MR is performed using a set of genetic instruments that are strongly associated with one trait - ‘Trait 1’. To find out if Trait 1 affects another trait - ‘Trait 2’ - MR can be applied to see if variants strongly associated with Trait 1 are proportionally associated with Trait 2.
There are three main assumptions that MR relies on:
It is important to note that even if a casual effect is identified by MR, this finding should be replicated and validated using different datasets and different methods. For example, if a causal effect was identified between a molecular trait and a disease outcome, RNAseq or proteomic analysis of tissues from patients with the disease could aid validation of an association between increased/decreased molecular levels and the disease.
MR can be performed on any two traits that have been analysed using a GWAS. The following example looks at whether plasma triglyceride level might have a causal effect on systolic blood pressure.
There are two useful R packages for performing MR -
library(RadialMR)
library(TwoSampleMR)
First both GWAS datasets are loaded into R, and any information that is missing is added:
# Set wd
setwd(('/Users/alicesmail/Desktop/KCL/LCV/TriGBPs'))
# Get GWAS data
options(datatable.fread.datatable=FALSE)
TriG <- data.table::fread("./GWASFiles/TG_with_Effect.tbl", header=TRUE, sep="\t", stringsAsFactors = FALSE)
sBP <- data.table::fread("./GWASFiles/GCST90029011_buildGRCh37.tsv", header=TRUE, sep="\t", stringsAsFactors = FALSE)
# Add phenotype information
TriG$Phenotype <- 'TriG'
TriG$samplesize <- 96598
sBP$Phenotype <- 'sBP'
Next the exposure GWAS is filtered to obtain SNPs that are strongly associated with plasma triglyceride level:
# Get instrumental variables for TriG
TriGSig <- TriG %>% filter(GC.Pvalue <= 1e-8)
The TwoSampleMR function
# Format exposure data
TriGMR <- TwoSampleMR::format_data(snps=NULL, type='exposure', TriGSig, header=TRUE,
phenotype_col='Phenotype', snp_col='MarkerName',
beta_col='Effect', se_col='StdErr',
effect_allele_col='Allele1', other_allele_col='Allele2',
pval_col='GC.Pvalue',
samplesize='samplesize')
# LD clumping
options(ieugwasr_api='gwas-api.mrcieu.ac.uk/')
TriGMRClumped <- clump_data(TriGMR, clump_kb=500, clump_r2=0.01)
Another step to take here is removing any
# Get SNPs in region
MHC <- sBP %>%
filter(chromosome==6) %>%
filter(base_pair_location >= 28510120 & base_pair_location <= 33480577)
MHCSNP <- MHC %>% dplyr::select(variant_id)
# Remove
sBPNoMHC <- sBP %>% filter(!variant_id %in% unlist(MHCSNP))
Next use TwoSampleMR to format the outcome GWAS:
# Format outcome data
sBPMR <- TwoSampleMR::format_data(snps=TriGMRClumped$SNP, type='outcome', sBPNoMHC, header=TRUE,
phenotype_col='Phenotype', snp_col='variant_id',
beta_col='beta', se_col='standard_error',
effect_allele_col='effect_allele', other_allele_col='other_allele',
pval_col='p_value',
samplesize='n')
# Harmonise
harm <- harmonise_data(exposure_dat=TriGMRClumped, outcome_dat=sBPMR)
# MR
res <- mr(harm)
knitr::kable(res)
id.exposure | id.outcome | outcome | exposure | method | nsnp | b | se | pval |
---|---|---|---|---|---|---|---|---|
1QhhMh | YcFEc9 | sBP | TriG | MR Egger | 23 | -0.0138922 | 0.0385138 | 0.7219214 |
1QhhMh | YcFEc9 | sBP | TriG | Weighted median | 23 | 0.0301564 | 0.0145819 | 0.0386332 |
1QhhMh | YcFEc9 | sBP | TriG | Inverse variance weighted | 23 | 0.0308218 | 0.0208680 | 0.1396787 |
1QhhMh | YcFEc9 | sBP | TriG | Simple mode | 23 | 0.0038519 | 0.0290564 | 0.8957403 |
1QhhMh | YcFEc9 | sBP | TriG | Weighted mode | 23 | 0.0531585 | 0.0147576 | 0.0015834 |
The results from the IVW method suggest that there is not a causal relationship between plasma triglyceride level and systolic blood pressure, with a beta effect that is not significantly different from 0.
After performing MR, different sensitivity methods can be implemented to
assess
# Heterogeneity
het <- mr_heterogeneity(harm)
# Single SNP
singleSNP <- mr_singlesnp(harm)
I2 <- Isq(singleSNP$b, singleSNP$se)
# Pleiotropy
pleio <- mr_pleiotropy_test(harm)
# Print
knitr::kable(het)
id.exposure | id.outcome | outcome | exposure | method | Q | Q_df | Q_pval |
---|---|---|---|---|---|---|---|
1QhhMh | YcFEc9 | sBP | TriG | MR Egger | 150.1872 | 21 | 0 |
1QhhMh | YcFEc9 | sBP | TriG | Inverse variance weighted | 163.6190 | 22 | 0 |
knitr::kable(pleio)
id.exposure | id.outcome | outcome | exposure | egger_intercept | se | pval |
---|---|---|---|---|---|---|
1QhhMh | YcFEc9 | sBP | TriG | 0.0036629 | 0.0026728 | 0.1850226 |
message('I2: ', round(I2[1], 4))
## I2: 0.8545
These sensitivity analyses show that this MR analysis has
substantial heterogeneity - the Q value is very large and significant,
and an I2 of
A mean F-statistic for the instrumental variables can assess their strength:
# Individual f-statistics
dat <- harm %>% filter(mr_keep=TRUE)
total_F <- 0
num <- 0
for (row in 1:nrow(dat)){
SNP = dat[row,]
Fstat <- (SNP$beta.exposure**2)/(SNP$se.exposure**2)
total_F <- total_F+Fstat
num <- num+1
}
# Mean f-statistic
FstatMean <- total_F/nrow(dat)
message(round(FstatMean,3))
## 146.025
As well as sensitivity analyses, there are some plots that can be used to inspect the MR results.
# Scatter plot
scatter <- mr_scatter_plot(res, harm)
scatter[[1]] + theme + guides(color=guide_legend(ncol=1))
This
Another useful plot is the
# LOO plot
LOO <- mr_leaveoneout(harm, method=mr_ivw_fe)
mr_leaveoneout_plot(LOO)[[1]] + theme + theme(legend.position="none")
Something else to do is examine if there are any
# Identify outliers
radial <- harm %>% filter(mr_keep==TRUE) %>% dat_to_RadialMR
radialIVW <- ivw_radial(radial[[1]], alpha=0.05/nrow(radial[[1]]))
##
## Radial IVW
##
## Estimate Std.Error t value Pr(>|t|)
## Effect (Mod.2nd) 0.03090395 0.020873880 1.480508 1.387377e-01
## Iterative 0.03090438 0.020873910 1.480527 1.387328e-01
## Exact (FE) 0.03290962 0.007680747 4.284689 1.829946e-05
## Exact (RE) 0.03110524 0.018924598 1.643641 1.144669e-01
##
##
## Residual standard error: 2.719 on 22 degrees of freedom
##
## F-statistic: 2.19 on 1 and 22 DF, p-value: 0.153
## Q-Statistic for heterogeneity: 162.6372 on 22 DF , p-value: 1.914582e-23
##
## Outliers detected
## Number of iterations = 3
# Radial plot
plot_radial(radialIVW, radial_scale=FALSE, show_outliers=TRUE) + theme
Then we can remove the outliers we identified and see how our results are affected.
# Remove outliers
outliers <- radialIVW$outliers
TriGMRClumpedrmOutliers <- TriGMRClumped %>% filter(!SNP %in% outliers$SNP)
# Re-harmonise data
harm <- harmonise_data(exposure_dat=TriGMRClumpedrmOutliers, outcome_dat=sBPMR)
# MR
res <- mr(harm)
knitr::kable(res)
id.exposure | id.outcome | outcome | exposure | method | nsnp | b | se | pval |
---|---|---|---|---|---|---|---|---|
1QhhMh | YcFEc9 | sBP | TriG | MR Egger | 19 | 0.0383116 | 0.0280671 | 0.1900468 |
1QhhMh | YcFEc9 | sBP | TriG | Weighted median | 19 | 0.0348414 | 0.0147743 | 0.0183620 |
1QhhMh | YcFEc9 | sBP | TriG | Inverse variance weighted | 19 | 0.0250642 | 0.0142645 | 0.0789006 |
1QhhMh | YcFEc9 | sBP | TriG | Simple mode | 19 | 0.0027901 | 0.0368017 | 0.9404040 |
1QhhMh | YcFEc9 | sBP | TriG | Weighted mode | 19 | 0.0599157 | 0.0161682 | 0.0016178 |
As these outliers are identified by their contribution to the overall heterogeneity of the analysis, when they are removed the overall heterogeneity should be reduced:
# Heterogeneity
het <- mr_heterogeneity(harm)
# Single SNP
singleSNP <- mr_singlesnp(harm)
I2 <- Isq(singleSNP$b, singleSNP$se)
# Print
knitr::kable(het)
id.exposure | id.outcome | outcome | exposure | method | Q | Q_df | Q_pval |
---|---|---|---|---|---|---|---|
1QhhMh | YcFEc9 | sBP | TriG | MR Egger | 56.27195 | 17 | 4.3e-06 |
1QhhMh | YcFEc9 | sBP | TriG | Inverse variance weighted | 57.28027 | 18 | 5.6e-06 |
message('I2: ', round(I2[1], 4))
## I2: 0.6521
While the heterogeneity is not as extreme as before, it is still substantial, meaning the results from this analysis may be biased.
In conclusion, while the IVW results from this analysis are borderline significant (p=0.08), there isn't enough evidence to say that plasma triglyceride levels causally effect systolic blood pressure. Additionally, even after removing outliers there is a substantial amount of heterogeneity in the effects of each SNP on each trait, which makes these results less reliable.
Some positives of this analysis are that the instrumental variables are sufficiently strong (F-stat=146), and that the exposure trait (plasma triglyceride levels) could be sufficiently modelled in MR by several strongly associated SNPs.
The LCV model is a similar method to MR that aims to identify causal effects between two traits using GWAS data. The results from applying the LCV model to the same GWAS datasets are here.
As the LCV model takes into account all SNPs from the input GWAS, it may be better powered to detect causal effects: MR only uses selected variants associated with exposure trait. It may be the case that this MR analysis was underpowered, relative to applying LCV, to detect a casual effect.