This chapter is a work in progress. The idea is to explore BLUPs in more depth than what was done in the last two chapters.
For a model defined as
\[ y_{ij} = \mu + \theta_i + e_{ij} \] The prediction (in the balanced case with constant within-study variance) is defined as:
\[ \tilde{\theta}_i = \lambda\hat{\mu} + (1 - \lambda)\hat{\theta}_i \] \[ \lambda = \frac{\sigma^2}{\sigma^2+\tau^2} \] This implies that when the within-trial variance (\(\sigma^2\)) is high relative to the between-trial variance, then \(\lambda\) gets closer to 1 and the prediction for an individual trial \(\tilde{\theta}_i\) is pulled toward the mean (\(\hat{\mu}\)).
Higgins et al. (2009) state that:
“Attempts to use the empirical distribution of observed effect sizes, \(\hat{\theta}_i\), to predict future effects should be avoided. The distribution of the estimates is overdispersed, since the true variance of an individual \(\hat{\theta}_i\) around the population mean \(\mu\) is \(\tau^2 + \sigma_i^2\), not \(\tau^2\). Further, the sampling errors (\(\sigma_i^2\)) may be substantially different for different studies. A histogram of observed results, for example, may therefore be misleading.”
An important observation when looking at data from on-farm research networks (OFRNs) is that it is tempting to only consider the studies which had a positive response. For example, this is normal and it was done in https://www.pioneer.com/home/site/us/agronomy/library/foliar-fungicide-insecticide-soybeans/ summarizing data from fungicide application. It was stated that 83% of the trials had a positive response and the average response was 2.6 bu/ac. This approach ignores the uncertainity of each individual trial. The problem is that we might say “8 out of 10 trials had a positive response to the treatment”. However, the amount of uncertainty matters, because none of those 8 trials might have had a confidence interval significantly different from zero. Conversely, it could be that all of those 8 trials had significant responses. This happens in meta-analysis as well. When none of the 8 trials (out of 10) were significant it is possible that the aggregate mean for all trials is in fact significanlty different from zero, but the magnitude is likely to be small. This will result in a significant but small (in magnitude) difference. With OFRN we rarely expect to find large differences (in magnitude) but quantifying uncertainty should be crucially important. The ultimate goal is to be able to make a statement such as: “This practice has an 83% probability of providing a 2.6 bushel/acre increase.” There is uncertainty about both the 83% number as well as the 2.6 bu/ac number. How do we convey this message? One approach is to use instead what are known as the predictions from a mixed model for a random effect. These are known as ‘empirical best linear unbiased predictions’ (BLUPs). These predictions incorporate the between studies and within studies variance. An interesting concept is the intra-class correlation:
\[ icc = \frac{\sigma_B^2}{\sigma_B^2 + \sigma^2} \]
Where \(\sigma^2_B\) is the between-studies variance and \(\sigma^2\) is the within-study variance. This quantity is important because is closely related to the degree to which predictions are shrunk toward the global mean. When there is a large between-studies variance the correlation is closer to one and when there is a large within studies variance this quantitiy is closer to zero. The implication is that a relatively high between-study variance will results in lower degree of shrinkage.
\[ BLUP(s_i) = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2/n}(\bar{Y_{s_i}} - \bar{Y}_{..}) \]
What this means is that the difference between the raw mean for a study (\(s_i\)) and the overall average (\(\bar{Y}_{..}\)) is adjusted (shrinkage effect) by the relative contribution of the between-studies variance and the within study variance (here assumed to be equal within studies). These examples are taken from McCulloch and Searle (2000).
What are BLUPs? (https://en.wikipedia.org/wiki/Best_linear_unbiased_prediction).
Are BLUPs good? https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/013045.html
J.D. Hadfield, A.J. Wilson, D. Garant, B.C. Sheldon, L.E.B. Kruk (2010), “The Misuse of BLUP in Ecology and Evolution”. The American Naturalist, 175(1), 116-125.
https://academic.oup.com/beheco/article/28/4/948/3059669
https://www.frontiersin.org/articles/10.3389/fgene.2018.00195/full
The BLUPs are not “best” when it comes to bootstrapping. Jeffrey S. Morris https://www.stat.ubc.ca/~nancy/papers/morris.pdf
In this case the error bars are the minimum and maximum values within each trial. There is no model being assumed at this point. If we actually want to analyze these data with a linear model we might determine that the trials are simply levels of a fixed factor (later we will consider them as random).
## Fitting a linear model (trial fixed)
fit <- lm(rr ~ trial, data = dat)
ndat <- data.frame(trial = unique(dat$trial))
preds <- predict(fit, newdata = ndat, interval = "confidence")
ndatp <- cbind(ndat, preds)
mdat <- merge(datm, ndatp)
mdat.o <- mdat[order(mdat$rr),]
mdat.o$otrial <- 1:nrow(mdat.o)
ggplot(data = mdat.o, aes(x = rr, y = otrial)) +
geom_point() +
geom_errorbarh(aes(xmin = tmin, xmax = tmax)) +
geom_vline(xintercept = 1) +
geom_point(aes(x = fit, y = otrial), color = "red") +
geom_errorbarh(aes(xmin = lwr, xmax = upr), color = "blue") +
ylab("Study") +
xlab("Response ratio") +
scale_y_continuous(breaks = NULL)
In this case, when the trial is fixed the point prediction matches the average for each trial and the confidence interval (in blue) is naturally narrower than the data range for each trial. Notice also that the confidence intervals are all of equal width because we assume equal variance for all the trials.
What is the implication of considering the trial as random instead of fixed?
fit.lmer <- lmer(rr ~ 1 + (1 | trial), data = dat)
## Calculating ICC
sdre <- sqrt(VarCorr(fit.lmer)[[1]][1])
print(icc <- sdre^2 / (sdre^2 + sigma(fit.lmer)^2), digits = 2)
## [1] 0.71
## Calculating predictions
ndat2 <- ndat
ndat2$mmpred <- predict(fit.lmer, newdata = ndat)
mdat.o2 <- merge(mdat.o, ndat2)
The intra-class correlation coefficient is 0.71. The between study variance accounts for ~70% of the term \(\sigma^2_b + \sigma^2\). In other words, there is more spread between trials as there is within trials.
## Graphing
ggplot(data = mdat.o2, aes(x = rr, y = otrial)) +
geom_point() +
geom_errorbarh(aes(xmin = tmin, xmax = tmax)) +
geom_vline(xintercept = 1) +
geom_point(aes(x = mmpred, y = otrial), color = "red") +
ylab("Study") +
xlab("Response ratio") +
scale_y_continuous(breaks = NULL)
The figure shows that there is a slight ‘pull’ of the red dots (blups) towards the mean. This is due to the shrinkage factor. It is possible to have an estimate about uncertainty for BLUPs.
blup.u <- ranef(fit.lmer, condVar = TRUE)
## requires 'lattice' package
dotplot(blup.u)
## $trial
We can implement a Bayesian meta-analysis using JAGS. In this case we are explicitly defining the parameters in the model and we are also setting up priors. In this case, these are non-informative priors, but if we had specific information we could include this explicitly with the priors.
yd <- list(rr = dat$rr, trial = dat$trial,
t.n = length(unique(dat$trial)),
n = nrow(dat))
init <- list(mu = 0, tau = 0.01, tau.trial = 0.01)
modelstring="
model {
# Single intercept model likelihood
for (i in 1:n) {
rr[i]~dnorm(mu + b[trial[i]],tau)
}
# trial effect
for(j in 1:t.n){
b[j] ~ dnorm(0, tau.trial)
}
# priors
mu ~ dnorm(0,0.00001) # intercept prior
tau ~ dgamma(0.0001,0.0001) ## tau is the residual precision
sigma <- 1.0/sqrt(tau)
tau.trial ~ dgamma(0.0001, 0.0001)
sigma.trial <- 1.0/sqrt(tau.trial)
}
"
mdl=jags.model(textConnection(modelstring), data=yd, inits=init,
n.chains = 2)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 175
## Unobserved stochastic nodes: 38
## Total graph size: 433
##
## Initializing model
update(mdl,n.iter = 10000, n.burnin=7000)
output=coda.samples(model=mdl,
variable.names=c("mu","sigma","sigma.trial"),
n.iter=10000, thin=10)
print(summary(output))
##
## Iterations = 10010:20000
## Thinning interval = 10
## Number of chains = 2
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## mu 1.06205 0.009318 2.084e-04 3.318e-04
## sigma 0.03362 0.002047 4.578e-05 4.578e-05
## sigma.trial 0.05278 0.007170 1.603e-04 1.603e-04
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## mu 1.04391 1.05578 1.06187 1.06818 1.08093
## sigma 0.03008 0.03226 0.03354 0.03486 0.03786
## sigma.trial 0.04053 0.04778 0.05213 0.05724 0.06791
plot(output, trace = FALSE)
plot(output, density = FALSE)
The results are very similar to those obtained using a linear mixed model and can be confirmed looking at object ‘fit.lmer’.
## Compare parameters from lmer and jags
lmer.prms <- c(fixef(fit.lmer),sqrt(VarCorr(fit.lmer)[[1]][1]),sigma(fit.lmer))
names(lmer.prms) <- c("mu","sigma.trial","sigma")
opp <- summary(output)[[1]][,1]
jags.prms <- c(opp[1],opp[3],opp[2])
names(jags.prms) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms,jags.prms)))
mu | sigma.trial | sigma | |
---|---|---|---|
lmer.prms | 1.062081 | 0.0516421 | 0.0333837 |
jags.prms | 1.062047 | 0.0527806 | 0.0336181 |
It is possible to fit the same model using MCMCglmm.
prior1 <- list(B = list(mu = 0, V = 100),
R = list(V = 1, nu = 0.001),
G = list(G1 = list(V = 1, nu = 0.001)))
fit.mcmcglmm <- MCMCglmm(rr ~ 1, random = ~trial,
prior = prior1,
data = dat, verbose = FALSE)
summary(fit.mcmcglmm)
##
## Iterations = 3001:12991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: -658.705
##
## G-structure: ~trial
##
## post.mean l-95% CI u-95% CI eff.samp
## trial 0.00285 0.001439 0.004249 1000
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 0.00114 0.0009077 0.001424 1000
##
## Location effects: rr ~ 1
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) 1.062 1.046 1.081 1000 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sl.mc <- summary(fit.mcmcglmm)$solutions
vs.mc <- sqrt(summary(fit.mcmcglmm$VCV)$statistics[,1])
mcmcglmm.prms <- c(sl.mc[1],vs.mc)
names(mcmcglmm.prms) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms,jags.prms,mcmcglmm.prms)),
caption = "Similar estimates for lmer, jags and mcmcglmm")
mu | sigma.trial | sigma | |
---|---|---|---|
lmer.prms | 1.062081 | 0.0516421 | 0.0333837 |
jags.prms | 1.062047 | 0.0527806 | 0.0336181 |
mcmcglmm.prms | 1.062220 | 0.0533867 | 0.0337566 |
plot(fit.mcmcglmm$VCV, density = FALSE)
plot(fit.mcmcglmm$VCV, trace = FALSE)
The analysis of these data is interesting because this is a very extensive network. Some details can be found in ISOFAST (https://analytics.iasoybeans.com/cool-apps/ISOFAST/); Soybean -> Foliar Fungicide -> Headline.
mzfg <- read.csv("./ch3/data/maize_head.csv")[,c(1,2,3,11,13)]
mzfg$lrr <- with(mzfg, log(TRT_Yld/CTR_Yld))
mzfg.fm <- lmer(lrr ~ 1 + (1|Trial_ID), data = mzfg)
mzfg.lm <- lm(lrr ~ Trial_ID, data = mzfg)
## Summaries
summary(mzfg.fm)
## Linear mixed model fit by REML ['lmerMod']
## Formula: lrr ~ 1 + (1 | Trial_ID)
## Data: mzfg
##
## REML criterion at convergence: -2094.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.0677 -0.4608 -0.0238 0.4389 7.4379
##
## Random effects:
## Groups Name Variance Std.Dev.
## Trial_ID (Intercept) 0.0002438 0.01561
## Residual 0.0027326 0.05227
## Number of obs: 703, groups: Trial_ID, 143
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.022846 0.002404 9.505
It turns out this is an important dataset because we might use it to develop priors for the analysis of other management networks. I will analyze it carefully.
First, compare the residual variance for a linear model and a linear mixed model.
## These two are nearly identical as would be expected
sigma(mzfg.fm)
## [1] 0.05227413
sigma(mzfg.lm)
## [1] 0.05227631
## One thing to notice is that the between trial
## variance is much smaller than the within trial
VarCorr(mzfg.fm)
## Groups Name Std.Dev.
## Trial_ID (Intercept) 0.015613
## Residual 0.052274
In this case the between-trial variance is much smaller than the within-trial variance with an ICC of only 0.08.
## Groups Name Std.Dev.
## Trial_ID (Intercept) 0.015613
## Residual 0.052274
## ICC: 0.08189905
Fitting the model using MCMCglmm.
## Providing priors is important
set.seed(1234)
prior1 <- list(B = list(mu = 0, V = 100),
R = list(V = 1, nu = 0.001),
G = list(G1 = list(V = 1, nu = 0.001)))
mzfg.mc1 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
prior = prior1,
nitt = 2e4, burnin = 1e4,
data = mzfg, verbose = FALSE)
mzfg.mc2 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
prior = prior1,
nitt = 2e4, burnin = 1e4,
data = mzfg, verbose = FALSE)
mzfg.mc3 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
prior = prior1,
nitt = 2e4, burnin = 1e4,
data = mzfg, verbose = FALSE)
mzfg.mc <- mcmc.list(mzfg.mc1$VCV,mzfg.mc2$VCV,mzfg.mc3$VCV)
gelman.diag(mzfg.mc)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## Trial_ID 1 1
## units 1 1
##
## Multivariate psrf
##
## 1
plot(mzfg.mc, density = FALSE)
How do the results from MCMCglmm compare with lmer?
lmer.prms.mzfg <- c(fixef(mzfg.fm),sqrt(VarCorr(mzfg.fm)[[1]][1]),sigma(mzfg.fm))
names(lmer.prms.mzfg) <- c("mu","sigma.trial","sigma")
sl.mc <- summary(mzfg.mc1)$solutions
vs.mc <- sqrt(summary(mzfg.mc1$VCV)$statistics[,1])
mcmcglmm.prms.mzfg <- c(sl.mc[1],vs.mc)
names(mcmcglmm.prms.mzfg) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms.mzfg,mcmcglmm.prms.mzfg)),
caption = "Estimates for lmer and mcmcglmm")
mu | sigma.trial | sigma | |
---|---|---|---|
lmer.prms.mzfg | 0.0228459 | 0.0156128 | 0.0522741 |
mcmcglmm.prms.mzfg | 0.0229376 | 0.0174158 | 0.0521590 |
Given the similarity between the two fits I would expect the prediction intervals to be very similar.
## Calculate confidence intervals
mzfg.ci <- as.vector(confint(mzfg.fm, "(Intercept)", quiet = TRUE))
## Calculate prediction interval
mzfg.pdi <- pred_int(mzfg.fm)
## Estimate BLUPs
mzfgd <- data.frame(Trial_ID = unique(mzfg$Trial_ID))
## For this model these are the same as BLUPs
mzfgd$prd <- predict(mzfg.fm, newdata = mzfgd)
## intervals from lm model
mzfgd$prd.lm <- predict(mzfg.lm, newdata = mzfgd)
mzfgd$prd.lm.lwr <- predict(mzfg.lm, newdata = mzfgd,
interval = "confidence")[,2]
mzfgd$prd.lm.upr <- predict(mzfg.lm, newdata = mzfgd,
interval = "confidence")[,3]
mzfgd.o <- mzfgd[order(mzfgd$prd),]
mzfgd.o$trial <- 1:nrow(mzfgd.o)
## Bootstrap BLUPs
nit <- 500
## residuals
mzfg.resid <- residuals(mzfg.lm)
mzfg.fitted <- fitted(mzfg.lm)
blpsm <- matrix(NA, ncol = nit, nrow = length(unique(mzfg$Trial_ID)))
for(i in 1:nit){
## Resample residuals
simr <- mzfg.fitted + sample(mzfg.resid, length(mzfg.resid),replace=TRUE)
simrd <- data.frame(mzfg, simr = simr)
mzfg.sim.fm <- lmer(simr ~ 1 + (1|Trial_ID), data = simrd)
##Estimate BLUPs
blpsm[,i] <- predict(mzfg.sim.fm, newdata = mzfgd)
}
## Calculate quantiles
blpsd <- data.frame(Trial_ID = unique(mzfg$Trial_ID),
blp = predict(mzfg.fm, newdata = mzfgd))
blpsd$q5 <- apply(blpsm, 1, quantile, probs = 0.025)
blpsd$q50 <- apply(blpsm, 1, quantile, probs = 0.5)
blpsd$q95 <- apply(blpsm, 1, quantile, probs = 0.975)
## Merge datasets
mzfg.simd <- merge(mzfgd.o, blpsd)
## Plot just the blps
##blpsd.o <- blpsd[order(blpsd$blp),]
##blpsd.o$trial <- 1:nrow(blpsd.o)
ggplot(data = mzfg.simd, aes(x = blp, y = trial)) +
xlab("LRR") + ggtitle("Means, BLUPS and more") +
geom_point(aes(x = prd.lm, y = trial), color = "blue") +
geom_point(color = "purple") +
geom_point(aes(x = q50, y = trial), color = "red") +
geom_errorbarh(aes(xmin = q5, xmax = q95)) +
geom_point(aes(x = fixef(mzfg.fm), y = -1), color = "red") +
geom_errorbarh(aes(xmin = mzfg.ci[1], xmax = mzfg.ci[2], y = -1), color = "red") +
geom_point(aes(x = mzfg.pdi[1], y = -7), color = "red") +
geom_errorbarh(aes(xmin = mzfg.pdi[2], xmax = mzfg.pdi[3], y = -7), color = "red") +
geom_text(aes(x = 0.1, y = -1), label = "confidence interval", color = "black") +
geom_text(aes(x = 0.1, y = -7), label = "prediction interval", color = "black") +
geom_text(aes(x = 0.1, y = 100), label = "Purple = BLUPs", color = "purple") +
geom_text(aes(x = 0.1, y = 85), label = "Blue = trial mean", color = "blue") +
geom_text(aes(x = 0.1, y = 65), label = "Red = bootstrap \n BLUP mode", color = "red")
set.seed(101)
mzfg.mc1 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
nitt = 2e4, burnin = 1e4,
prior = prrs,
data = mzfg, verbose = FALSE)
mzfg.mc2 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
nitt = 2e4, burnin = 1e4,
prior = prrs,
data = mzfg, verbose = FALSE)
mzfg.mc3 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
nitt = 2e4, burnin = 1e4,
prior = prrs,
data = mzfg, verbose = FALSE)
mzfg.mc <- mcmc.list(mzfg.mc1$VCV,mzfg.mc2$VCV,mzfg.mc3$VCV)
gelman.diag(mzfg.mc)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## Trial_ID 1 1
## units 1 1
##
## Multivariate psrf
##
## 1
plot(mzfg.mc, density = FALSE)
plot(mzfg.mc, trace = FALSE)
## Look at the mean parameter
mzfg.mcS <- mcmc.list(mzfg.mc1$Sol,mzfg.mc2$Sol,mzfg.mc3$Sol)
plot(mzfg.mcS, density = FALSE)
plot(mzfg.mcS, trace = FALSE)