Linear Mixed Models and BLUPS

This chapter is a work in progress. The idea is to explore BLUPs in more depth than what was done in the last two chapters.

For a model defined as

\[ y_{ij} = \mu + \theta_i + e_{ij} \] The prediction (in the balanced case with constant within-study variance) is defined as:

\[ \tilde{\theta}_i = \lambda\hat{\mu} + (1 - \lambda)\hat{\theta}_i \] \[ \lambda = \frac{\sigma^2}{\sigma^2+\tau^2} \] This implies that when the within-trial variance (\(\sigma^2\)) is high relative to the between-trial variance, then \(\lambda\) gets closer to 1 and the prediction for an individual trial \(\tilde{\theta}_i\) is pulled toward the mean (\(\hat{\mu}\)).

Higgins et al. (2009) state that:

“Attempts to use the empirical distribution of observed effect sizes, \(\hat{\theta}_i\), to predict future effects should be avoided. The distribution of the estimates is overdispersed, since the true variance of an individual \(\hat{\theta}_i\) around the population mean \(\mu\) is \(\tau^2 + \sigma_i^2\), not \(\tau^2\). Further, the sampling errors (\(\sigma_i^2\)) may be substantially different for different studies. A histogram of observed results, for example, may therefore be misleading.”

Linear Mixed Models: BLUPs

An important observation when looking at data from on-farm research networks (OFRNs) is that it is tempting to only consider the studies which had a positive response. For example, this is normal and it was done in https://www.pioneer.com/home/site/us/agronomy/library/foliar-fungicide-insecticide-soybeans/ summarizing data from fungicide application. It was stated that 83% of the trials had a positive response and the average response was 2.6 bu/ac. This approach ignores the uncertainity of each individual trial. The problem is that we might say “8 out of 10 trials had a positive response to the treatment”. However, the amount of uncertainty matters, because none of those 8 trials might have had a confidence interval significantly different from zero. Conversely, it could be that all of those 8 trials had significant responses. This happens in meta-analysis as well. When none of the 8 trials (out of 10) were significant it is possible that the aggregate mean for all trials is in fact significanlty different from zero, but the magnitude is likely to be small. This will result in a significant but small (in magnitude) difference. With OFRN we rarely expect to find large differences (in magnitude) but quantifying uncertainty should be crucially important. The ultimate goal is to be able to make a statement such as: “This practice has an 83% probability of providing a 2.6 bushel/acre increase.” There is uncertainty about both the 83% number as well as the 2.6 bu/ac number. How do we convey this message? One approach is to use instead what are known as the predictions from a mixed model for a random effect. These are known as ‘empirical best linear unbiased predictions’ (BLUPs). These predictions incorporate the between studies and within studies variance. An interesting concept is the intra-class correlation:

\[ icc = \frac{\sigma_B^2}{\sigma_B^2 + \sigma^2} \]

Where \(\sigma^2_B\) is the between-studies variance and \(\sigma^2\) is the within-study variance. This quantity is important because is closely related to the degree to which predictions are shrunk toward the global mean. When there is a large between-studies variance the correlation is closer to one and when there is a large within studies variance this quantitiy is closer to zero. The implication is that a relatively high between-study variance will results in lower degree of shrinkage.

\[ BLUP(s_i) = \frac{\sigma^2_b}{\sigma^2_b + \sigma^2/n}(\bar{Y_{s_i}} - \bar{Y}_{..}) \]

What this means is that the difference between the raw mean for a study (\(s_i\)) and the overall average (\(\bar{Y}_{..}\)) is adjusted (shrinkage effect) by the relative contribution of the between-studies variance and the within study variance (here assumed to be equal within studies). These examples are taken from McCulloch and Searle (2000).

Estimates for the individual trials

What are BLUPs? (https://en.wikipedia.org/wiki/Best_linear_unbiased_prediction).

Are BLUPs good? https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/013045.html

J.D. Hadfield, A.J. Wilson, D. Garant, B.C. Sheldon, L.E.B. Kruk (2010), “The Misuse of BLUP in Ecology and Evolution”. The American Naturalist, 175(1), 116-125.

https://academic.oup.com/beheco/article/28/4/948/3059669

https://www.frontiersin.org/articles/10.3389/fgene.2018.00195/full

The BLUPs are not “best” when it comes to bootstrapping. Jeffrey S. Morris https://www.stat.ubc.ca/~nancy/papers/morris.pdf

Example with synthetic data

In this case the error bars are the minimum and maximum values within each trial. There is no model being assumed at this point. If we actually want to analyze these data with a linear model we might determine that the trials are simply levels of a fixed factor (later we will consider them as random).

## Fitting a linear model (trial fixed)
fit <- lm(rr ~ trial, data = dat)

ndat <- data.frame(trial = unique(dat$trial))
preds <- predict(fit, newdata = ndat, interval = "confidence")
ndatp <- cbind(ndat, preds)

mdat <- merge(datm, ndatp)
mdat.o <- mdat[order(mdat$rr),]
mdat.o$otrial <- 1:nrow(mdat.o) 

ggplot(data = mdat.o, aes(x = rr, y = otrial)) + 
  geom_point() + 
  geom_errorbarh(aes(xmin = tmin, xmax = tmax)) + 
  geom_vline(xintercept = 1) +
  geom_point(aes(x = fit, y = otrial), color = "red") + 
  geom_errorbarh(aes(xmin = lwr, xmax = upr), color = "blue") + 
  ylab("Study") + 
  xlab("Response ratio") + 
  scale_y_continuous(breaks = NULL)

In this case, when the trial is fixed the point prediction matches the average for each trial and the confidence interval (in blue) is naturally narrower than the data range for each trial. Notice also that the confidence intervals are all of equal width because we assume equal variance for all the trials.

What is the implication of considering the trial as random instead of fixed?

fit.lmer <- lmer(rr ~ 1 + (1 | trial), data = dat)
## Calculating ICC
sdre <- sqrt(VarCorr(fit.lmer)[[1]][1])
print(icc <- sdre^2 / (sdre^2 + sigma(fit.lmer)^2), digits = 2)

## [1] 0.71

## Calculating predictions
ndat2 <- ndat
ndat2$mmpred <- predict(fit.lmer, newdata = ndat)
mdat.o2 <- merge(mdat.o, ndat2)

The intra-class correlation coefficient is 0.71. The between study variance accounts for ~70% of the term \(\sigma^2_b + \sigma^2\). In other words, there is more spread between trials as there is within trials.

## Graphing
ggplot(data = mdat.o2, aes(x = rr, y = otrial)) + 
  geom_point() + 
  geom_errorbarh(aes(xmin = tmin, xmax = tmax)) + 
  geom_vline(xintercept = 1) +
  geom_point(aes(x = mmpred, y = otrial), color = "red") + 
  ylab("Study") + 
  xlab("Response ratio") + 
  scale_y_continuous(breaks = NULL)

The figure shows that there is a slight ‘pull’ of the red dots (blups) towards the mean. This is due to the shrinkage factor. It is possible to have an estimate about uncertainty for BLUPs.

blup.u <- ranef(fit.lmer, condVar = TRUE)
## requires 'lattice' package
dotplot(blup.u)

## $trial

Bayesian approach

Example using JAGS

We can implement a Bayesian meta-analysis using JAGS. In this case we are explicitly defining the parameters in the model and we are also setting up priors. In this case, these are non-informative priors, but if we had specific information we could include this explicitly with the priors.

yd <- list(rr = dat$rr, trial = dat$trial,
           t.n = length(unique(dat$trial)), 
           n = nrow(dat))
init <- list(mu = 0, tau = 0.01, tau.trial = 0.01)
modelstring="
  model {
    # Single intercept model likelihood
    for (i in 1:n) {
      rr[i]~dnorm(mu + b[trial[i]],tau)
    }

    # trial effect 
    for(j in 1:t.n){
      b[j] ~ dnorm(0, tau.trial)
    }
    
    # priors
    mu ~ dnorm(0,0.00001) # intercept prior
    tau ~ dgamma(0.0001,0.0001) ## tau is the residual precision
    sigma <- 1.0/sqrt(tau)
    tau.trial ~ dgamma(0.0001, 0.0001)
    sigma.trial <- 1.0/sqrt(tau.trial)
  }
"
mdl=jags.model(textConnection(modelstring), data=yd, inits=init, 
               n.chains = 2)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 175
##    Unobserved stochastic nodes: 38
##    Total graph size: 433
## 
## Initializing model

update(mdl,n.iter = 10000, n.burnin=7000)
output=coda.samples(model=mdl, 
                    variable.names=c("mu","sigma","sigma.trial"), 
                    n.iter=10000, thin=10)
print(summary(output))

## 
## Iterations = 10010:20000
## Thinning interval = 10 
## Number of chains = 2 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                Mean       SD  Naive SE Time-series SE
## mu          1.06205 0.009318 2.084e-04      3.318e-04
## sigma       0.03362 0.002047 4.578e-05      4.578e-05
## sigma.trial 0.05278 0.007170 1.603e-04      1.603e-04
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%     50%     75%   97.5%
## mu          1.04391 1.05578 1.06187 1.06818 1.08093
## sigma       0.03008 0.03226 0.03354 0.03486 0.03786
## sigma.trial 0.04053 0.04778 0.05213 0.05724 0.06791

plot(output, trace = FALSE)

plot(output, density = FALSE)

The results are very similar to those obtained using a linear mixed model and can be confirmed looking at object ‘fit.lmer’.

## Compare parameters from lmer and jags
lmer.prms <- c(fixef(fit.lmer),sqrt(VarCorr(fit.lmer)[[1]][1]),sigma(fit.lmer))
names(lmer.prms) <- c("mu","sigma.trial","sigma")
opp <- summary(output)[[1]][,1]
jags.prms <- c(opp[1],opp[3],opp[2])
names(jags.prms) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms,jags.prms)))

	mu	sigma.trial	sigma
lmer.prms	1.062081	0.0516421	0.0333837
jags.prms	1.062047	0.0527806	0.0336181

Example using MCMCglmm

It is possible to fit the same model using MCMCglmm.

prior1 <- list(B = list(mu = 0, V = 100),
               R = list(V = 1, nu = 0.001),
               G = list(G1 = list(V = 1, nu = 0.001)))
fit.mcmcglmm <- MCMCglmm(rr ~ 1, random = ~trial, 
                         prior = prior1,
                         data = dat, verbose = FALSE)
summary(fit.mcmcglmm)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: -658.705 
## 
##  G-structure:  ~trial
## 
##       post.mean l-95% CI u-95% CI eff.samp
## trial   0.00285 0.001439 0.004249     1000
## 
##  R-structure:  ~units
## 
##       post.mean  l-95% CI u-95% CI eff.samp
## units   0.00114 0.0009077 0.001424     1000
## 
##  Location effects: rr ~ 1 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)     1.062    1.046    1.081     1000 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

sl.mc <- summary(fit.mcmcglmm)$solutions
vs.mc <- sqrt(summary(fit.mcmcglmm$VCV)$statistics[,1])
mcmcglmm.prms <- c(sl.mc[1],vs.mc)
names(mcmcglmm.prms) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms,jags.prms,mcmcglmm.prms)),
      caption = "Similar estimates for lmer, jags and mcmcglmm")

Similar estimates for lmer, jags and mcmcglmm
	mu	sigma.trial	sigma
lmer.prms	1.062081	0.0516421	0.0333837
jags.prms	1.062047	0.0527806	0.0336181
mcmcglmm.prms	1.062220	0.0533867	0.0337566

plot(fit.mcmcglmm$VCV, density = FALSE)

plot(fit.mcmcglmm$VCV, trace = FALSE)

Analysis of fungicide data

The analysis of these data is interesting because this is a very extensive network. Some details can be found in ISOFAST (https://analytics.iasoybeans.com/cool-apps/ISOFAST/); Soybean -> Foliar Fungicide -> Headline.

mzfg <- read.csv("./ch3/data/maize_head.csv")[,c(1,2,3,11,13)]
mzfg$lrr <- with(mzfg, log(TRT_Yld/CTR_Yld))

mzfg.fm <- lmer(lrr ~ 1 + (1|Trial_ID), data = mzfg)
mzfg.lm <- lm(lrr ~ Trial_ID, data = mzfg)

## Summaries
summary(mzfg.fm)

## Linear mixed model fit by REML ['lmerMod']
## Formula: lrr ~ 1 + (1 | Trial_ID)
##    Data: mzfg
## 
## REML criterion at convergence: -2094.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.0677 -0.4608 -0.0238  0.4389  7.4379 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Trial_ID (Intercept) 0.0002438 0.01561 
##  Residual             0.0027326 0.05227 
## Number of obs: 703, groups:  Trial_ID, 143
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 0.022846   0.002404   9.505

It turns out this is an important dataset because we might use it to develop priors for the analysis of other management networks. I will analyze it carefully.

First, compare the residual variance for a linear model and a linear mixed model.

## These two are nearly identical as would be expected
sigma(mzfg.fm)

## [1] 0.05227413

sigma(mzfg.lm)

## [1] 0.05227631

## One thing to notice is that the between trial 
## variance is much smaller than the within trial
VarCorr(mzfg.fm)

##  Groups   Name        Std.Dev.
##  Trial_ID (Intercept) 0.015613
##  Residual             0.052274

In this case the between-trial variance is much smaller than the within-trial variance with an ICC of only 0.08.

##  Groups   Name        Std.Dev.
##  Trial_ID (Intercept) 0.015613
##  Residual             0.052274

## ICC:  0.08189905

Fitting the model using MCMCglmm.

## Providing priors is important
set.seed(1234)
prior1 <- list(B = list(mu = 0, V = 100),
               R = list(V = 1, nu = 0.001),
               G = list(G1 = list(V = 1, nu = 0.001)))

mzfg.mc1 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID,
                     prior = prior1,
                    nitt = 2e4, burnin = 1e4,
                    data = mzfg, verbose = FALSE)
mzfg.mc2 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID, 
                     prior = prior1,
                    nitt = 2e4, burnin = 1e4,
                    data = mzfg, verbose = FALSE)
mzfg.mc3 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID, 
                     prior = prior1,
                    nitt = 2e4, burnin = 1e4,
                    data = mzfg, verbose = FALSE)
mzfg.mc <- mcmc.list(mzfg.mc1$VCV,mzfg.mc2$VCV,mzfg.mc3$VCV)
gelman.diag(mzfg.mc)

## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## Trial_ID          1          1
## units             1          1
## 
## Multivariate psrf
## 
## 1

plot(mzfg.mc, density = FALSE)

How do the results from MCMCglmm compare with lmer?

lmer.prms.mzfg <- c(fixef(mzfg.fm),sqrt(VarCorr(mzfg.fm)[[1]][1]),sigma(mzfg.fm))
names(lmer.prms.mzfg) <- c("mu","sigma.trial","sigma")
sl.mc <- summary(mzfg.mc1)$solutions
vs.mc <- sqrt(summary(mzfg.mc1$VCV)$statistics[,1])
mcmcglmm.prms.mzfg <- c(sl.mc[1],vs.mc)
names(mcmcglmm.prms.mzfg) <- c("mu","sigma.trial","sigma")
kable(data.frame(rbind(lmer.prms.mzfg,mcmcglmm.prms.mzfg)),
      caption = "Estimates for lmer and mcmcglmm")

Estimates for lmer and mcmcglmm
	mu	sigma.trial	sigma
lmer.prms.mzfg	0.0228459	0.0156128	0.0522741
mcmcglmm.prms.mzfg	0.0229376	0.0174158	0.0521590

Given the similarity between the two fits I would expect the prediction intervals to be very similar.

## Calculate confidence intervals
mzfg.ci <- as.vector(confint(mzfg.fm, "(Intercept)", quiet = TRUE))
## Calculate prediction interval
mzfg.pdi <- pred_int(mzfg.fm)

## Estimate BLUPs 
mzfgd <- data.frame(Trial_ID = unique(mzfg$Trial_ID))
## For this model these are the same as BLUPs
mzfgd$prd <- predict(mzfg.fm, newdata = mzfgd)
## intervals from lm model
mzfgd$prd.lm <- predict(mzfg.lm, newdata = mzfgd)
mzfgd$prd.lm.lwr <- predict(mzfg.lm, newdata = mzfgd, 
                            interval = "confidence")[,2]
mzfgd$prd.lm.upr <- predict(mzfg.lm, newdata = mzfgd, 
                            interval = "confidence")[,3]
mzfgd.o <- mzfgd[order(mzfgd$prd),]
mzfgd.o$trial <- 1:nrow(mzfgd.o)

## Bootstrap BLUPs
nit <- 500
## residuals 
mzfg.resid <- residuals(mzfg.lm)
mzfg.fitted <- fitted(mzfg.lm)

blpsm <- matrix(NA, ncol = nit, nrow = length(unique(mzfg$Trial_ID)))

for(i in 1:nit){
  ## Resample residuals
  simr <- mzfg.fitted + sample(mzfg.resid, length(mzfg.resid),replace=TRUE)
  simrd <- data.frame(mzfg, simr = simr)
  mzfg.sim.fm <- lmer(simr ~ 1 + (1|Trial_ID), data = simrd)
  ##Estimate BLUPs
  blpsm[,i] <- predict(mzfg.sim.fm, newdata = mzfgd)
}

## Calculate quantiles
blpsd <- data.frame(Trial_ID = unique(mzfg$Trial_ID), 
                    blp = predict(mzfg.fm, newdata = mzfgd))
blpsd$q5 <- apply(blpsm, 1, quantile, probs = 0.025)
blpsd$q50 <- apply(blpsm, 1, quantile, probs = 0.5)
blpsd$q95 <- apply(blpsm, 1, quantile, probs = 0.975)

## Merge datasets
mzfg.simd <- merge(mzfgd.o, blpsd)

## Plot just the blps
##blpsd.o <- blpsd[order(blpsd$blp),]
##blpsd.o$trial <- 1:nrow(blpsd.o)
ggplot(data = mzfg.simd, aes(x = blp, y = trial)) + 
  xlab("LRR") + ggtitle("Means, BLUPS and more") +
  geom_point(aes(x = prd.lm, y = trial), color = "blue") +
  geom_point(color = "purple") + 
  geom_point(aes(x = q50, y = trial), color = "red") +
  geom_errorbarh(aes(xmin = q5, xmax = q95)) + 
  geom_point(aes(x = fixef(mzfg.fm), y = -1), color = "red") + 
  geom_errorbarh(aes(xmin = mzfg.ci[1], xmax = mzfg.ci[2], y = -1), color = "red") +
  geom_point(aes(x = mzfg.pdi[1], y = -7), color = "red") + 
  geom_errorbarh(aes(xmin = mzfg.pdi[2], xmax = mzfg.pdi[3], y = -7), color = "red") +
  geom_text(aes(x = 0.1, y = -1), label = "confidence interval", color = "black") + 
  geom_text(aes(x = 0.1, y = -7), label = "prediction interval", color = "black") +
  geom_text(aes(x = 0.1, y = 100), label = "Purple = BLUPs", color = "purple") + 
  geom_text(aes(x = 0.1, y = 85), label = "Blue = trial mean", color = "blue") +
  geom_text(aes(x = 0.1, y = 65), label = "Red = bootstrap \n BLUP mode", color = "red")

What are the posterior distrubutions of the variances?

set.seed(101)
mzfg.mc1 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID, 
                    nitt = 2e4, burnin = 1e4, 
                    prior = prrs,
                    data = mzfg, verbose = FALSE)
mzfg.mc2 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID, 
                    nitt = 2e4, burnin = 1e4,
                    prior = prrs,
                    data = mzfg, verbose = FALSE)
mzfg.mc3 <- MCMCglmm(lrr ~ 1, random = ~ Trial_ID, 
                    nitt = 2e4, burnin = 1e4,
                    prior = prrs,
                    data = mzfg, verbose = FALSE)
mzfg.mc <- mcmc.list(mzfg.mc1$VCV,mzfg.mc2$VCV,mzfg.mc3$VCV)
gelman.diag(mzfg.mc)

## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## Trial_ID          1          1
## units             1          1
## 
## Multivariate psrf
## 
## 1

plot(mzfg.mc, density = FALSE)

plot(mzfg.mc, trace = FALSE)

## Look at the mean parameter
mzfg.mcS <- mcmc.list(mzfg.mc1$Sol,mzfg.mc2$Sol,mzfg.mc3$Sol)
plot(mzfg.mcS, density = FALSE)

plot(mzfg.mcS, trace = FALSE)

Best Linear Unbiased Predictions (BLUPs)

Fernando Miguez

6/25/2019