1 Chapter 1: Fitting distributions

The purpose of this chapter is to illustrate how to fit a variety of distributions using moment matching and maximum likelihood. Being proficient with a variety of statistical distributions is important for Bayesian analysis.

https://en.wikipedia.org/wiki/Probability_distribution

https://en.wikipedia.org/wiki/List_of_probability_distributions

https://journal.r-project.org/archive/2013-1/lebauer-dietze-bolker.pdf

In R a list of distributions are found in ‘?Distributions’.

soy <- read.csv("./ch1/data/srs.csv")[,c(1,4,7,9)]
## Select only a subset of data
soy$lrr <- log(soy$TRT1_Yld/soy$TRT2_Yld)
soy.o <- soy[order(soy$lrr),]
soy.o$id <- 1:nrow(soy.o)

1.1 Modeling Variances

Variances (or standard deviations) do not always get the attention they deserve. They are somewhat harder to interpret than means and are not usually the focus of most research studies. They represent the uncertainty and thus are fundamental to the construction of predictions.

One argument for presenting prediction intervals is that they are a statement about the mean prediciton as well as the variance estimation.

1.1.1 Gamma distribution for variance modeling

Going back to the soybean example, we can assume that the individual trials in our example are essentially estimating the same unknown. We could describe the distribution of these variances using a gamma distribution. This next bit of R code shows how to calcualte the moments for the gamma, inverse gamma and one-dimensional Wishart (equivalent to the inverse gamma). This last function follows the parameterization in the MCMCglmm package.

## These simple functions calculate the parameters
## for the gamma, inverse-gamma and 1d inverse Wishart
## This is known as moment matching
fgp <- function(x){
  ## https://en.wikipedia.org/wiki/Gamma_distribution
  ## the mean of gamma is E(x) = a * s
  ## the variance of gamma is Var(X) = a * s^2
  m.x <- mean(x)
  v.x <- var(x)
  scale <- v.x / m.x
  shape <- m.x / scale
  return(c(shape, scale))
}
## Inverse gamma distribution
figp <- function(x){
  ## https://en.wikipedia.org/wiki/Inverse-gamma_distribution
  ## shape = alpha
  ## scale = beta
  ## the mean of inverse gamma is E(x) = beta / (alpha - 1) for alpha > 1 
  ## the variance of inverse gamma is Var(X) = beta^2 / (alpha - 1)^2(alpha - 2)
  ## alpha (shape) = m.x^2 / v.x + 2
  m.x <- mean(x)
  v.x <- var(x)
  shape <- m.x^2 / v.x + 2
  scale <- m.x * (shape - 1)
  return(c(shape, scale))
}
f1diwp <- function(x){
  ## https://en.wikipedia.org/wiki/Inverse-Wishart_distribution
  ## This is only for a 1d inverse Wishart parameterized as in the 
  ## MCMCglmm package
  m.x <- mean(x)
  v.x <- var(x)
  shape <- m.x^2 / v.x + 2
  scale <- m.x * (shape - 1)
  V <- scale/shape
  nu <- shape * 2
  return(c(V, nu))
}
soy.av <- aggregate(lrr ~Trial_ID, FUN = var, data = soy.o)
soy.fgp <- fgp(soy.av$lrr)
soy.figp <- figp(soy.av$lrr)
m1 <- matrix(rbind(soy.fgp, soy.figp), ncol = 2, nrow = 2, 
             dimnames = list(c("gamma","inverse gamma"),c("shape","scale")))
kable(m1, caption = "Gamma and inverse gamma parameters")
Gamma and inverse gamma parameters
shape scale
gamma 0.7772621 0.0027550
inverse gamma 2.7772621 0.0038057

In this first example, the gamma seems to be a more natural fit for the variance than the inverse gamma. Gelman has argued that the inverse gamma is not the best choice for priors (http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf)(page 528).

“We do not recommend the inverse-gamma(\(e\), \(e\)) family of noninformative prior distributions because, as discussed in Sections 4.3 and 5.1, in cases where \(\sigma_{\alpha}\) is estimated to be near zero, the resulting inferences will be sensitive to \(e\). The setting of near-zero variance parameters is important partly because this is where classical and Bayesian inferences for hierarchical models will differ the most (see Draper and Browne, 2005, and Section 3.4 of Gelman, 2005).”

1.2 Fitting distributions to data

More generally we can use R to fit other distributions to data. One option is to use the MASS package and another option is to use the fitdistrplus package. The ‘::’ is not strictly needed but it is used here to emphasize that these are functions from different pacakges.

1.2.1 Using the gamma to fit the variance

fitgd0 <- MASS::fitdistr(soy.av$lrr, "gamma") 
fitgd <- fitdistrplus::fitdist(soy.av$lrr, "gamma")
kable(data.frame(rbind(coef(fitgd0),coef(fitgd)),
                 row.names = c("MASS","fitdistrplus")),
      caption = "Gamma parameters")
Gamma parameters
shape rate
MASS 0.9899396 462.2847
fitdistrplus 0.9899056 462.2650
## The fitdistrplus package has nice plotting 
plot(fitgd)

Here I’m illustrating the fit for the gamma using maximum likelihood (only showing the MASS results).

The goal of getting comfortable fitting these function is that we want to be able to generate informative priors for future Bayesian analysis. In some Bayesian applications the gamma is often used to model the precision which is the inverse of the variance. However, it is possible to use other distributions to model either the variance or the precision. The main argument for using the gamma for modeling the precision is that it is a conjugate prior to the normal, so it is considered a ‘proper’ prior and sampling from the posterior can be more efficient.

1.2.2 Using the gamma to fit the precision

soy.prec <- 1/soy.av$lrr
## One challenge with these data is that the very high values
## Can have a large effect on the numerical stability
## Of the optimization algorithm
## I remove very small values for the variance which
## Create very high precisions
thrsh <- 3000
soy.fgp2 <- MASS::fitdistr(soy.prec[soy.prec < thrsh], "gamma")
soy.fgp3 <- fitdist(soy.prec[soy.prec < thrsh], "gamma", method = "mme")
## Method mle failed for fitdist. I think the algorithms in MASS are better
kable(data.frame(rbind(coef(soy.fgp2),coef(soy.fgp3)),
                 row.names = c("MASS","fitdistrplus")),
      caption = "Gamma parameters")
Gamma parameters
shape rate
MASS 1.413668 0.0015579
fitdistrplus 1.586384 0.0017483

1.2.3 Using the inverse gamma to fit the variance

If we use the inverse gamma to model the variances then we can estimate the parameters using the figp function; this is ‘moment matching’. The function ‘f1diwp’ does moment matching for the one-dimensional inverse Wishart with the specific parameterization in the MCMCglmm package. From the notes:

“The inverse gamma is a special case of the inverse Wishart, although it is parametrised using shape and scale, where nu = 2\(\times\)shape and V = scale/shape (or shape = nu and scale = nu\(\times\)V )”

## The rinvgamma function is from MCMCpack 
rigv <- MCMCpack::rinvgamma(2e3, shape = soy.figp[1], scale = soy.figp[2])
## If we want to use the inverse Wishart from the MCMCglmm package
Vnu <- f1diwp(soy.av$lrr)
rdiw <- rIW(V = Vnu[1]*diag(1), nu = Vnu[2], n = 2e3)

We can also apply the bootstrap technique to augment the ‘observed’ variances and visualize a more ‘smooth’ distribution. The boostrap method shows that there is a bit of a concentration for low values of the variance, which do not conform well to the gamma or inverse gamma. We expect to see few values for very low variances a peak for the ‘most likely’ variance and a tail with lower densities for large variances. Instead we see a distribution that resembles a half bell-shaped distribution.

The previous figure shows that the bootstrapped variances do not conform to the inverse Wishart distribution. Rather it looks more like a half bell-shaped distribution. If we truly want to use this as our prior in future analysis we could use a truncated normal, t-student or Cauchy, for example.

1.2.4 Bootstrapping to augment the distribution

## Bootstrapping the variance to augment the visualization of the 
## distribution
nit <- 500
ntrial <- length(unique(soy.o$Trial_ID))
vmat <- matrix(NA, nrow = ntrial, ncol = nit)

soy.lm1 <- lm(lrr ~ Trial_ID, data = soy.o)
## Predictions  
prd <- predict(soy.lm1)
for(i in 1:nit){
  ## Resample the residuals
  bresid <- sample(resid(soy.lm1), length(prd), replace = TRUE)
  yy2 <- prd + bresid # New 'observed' data
  tmp <- data.frame(lrr = yy2, Trial_ID = soy.o$Trial_ID)
  vaa <- aggregate(yy2 ~ Trial_ID, FUN = var, data = tmp)$yy2
  vmat[,i] <- vaa 
}
vmat2 <- as.vector(vmat)
iwp <- f1diwp(vmat2)
rdiw2 <- rIW(V = iwp[1]*diag(1), nu = iwp[2], n = 5e3)
## Fitting the exponential distribution to the bootstrapped data
fxp <- fitdist(vmat2, "exp")
exx <- seq(0.0001,0.02,0.0005)
dxp <- dexp(exx, rate = coef(fxp))
## Fitting the generalized gamma (package flexsurv)
fgg <- fitdist(vmat2, "gengamma", 
               start = list(mu = 0, sigma = 1, Q = 0.5))
plot(fgg)

fggc <- coef(fgg)
dgg <- dgengamma(exx, fggc[1],fggc[2],fggc[3])

1.2.5 Attempt to incorporate weights (bbmle package)

Here I will include an example using the bbmle package. The reason for doing this is that we might want to include weights, which, apparently, I can’t do with the other functions. Including weights means that I recognize that not all the variances are estimated with the same confidence (varying sample size). The variances associated with a larger sample size will have more weight in the maximization of the likelihood. So far this has not worked all that well and it is possible that I’m making some kind of mistake.

soy.av$n <- aggregate(lrr ~ Trial_ID, data = soy.o, FUN = length)$lrr
soy.av$w <- soy.av$n / sum(soy.av$n)
## Here I do the following:
## Reparameterize the inverse gamma in terms of the
## log-shape and log-scale to avoid evaluation 
## for negative values
## Apply a weighting scheme so that variances estimated 
## with more data are more important in the calculation 
## of the likelihood
dinvgamma2 <- 
function (x, lshape, lscale = log(1), log = FALSE) 
{
  ## Reparameterized in terms of the logshape and logscale
    alpha <- exp(lshape)
    beta <- exp(lscale)
    log.density <- alpha * log(beta) - lgamma(alpha) - (alpha + 
        1) * log(x) - (beta/x)
    if(log == FALSE){
      return(exp(log.density))
    }else{
      return(log.density)
    }
}
## minus log likelihood function without weights
mll <- function(lshape, lscale, log = TRUE){
  x <- soy.av$lrr
  ans <- -sum(dinvgamma2(x, lshape = lshape, lscale = lscale, log = log))
  return(ans)
}
## Run the optimizer
invgp <- mle2(mll, 
              start = list(lshape = log(soy.figp[1]), lscale = log(soy.figp[2])),
              data = soy.av) 
invgp
## 
## Call:
## mle2(minuslogl = mll, start = list(lshape = log(soy.figp[1]), 
##     lscale = log(soy.figp[2])), data = soy.av)
## 
## Coefficients:
##      lshape      lscale 
##  0.06873764 -7.19586710 
## 
## Log-likelihood: 93.54
cfig <- exp(coef(invgp))
## minus log likelihood function with weights
mll.w <- function(lshape, lscale, log = TRUE){
  x <- soy.av$lrr
  w <- soy.av$w
  ans <- -sum(w*dinvgamma2(x, lshape = lshape, lscale = lscale, log = log))
  return(ans)
}
## Run the optimizer
invgp.w <- mle2(mll.w, 
              start = list(lshape = log(soy.figp[1]), lscale = log(soy.figp[2])),
              data = soy.av) 
invgp.w
## 
## Call:
## mle2(minuslogl = mll.w, start = list(lshape = log(soy.figp[1]), 
##     lscale = log(soy.figp[2])), data = soy.av)
## 
## Coefficients:
##      lshape      lscale 
## -0.08665078 -7.05297893 
## 
## Log-likelihood: 4.7
cfig.w <- exp(coef(invgp.w))
kable(data.frame(moment.matching = soy.figp,
           maximum.likelihood = cfig,
           w.maximum.likelihood = cfig.w,
           row.names = c("shape","scale")), 
      caption = "inverse Gamma parameters")
inverse Gamma parameters
moment.matching maximum.likelihood w.maximum.likelihood
shape 2.7772621 1.0711551 0.9169973
scale 0.0038057 0.0007497 0.0008648

So far I consider this somewhat of a succeful exercise. At the moment I’m not happy with the behavior of the weighted maximum likelihood. Now I can fit a custom weighted distribution to the data using the mle2 function. The fit is not meant to match the data exactly because I’m using the weighting scheme and there is very few data points here.

1.3 Variance modeling using a larger dataset

Given the previous examples, we can now fit a variety of distributions and are able to model variances using the gamma, or fitting the gamma to the precision or fit the inverse gamma using a weighted likelihood.

mzfg <- read.csv("./ch1/data/maize_head.csv")[,c(1,2,3,11,13)]
mzfg$lrr <- with(mzfg, log(TRT_Yld/CTR_Yld))

## What is the sample size for each trial?
mzfg.n <- aggregate(lrr ~ Trial_ID, data = mzfg, FUN = length)
summary(mzfg.n)
##        Trial_ID        lrr        
##  ST2006007 :  1   Min.   : 1.000  
##  ST2006104 :  1   1st Qu.: 3.000  
##  ST2006313 :  1   Median : 5.000  
##  ST2006317a:  1   Mean   : 4.916  
##  ST2006317b:  1   3rd Qu.: 6.000  
##  ST2006354 :  1   Max.   :19.000  
##  (Other)   :137
## Calculate variance for each trial
mzfg.tv <- aggregate(lrr ~ Trial_ID, data = mzfg, FUN = var)
mzfg.tv$n <- aggregate(lrr ~ Trial_ID, data = mzfg, FUN = length)$lrr
mzfg.tv$w <- mzfg.tv$n/sum(mzfg.tv$n)
names(mzfg.tv) <- c("Trial_ID", "var","n","w")
## There is one NA because one trial has one rep only
mzfg.tv <- na.omit(mzfg.tv)
## I will eliminate very large values to simplify the data
mzfg.tv <- subset(mzfg.tv, var < 0.015)

ggplot(data = mzfg.tv, aes(x = var)) + geom_histogram(bins = 30)

1.3.1 Fitting the gamma function to the variance

## Using method matching functions
mzfg.gp <- fgp(mzfg.tv$var) ## This is method matching
## Using mle method MASS package
mzfgp.m <- MASS::fitdistr(mzfg.tv$var, "gamma") 
## Using mle method fitdistrplus package
mzfgp.f <- fitdistrplus::fitdist(mzfg.tv$var, "gamma") 
kable(data.frame(gamma=mzfg.gp,
                 gamma.mle=c(coef(mzfgp.m)[1],1/coef(mzfgp.m)[2]),
                 gamma.mle2=c(coef(mzfgp.f)[1],1/coef(mzfgp.f)[2]),
                 row.names = as.character(c("shape","scale"))))
gamma gamma.mle gamma.mle2
shape 0.7320812 0.9182416 0.9186453
scale 0.0024434 0.0019481 0.0019470

1.3.2 Fitting the gamma to the precision (1/variance)

## I restrict the range of the precision because
## It can get a little to big for very small values
## for the variance
thrsh <- 7000
mzfg.prec <- 1/mzfg.tv$var
mzfg.prec <- mzfg.prec[mzfg.prec < thrsh]
##hist(mzfg.prec)
## Using method matching functions
mzfg.gpp <- fgp(mzfg.prec) ## This is method matching
## Using mle method MASS package
mzfgp.mp <- MASS::fitdistr(mzfg.prec, "gamma") 
## Using mle method fitdistrplus package
## Using method matching because mle did not work
mzfgp.fp <- fitdistrplus::fitdist(mzfg.prec, "gamma", method = "mme") 
gamma gamma.mle gamma.mme2
shape 1.185131 1.339959 1.194463
scale 1083.839913 958.601547 1075.372414

1.3.3 Fitting the inverse gamma to the variance with and without weights

At the moment trying to use weights is tricky, I think because there is one observation with high weights.

## What if I delete the observation with higher weight?
##mzfg.tv <- subset(mzfg.tv, w < 0.02)
## Using weights for this dataset does not work well
## Using method matching functions
mzfig.gp <- figp(mzfg.tv$var) ## This is method matching
## The inverse gamma is not supported by the MASS package
## But we can try deriving the parameters from the fit of the
## precision
mzfig.mp <- 1/c(coef(mzfgp.mp)[1], 1/coef(mzfgp.mp)[2])
## Apparently it is not trivial to use fitdistrplus to fit the 
## inverse gamma
## We have defined the inverse gamma before
## Create the minus log likelihood function
## minus log likelihood function
## No weights
mll <- function(lshape, lscale, log = TRUE){
  x <- mzfg.tv$var
  ans <- -sum(dinvgamma2(x, lshape = lshape, lscale = lscale, log = log))
  return(ans)
}
## Run the optimizer
invgp.m <- mle2(mll, method = "Nelder-Mead",
              start = list(lshape = log(mzfig.mp[1]), 
                           lscale = log(mzfig.mp[2])),
              data = mzfg.tv)

invgp.m
## 
## Call:
## mle2(minuslogl = mll, start = list(lshape = log(mzfig.mp[1]), 
##     lscale = log(mzfig.mp[2])), method = "Nelder-Mead", data = mzfg.tv)
## 
## Coefficients:
##     lshape     lscale 
## -0.9684366 -9.6770171 
## 
## Log-likelihood: 643.89
confint(invgp.m)
##             2.5 %     97.5 %
## lshape  -1.164044 -0.7824635
## lscale -10.032626 -9.3651613
p2 <- suppressWarnings(profile(invgp.m))
plot(p2)

kable(data.frame(invgamma.mm=mzfig.gp,
                 invgamma.mle=mzfig.mp,
                 invgamma.mle2=exp(coef(invgp.m)),
                 row.names = as.character(c("shape","scale"))))
invgamma.mm invgamma.mle invgamma.mle2
shape 2.7320812 0.7462915 0.3796762
scale 0.0030983 0.0010432 0.0000627
## Parameters for inverse Wisahrt (MCMCglmm parameterization)
## First using method matching
V.nu <- f1diwp(mzfg.tv$var)
V.p <- V.nu[1]; nu.p <- V.nu[2]
## Second based on maximum likelihood
nu.p2 <- as.vector(2 * mzfig.mp[1])
V.p2 <- as.vector(mzfig.mp[2]/mzfig.mp[1])
## Third based on mle2
invgp.m.cf <- exp(coef(invgp.m))
nu.p3 <- as.vector(2 * invgp.m.cf[2])
V.p3 <- as.vector(invgp.m.cf[2]/invgp.m.cf[1])
kable(data.frame(moment.matching=V.nu,
                 maximum.likelihood=c(V.p2,nu.p2),
                 maximum.likelihood.2=c(V.p3,nu.p3),
                 row.names = as.character(c("V","nu"))),
      caption = "Parameterization for inverse Wishart for MCMCglmm")
Parameterization for inverse Wishart for MCMCglmm
moment.matching maximum.likelihood maximum.likelihood.2
V 0.001134 0.0013978 0.0001652
nu 5.464162 1.4925830 0.0001254

According to the documentation from the MCMCglmm package we can derive the parameters for the inverse Wishart by considering that:

\[ \nu = 2 \times shape \]

\[ V = \frac{scale}{shape} \]

Let’s compare the fit from the inverse gamma with the fit from the inverse Wishart.

Visually, the inverse gamma (or inverse Wishart) do not fit the distribution for the within trial variance as well as the gamma fits either the variance or the precision.