Bootstrap and Simulation for Nonlinear Models

Fernando Miguez

2020-04-23

Preliminaries: first, we need to load required libraries.

library(ggplot2)
library(nlraa)
library(car)
## Loading required package: carData
library(nlme)

Bootstrapping

The bootstrap is a technique in statistics which consists of resampling the observed data in order to create an empirical distribution of some statistic (Fox and Weisberg, 2012, 2017). This is often done to evaluate the assumptions of the model or to derive empirical distributions which are difficult to approximate. For nonlinear models, bootstrapping can be useful because often questions arise that a typical analysis does not answer. In many instances the distributions of parameters from a nonlinear model are in fact normal and standard approximations work well, in other instances, however, this is not the case.

In the following sections I will illustrate the use of the bootstrap for nonlinear model (nls), generalized nonlinear models (gnls) and nonlinear mixed models (nlme). In another section I will illustrate the ability to simulate from nonlinear (mixed) models

Nonlinear models

The first type of models illustrated here are nonlinear models for which we make standard assumptions for error distributions and there are no random effects.

Simple example

As a simple example, we can use the dataset ‘barley’ in the ‘nlraa’ package. This represents the response of barley yield to different doses of fertilizer over several seasons. We will ignore the effect of ‘year’ in this section.

We can fit a very simple model known as the linear-plateau.

At this point several questions might arise:

  1. What are the confidence intervals for the model parameters?
  2. How do we describe the uncertainty around the fitted values of the model?
  3. What is the estimate and confidence interval for the asymptote?

Confidence interval for model parameters

The confidence intervals for the model parameters can be derived using a method called profiling. The generic function ‘confint’ can be used which invokes ‘confint.nls’ from the ‘MASS’ package.

## Waiting for profiling to be done...
##          2.5%     97.5%
## a  104.821725 167.73660
## b   19.937411  38.00267
## xs   5.849474  12.23099

For nonlinear models this method is preferred to simple confidence intervals that would directly use the standard error of the model parameters. In many cases, there is good agreement between the two, but this is not always the case and it can be informative to compare both methods. Those standard errors can be obtained using the ‘summary’ function, along with hypothesis testing.

## 
## Formula: yield ~ SSlinp(NF, a, b, xs)
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a   137.067     16.179   8.472 1.83e-12 ***
## b    27.501      5.269   5.219 1.62e-06 ***
## xs    7.682      1.211   6.342 1.68e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.79 on 73 degrees of freedom
## 
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 6.548e-16

Some interesting plots can be produced which illustrate the symmetry (or lack thereof) of the distribution for the model parameters. Profile plots which show large departures from normality would indicate that the normal approximation for the confidence intervals might not be the best choice.

Being able to see the whole profile for a parameter can be interesting in terms of understanding its behavior. For example, in this model, which has a ‘break-point’, we would not expect all the parameters to have identical shaped profiles and that is illustrated above for parameter ‘xs’.

In this case, bootstraping is an alternative to computing the confidence intervals and it can be used as a way to double-check the previous results, but it can also be used when profiling fails. A function which can perform bootstraping for nonlinear models is ‘Boot’ in the ‘car’ pacakge.

## Loading required namespace: boot
## 
##  Number of bootstraps was 814 out of 999 attempted

In this case, this function uses the base ‘boot’ package under the hood and it returns an object of that class. This also allows for other options such as defining a function as a combination of model parameters, and the use of more than one core to speed up computation. In this case, since we are also interested in the asymptote, which is ‘a + b * xs’, we can obtain confidence intervals for this parameter by doing the following.

## 
##  Number of bootstraps was 813 out of 999 attempted
## Bootstrap bca confidence intervals
## 
##              2.5 %   97.5 %
## asymptote 324.0106 380.8581

The bootstrap method takes a few seconds here, but it can be computationally much more demanding in larger problems since it re-fits the model many, many times. An alternative is to use the delta method, for which there is a function in the ‘car’ pacakge. The delta method has the advantage of being fast and the disadvantage that it makes the assumption of a normal distribution. In this case the lower bound for the confidence interval is similar to the one obtained with the bootstrap, but the upper bound for the confidence interval is somewhat higher using the bootstrap (381 vs. 370). Since the bootstrap method makes fewer assumptions it is probably the better one to report.

##            Estimate      SE   2.5 % 97.5 %
## a + b * xs  348.326  11.484 325.817 370.84

In order to answer our second question above related to the uncertainty around the fitted values we could plug-in the values from the bootstrap sampling and obtain different regression lines.

The previous graph shows the black line with the original fit and the variability in the fitted values due to the resampling generated during the bootstrap process. The function ‘predict.nls’ at the moment ignores the arguments ‘se.fit’ and ‘interval’ which means that this functionality is not available in the base ‘stats’ package. (There is an alternative approach in the ‘propagate’ package - see references.)

So we could use the quantiles of ‘prd.dat’ object above to derive confidence intervals for the regression line. The previous example is an attempt to make it clear how the uncertainty could be displayed. Equivalently, we can use ‘Boot’ for this purpose, but with some effort manipulating the data.

## 
##  Number of bootstraps was 820 out of 999 attempted

Bootstrapping generalized nonlinear models

Implementing bootstrap for more complex models takes extra work. For this, I’m taking the approach of sampling from the vector of fixed parameters and also bootstrapping the standardized residuals for ‘gnls’ and ‘nlme’ objects. (This methodology can be improved in the future.) This takes advantage that we assume that the residuals are normally distributed.

As a first example, we can compare the bootstrapped confidence intervals with the ones obtained by ‘intervals’. Note: I’m running this only a few hundred times in the examples below for efficiency, but it is a good practice to run these a few thousand times.

## Approximate 95% confidence intervals
## 
##  Coefficients:
##        lower       est.      upper
## a  138.79981 176.800000 214.800186
## b   13.75248  27.603093  41.453706
## xs   3.64251   6.174127   8.705744
## attr(,"label")
## [1] "Coefficients:"
## 
##  Residual standard error:
##    lower     est.    upper 
## 25.50341 35.17935 56.67543
## Number of times model fit did not converge 22 out of 200
## 
## Number of bootstrap replications R = 178 
##   original  bootBias  bootSE  bootMed
## 1 176.8000 -1.881489 23.1895 174.7183
## 2  27.6031  1.095069  7.4412  28.4382
## 3   6.1741 -0.072778  1.3834   5.9537
## Bootstrap percent confidence intervals
## 
##        2.5 %     97.5 %
## 1 125.320034 224.479385
## 2  14.440280  42.522212
## 3   4.040724   9.591933

The confidence intervals obtained by bootstrap are wider (as expected) than the ones obtained using intervals because they consider the uncertainty in the parameters of the nonlinear model. The next example shows a slightly more complex model in which we introduce the (fixed) effect of year for a subset of the data.

## Number of times model fit did not converge 152 out of 300
## Bootstrap percent confidence intervals
## 
##          2.5 %     97.5 %
## 1  111.1375192 196.119143
## 2  -45.3383112  86.644389
## 3  -52.9212287  65.101962
## 4   -0.5059473 114.993794
## 5   14.3750448  41.574802
## 6  -20.9463834  15.697041
## 7  -17.7796470  16.955886
## 8  -18.1083851  16.965825
## 9    4.4735481   9.074686
## 10  -2.6628309   4.960814
## 11  -4.1799869   1.682101
## 12  -2.8113050   3.862596

This is simply to illustrate the use of bootstrapping for a ‘gnls’ object, which is something that function ‘car::Boot’ does not seem to be able to handle (the deltaMethod function also fails for this type of model, because it cannot handle the names in the vector of coefficients which uses periods).

Bootstrapping nonlinear mixed models

For illustration, I will continue to use the barley example, but this time the model is fitted to each year individually and then a nonlinear mixed model which assumes a diagonal matrix for the random effects (for simplicity). In this case we want an esimtate of the confidence interval for the asymptote which is not an explicit parameter but rather a combination ‘a + b * xs’ of the three parameters.

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##         lower       est.     upper
## a  109.212647 134.663569 160.11449
## b   23.970656  27.939486  31.90832
## xs   6.823868   7.671139   8.51841
## attr(,"label")
## [1] "Fixed effects:"
## Number of times model fit did not converge 90 out of 200
## Bootstrap percent confidence intervals
## 
##     2.5 %   97.5 %
## 1 306.707 382.6985

For this model the estimate for the asymptote is 349, but from the model fit we do not get a direct confidence interval. Bootstrapping makes this easy (but it takes a bit of time) and it results in 307, 383.

Confidence bands for generalized nonlinear models

For this example I will use the Orange dataset. The goal here is to place confidence bands around the mean response function.

## Number of times model fit did not converge 3 out of 300

Simulation

Here I will continue with the Orange dataset and illustrate different types of simulation which are relevant depending on what the inference scope is. This simulation is, in fact, used inside of the bootstrap function above; it is a necessary step for bootstrap for these type of models.

Fitted values

The first level of predictions we can generate for the Orange dataset are at the ‘population’ level. These are similar to the fitted values for the ‘gnls’ case above. Notice that we do not have the option of requesting standard errors for the predicitons for this model either.

One approach we can use to assess the uncertainty in the response would be to sample from the vector of parameters and the associated variance-covariance matrix. In this case, we could do many realizations, but I’ll keep it to just 100. A frequentist interpretation of the bands below would be that in repeated sampling from this same population we would expect that the true relationship will be within this band with a specified probability. One way to compute these bands (which I’m not doing below), would be to calculate the quantiles of the empirical distribution. These could be called confidence bands, which would be equivalent to the ‘interval = confidence’ in a linear model (?predict.lm).

In this case, each tree is an ‘individual’ and the previous one is a population level confidence. Next, we would like to produce individual level ‘prediction’ bands. This is, if we were able to resample from a population with a similar structure and if we wanted to have bands that include the true relationship between circumference and age for these trees, we could produce the confidence bands from the quantiles of the empirical distribution of the lines in the figure below (we would need a few thousand samples in order to do this).

Finally, the next level would be to produce intervals that would likely contain a future observation instead of just the relationship, which was captured in the figure above. The ‘bands’ below are wider than in the previous figure. The reason why the difference is small, is, because in this case the residual variance is comparatively small.

The previous examples are simulations, but in order to generate confidence intervals or bands we would need to perform bootstrap, but it takes much longer, so I’m not including it in this vignette.

Appendix

Note on simulate methodology

In the case of simulating data from a nonlinear model and how this relates to bootstrapping. I’ve been thinking about a number of things. In a nonlinear model, the mean function should account for a substantial part of the variability. If we think about models in terms of

\[ data = signal + noise \] And for mixed models: \[ data = fixed + random + residual \]

I can see how in the linear case often the random part can be of substantial interest and sometimes more important than the fixed part, but for nonlinear models, the fixed part should dominate, otherwise we would need a better function or the data are not amenable to nonlinear modeling. In simulate_nlme, psim = 0, gives you the fitted values, so should be equivalent to ‘predict.nlme’, if theres is ever a discrepancy, then that is a bug. For psim = 1, I sample from the vector of parameters assuming that they are normally distributed. This seems to be very reasonable in practice and, in fact, much more important is the variance-covariance matrix of the vector of parameters and in particular the correlations. The problem is that in nonlinear mixed models, the relationship among parameters is not linear and can look a bit like a ‘banana’. This can be investigated through the use of bootstrap. For psim = 2, I add the residual variability which considers the modeling of the variance (heterogeneous variances), but not the possible lack of independence of the residuals. (This seems to be unlikely to arise for the type of models that we usually fit. Adding the correlation structure of the residuals can be done, but I’ll work on it when the need arises - not a priority.) It would be possible to add an options for psim = 3, where I also consider the uncertainty in the ‘random’ part above, this is is better captures, in my opinion, the ‘level’ argument in the ‘simulate_nlme’ function. By default, ‘predict.nlme’ predicts at the deepest level in the hierarchy (i.e. Q) and changing this argument allows for simulations at the different levels. Another reason, for not implementing psim = 3 (meaning sampling for the random part) is that, often, there is meaning or interest in these specific random terms. For example, sites/locations/subjects/experimenta.units are higher or lower because, in part, some characteristics that are know and some unknown, but which are out of our control. For this reason, assigning resampling or assigning a deviation at random at this level might be undesirable. If we have 10 sites labeled “A” through “J” and we know that site “A” is higher due to some characteristics (which we do not control), it would not make sense to assign the deviation from site “A” to “J” (which might be the lowest). In our analysis it might still make sense to treat ‘sites’ as random. In this case, when we use simulate_nlme, it makes more sense to use ‘psim = 2, level = 1’ (‘site’ would be our deepest level in the hierarchy - like ‘Tree’ in the ‘Orange’ example above) to generate data which appears similar to the data which was actually collected. It is true that this does not allow us to incorporate the uncertainty from the fact that we obtained these specific random deviates at the level of site (or Tree), but these will vary anyway because we are simulating forward from the vector of fixed parameters, which should account for the majority of the variability in these models. For this reason, I know that my approach is not perfect, but, so far, it seems to be usable. I’m thinking about Morris argument (see References) and I tested whether there is bias in the estimate of the random term variances using my method and I did not detect a bias - but I’m not resampling from the BLUPs. I’m also thinking that the non-parameteric part of my method (resampling the residuals) is somewhat trivial and implementing a fully parameteric option is feasible.

References