Multilevel models, or mixed effect models, can easily be estimated in R. Several packages are available. Here, the lme() function from the nlme-package is described. The specification of several types of models will be shown, using a fictive example. A detailed description of the specification rules is given. Output of the specified models is given, but not described or interpreted.
Please note that this description is very closely related to the description of the specification of the lmer() function of the lme4-package. The results are similar and here exactly the same possibilities are offered.
In this example, the dependent variable is the standardized result of a student on a specific exam. This variable is called “normexam”. In estimating the score on the exam, two levels will be discerned: student and school. On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (“standLRT”). On the school-level, we take into account the average intake-score (“schavg”).
Preparation
Before analyses can be performed, preparation needs to take place. Using the library() command, two packages are loaded. The nlme-package contains functions for estimation of multilevel or hierarchical regression models. The mlmRev-package contains, amongst many other things, the data we are going to use here. In the output below, we see that R-Project automatically loads the Matrix- and the lattice-packages as well. These are needed for the mlmRev-package to work properly.
Finally, the names() command is used to examine which variables are contained in the ‘Exam’ data.frame.
require(nlme)
require(mlmRev)
names(Exam)
Note that in the output below, R-Project notifies us that the objects ‘Oxboys’ and ‘bdf’ are masked when the mlmRev-package is loaded. This is simply caused by the fact, that objects with similar names were already loaded as part of the nlme-package. When we know that these objects contain exactly the same data or functions, there is nothing to worry about. If we don’t know that or if we do indeed have knowledge of differences, we should be careful in which order the packages are loaded. Since we don’t need those here, there is no need to further investigate that.
> require(nlme) Loading required package: nlme [1] TRUE > require(mlmRev) Loading required package: mlmRev Loading required package: lme4 Loading required package: Matrix Loading required package: lattice Attaching package: 'mlmRev' The following object(s) are masked from package:nlme : Oxboys, bdf [1] TRUE > names(Exam) [1] "school" "normexam" "schgend" "schavg" "vr" "intake" [7] "standLRT" "sex" "type" "student"
null-model
The syntax below specifies the most simple multilevel regression model of all: the null-model. Only the levels are defined. Using the lme-function, the first level (here: students) do not have to be specified. It is assumed that the dependent variable (here: normexam) is on the first level (which it should be).
The model is specified using two standard R formulas, respectively for the fixed part and for the random part, in indicating the different levels as well. The fixed and the random formulas are preceded by respectively ‘fixed =’ and ‘random =’. In the formula for the fixed part, first the dependent variable is given, followed by a tilde ( ~ ). The ~ should be read as: “follows”, or: “is defined by”. Next, the predictors are defined. In this case, only the intercept is defined by entering a ‘1’.
The formula for the random part is given, stating with a tilde (~). The dependent variable is not given here. Then the random variables are given, followed by a vertical stripe ( | ), after which the group-level is specified.
After the model specification, several parameters can be given to the model. Here, we specify the data that should be used by data=Exam. Another often used parameter indicates the estimation method. If left unspecified, restricted maximum likelihood (REML) is used. Another option would be: method=”ML”, which calls for full maximum likelihood estimation. All this leads to the following model specification:
lme(fixed = normexam ~ 1,
data = Exam,
random = ~ 1 | school)
This leads to the following output:
> lme(fixed = normexam ~ 1, + data = Exam, + random = ~ 1 | school) Linear mixed-effects model fit by REML Data: Exam Log-restricted-likelihood: -5507.327 Fixed: normexam ~ 1 (Intercept) -0.01325213 Random effects: Formula: ~1 | school (Intercept) Residual StdDev: 0.4142457 0.9207376 Number of Observations: 4059 Number of Groups: 65
random intercept, fixed predictor in individual level
For the next model, we add a predictor to the individual level. We do this, by replacing the ‘1’ in the formula for the fixed part of the previous model by the predictor (here: standLRT). An intercept is always assumed, so it is still estimated here. It only needs to be specified when no other predictors are specified. Since we don’t want the effect of the predictor to vary between groups, the specification of the random part of the model remains identical to the previous model. The same data is used, so we specify data=Exam again.
lme(fixed = normexam ~ standLRT,
data = Exam,
random = ~ 1 | school)
> lme(fixed = normexam ~ standLRT, + data = Exam, + random = ~ 1 | school) Linear mixed-effects model fit by REML Data: Exam Log-restricted-likelihood: -4684.383 Fixed: normexam ~ standLRT (Intercept) standLRT 0.002322823 0.563306914 Random effects: Formula: ~1 | school (Intercept) Residual StdDev: 0.3063315 0.7522402 Number of Observations: 4059 Number of Groups: 65
random intercept, random slope
The next model that will be specified, is a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, the effect of doing homework on the score on a math-test varies between schools. In order to estimate this model, the ‘1’ that indicates the intercept in the random part of the model specification is replaced by the variable of which we want the effect to vary between the groups.
lme(fixed = normexam ~ standLRT,
data = Exam,
random = ~ standLRT | school)
> lme(fixed = normexam ~ standLRT, + data = Exam, + random = ~ standLRT | school) Linear mixed-effects model fit by REML Data: Exam Log-restricted-likelihood: -4663.8 Fixed: normexam ~ standLRT (Intercept) standLRT -0.01164834 0.55653379 Random effects: Formula: ~standLRT | school Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.3034980 (Intr) standLRT 0.1223499 0.494 Residual 0.7440699 Number of Observations: 4059 Number of Groups: 65
random intercept, individual and group level predictor
It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lme()-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school) should have a value. For this example, this implies that all respondents that attend the same school, have the same value on the variable “schavg”. We enter this variable to the model in the same way as individual level variables, leading to the following syntax:
lme(fixed = normexam ~ standLRT + schavg,
data = Exam,
random = ~ standLRT | school)
> lme(fixed = normexam ~ standLRT + schavg, + data = Exam, + random = ~ standLRT | school) Linear mixed-effects model fit by REML Data: Exam Log-restricted-likelihood: -4661.943 Fixed: normexam ~ standLRT + schavg (Intercept) standLRT schavg -0.001422435 0.552241377 0.294758810 Random effects: Formula: ~standLRT | school Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.2778443 (Intr) standLRT 0.1237837 0.373 Residual 0.7440440 Number of Observations: 4059 Number of Groups: 65
random intercept, cross-level interaction
Finally, a cross-level interaction is specified. This basically works the same as any other interaction specified in R. In contrast with many other statistical packages, it is not necessary to calculate separate interaction variables (but you’re free to do so, of course).
In this example, the cross-level interaction between time spend on homework and size of the school can be specified by entering a model formula containing standLRT * schavg. This leads to the following syntax and output.
lme(fixed = normexam ~ standLRT * schavg,
data = Exam,
random = ~ standLRT | school)
> lme(fixed = normexam ~ standLRT * schavg, + data = Exam, + random = ~ standLRT | school) Linear mixed-effects model fit by REML Data: Exam Log-restricted-likelihood: -4660.194 Fixed: normexam ~ standLRT * schavg (Intercept) standLRT schavg standLRT:schavg -0.007091769 0.557943270 0.373396511 0.161829150 Random effects: Formula: ~standLRT | school Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.2763292 (Intr) standLRT 0.1105667 0.357 Residual 0.7441656 Number of Observations: 4059 Number of Groups: 65
– – — — —– ——–
- Discuss this article and pose additional questions in the R-Sessions Forum
- Find the original article embedded in the manual.
– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –
Why do I keep getting this error?
**Iteration 1
LME step: Loglik: -1606.834 , nlm iterations: 32
reStruct parameters:
ID1 ID2 ID3 ID4
10.055430 2.372683 0.238945 10.159490
Error in nlme.formula((weight) ~ Reed1model(age_months, a, b, c, d), data = wei02yfemale_gr, :
Step halving factor reduced below minimum in PNLS step
>
How do you then specify the syntax if multiple random effects are included in the model?
Thanks