Multilevel models, or mixed effects models, can easily be estimated in R. Several packages are available. Here, the lmer() function from the lme4-package is described. The specification of several types of models will be shown, using a fictive example. A detailed description of the specification rules is given. Output of the specified models is given, but not described or interpreted.
Please note that this description is very closely related to the description of the specification of the lme() function of the nlme-package. The results are similar and here exactly the same possibilities are offered.
In this example, the dependent variable is the standardized result of a student on a specific exam. This variable is called “normexam”. In estimating the score on the exam, two levels will be discerned: student and school. On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (“standLRT”). On the school-level, we take into account the average intake-score (“schavg”).
Preparation
Before analyses can be performed, preparation needs to take place. Using the library() command, two packages are loaded. The lme4-package contains functions for estimation of multilevel or hierarchical regression models. The mlmRev-package contains, amongst many other things, the data we are going to use here. In the output below, we see that R-Project automatically loads the Matrix- and the lattice-packages as well. These are needed for the lme4-package to work properly.
Finally, the names() command is used to examine which variables are contained in the ‘Exam’ data.frame.
library(lme4)
library(mlmRev)
names(Exam)
>library(lme4) Loading required package: lme4 Loading required package: Matrix Loading required package: lattice [1] TRUE >library(mlmRev) Loading required package: mlmRev [1] TRUE >names(Exam) [1] "school" "normexam" "schgend" "schavg" "vr" "intake" [7] "standLRT" "sex" "type" "student"
null-model
The syntax below specifies the most simple multilevel regression model of all: the null-model. Only the levels are defined. Using the lmer-function, the first level (here: students) do not have to be specified. It is assumed that the dependent variable (here: normexam) is on the first level (which it should be).
The model is specified using standard R formulas: First the dependent variable is given, followed by a tilde ( ~ ). The ~ should be read as: “follows”, or: “is defined by”. Next, the predictors are defined. In this case, only the intercept is defined by entering a ‘1’. Next, the random elements are specified between brackets ( ). Inside these brackets we specify the random predictors, followed by a vertical stripe ( | ), after which the group-level is specified.
After the model specification, several parameters can be given to the model. Here, we specify the data that should be used by data=Exam. Another often used parameter indicates the estimation method. If left unspecified, restricted maximum likelihood (REML) is used. Another option would be: method=”ML”, which calls for full maximum likelihood estimation. All this leads to the following model specification:
lmer(normexam ~ 1 + (1 | school), data=Exam)
This leads to the following output:
> lmer(normexam ~ 1 + (1 | school), data=Exam) Linear mixed-effects model fit by REML Formula: normexam ~ 1 + (1 | school) Data: Exam AIC BIC logLik MLdeviance REMLdeviance 11019 11031 -5507 11011 11015 Random effects: Groups Name Variance Std.Dev. school (Intercept) 0.17160 0.41425 Residual 0.84776 0.92074 number of obs: 4059, groups: school, 65 Fixed effects: Estimate Std. Error t value (Intercept) -0.01325 0.05405 -0.2452
random intercept, fixed predictor in individual level
For the next model, we add a predictor to the individual level. We do this, by replacing the ‘1’ of the previous model by the predictor (here: standLRT). An intercept is always assumed, so it is still estimated here. It only needs to be specified when no other predictors are specified. Since we don’t want the effect of the predictor to vary between groups, the specification of the random part of the model remains identical to the previous model. The same data is used, so we specify data=Exam again.
lmer(normexam ~ standLRT + (1 | school), data=Exam)
> lmer(normexam ~ standLRT + (1 | school), data=Exam) Linear mixed-effects model fit by REML Formula: normexam ~ standLRT + (1 | school) Data: Exam AIC BIC logLik MLdeviance REMLdeviance 9375 9394 -4684 9357 9369 Random effects: Groups Name Variance Std.Dev. school (Intercept) 0.093839 0.30633 Residual 0.565865 0.75224 number of obs: 4059, groups: school, 65 Fixed effects: Estimate Std. Error t value (Intercept) 0.002323 0.040354 0.06 standLRT 0.563307 0.012468 45.18 Correlation of Fixed Effects: (Intr) standLRT 0.008
random intercept, random slope
The next model that will be specified, is a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, the effect of doing homework on the score on a math-test varies between schools. In order to estimate this model, the ‘1’ that indicates the intercept in the random part of the model specification is replaced by the variable of which we want the effect to vary between the groups.
lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method=”ML”)
> lmer(normexam ~ standLRT + (standLRT | school), data=Exam, method="ML") Linear mixed-effects model fit by maximum likelihood Formula: normexam ~ standLRT + (standLRT | school) Data: Exam AIC BIC logLik MLdeviance REMLdeviance 9327 9358 -4658 9317 9328 Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) 0.090406 0.30068 standLRT 0.014548 0.12062 0.497 Residual 0.553656 0.74408 number of obs: 4059, groups: school, 65 Fixed effects: Estimate Std. Error t value (Intercept) -0.01151 0.03978 -0.289 standLRT 0.55673 0.01994 27.917 Correlation of Fixed Effects: (Intr) standLRT 0.365
random intercept, individual and group level predictor
It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lmer-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school) should have a value. For this example, this implies that all respondents that attend the same school, have the same value on the variable “schavg”. We enter this variable to the model in the same way as individual level variables, leading to the following syntax:
lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam)
> lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam) Linear mixed-effects model fit by REML Formula: normexam ~ standLRT + schavg + (1 + standLRT | school) Data: Exam AIC BIC logLik MLdeviance REMLdeviance 9336 9374 -4662 9310 9324 Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) 0.077189 0.27783 standLRT 0.015318 0.12377 0.373 Residual 0.553604 0.74405 number of obs: 4059, groups: school, 65 Fixed effects: Estimate Std. Error t value (Intercept) -0.001422 0.037253 -0.038 standLRT 0.552243 0.020352 27.135 schavg 0.294737 0.107262 2.748 Correlation of Fixed Effects: (Intr) stnLRT standLRT 0.266 schavg 0.089 -0.085
random intercept, cross-level interaction
Finally, a cross-level interaction is specified. This basically works the same as any other interaction specified in R. In contrast with many other statistical packages, it is not necessary to calculate separate interaction variables (but you’re free to do so, of course).
In this example, the cross-level interaction between time spend on homework and size of the school can be specified by entering a model formula containing standLRT * schavg. This leads to the following syntax and output.
lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam)
> lmer(normexam ~ standLRT * schavg + (1 + standLRT | school), data=Exam) Linear mixed-effects model fit by REML Formula: normexam ~ standLRT * schavg + (1 + standLRT | school) Data: Exam AIC BIC logLik MLdeviance REMLdeviance 9334 9379 -4660 9303 9320 Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) 0.076326 0.27627 standLRT 0.012240 0.11064 0.357 Residual 0.553780 0.74416 number of obs: 4059, groups: school, 65 Fixed effects: Estimate Std. Error t value (Intercept) -0.00709 0.03713 -0.191 standLRT 0.55794 0.01915 29.134 schavg 0.37341 0.11094 3.366 standLRT:schavg 0.16182 0.05773 2.803 Correlation of Fixed Effects: (Intr) stnLRT schavg standLRT 0.236 schavg 0.070 -0.064 stndLRT:sch -0.065 0.087 0.252
– – — — —– ——–
- Discuss this article and pose additional questions in the R-Sessions Forum
- Find the original article embedded in the manual.
– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –
I found theses sessions to be extremely helpful. I’m surprised there are no comments here. Just to encourage you to keep providing these sessions, I’m writing to say thanks.
I’m currently studying for my PhD in Economics and decided to switch from Stata to R.
I agree — this was really helpful. I hadn’t found a clear answer to this question anywhere else. Thanks for the post.
Second that. This is the clearest answer I’ve found in 2 years of trying to wrap my head around this. Thanks.
Thanks for posting this – I’m struggling my way through the final few months of an ecological PhD with a lot of mixed modelling involved. It really helps to see the examples and explanations for specifying different models.
Can we consider the group-level variable as the dependent variable?.
This is one of the best explanations I have seen so far. However, I am having a hard time finding examples of code when there are 2 levels. In my case, units (counts so I using glmer) within households and then households within communities.
Two years later… me too! I need to work with several 2- or 3-level models, and can’t seem to find any instructions for how to specify those in lme4 or nlme.
great – very clear and good to include the crossed design. be good to see SPSS syntax equivalents and often people start with spss and move onto R
I found this helpful to illustrate the different syntaxes but could you please provide some information on the interpretation of the output? And also how it differs across scenarios? What exactly are we meant to be looking at to determine our coefficients of interest?
Hi Laura,
thank you for your comment, and for your interest on my (old) blogpost. I refrained from interpreting the results, as interpretations are really dependent on the purpose of the regression model. All parameter estimates are important, but which ones you emphasise in your interpretation depends on your research questions / hypotheses / modelling goals.
Should you have more specific questions, you are always welcome to contact me.
Great post, very clear thanks!
That’s a great post! I have one question…could we include a Level 2 variable as DV e.g. school review scores in your example?
lmer(schoolreviewscore ~ standLRT * schavg + (1 + standLRT | school), data=Exam)
Absolutely!
Excellent discussion of multi level model fitting in R.
Would you be kind enough to supply a PDF copy so that I could give a hand out to students.
Many thanks
John Fresen
Thanks!
A .PDF version of most of the blogs about R is available: http://www.rensenieuwenhuis.nl/documents/Applied%20R.pdf