Several functions already present in R-Project are very useful when analyzing multilevel models or when preparing data to do so. Three of these helper functions will be described: aggregating data, the behavior of the plot() function when applied to a multilevel model and finally setting contrasts for categorical functions. Note that none of these functions are related to multilevel analysis only.
Aggregate
We will continue to work with the Exam-dataset that is made available through the mlmRev-package. In the syntax below we load that package and use the names() function to see what variables are available in the Exam data-set. One of the variables on individual level is the normexam-variable. Let’s say we want to aggregate this variable to the school-level, so that the new variable represents for each student the level of the exam-score at school level.
library(mlmRev)
names(Exam)meansch <- tapply(Exam$normexam, Exam$school, mean)
meansch[1:10]
Exam$meanexam <- meansch[Exam$school]
names(Exam)
Using tapply(), we create a table of the normexam-variable by school, calculating the mean for each school. The result of this is stored in the meansch-variable. This variable now contains 65 values representing the mean score on the normexam-variable for each school. The first ten of these values is shown. This meansch-variable is only a temporary helping variable. Then we create a new variable in the Exam-data.frame called meanexam as well (this can be any name you want, as long as it does not already exist in the data.frame).
We give the correct value on this new variable to each respondent, by indexing the meansch by the school-index stored in the Exam-data.frame. When we look at the names of the Exam data.frame now, we see that our new variable is added to it.
> library(mlmRev) > names(Exam) [1] "school" "normexam" "schgend" "schavg" "vr" [6] "intake" "standLRT" "sex" "type" "student" [11] "meansch" > > meansch <- tapply(Exam$normexam, Exam$school, mean) > meansch[1:10] 1 2 3 4 5 0.501209573 0.783102291 0.855444696 0.073628516 0.403608663 6 7 8 9 10 0.944579967 0.391500023 -0.048192541 -0.435682141 -0.269390664 > Exam$meanexam <- meansch[Exam$school] > names(Exam) [1] "school" "normexam" "schgend" "schavg" "vr" [6] "intake" "standLRT" "sex" "type" "student" [11] "meansch" "meanexam"
plot (multilevel.model)
R-Project has many generic functions, that behave differently when different data is provided to it. A good example of this is the plot()-function. We have already seen how this function can be used to plot simple data. When it is used to plot an estimated multilevel model, it will extract the residuals from it and the result is a plot of the standardized residuals.
library(nlme)
model.01 <- lme(fixed = normexam ~ standLRT,
data = Exam,
random = ~ 1 | school)plot(model.01, main="Residual plot of model.01")
In the syntax above, this is illustrated using a basic multilevel model, based on the Exam-data. First, the nlme-package is loaded, which results in a few warning messages (not shown here) that indicate that some data-sets are loaded twice (they were already loaded with the mlmRev-package). We assign the estimated model, that has a random intercept by schools and a predictor on the first (individual) level, to an object we have called ‘model.01′.
When this is plotted, we see that the residuals are distributed quite homoscedastically. The layout of the plot-window is quite different from what we have seen before. The reason for this is that here the lattice-package for advanced graphics is automatically used.
Contrast
When using categorical data in regression analyses (amongst other types of analysis), it is needed to code dummy-variables or contrasts. This can be done manually of course, as needs to be done in some other statistical packages, but in R-Project this is done automatically.
The ‘vr’-variable in the Exam-dataset represents “Student level Verbal Reasoning (VR) score band at intake – a factor. Levels are bottom 25%, mid 50%, and top 25%.” (quote from the description of the dataset, found by using ?Exam). In the syntax below, a summary of this variable is asked for, resulting in a confirmation that this variable contains three categories indeed.
Then, a basic model is estimated using the ‘vr’-variable and assigned to an object called model.02 (it is a random intercept model at school level containing two variables on individual level (‘vr’ and ‘standLRT’) and one variable on second level (‘schavg’). The summary of that model shows that it is the first category, representing the lowest Verbal Reasoning score band, that is used as reference category in the analyses.
summary(Exam$vr)
model.02 <- lme(fixed = normexam ~ vr + standLRT + schavg,
data = Exam,
random = ~ 1 | school)summary(model.02)
> summary(Exam$vr) bottom 25% mid 50% top 25% 640 2263 1156 > > model.02 <- lme(fixed = normexam ~ vr + standLRT + schavg, + data = Exam, + random = ~ 1 | school) > > summary(model.02) Linear mixed-effects model fit by REML Data: Exam AIC BIC logLik 9379.377 9423.53 -4682.689 Random effects: Formula: ~1 | school (Intercept) Residual StdDev: 0.2844674 0.7523679 Fixed effects: normexam ~ vr + standLRT + schavg Value Std.Error DF t-value p-value (Intercept) 0.0714396 0.16717547 3993 0.42733 0.6692 vrmid 50% -0.0834973 0.16341508 61 -0.51095 0.6112 vrtop 25% -0.0537963 0.27433650 61 -0.19610 0.8452 standLRT 0.5594779 0.01253795 3993 44.62276 0.0000 schavg 0.4007454 0.28016707 61 1.43038 0.1577 Correlation: (Intr) vrm50% vrt25% stnLRT vrmid 50% -0.946 vrtop 25% -0.945 0.886 standLRT 0.000 0.000 0.000 schavg 0.863 -0.794 -0.914 -0.045 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.71568977 -0.63052325 0.02781354 0.68294752 3.25580231 Number of Observations: 4059 Number of Groups: 65 >
Should we want to change that, we need to use the contrast() function. When used on the ‘vr’-variable, it makes clear why the first category of the variable is used as reference: in the resulting matrix it is the only category without a ‘1’ on one of the columns.
To change this, we have two options at hand. First we use the contr.treatment()-function. We want the third category used as reference now. So, we specify that the contrasts should be treated in such a way that the categories are based on the levels of the ‘vr’-variable and that the base, or reference, should be the third. The result of the contr.treatment()-function looks exactly like the result of the contrasts()-function (except for a different reference category obviously). To change the used reference in an analysis, the result of the contr.treatment()-function should be assigned to the contrast(Exam$vr). The old contrasts -settings are replaced by the new ones in that way.
Instead of using the contr.treatment() function we can simply create a matrix ourselves and use this to change the contrasts. This is done below in order to make clear what is actually happening when the contr.treatment()-function is used. First the matrix is shown, then it is assigned to the contrasts of the ‘vr’-variable. Now we have chosen the middle category to be the reference. Although this works perfectly, the draw-back of this procedure is that the value-labels of the variable are lost, while these are maintained when contr.treatment() is used.
Finally, using the new contrasts-settings of the ‘vr’-variable, the same model is estimated again. In the output below, we see that indeed the value labels are gone now (because of using a basic matrix to set the contrasts) and that the middle category is used as reference, although the categories are now referred to as ‘1’ and ‘2’ which might lead to confusing interpretations.
contrasts(Exam$vr)
contrasts(Exam$vr) <- contr.treatment(levels(Exam$vr), base=3)
contrasts(Exam$vr)
matrix(data=c(1,0,0,0,0,1), nrow = 3, ncol = 2, byrow=FALSE)
contrasts(Exam$vr) <- matrix(data=c(1,0,0,0,0,1), nrow = 3, ncol = 2, byrow=FALSE)
contrasts(Exam$vr)model.03 <- lme(fixed = normexam ~ vr + standLRT + schavg,
data = Exam,
random = ~ 1 | school)summary(model.03)
> contrasts(Exam$vr) mid 50% top 25% bottom 25% 0 0 mid 50% 1 0 top 25% 0 1 > contr.treatment(levels(Exam$vr), base=3) bottom 25% mid 50% bottom 25% 1 0 mid 50% 0 1 top 25% 0 0 > contrasts(Exam$vr) <- contr.treatment(levels(Exam$vr), base=3) > contrasts(Exam$vr) bottom 25% mid 50% bottom 25% 1 0 mid 50% 0 1 top 25% 0 0 > matrix(data=c(1,0,0,0,0,1), nrow = 3, ncol = 2, byrow=FALSE) [,1] [,2] [1,] 1 0 [2,] 0 0 [3,] 0 1 > contrasts(Exam$vr) <- matrix(data=c(1,0,0,0,0,1), nrow = 3, ncol = 2, byrow=FALSE) > contrasts(Exam$vr) [,1] [,2] bottom 25% 1 0 mid 50% 0 0 top 25% 0 1 > > model.03 <- lme(fixed = normexam ~ vr + standLRT + schavg, + data = Exam, + random = ~ 1 | school) > > summary(model.03) Linear mixed-effects model fit by REML Data: Exam AIC BIC logLik 9379.377 9423.53 -4682.689 Random effects: Formula: ~1 | school (Intercept) Residual StdDev: 0.2844674 0.7523679 Fixed effects: normexam ~ vr + standLRT + schavg Value Std.Error DF t-value p-value (Intercept) -0.0120577 0.05427674 3993 -0.22215 0.8242 vr1 0.0834973 0.16341508 61 0.51095 0.6112 vr2 0.0297010 0.15018792 61 0.19776 0.8439 standLRT 0.5594779 0.01253795 3993 44.62276 0.0000 schavg 0.4007454 0.28016707 61 1.43038 0.1577 Correlation: (Intr) vr1 vr2 stnLRT vr1 -0.096 vr2 -0.551 -0.530 standLRT 0.000 0.000 0.000 schavg 0.267 0.794 -0.806 -0.045 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.71568977 -0.63052325 0.02781354 0.68294752 3.25580231 Number of Observations: 4059 Number of Groups: 65 >
– – — — —– ——–
- Discuss this article and pose additional questions in the R-Sessions Forum
- Find the original article embedded in the manual.
– – — — —– ——–
R-Sessions is a collection of manual chapters for R-Project, which are maintained on Curving Normality. All posts are linked to the chapters from the R-Project manual on this site. The manual is free to use, for it is paid by the advertisements, but please refer to it in your work inspired by it. Feedback and topic requests are highly appreciated.
——– —– — — – –
One comment on “R-Sessions 18: Helper Functions”