This nested structure can tell us important things about the social world. Knowing how much variation we have at each level can inform policy and theory.
Multilevel models allow us to estimate these different sources of variation.
- The relationship between X and Y would be different if we use multilevel model relative or linear regression.
When performing multilevel modeling, violations of the assumption of independent errors may occur. For example, the math achievement scores of students who attend the same school will be more correlated than the scores of students who attend different schools.
This may be because students in the same school have the same teachers, curriculum, and community, or for other reasons.
Correlation within schools will result in inaccurate estimates of standard errors for the model parameters, which in turn can lead to statistical inference errors, such as p-values that are smaller than they should be and result in Type 1 errors.
Now, we have the same dataset as before.
Now, we have the same dataset as before.
But this time we introduce a multilevel data structure, i.e. our data points are nested in six other groups.
First, we estimate a null model again.
In a multilevel context this means estimating a random intercept only model, without any predictors.
Notice how each group gets their own intercept.
The intercept in thick blue is called the grand mean and shows the average intercept across all groups.
In MLM, analysis often begins with a null model, which can be denoted as follows. The null model is used as a basis for model building and as a model comparison.
\[ y_{ij}=\gamma_{00}+U_{0j}+\varepsilon_{ij} \]
The null model introduces an error term relating to each intercept.
This shows the within-group variation.
We’ve also introduced an error term for each group intercept, i.e. differences between the grand mean and other group-specific intercepts.
This shows the between-group variation.
\(\rho_I\) can also be conceptualized as the correlation for the dependent measure for two individuals randomly selected from the same cluster. It can be expressed as:
\[ \rho_I = \frac {\tau^2}{\tau^2 + \sigma^2} \]
where \(\tau^2\) = Population variance between clusters \(\sigma^2\) = Population variance within clusters
The ICC is an important tool in multilevel modeling, in large part because it is an indicator of the degree to which the multilevel data structure might impact the outcome variable of interest.
The higher the ICC value, the more diversity there is between groups/clusters.