Part 3: a mixed effects example
Key idea:
leave one data point out of the data
re-fit the model
predict the value for that one data point and compare with observed value
re-do this n times
loo
packageLeave-one-out as described is almost impossible!
loo
uses a “shortcut making use of the mathematics of Bayesian inference” 1
Result: (^elpd): “expected log predictive density” (higher (^elpd) implies better model fit without being sensitive for over-fitting!)
loo
codeloo
codeWritingData.RData
Experimental study on Writing instructions
2 conditions:
Model 1: Intercept varies between classes j + overall effect of FirstVersion
Frequentist way of writing…
Bayesian way of writing…
Model 2: Intercept and effect of First_Version varies between classes j (random slopes)
Frequentist way of writing…
Bayesian way of writing…
or
Model 3: M2 + Effect of condition
Open WritingData.RData
Estimate 3 models with SecondVersion
as dependent variable
FirstVersion_GM
+ random effect of Class
((1|Class)
)FirstVersion_GM
((1 + FirstVersion_GM |Class)
)Experimental_condition
Compare the models on their fit
What do we learn?
Make a summary of the best fitting model
Something to worry about!
Essentially: sampling of parameter estimate values went wrong
Fixes:
control = list(adapt_delta = 0.9)
) worksbrms
priorsThe priors of M3
The overall intercept
Maybe a little wide?? But, let’s stick with it…
What about the overall effect of FirstVersion_GM?
What about the overall effect of ExperimentalCondition?
Nice to know:
What about the overall effect of FirstVersion_GM?
# Normal distribution with mean = 1 and sd = 5
Prior_beta1 <- ggplot( ) +
stat_function(
fun = dnorm, # We use the normal distribution
args = list(mean = 1, sd = 5), #
xlim = c(-9,11)
) +
scale_y_continuous(name = "density") +
labs(title = "Prior for the effect of FirstVersion",
subtitle = "N(1,5)") +
geom_vline(xintercept=1,linetype=3)
Prior_beta1
What about the overall effect of ExperimentalCondtion?
Assuming a “small effect size” = 0.2 SD’s = Effect of 3.4 points on our scale
We also want to give probability to big effects as well as negative effects, so we set the standard deviation of our prior distribution wide enough (e.g., 17).
# Normal distribution with mean = 3.4 and sd = 17
Prior_beta2 <- ggplot( ) +
stat_function(
fun = dnorm, # We use the normal distribution
args = list(mean = 3.4, sd = 17), #
xlim = c(-30.6,37.4)
) +
scale_y_continuous(name = "density") +
labs(title = "Prior for the effect of FirstVersion",
subtitle = "N(3.4,17)") +
geom_vline(xintercept=3.4,linetype=3)
Prior_beta2
Variance between classes for the intercept (expressed as an SD)?
How large are differences between classes for an average student (for first version)?
So, a high probability for SD’s bigger than 13.3, what does that mean?
Imagine an SD of 13.3 for the differences between classes’ intercepts
95% of classes will score between 83.8 and 137
Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!
Imagine an SD of 20 for the differences between classes’ intercepts
95% of classes will score between 70.4 and 150.4
Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!
Variance between classes for the effect of FirstVersion (expressed as an SD)?
How large are differences between classes for the effect of first version?
So, a high probability for SD’s bigger than 13.3, what does that mean?
Imagine an SD of 13.3 for the differences in slopes between classes
For 95% of classes the effect of first version will vary between -25.6 and 27.6!
Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!
brms
Did you set sensible priors? How informative are the priors (taken together)?
brms
Step 1: Fit the model with custom priors with option sample_prior="only"
M3_custom_priorsOnly <- brm( SecondVersion ~ FirstVersion_GM + Experimental_condition + (1 + FirstVersion_GM |Class), data = WritingData, prior = Custom_priors, backend = "cmdstanr", cores = 4, seed = 1975, sample_prior = "only" )
M3_custom_priorsOnly <- brm( SecondVersion ~ FirstVersion_GM + Experimental_condition + (1 + FirstVersion_GM |Class), data = WritingData, prior = Custom_priors, backend = "cmdstanr", cores = 4, seed = 1975, sample_prior = "only" )
Note
This will not work with the default priors of brms
as brms
sets flat priors for beta’s. So, you have to set custom priors for the effects of independent variables to be able to perform the prior predictive checks.
brms
Step 2: visualize the data with the pp_check( )
function
brms
300 distributions of simulated datasets (scores on SecondVersion
) overlayed by our observed data (to have an anchoring point)
How are summary statistics of simulated datasets (e.g., median, min, max, …) distributed over the datasets?
How does that compare to our real data?
Use type = "stat"
argument within pp_check()
300 medians of simulated datasets (scores on SecondVersion
) overlayed by the median of our observed data (to have an anchoring point)
300 lowest scores (in the simulated datasets) overlayed by the lowest observed score in our data (to have an anchoring point)
For each datapoint, situated within it’s group (Class) a set of 300 predicted scores based on the prior only.
Put priors in the context of the likelihood
The priors are non-informative!!
They do not predict our data perfectly.
Often we rely on ‘arbitrary’ chosen (default) weakly informative priors
What is the influence of the prior (and the likelihood) on our results?
You could ad hoc set new priors and re-run the analyses and compare (a lot of work, without strict sytematical guidelines)
Semi-automated checks can be done with priorsense
package
priorsense
packageRecently a package dedicated to prior sensitivity analyses is launched
Key-idea: power-scaling (both prior and likelihood)
background reading:
YouTube talk:
First check is done by using the powerscale_sensitivity( )
function
column prior contains info on sensitivity for prior (should be lower than 0.05)
column likelihood contains info on sensitivity for likelihood (that we want to be high, ‘let our data speak’)
column diagnosis is a verbalization of potential problem (- if none)
Sensitivity based on cjs_dist:
# A tibble: 47 × 4
variable prior likelihood diagnosis
<chr> <dbl> <dbl> <chr>
1 b_Intercept 0.00282 0.0448 -
2 b_FirstVersion_GM 0.00201 0.0152 -
3 b_Experimental_condition 0.00170 0.0229 -
4 sd_Class__Intercept 0.00549 0.169 -
5 sd_Class__FirstVersion_GM 0.00411 0.0296 -
6 cor_Class__Intercept__FirstVersion_GM 0.000935 0.171 -
7 sigma 0.00199 0.301 -
8 r_Class[1,Intercept] 0.00116 0.0803 -
9 r_Class[2,Intercept] 0.00114 0.133 -
10 r_Class[3,Intercept] 0.00167 0.123 -
# ℹ 37 more rows