Part2: prior probability distributions
by Laurent Smeets and Rens van der Schoot
Before estimating the model:
1. Do you understand the priors?
After estimation before inspecting results:
Understanding the exact influence of the priors
Uninformative/Weakly informative
When objectivity is crucial and you want let the data speak for itself…
Informative
When including significant information is crucial
or
If I dichotomize the respondents into those who are rated “very attractive” and everyone else, the difference in the proportion of sons between the two groups (0.52 vs. 0.44) is statistically significant (t = 2.44, p<0.05).
n = 3000
Parameter: difference in % girls for both groups
Best estimate using a “flat prior”: 8% difference between both groups
Parameter: difference in % girls for both groups
What we know from extensive research:
So, we could argue for a full informative prior like a normal distribution with mean = 0 and sd = 0.001
Parameter: difference in % girls for both groups
Best estimate using a using a full informative prior: 0.007% difference
Parameter: difference in % girls for both groups
Best estimate using a using a weakly informative prior: 0.2% difference
What’s the problem?
Prior can only be interpreted in the context of the likelihood
For each parameter in the model we set priors
In a complex model there can be a complex interplay between priors
Setting weak priors for each single parameter may result in a lot of information…
Informativeness can only be judged in comparison to the likelihood
Suggested help source: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
brms
defaultsWeakly informative priors
If dataset is big, impact of priors is minimal
But, always better to know what you are doing!
Complex models might run into convergence issues → specifying more informative priors might help!
So, how to deviate from the defaults?
Remember our model 2 for Marathon Times (see slides Part 1):
MarathonTimeMi∼N(μ,σe)μ=β0+β1∗km4weeki+β2∗sp4weeki
What are potential priors for each of the parameters?
What do we need to be aware of to start thinking in priors?
Note: I centred both km4week
and sp4week
around their mean!
Packages needed:
Load the dataset and the model:
brms
Function: get_prior( )
Remember our model 2 for Marathon Times:
MarathonTimeMi∼N(μ,σe)μ=β0+β1∗km4weeki+β2∗sp4weeki
brms
prior
: type of prior distributionclass
: parameter class (with b
being population-effects)coef
: name of the coefficient within parameter classgroup
: grouping factor for group-level parameters (when using mixed effects models)resp
: name of the response variable when using multivariate modelslb
& ub
: lower and upper bound for parameter restrictionThe best way to make sense of the priors used is visualizing them!
Many options:
ggplot2
Generate a plot for the prior of the effects of independent variables
# Setting a plotting theme
theme_set(theme_linedraw() +
theme(text = element_text(family = "Times", size = 10),
panel.grid = element_blank(),
plot.title = element_markdown()))
# Say we set a Normal distribution with mean = 0 and sd = 10
Prior_betas <- ggplot( ) +
stat_function(
fun = dnorm, # We use the normal distribution
args = list(mean = 0, sd = 10), #
xlim = c(-20,20)
) +
scale_y_continuous(name = "density") +
labs(title = "Prior for the effects of independent variables",
subtitle = "N(0,10)")
Prior_betas
ggplot2
ggplot2
Generate the plot for the prior of the Intercept (mu)
library(metRology)
# Use the brms default t-distribution
Prior_mu <- ggplot( ) +
stat_function(
fun = dt.scaled, # We use the dt.scaled function of metRology
args = list(df = 3, mean = 199.2, sd = 24.9), #
xlim = c(120,300)
) +
scale_y_continuous(name = "density") +
labs(title = "Prior for the intercept",
subtitle = "student_t(3,199.2,24.9)")
Prior_mu
ggplot2
ggplot2
Generate the plot for the prior of the error variance (sigma)
# Use the brms default t-distribution
Prior_sigma <- ggplot( ) +
stat_function(
fun = dt.scaled, # We use the dt.scaled function of metRology
args = list(df = 3, mean = 0, sd = 24.9), #
xlim = c(0,6)
) +
scale_y_continuous(name = "density") +
labs(title = "Prior for the residual variance",
subtitle = "student_t(3,0,24.9)")
Prior_sigma
ggplot2
Experimental study (pretest - posttest design) with 3 conditions:
Model:
Posttesti∼N(μ,σei)μ=β0+β1∗Pretesti+β2∗Exp_cond1i+β3∗Exp_cond2i
With: pretest and posttest standardized (mean = 0 ; sd = 1)
Our job: coming up with priors that reflect that we expect both conditions to have a positive effect (hypothesis based on literature) and no indications that one experimental condition will outperform the other.
Remember Cohen’s d: 0.2 = small effect size; 0.5 = medium effect size; 0.8 or higher = large effect size
brms
Setting our custom priors can be done with set_prior( )
command
E.g., change the priors for the beta’s (effects of km4week
and sp4week
):
Did you set sensible priors? How informative are the priors (taken together)?
brms
Step 1: Fit the model with custom priors with option sample_prior="only"
Fit_Model_priors <- brm( MarathonTimeM ~ 1 + km4week + sp4week, data = MarathonData, prior = Custom_priors, backend = "cmdstanr", cores = 4, sample_prior = "only" )
Fit_Model_priors <- brm( MarathonTimeM ~ 1 + km4week + sp4week, data = MarathonData, prior = Custom_priors, backend = "cmdstanr", cores = 4, sample_prior = "only" )
Note
This will not work with the default priors of brms
as brms
sets flat priors for beta’s. So, you have to set custom priors for the effects of independent variables to be able to perform the prior predictive checks.
brms
Step 2: visualize the data with the pp_check( )
function
brms
300 distributions of simulated datasets (marathon times) overlayed by our observed data (to have an anchoring point)
How are summary statistics of simulated datasets (e.g., median, min, max, …) distributed over the datasets?
How does that compare to our real data?
Use type = "stat"
argument within pp_check()
300 medians of simulated datasets (marathon times) overlayed by the median of our observed data (to have an anchoring point)
300 fastest marathon times (in the simulated datasets) overlayed by the fastest observed time in our data (to have an anchoring point)
Put priors in the context of the likelihood
The priors
Are non-informative!! They will not predict our data perfectly.
Often we rely on ‘arbitrary’ chosen (default) weakly informative priors
What is the influence of the prior (and the likelihood) on our results?
You could ad hoc set new priors and re-run the analyses and compare (a lot of work, without strict sytematical guidelines)
Semi-automated checks can be done with priorsense
package
priorsense
packageRecently a package dedicated to prior sensitivity analyses is launched
Key-idea: power-scaling (both prior and likelihood)
background reading:
YouTube talk:
First check is done by using the powerscale_sensitivity( )
function
column prior contains info on sensitivity for prior (should be lower than 0.05)
column likelihood contains info on sensitivity for likelihood (that we want to be high, ‘let our data speak’)
column diagnosis is a verbalization of potential problem (- if none)
Sensitivity based on cjs_dist:
# A tibble: 4 × 4
variable prior likelihood diagnosis
<chr> <dbl> <dbl> <chr>
1 b_Intercept 0.000858 0.0856 -
2 b_km4week 0.000515 0.0807 -
3 b_sp4week 0.000372 0.0837 -
4 sigma 0.00574 0.152 -