Part1: introduction
Let’s get to know each other first!
Part 1: Introduction to Bayesian inferences and some basics in brms
Part 2: Think and set priors in a simple model
Part 3: Comparing models and into Bayesian Mixed Effects Modesl
Part 4: Reporting and interpreting Bayesian models
Integrated exercise: Analysing your own data or integrated exercise when you want as homework
All material is on the on-line dashboard:
https://abar-turku2024.netlify.app/#quick-overview
You can find:
an overview of the course
how to prepare
for each day the slides and datasets
html slides can be exported to pdf in your browser (use the menu button bottom right)
We want to learn more about a population but we have sample data!
Therefore, we have uncertainty
What is the effect of average number of kilometres ran per week on the marathon time?
We can estimate the effect by using a regression model based on a sample:
MarathonTimei=β0+β1∗n_kmi+ϵi
Of course, we are not interested in the value of β1 in the sample, but we want to learn more on β1 in the population!
Maximal Likelihood estimation:
Call:
lm(formula = MarathonTimeM ~ km4week, data = MarathonData)
Residuals:
Min 1Q Median 3Q Max
-42.601 -12.469 -0.536 12.816 38.670
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 199.14483 1.93856 102.728 < 2e-16 ***
km4week -0.50907 0.07233 -7.038 4.68e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 18.08 on 85 degrees of freedom
Multiple R-squared: 0.3682, Adjusted R-squared: 0.3608
F-statistic: 49.53 on 1 and 85 DF, p-value: 4.675e-10
Sample: for each km more a week approx half a minute faster
Population:
==> So: -0.6 is as equally probable as -0.4 …
Advantages:
Disadvantage:
P(θ|data)=P(data|θ)P(θ)P(data) with
P(data|θ) : the likelihood of the data given the parameters in the model θ
This part is about our MODEL and the parameters (aka the unknowns) in that model
Example: a model on “time needed to run a marathon” could be a normal distribution: yi∽N(μ,σ)
So: for a certain average (μ) and standard deviation (σ) we can calculate the probability of observing a marathon time of 240 minutes
Expression of our prior knowledge (belief) on the parameter values
Example: a model on “time needed to run a marathon” could be a normal distribution: yi∽N(μ,σ)
e.g., How probable is an average marathon time of 120 minutes (P(μ=120)) and how probable is a standard deviation of 40 minutes (P(σ=40))?
How probable are different values as average marathon time in a population?
Uninformative/Weakly informative
When objectivity is crucial and you want let the data speak for itself…
Informative
When including significant information is crucial
Two potential priors for the mean marathon time in the population
“For Bayesians, the data are treated as fixed and the parameters vary. […] Bayes’ rule tells us that to calculate the posterior probability distribution we must combine a likelihood with a prior probability distribution over parameter values.”
(Lambert, 2018, p.88)
Important: 2 types/contexts
Distribution of values of a variable (e.g., how are observed marathon times )
Distribution of probabilities of a parameter (prior or posterior)
This can be confusing, so each time, slow down and think!
Our Model for marathon times
yi∽N(μ,σ)
Our Priors related to mu and sigma:
Our data:
[1] 185 193 240 245 155 234 189 196 206 263
What is the posterior probability of mu = 180 combined with a sigma = 15?
What is the posterior probability of mu = 210 combined with a sigma = 30?
We calculated the posterior probability of a combination of 2 parameters
We could repeat this systematically for following values:
aka: we create a grid of possible parameter value combinations and approx the distribution of posterior probabilities
C1: first combo of mu and sigma
& C2: second combo of mu and sigma
Instead of a fixed grid of combinations of parameter values,
sample pairs/triplets/… of parameter values
reconstruct the posterior probability distribution
brms
?A ‘simple’ linear model
MTi∼N(μ,σei)μ=β0+β1∗sp4weeki+β2∗km4weeki+β3∗Agei+β4∗Genderi+β5∗CrossTrainingi
So you can get a REALLY LARGE number of parameters!
Complex models → Large number of parameters → exponentionally large number of combinations!
Posterior gets unsolvable by grid approximation
We will approximate the ‘joint posterior’ by ‘smart’ sampling
Samples of combinations of parameter values are drawn
BUT: samples will not be random!
Following link brings you to an interactive tool that let’s you get the basic idea behind MCMC sampling:
https://chi-feng.github.io/mcmc-demo/app.html#HamiltonianMC,standard
different dedicated software/packages are available: JAGS / BUGS / Stan
most powerful is Stan! Specifically the Hamiltonian Monte Carlo algorithm makes it the best choice at the moment
Stan is a probabilistic programming language that uses C++
// generated with brms 2.14.4
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// likelihood including all constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0.0, N);
target += normal_lpdf(Y | mu, sigma);
}
// priors including all constants
target += normal_lpdf(Intercept | 500,50);
target += student_t_lpdf(sigma | 3, 0, 68.8)
- 1 * student_t_lccdf(0 | 3, 0, 68.8);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
brms
worksbrms
brms
syntaxVery very similar to lm
or lme4
and in line with typical R-style writing up of a model …
Notice:
backend = "cmdstanr"
indicates the way we want to interact with Stan and C++The simplest model looked like:
MTi∼N(μ,σe)
In brms
this model is:
MT <- c(185, 193, 240, 245, 155, 234, 189, 196, 206, 263)
# First make a dataset from our MT vector
DataMT <- data_frame(MT)
Mod_MT1 <- brm(
MT ~ 1, # We only model an intercept
data = DataMT,
backend = "cmdstanr",
seed = 1975
)
🏃 Try it yourself and run the model … (don’t forget to load the necessary packages: brms
& tidyverse
)
brms
brms
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MT ~ 1
Data: DataMT (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 209.58 11.18 187.37 231.78 1.00 2386 1935
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.21 9.51 23.26 58.13 1.00 2487 2329
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Markov Chain Monte Carlo
brms
defaults with 4 chains each of 2000 iterations of which 1000 iterations are used as burn-in
brms
defaultsMod_MT1 <- brm( MT ~ 1, data = DataMT, backend = "cmdstanr", chains = 5, iter = 6000, warmup = 1500, cores = 4, seed = 1975 )
Mod_MT1 <- brm( MT ~ 1, data = DataMT, backend = "cmdstanr", chains = 5, iter = 6000, warmup = 1500, cores = 4, seed = 1975 )
Mod_MT1 <- brm( MT ~ 1, data = DataMT, backend = "cmdstanr", chains = 5, iter = 6000, warmup = 1500, cores = 4, seed = 1975 )
Mod_MT1 <- brm( MT ~ 1, data = DataMT, backend = "cmdstanr", chains = 5, iter = 6000, warmup = 1500, cores = 4, seed = 1975 )
plot()
functionplot()
functionMarathonData.RData
km4week
and sp4week
on MarathonTimeM
brms
defaultsplot()
functionˆR < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on ˆR
brms
defaults to family = gaussian(link = "identity")
But many alternatives are available!
The default family
types known in R
(e.g, binomial(link = "logit")
, Gamma(link = "inverse")
, poisson(link = "log")
, …)
see help(family)
And even more!
brmsfamily(...)
has a lot of possible models
see help(brmsfamily)
brms
can estimate models for more complex designs like multilevel models
or more generally mixed effects models
Random intercepts…
Random intercepts and slopes…