This is a template that can be used to systematically inspect the Bayesian Modeling process, mainly inspired on the WAMBS checklist (see Depaoli (2020) for a tutorial). Based on this report it is an option to create a supplementary document for the analyses presented in the actual paper.
1. Before estimating the model
1.1 Do you understand the priors?
An important aspect of the Bayesian workflow considers checking the priors that were used in the model (Schad, Betancourt, and Vasishth 2021). In this document we will first report on the definition of the priors, and run some analyses to check the priors used for the final model. All the code and results are included in this document. The necessary files (being the raw data file) can be retrieved from OSF (make link here).
1.1.1 Preparation
We will visualize the priors and we will run “prior predictive checks”. Therefore we need to load the dataset and a number of packages. This is done in the following code block.
Note
This template is prepared as material for the ABAR 2023 workshop. In that workshop the data used as a running example are data on predicting how fast marathons are run and whether some variables on the training before the training impact the results. The dependent variable is MarathonTimeM and we want to know whether there is an impact of average number of kilometers in a week ran during the preparation phase (km4week) and average speed while training during the preparation phase (sp4week).
Code
library(here) # to easily access fileslibrary(tidyverse) # for all data wrangling and plottinglibrary(brms) # to have functions to run and check modelslibrary(ggmcmc) # to generate some information on our model library(bayesplot) # to use certain plotting functionslibrary(mcmcplots) # to use certain plotting functionslibrary(patchwork) # to combine multiple ggplots into one plotlibrary(priorsense) # to perform prior sensibility analysesload(here("Presentations","MarathonData.RData") )MarathonTimes_Mod2 <-readRDS(here("Presentations","Output","MarathonTimes_Mod2.RDS" ))
To get an overview of the different parameters for which a prior has to be defined, we can use the get_prior( ) function and specify our model. This will result in an overview of the default priors set by the brms package.
Code
get_prior( MarathonTimeM ~1+ km4week + sp4week, data = MarathonData)
prior class coef group resp dpar nlpar lb ub
(flat) b
(flat) b km4week
(flat) b sp4week
student_t(3, 199.2, 24.9) Intercept
student_t(3, 0, 24.9) sigma 0
source
default
(vectorized)
(vectorized)
default
default
In the following table we made a list of all the priors that were defaulted by brms.
Table 1. Overview of the default priors used in the example model
Parameter(s)
Description
Prior
Rationale
Intercept
provide a rationale for this prior
Overall effect of km4week
Flat prior
provide a rationale for this prior
Overall effect of sp4week
Flat prior
provide a rationale for this prior
Residual variance
with lowerbound being
provide a rationale for this prior
We could also define our own priors. The next table gives an overview of our own weakly informative priors, where we mainly defined custom priors for the effects of our two independent variables.
In the following table we made a list of potential custom priors.
Table 2. Overview of the custom priors used in the example model
Parameter(s)
Description
Prior
Rationale
Intercept
This prior implies that we concentrate our prior probabilities for the expected marathon time for a person scoring zero on both independent variables somewhere between 150 minutes (2,5 hours) and 250 minutes (4 hours and 10 minutes). If you know that the fastest marathon is just above 2 hours and that many runners run their marathon between 3 and 4,5 hours, this prior might make sense.
Overall effect of km4week
Here we use a normal distribution as our prior, with 0 as mean (we have no idea about the direction of the effect, so 0 gets most probability) and sd at 10.
Overall effect of sp4week
Here we use a normal distribution as our prior, with 0 as mean (we have no idea about the direction of the effect, so 0 gets most probability) and sd at 10.
Residual variance
with lowerbound being
provide a rationale for this prior
1.1.3 Visualisation of the priors
To make some of the priors more tangible we plot the prior distributions in this part. We can make use of different functions from base but we will also rely on the metRology package for visualizing the student t distributions. Flat priors (for the effect of both independent variables) don’t need to be plotted, as they imply that all values get a similar probability.
Probability density plots for the different priors used in the example model
To use these priors for our model, we use the set_prior( ) function in bmrs. We only have to define the priors that we want to change. The resulting object (Custom_priors) can be used in the model definition.
Code
Custom_priors <-c(set_prior("normal(0,10)", class ="b", coef ="km4week"),set_prior("normal(0,10)", class ="b", coef ="sp4week") )
1.1.4 Check the predictions based on the priors
To check the predictions based on the priors (aka prior predictive checks) we “estimate the model” based on the priors only. This equals to generating the default number of chains and iterations of potential parameter values, given the prior probabilities defined and not taking into account the data. The results can be stored in an object.
Next, we can check the implications of these priors by using the pp_check( ) function. This function simulates data based on ndraws number of draws of probable parameter values from our prior probability distribution. So, in the following code 300 new datasets are simulate, each based on another set of potential parameter values. Then this same function also plots the distribution of each of these 300 datasets (light blue lines) over-layed by the distribution of the real observed data (dark blue lines). We can see that some of the simulated datasets show very different distributions compared to the actual dataset.
We can also generate other types of plots to see whether our priors result in impossible data. For instance, we can compare some summary statistics based on our dataset to how the same summary statistics are distributed for all the simulated datasets. First let’s apply this to the median of our data, by including the argument type = "stat" and then define the stat = "median".
Code
pp_check(Fit_Model_priors, type ="stat", stat ="median")
As we can see in the above graph, some of the simulated datasets based on our priors result in a negative median marathon time, which is of course impossible! We could reconsider our priors here. If we think more carefully about our prior distribution for the effect of km4week we are still giving quiet some probability to strong negative effects (e.g., 10 minutes gain for each km run more in a week). So, if an athlete runs 20km more on average in a week that will result in a gain of 200 minutes! Same holds for the other side of the coin: we also give some probability of being 200 mintues slower if an athlete runs 20km more on average a week. Clearly our prior is too weakly defined here. Alternatively we could also argue that our model doesn’t really make sense (is there really a linear effect of these variables?). Let’s introduce a more restrictive prior for the beta of km4week (a normal distribution with a sd = 2) and do a similar check.
Code
Custom_priors2 <-c(set_prior("normal(0,2)", class ="b", coef ="km4week"),set_prior("normal(0,10)", class ="b", coef ="sp4week") )Fit_Model_priors2 <-brm( MarathonTimeM ~1+ km4week + sp4week, data = MarathonData,prior = Custom_priors2,backend ="cmdstanr",cores =4,sample_prior ="only" )
pp_check(Fit_Model_priors2, type ="stat", stat ="median")
This looks better! Now none of the simulated datasets is resulting in a negative median marathon time.
2. After estimation before inspecting results
2.1 Check convergence?
Important in the Bayesian workflow is to control whether the MCMC algorithm converged well and that all parameters are estimated properly. Different complementary methods can be used to inspect proper convergence of the MCMC algorithm.
Note
In what follows we will work with the model estimated with the default brms priors. So we didn’t re-run the model with our custom priors. Of course, in your project you should do this first and then do the later checks!
2.1.1 Does the trace-plot exhibit convergence?
Below we plot the different trace-plots (aka caterpillar plots) for the parameters estimated. As can be seen from these plots no strange patterns occur. All the chains mix well, meaning that we can conclude that the model converged.
Note
To create the caterpillar plots we made use of the ggs( ) function from the package ggmcmc. Then we make use of the information on the Parametercolumn in the resulting object to create a plot making use of ggplot2.
Code
Model_chains <-ggs(MarathonTimes_Mod2)Model_chains %>%filter(Parameter %in%c("b_Intercept", "b_km4week", "b_sp4week", "sigma" ) ) %>%ggplot(aes(x = Iteration,y = value, col =as.factor(Chain)))+geom_line() +facet_grid(Parameter ~ .,scale ='free_y',switch ='y') +labs(title ="Caterpillar Plots for the parameters",col ="Chains")
Caterpillar plots for fixed part of the model
2.1.2 R-hat statistic
Next we can have a look at both the values as well as the Effective Sample Sizes (ESS) for each parameter in the model. Following the rules of thumb in Vehtari et al. (2021) we consider that <1.015 and ESS > 400 to rely on the as indicators of convergence of the MCMC sampling. Printing the summary of our model gives us the necessary information:
Code
summary(MarathonTimes_Mod2)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MarathonTimeM ~ km4week + sp4week
Data: MarathonData (Number of observations: 87)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 199.16 1.50 196.34 202.06 1.00 4537 3209
km4week -0.42 0.05 -0.53 -0.31 1.00 3936 3002
sp4week -9.98 1.27 -12.43 -7.48 1.00 4063 2622
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 13.94 1.09 12.04 16.29 1.00 3618 2751
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
All values are lower than 1.015 and given that the ESS are higher than 400 for all parameters, we can rely on the r-hat statistics. So we can conclude that the MCMC algorithm has converged properly.
A more detailed plot of the values confirms this conclusion. Here we make use of the mcmc_rhat() function from the bayesplot package.
Code
mcmc_rhat(rhat(MarathonTimes_Mod2), size =1) +yaxis_text(hjust =1) # to print parameter names
R-hat plot for the parameters in the model
2.1.3 Checking auto-correlation
The MCMC-algorithm introduces autocorrelation in the values it samples. Hence, not every sampled value is independent. The ratio of the effective sample size to the total sample size gives an idea of the amount of autocorrelation. This ratio should be higher than 0.1 (Gelman et al., 2013). This can be visualized making use of the mcmc_neff( ) function from the package bayesplot and refer to the result of the function neff_ratio( ).
Plot for the ratio’s of effective sample size to the total sample size for each paramater
Another way to get insight into the degree of autocorrelation in the posterior distribution is by plotting it. The plots below show the level of autocorrelation for a time lag of 20. Positive values are indicative of a positive correlation between sampled values (and implies that the algorithm stayed in the same region for a while). Therefore, autocorrelation should become 0 as quickly as possible. The plots are generated making use of the mcmc_acf( ) from bayesplot.
Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.
Level of autocorrelation per chain for the intercept and beta’s in the model
Level of autocorrelation per chain for the residual variance in the model
2.1.4 Rank plots
Rank plots are an additional way to assess the convergence of MCMC. These plots show histograms of the ranked posterior draws per chain. If the algorithm converged, plots of all chains look similar (Vehtari et al., 2019).
Code
mcmc_rank_hist(MarathonTimes_Mod2, regex ="b")
Rank plots for the intercept and beta’s in the model
2.2 Does the posterior distribution histogram have enough information?
Here we make histograms based on the posterior probability distributions for each of the estimated parameters. What we want to see in those histograms is that there is a clear peak and sliding slopes. If the posterior distribution hasn’t got enough information it is not wise to come to conclusions on the probability distribution. This could potentially be overcome with running more iterations.
Posterior distribution histograms for the estimates in the final model
2.3 Posterior Predictive Checks
To see how well our model predicts the observed data, we can perform a posterior predictive check. This is done by using the pp_check() function from the package brms(Bürkner 2017).
Code
pp_check(MarathonTimes_Mod2, ndraws =100)
Posterior predictive check for the final model with 100 draws
From this we can learn that our model predicts the data rather well.
3. Understanding the exact influence of the priors
3.1 Prior sensitivity analyses
If we have a big dataset, our priors will be overruled by the data. But with smaller datasets a prior can have a significant influence on the results. In either situations, it is important to check how strongly the results are influenced by the priors we used in the model. Therefore, we rely on some functions in the priorsense package.
A first quick check is done by using the powerscale_sensitivity( ) function. This results in a tibble with some information on each of the parameters:
the column prior gives a sensitivity index for the prior (so how sensitive is this parameter to changes in the prior)
the colum likelihood gives a similar sensitivity index for the likelihood
the column diagnosis gives a verbalization of the diagnosis if there is something to consider
The sensitivity indices can be interpreted by making use of a critical value 0.05. If the sensitivity index is higher than 0.05 this means that the parameter is sensitive for changes in the prior and/or likelihood.
In our example here the prior sensitivity indices are all lower than 0.05, indicating that none of the parameter estimates are sensitive to changes in the prior. This is what we are hoping for. At the other hand, we see that all parameters are sensitive to changes in the likelihood. That is actually also what you are hoping for, as this implies that our parameter estimates are mainly informed by the data (which translates in the likelihood part of Bayes’ theorem).
The resulting graph shows different panels. The panels in the first row show information on the impact of changing the priors on the posterior probability distribution for each of the parameters (= prior sensitivity). Here we see that this shows a single line (all lines are plotted on top of each other), meaning that changing the prior has no impact on the posterior probability distributions for each of these parameters.
The panels in the second row show the information for the likelihood sensitivity. Focussing on the graph for the effect of km4week we can see that changing the likelihood results in different posterior distributions: if the likelihood gets more weight (gets up-scaled by a positive factor, see more red lines) the posterior gets narrower and more peaked, and vice versa. The same holds for the other two parameters, but is hidden in the graph given the uninformative y-scales of the graphs at this point.
Another interesting diagnostic visualization is coming from the powerscale_plot_quantities( ) function. Here we apply it to the two slope parameters:
The first column shows how a summary statistic for the distance between the actual posterior probability distribution and the one simulated after giving the prior (or likelihood) more weight. The lines summarizing the effect of changing the likelihood show the typical V form: changing the likelihood makes the posterior distribution move away from the actual one based on our model, which indicates likelihood sensitivity. The line for the prior is flat.
The second column shows how the mean of the posterior probability distribution is changed when giving the prior or likelihood more weight. Here we see that there is no impact on the mean.
The last column shows how the standard deviation of the posterior probability distribution is changed when giving the prior or likelihood more weight. Here we see that this is strongly influenced by changing the likelihood. Giving the likelihood more weight results in a lower standard deviation for the posterior distribution. This makes perfect sense, more weight to the likelihood resembles having more data, which makes our estimates more precise and hence results in a smaller standard deviation for the posterior distribution.
Depaoli, Rens van de Schoot , Duco Veen , Laurent Smeets , Sonja D. Winter , Sarah. 2020. “A Tutorial on Using the Wambs Checklist to Avoid the Misuse of Bayesian Statistics.” In. Routledge.
Kallioinen, Noa, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari. 2021. “Detecting and Diagnosing Prior and Likelihood Sensitivity with Power-Scaling.”
Schad, Daniel J., Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.”Psychological Methods 26 (1): 103–26. https://doi.org/10.1037/met0000275.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion).”Bayesian Analysis 16 (2). https://doi.org/10.1214/20-ba1221.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.