Checks for the final model

Note

This is a template that can be used to systematically inspect the Bayesian Modeling process, mainly inspired on the WAMBS checklist (see Depaoli (2020) for a tutorial). Based on this report it is an option to create a supplementary document for the analyses presented in the actual paper.

1. Before estimating the model

1.1 Do you understand the priors?

An important aspect of the Bayesian workflow considers checking the priors that were used in the model (Schad, Betancourt, and Vasishth 2021). In this document we will first report on the definition of the priors, and run some analyses to check the priors used for the final model. All the code and results are included in this document. The necessary files (being the raw data file) can be retrieved from OSF (make link here).

1.1.1 Preparation

We will visualize the priors and we will run “prior predictive checks”. Therefore we need to load the dataset and a number of packages. This is done in the following code block.

Note

This template is prepared as material for the ABAR 2023 workshop. In that workshop the data used as a running example are data on predicting how fast marathons are run and whether some variables on the training before the training impact the results. The dependent variable is MarathonTimeM and we want to know whether there is an impact of average number of kilometers in a week ran during the preparation phase (km4week) and average speed while training during the preparation phase (sp4week).

Code

library(here)       # to easily access files
library(tidyverse)  # for all data wrangling and plotting
library(brms)       # to have functions to run and check models
library(ggmcmc)     # to generate some information on our model 
library(bayesplot)  # to use certain plotting functions
library(mcmcplots)  # to use certain plotting functions
library(patchwork)  # to combine multiple ggplots into one plot
library(priorsense) # to perform prior sensibility analyses

load(
  here(
    "Presentations",
    "MarathonData.RData")
  )

MarathonTimes_Mod2 <- readRDS(
  here(
    "Presentations",
    "Output",
    "MarathonTimes_Mod2.RDS"
  )
)

These are the different packages used in this template: (Müller 2020; Wickham et al. 2019; Bürkner 2017; Fernández-i-Marín 2016; Gabry and Mahr 2022; Curtis 2018; Pedersen 2022; Kallioinen et al. 2021)

1.1.2 Specification of the priors

The model was formulated as:

To get an overview of the different parameters for which a prior has to be defined, we can use the get_prior( ) function and specify our model. This will result in an overview of the default priors set by the brms package.

Code

get_prior(
  MarathonTimeM ~ 1 + km4week + sp4week, 
  data = MarathonData
)

                     prior     class    coef group resp dpar nlpar lb ub
                    (flat)         b                                    
                    (flat)         b km4week                            
                    (flat)         b sp4week                            
 student_t(3, 199.2, 24.9) Intercept                                    
     student_t(3, 0, 24.9)     sigma                                0   
       source
      default
 (vectorized)
 (vectorized)
      default
      default

In the following table we made a list of all the priors that were defaulted by brms.

Table 1. Overview of the default priors used in the example model
Parameter(s)	Description	Prior	Rationale
$β_{0}$	Intercept	$\sim t (3, 199.2, 24.9)$	provide a rationale for this prior
$β_{1}$	Overall effect of `km4week`	Flat prior	provide a rationale for this prior
$β_{2}$	Overall effect of `sp4week`	Flat prior	provide a rationale for this prior
$σ_{e}$	Residual variance	$\sim t (3, 0, 24.9)$ with lowerbound being $0$	provide a rationale for this prior

We could also define our own priors. The next table gives an overview of our own weakly informative priors, where we mainly defined custom priors for the effects of our two independent variables.

In the following table we made a list of potential custom priors.

Table 2. Overview of the custom priors used in the example model
Parameter(s)	Description	Prior	Rationale
$β_{0}$	Intercept	$\sim t (3, 199.2, 24.9)$	This prior implies that we concentrate our prior probabilities for the expected marathon time for a person scoring zero on both independent variables somewhere between 150 minutes (2,5 hours) and 250 minutes (4 hours and 10 minutes). If you know that the fastest marathon is just above 2 hours and that many runners run their marathon between 3 and 4,5 hours, this prior might make sense.
$β_{1}$	Overall effect of `km4week`	$\sim N (0, 10)$	Here we use a normal distribution as our prior, with 0 as mean (we have no idea about the direction of the effect, so 0 gets most probability) and sd at 10.
$β_{2}$	Overall effect of `sp4week`	$\sim N (0, 10)$	Here we use a normal distribution as our prior, with 0 as mean (we have no idea about the direction of the effect, so 0 gets most probability) and sd at 10.
$σ_{e}$	Residual variance	$\sim t (3, 0, 24.9)$ with lowerbound being $0$	provide a rationale for this prior

1.1.3 Visualisation of the priors

To make some of the priors more tangible we plot the prior distributions in this part. We can make use of different functions from base but we will also rely on the metRology package for visualizing the student t distributions. Flat priors (for the effect of both independent variables) don’t need to be plotted, as they imply that all values get a similar probability.

Probability density plots for the different priors used in the example model

To use these priors for our model, we use the set_prior( ) function in bmrs. We only have to define the priors that we want to change. The resulting object (Custom_priors) can be used in the model definition.

Code

Custom_priors <- 
  c(
    set_prior(
      "normal(0,10)", 
      class = "b", 
      coef = "km4week"),
    set_prior(
      "normal(0,10)", 
      class = "b", 
      coef = "sp4week")
    )

1.1.4 Check the predictions based on the priors

To check the predictions based on the priors (aka prior predictive checks) we “estimate the model” based on the priors only. This equals to generating the default number of chains and iterations of potential parameter values, given the prior probabilities defined and not taking into account the data. The results can be stored in an object.

Code

Fit_Model_priors <- 
  brm(
    MarathonTimeM ~ 1 + km4week + sp4week, 
    data = MarathonData,
    prior = Custom_priors,
    backend = "cmdstanr",
    cores = 4,
    sample_prior = "only"
    )

Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.4 seconds.

Next, we can check the implications of these priors by using the pp_check( ) function. This function simulates data based on ndraws number of draws of probable parameter values from our prior probability distribution. So, in the following code 300 new datasets are simulate, each based on another set of potential parameter values. Then this same function also plots the distribution of each of these 300 datasets (light blue lines) over-layed by the distribution of the real observed data (dark blue lines). We can see that some of the simulated datasets show very different distributions compared to the actual dataset.

Code

set.seed(1975)
pp_check(
  Fit_Model_priors, 
  ndraws = 300) +
  scale_x_continuous(
    limits = c(120,300)
  )

We can also generate other types of plots to see whether our priors result in impossible data. For instance, we can compare some summary statistics based on our dataset to how the same summary statistics are distributed for all the simulated datasets. First let’s apply this to the median of our data, by including the argument type = "stat" and then define the stat = "median".

Code

pp_check(Fit_Model_priors, 
         type = "stat", 
         stat = "median")

As we can see in the above graph, some of the simulated datasets based on our priors result in a negative median marathon time, which is of course impossible! We could reconsider our priors here. If we think more carefully about our prior distribution for the effect of km4week we are still giving quiet some probability to strong negative effects (e.g., 10 minutes gain for each km run more in a week). So, if an athlete runs 20km more on average in a week that will result in a gain of 200 minutes! Same holds for the other side of the coin: we also give some probability of being 200 mintues slower if an athlete runs 20km more on average a week. Clearly our prior is too weakly defined here. Alternatively we could also argue that our model doesn’t really make sense (is there really a linear effect of these variables?). Let’s introduce a more restrictive prior for the beta of km4week (a normal distribution with a sd = 2) and do a similar check.

Code

Custom_priors2 <- 
  c(
    set_prior(
      "normal(0,2)", 
      class = "b", 
      coef = "km4week"),
    set_prior(
      "normal(0,10)", 
      class = "b", 
      coef = "sp4week")
    )

Fit_Model_priors2 <- 
  brm(
    MarathonTimeM ~ 1 + km4week + sp4week, 
    data = MarathonData,
    prior = Custom_priors2,
    backend = "cmdstanr",
    cores = 4,
    sample_prior = "only"
    )

Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.

Code

pp_check(Fit_Model_priors2, 
         type = "stat", 
         stat = "median")

This looks better! Now none of the simulated datasets is resulting in a negative median marathon time.

2. After estimation before inspecting results

2.1 Check convergence?

Important in the Bayesian workflow is to control whether the MCMC algorithm converged well and that all parameters are estimated properly. Different complementary methods can be used to inspect proper convergence of the MCMC algorithm.

Note

In what follows we will work with the model estimated with the default brms priors. So we didn’t re-run the model with our custom priors. Of course, in your project you should do this first and then do the later checks!

2.1.1 Does the trace-plot exhibit convergence?

Below we plot the different trace-plots (aka caterpillar plots) for the parameters estimated. As can be seen from these plots no strange patterns occur. All the chains mix well, meaning that we can conclude that the model converged.

Note

To create the caterpillar plots we made use of the ggs( ) function from the package ggmcmc. Then we make use of the information on the Parametercolumn in the resulting object to create a plot making use of ggplot2.

Code

Model_chains <- ggs(MarathonTimes_Mod2)

Model_chains %>%
  filter(Parameter %in% c(
          "b_Intercept", 
          "b_km4week", 
          "b_sp4week", 
          "sigma"
          )
  ) %>%
  ggplot(aes(
    x   = Iteration,
    y   = value, 
    col = as.factor(Chain)))+
  geom_line() +
  facet_grid(Parameter ~ .,
             scale  = 'free_y',
             switch = 'y') +
  labs(title = "Caterpillar Plots for the parameters",
       col   = "Chains")

Caterpillar plots for fixed part of the model

2.1.2 R-hat statistic

Next we can have a look at both the $\hat{R}$ values as well as the Effective Sample Sizes (ESS) for each parameter in the model. Following the rules of thumb in Vehtari et al. (2021) we consider that $\hat{R}$ <1.015 and ESS > 400 to rely on the $\hat{R}$ as indicators of convergence of the MCMC sampling. Printing the summary of our model gives us the necessary information:

Code

summary(MarathonTimes_Mod2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: MarathonTimeM ~ km4week + sp4week 
   Data: MarathonData (Number of observations: 87) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   199.16      1.50   196.34   202.06 1.00     4537     3209
km4week      -0.42      0.05    -0.53    -0.31 1.00     3936     3002
sp4week      -9.98      1.27   -12.43    -7.48 1.00     4063     2622

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    13.94      1.09    12.04    16.29 1.00     3618     2751

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

All $\hat{R}$ values are lower than 1.015 and given that the ESS are higher than 400 for all parameters, we can rely on the r-hat statistics. So we can conclude that the MCMC algorithm has converged properly.

A more detailed plot of the $\hat{R}$ values confirms this conclusion. Here we make use of the mcmc_rhat() function from the bayesplot package.

Code

mcmc_rhat(
  rhat(MarathonTimes_Mod2), 
  size = 1) +
  yaxis_text(hjust = 1) # to print parameter names

R-hat plot for the parameters in the model

2.1.3 Checking auto-correlation

The MCMC-algorithm introduces autocorrelation in the values it samples. Hence, not every sampled value is independent. The ratio of the effective sample size to the total sample size gives an idea of the amount of autocorrelation. This ratio should be higher than 0.1 (Gelman et al., 2013). This can be visualized making use of the mcmc_neff( ) function from the package bayesplot and refer to the result of the function neff_ratio( ).

Code

mcmc_neff(
  neff_ratio(MarathonTimes_Mod2)
  ) + 
  yaxis_text(hjust = 1)

Plot for the ratio’s of effective sample size to the total sample size for each paramater

Another way to get insight into the degree of autocorrelation in the posterior distribution is by plotting it. The plots below show the level of autocorrelation for a time lag of 20. Positive values are indicative of a positive correlation between sampled values (and implies that the algorithm stayed in the same region for a while). Therefore, autocorrelation should become 0 as quickly as possible. The plots are generated making use of the mcmc_acf( ) from bayesplot.

Code

mcmc_acf(as.array(MarathonTimes_Mod2), regex = "b")

Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
  Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.

Level of autocorrelation per chain for the intercept and beta’s in the model

Code

mcmc_acf(as.array(MarathonTimes_Mod2), regex = "sigma")

Level of autocorrelation per chain for the residual variance in the model

2.1.4 Rank plots

Rank plots are an additional way to assess the convergence of MCMC. These plots show histograms of the ranked posterior draws per chain. If the algorithm converged, plots of all chains look similar (Vehtari et al., 2019).

Code

mcmc_rank_hist(MarathonTimes_Mod2, regex = "b")

Rank plots for the intercept and beta’s in the model

Code

mcmc_rank_hist(MarathonTimes_Mod2, regex = "sigma")

Rank plots for the residual variance in the model

2.2 Does the posterior distribution histogram have enough information?

Here we make histograms based on the posterior probability distributions for each of the estimated parameters. What we want to see in those histograms is that there is a clear peak and sliding slopes. If the posterior distribution hasn’t got enough information it is not wise to come to conclusions on the probability distribution. This could potentially be overcome with running more iterations.

Code

posterior_PD <- as_draws_df(MarathonTimes_Mod2)

post_intercept <- 
  posterior_PD %>%
  select(b_Intercept) %>%
  ggplot(aes(x = b_Intercept)) +
  geom_histogram() +
  ggtitle("Intercept") 

post_km4week <- 
  posterior_PD %>%
  select(b_km4week) %>%
  ggplot(aes(x = b_km4week)) +
  geom_histogram() +
  ggtitle("Beta km4week") 

post_sp4week <- 
  posterior_PD %>%
  select(b_sp4week) %>%
  ggplot(aes(x = b_sp4week)) +
  geom_histogram() +
  ggtitle("Beta sp4week") 

post_intercept + post_km4week + post_sp4week +
  plot_layout(ncol = 3)

Posterior distribution histograms for the estimates in the final model

2.3 Posterior Predictive Checks

To see how well our model predicts the observed data, we can perform a posterior predictive check. This is done by using the pp_check() function from the package brms (Bürkner 2017).

Code

pp_check(MarathonTimes_Mod2, 
         ndraws = 100)

Posterior predictive check for the final model with 100 draws

From this we can learn that our model predicts the data rather well.

3. Understanding the exact influence of the priors

3.1 Prior sensitivity analyses

If we have a big dataset, our priors will be overruled by the data. But with smaller datasets a prior can have a significant influence on the results. In either situations, it is important to check how strongly the results are influenced by the priors we used in the model. Therefore, we rely on some functions in the priorsense package.

To install the package we used the following:

Code

# install.packages("remotes")
remotes::install_github("n-kall/priorsense")

A first quick check is done by using the powerscale_sensitivity( ) function. This results in a tibble with some information on each of the parameters:

the column prior gives a sensitivity index for the prior (so how sensitive is this parameter to changes in the prior)
the colum likelihood gives a similar sensitivity index for the likelihood
the column diagnosis gives a verbalization of the diagnosis if there is something to consider

The sensitivity indices can be interpreted by making use of a critical value 0.05. If the sensitivity index is higher than 0.05 this means that the parameter is sensitive for changes in the prior and/or likelihood.

In our example here the prior sensitivity indices are all lower than 0.05, indicating that none of the parameter estimates are sensitive to changes in the prior. This is what we are hoping for. At the other hand, we see that all parameters are sensitive to changes in the likelihood. That is actually also what you are hoping for, as this implies that our parameter estimates are mainly informed by the data (which translates in the likelihood part of Bayes’ theorem).

Code

powerscale_sensitivity(MarathonTimes_Mod2)

Loading required namespace: testthat

Sensitivity based on cjs_dist:
# A tibble: 4 × 4
  variable       prior likelihood diagnosis
  <chr>          <dbl>      <dbl> <chr>    
1 b_Intercept 0.000858     0.0856 -        
2 b_km4week   0.000515     0.0807 -        
3 b_sp4week   0.000372     0.0837 -        
4 sigma       0.00574      0.152  -

There is also the possibility to visualize this making use of the following code:

Code

powerscale_plot_dens(
  powerscale_sequence(
    MarathonTimes_Mod2
    ),
  variables = c(
      "b_Intercept",
      "b_km4week",
      "b_sp4week"
    )
  )

The resulting graph shows different panels. The panels in the first row show information on the impact of changing the priors on the posterior probability distribution for each of the parameters (= prior sensitivity). Here we see that this shows a single line (all lines are plotted on top of each other), meaning that changing the prior has no impact on the posterior probability distributions for each of these parameters.

The panels in the second row show the information for the likelihood sensitivity. Focussing on the graph for the effect of km4week we can see that changing the likelihood results in different posterior distributions: if the likelihood gets more weight (gets up-scaled by a positive factor, see more red lines) the posterior gets narrower and more peaked, and vice versa. The same holds for the other two parameters, but is hidden in the graph given the uninformative y-scales of the graphs at this point.

Another interesting diagnostic visualization is coming from the powerscale_plot_quantities( ) function. Here we apply it to the two slope parameters:

Code

powerscale_plot_quantities(
  powerscale_sequence(
    MarathonTimes_Mod2
    ),
  variables = c(
      "b_km4week",
      "b_sp4week"
      )
  )

The first column shows how a summary statistic for the distance between the actual posterior probability distribution and the one simulated after giving the prior (or likelihood) more weight. The lines summarizing the effect of changing the likelihood show the typical V form: changing the likelihood makes the posterior distribution move away from the actual one based on our model, which indicates likelihood sensitivity. The line for the prior is flat.

The second column shows how the mean of the posterior probability distribution is changed when giving the prior or likelihood more weight. Here we see that there is no impact on the mean.

The last column shows how the standard deviation of the posterior probability distribution is changed when giving the prior or likelihood more weight. Here we see that this is strongly influenced by changing the likelihood. Giving the likelihood more weight results in a lower standard deviation for the posterior distribution. This makes perfect sense, more weight to the likelihood resembles having more data, which makes our estimates more precise and hence results in a smaller standard deviation for the posterior distribution.

References

Bürkner, Paul-Christian. 2017. “Brms: An r Package for Bayesian Multilevel Models Using Stan” 80. https://doi.org/10.18637/jss.v080.i01.

Curtis, S. McKay. 2018. “Mcmcplots: Create Plots from MCMC Output.” https://CRAN.R-project.org/package=mcmcplots.

Depaoli, Rens van de Schoot , Duco Veen , Laurent Smeets , Sonja D. Winter , Sarah. 2020. “A Tutorial on Using the Wambs Checklist to Avoid the Misuse of Bayesian Statistics.” In. Routledge.

Fernández-i-Marín, Xavier. 2016. “Ggmcmc: Analysis of MCMC Samples and bayesian Inference” 70. https://doi.org/10.18637/jss.v070.i09.

Gabry, Jonah, and Tristan Mahr. 2022. “Bayesplot: Plotting for Bayesian Models.” https://mc-stan.org/bayesplot/.

Kallioinen, Noa, Topi Paananen, Paul-Christian Bürkner, and Aki Vehtari. 2021. “Detecting and Diagnosing Prior and Likelihood Sensitivity with Power-Scaling.”

Müller, Kirill. 2020. “Here: A Simpler Way to Find Your Files.” https://CRAN.R-project.org/package=here.

Pedersen, Thomas Lin. 2022. “Patchwork: The Composer of Plots.” https://CRAN.R-project.org/package=patchwork.

Schad, Daniel J., Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103–26. https://doi.org/10.1037/met0000275.

Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2). https://doi.org/10.1214/20-ba1221.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.