#### Comparison of Rhat versions: clarifying formulas for the potential scale reduction factor and its implications

##### Author: Xavier Fernández-i-Marín

##### March 6, 2019 - 7 minutes

##### Description of how to calculate Rhat (Potential Scale Reduction Factor) for Bayesian convergence usind different formulas, and its impact on the length of the chains

`ggmcmc`

`R`

`Bayesian`

One of the formal measures of convergence in Bayesian inference is the Potential Scale Reduction Factor (\(\hat{R}\), “Rhat”). It is a weighted average of the Between-chain and Within-chain variances. The idea is that the closer this weighted average is to 1, the fewer the evidence that the chains have not converged.

# Multiple versions for the Rhat

The formula for the concrete weighted average, however, has changed, and there are multiple specifications of it. A more detailed discussion can be found in this post in stackexchange.com, but the main idea is the following:

- The
*first idea*of a potential scale reduction factor appears in (Gelman and Rubin 1992) - There is a minor change in the second version in (Brooks and Gelman 1998) that entails changing the part of the degrees of freedom from
`df/(df-2)`

to`(df+3)/(df+1)`

. This is the version that the R package`coda`

uses and it is possible to use in`ggmcmc`

when using the argument`version_Rhat = "BG98"`

in the`ggs_Rhat()`

function. Let’s call this the “BG98” version. - A newer version appears in the first version of
*Bayesian Data Analysis*(Gelman et al. 1995). It does not include any sort of weighting by degrees of freedom. - The
*most prevalent*version, though is the one in the second version of the second edition of*Bayesian Data Analysis*(let’s call this the “BDA2” version), although paradoxically is not the one used in coda, neither in stan currently. But this may change in the future. - In
*Bayesian Data Analysis*, the third edition, the formula is equal to “BDA2”, but the difference is that the original chains are broken by half and treated as if there were twice the chains of half the original size.

`ggmcmc`

’s function `ggs_Rhat()`

contains (starting from version 1.3) an argument that allows the user to specify which version of the computation is used, the “BG98” or the “BDA2” (the only one available since now and the default one from now on).

# Example using the radon data in `ggmcmc`

This example aims at clarifying the differences between both versions (“BR98” and “BDA2”) and their implications in different problems, specifically when comparing longer or shorter chains.

```
library(ggmcmc)
data(radon)
library(coda)
```

I am comparing the following: - *Source:* the Rhat as calculated by “coda” or “ggmcmc”. - *Versions*: the “BG98” and the “BDA2” - *Chain Length*: the original “radon” example contains only 100 iterations in 2 chains (“Long”), and the shorter version even fewer (20, “Short”).

The following lines simply arrange the Rhat values for all the parameters (175) in the “radon” object (an “mcmc.list”) in a tidy dataframe.

```
### Full output (100 iterations, "Long" length)
s.full <- radon$s.radon
gg.full <- ggs_Rhat(ggs(s.full), version_rhat = "BG98")$data
gg.full.bda2 <- ggs_Rhat(ggs(s.full), version_rhat = "BDA2")$data
coda.full <- gelman.diag(s.full, autoburnin = F, multivariate = F)$psrf
comp.full <- as.data.frame.table(coda.full,
responseName = "Rhat") %>%
tbl_df() %>%
filter(Var2 == "Point est.") %>%
select(-Var2) %>%
rename(Parameter = Var1) %>%
mutate(Source = "coda", Version = "BG98", Length = "Long")
comp.full <- bind_rows(comp.full, mutate(comp.full, Version = "BDA2")) %>%
bind_rows(mutate(gg.full, Source = "ggmcmc", Version = "BG98", Length = "Long")) %>%
bind_rows(gg.full.bda2 %>%
select(-B, -W, -wa) %>%
mutate(Source = "ggmcmc", Version = "BDA2", Length = "Long"))
### Part of the output (20 iterations, "Short" length)
s.short <- lapply(radon$s.radon, function(x) x[1:20,])
class(s.short) <- "mcmc.list"
gg.short <- ggs_Rhat(ggs(s.short), version_rhat = "BG98")$data
gg.short.bda2 <- ggs_Rhat(ggs(s.short), version_rhat = "BDA2")$data
coda.short <- gelman.diag(s.short, autoburnin = F, multivariate = F)$psrf
comp.short <- as.data.frame.table(coda.short,
responseName = "Rhat") %>%
tbl_df() %>%
filter(Var2 == "Point est.") %>%
select(-Var2) %>%
rename(Parameter = Var1) %>%
mutate(Source = "coda", Version = "BG98", Length = "Short")
comp.short <- bind_rows(comp.short, mutate(comp.short, Version = "BDA2")) %>%
bind_rows(mutate(gg.short, Source = "ggmcmc", Version = "BG98", Length = "Short")) %>%
bind_rows(gg.short.bda2 %>%
select(-B, -W, -wa) %>%
mutate(Source = "ggmcmc", Version = "BDA2", Length = "Short"))
comp <- bind_rows(comp.full, comp.short)
```

The comparison object `comp`

is a data frame with 1,400 observations (175 parameters * 2 Lengths * 2 Sources * 2 Versions).

`comp`

```
## # A tibble: 1,400 x 5
## Parameter Rhat Source Version Length
## <fct> <dbl> <chr> <chr> <chr>
## 1 alpha[1] 1.01 coda BG98 Long
## 2 alpha[2] 0.997 coda BG98 Long
## 3 alpha[3] 1.00 coda BG98 Long
## 4 alpha[4] 1.00 coda BG98 Long
## 5 alpha[5] 0.996 coda BG98 Long
## 6 alpha[6] 1.02 coda BG98 Long
## 7 alpha[7] 1.00 coda BG98 Long
## 8 alpha[8] 0.999 coda BG98 Long
## 9 alpha[9] 0.998 coda BG98 Long
## 10 alpha[10] 1.01 coda BG98 Long
## # … with 1,390 more rows
```

In order to focus on a comparison at the “Source” level (coda vs. ggmcmc) I make the “Source” variable contain two columns on the “Rhat” value.

```
comp.l <- comp %>%
spread(Source, Rhat)
comp.l
```

```
## # A tibble: 700 x 5
## Parameter Version Length coda ggmcmc
## <fct> <chr> <chr> <dbl> <dbl>
## 1 alpha[1] BDA2 Long 1.01 1.00
## 2 alpha[1] BDA2 Short 0.980 0.978
## 3 alpha[1] BG98 Long 1.01 1.01
## 4 alpha[1] BG98 Short 0.980 0.980
## 5 alpha[2] BDA2 Long 0.997 0.995
## 6 alpha[2] BDA2 Short 0.989 0.975
## 7 alpha[2] BG98 Long 0.997 0.997
## 8 alpha[2] BG98 Short 0.989 0.989
## 9 alpha[3] BDA2 Long 1.00 0.999
## 10 alpha[3] BDA2 Short 0.989 0.975
## # … with 690 more rows
```

```
ggplot(comp.l, aes(x = coda, y = ggmcmc)) +
geom_point() +
facet_grid(Version ~ Length)
```

Of course, the `coda`

package uses only the “BG98” version and therefore the second row maps perfectly the “coda” and the “ggmcmc” versions. But when `ggmcmc`

is used and the two versions to calculate the \(\hat{R}\) are directly compared it can be seen that the “BDA2” version tends to produce values closer to 1, specially when the chain is longer.

In fact, the “rule of thumb” is to look at Parameters with \(\hat{R} > 1.1\). According to this, in this example there are several cases in the “BG98” version, a few in the “BDA2” using a really, really short chain (only 20 iterations) and none in the long chain (top left panel).

I would say that the “BDA2” version is more liberal, and that if you are interested in having less certainty of non-convergence (remember that Bayesians can never be sure of convergence, but only collect evidence for lack of non-convergence) or applying a more restrictive / conservative convergence test, the “BG98” version is preferred.

Another way to see it is to compare the differences between the “BDA2” and “BG98” (calculated as a percentage of “BG98” over “BDA2”).

```
bda2.diff <- comp %>%
filter(Source == "ggmcmc") %>%
spread(Version, Rhat) %>%
mutate(p.diff = BG98 / BDA2)
ggplot(bda2.diff, aes(x = p.diff, fill = Length, color = Length)) +
geom_density(alpha = 0.4)
```

Shorter chains, not surprisingly, have higher differences between the two versions of the estimator, arriving in this example to 20 percent. But for the “Long” chains (relative, as in this case the length is only 100 iterations), most of the deviations are below 3 percent, and very unlikely above 5 percent.

# Final remarks

For a Bayesian data analysis involving more than, say, 100 iterations, there is going to be no virtually no difference in the Potential Scale Reduction Factor or \(\hat{R}\) as calculated using the “BDA2” or the “BG98” version. But for extreme cases with a low number of iterations (and chains), the most conservative measure is the “BG98”. In any case, the function `ggs_Rhat()`

in `ggmcmc`

allows you to pick which version of the \(\hat{R}\) you want.

## Citation

When usind the `ggmcmc`

don’t forget to cite it using: (Fernández-i-Marín 2016)

Brooks, Stephen P, and Andrew Gelman. 1998. “General Methods for Monitoring Convergence of Iterative Simulations.” *Journal of Computational and Graphical Statistics* 7 (4). Taylor & Francis Group: 434–55.

Fernández-i-Marín, Xavier. 2016. “Ggmcmc: Analysis of Mcmc Samples and Bayesian Inference.” *Journal of Statistical Software* 70 (1): 1–20. doi:10.18637/jss.v070.i09.

Gelman, Andrew, and Donald B Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” *Statistical Science*, 457–72.

Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995. *Bayesian Data Analysis*. Boca Ratón, FL: Hardcover; Chapman & Hall/CRC.

#### Related

#### Families and batches of parameters in Bayesian inference: how to treat them using ggmcmc

##### Author: Xavier Fernández-i-Marín

##### February 22, 2019 - 10 minutes

##### Tutorial of how to deal with families of parameters in Bayesian inference using ggmcmc

`ggmcmc`

`R`

`Bayesian`

#### V-Dem by default: Load and process V-Dem democracy scores in R

##### Author: Xavier Fernández-i-Marín

##### January 11, 2019 - 6 minutes

##### Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in R

`Governance`

`R`

`Data visualization`

#### Bavarian state elections 2018 - Evaluation of the forecasts

##### Author: Xavier Fernández-i-Marín

##### October 16, 2018 - 3 minutes

##### Comparison of the electoral results and forecasts for Bavarian elections 2018

`Electoral forecast`

`Bayesian`

`Data visualization`