#### Families and batches of parameters in Bayesian inference: how to treat them using ggmcmc

##### Author: Xavier Fernández-i-Marín

##### February 22, 2019 - 10 minutes

##### Tutorial of how to deal with families of parameters in Bayesian inference using ggmcmc

`ggmcmc`

`R`

`Bayesian`

When working with Bayesian inference using sampling procedures (JAGS, Stan, Bugs) it is usually the case that the parameters come in different families (the “thetas”, the “betas”, the “alphas”), specially when dealing with hierarchical / multilevel models.

Sometimes parameters come in a single dimension (like `beta[1]`

, `beta[2]`

, `beta[3]`

, …), and sometimes they come in multiple dimensions (`beta[1,1]`

, `beta[1,2]`

, …). The reasons may be diverse, but it is quite common that one dimension refers to, for instance, an intercept, and the second dimension referst to different conditions of the experiment or the observation. This entry shows how to flexibly manage different situations in which families of parameters are involved.

Let’s start with the data provided by the `ggmcmc`

package.

```
library(ggmcmc)
data(linear)
```

The original data is a `coda`

object called `s`

, that stores several parameters, some of them belonging to the same *family*, in this case, the *beta* family.

```
S <- ggs(s)
levels(S$Parameter)
```

`## [1] "beta[1]" "beta[2]" "sigma"`

# Single dimensional parameters

In case only the *beta* family of parameters is needed, this can be easily accomplished with the argument `family`

, that accepts an R regular expression. In its simplest form, passing “beta” will keep all parameters containing it, which includes not only “beta”, but also possible “sigma.beta”, or other combinations.

```
S.beta <- ggs(s, family = "beta")
S.beta
```

```
## # A tibble: 4,000 x 4
## Iteration Chain Parameter value
## <int> <int> <fct> <dbl>
## 1 1 1 beta[1] 3.04
## 2 2 1 beta[1] 2.85
## 3 3 1 beta[1] 2.85
## 4 4 1 beta[1] 2.58
## 5 5 1 beta[1] 2.46
## 6 6 1 beta[1] 3.05
## 7 7 1 beta[1] 2.67
## 8 8 1 beta[1] 2.64
## 9 9 1 beta[1] 2.76
## 10 10 1 beta[1] 2.74
## # … with 3,990 more rows
```

`levels(S.beta$Parameter)`

`## [1] "beta[1]" "beta[2]"`

The `S.beta`

object only contains the “beta” parameters, and also only contains the bare minimum variables for a `ggs()`

object to work, namely: Iteration, Chain, Parameter and value.

Now the `S.beta`

object can be treated like any other `ggs()`

object:

`ggs_traceplot(S.beta)`

# Single dimensional parameters with labels

There is the possibility to complicate a bit more the representation when combining the `family`

parameter with the `par_labels`

one, that allows to incorporate meaningful labels to parameters. In this case, for instance, we will define the “beta” parameters as having different labels, but sharing a single dimension. We create a data frame that maps/links the original parameter names (`beta[1]`

and `beta[2]`

to different properties (like being in a single dimension called “First dimension”, but being different in the value of their Variable name.

```
L.betas <- expand.grid(b = 1:2) %>%
mutate(Parameter = paste("beta[", b, "]", sep = ""),
Variable = factor(b, labels = c("One", "Two")),
Dimension = "First dimension",
Label = paste(Variable, Dimension, sep = " : "))
L.betas
```

```
## b Parameter Variable Dimension Label
## 1 1 beta[1] One First dimension One : First dimension
## 2 2 beta[2] Two First dimension Two : First dimension
```

Now we also incorporate this dataframe into the `ggs()`

object, using the `par_labels`

argument:

```
S.beta <- ggs(s, family = "beta", par_labels = L.betas)
S.beta
```

```
## # A tibble: 4,000 x 8
## Iteration Chain Parameter value ParameterOrigin… b Variable
## <int> <int> <fct> <dbl> <fct> <int> <fct>
## 1 1 1 One : Fi… 3.04 beta[1] 1 One
## 2 2 1 One : Fi… 2.85 beta[1] 1 One
## 3 3 1 One : Fi… 2.85 beta[1] 1 One
## 4 4 1 One : Fi… 2.58 beta[1] 1 One
## 5 5 1 One : Fi… 2.46 beta[1] 1 One
## 6 6 1 One : Fi… 3.05 beta[1] 1 One
## 7 7 1 One : Fi… 2.67 beta[1] 1 One
## 8 8 1 One : Fi… 2.64 beta[1] 1 One
## 9 9 1 One : Fi… 2.76 beta[1] 1 One
## 10 10 1 One : Fi… 2.74 beta[1] 1 One
## # … with 3,990 more rows, and 1 more variable: Dimension <chr>
```

In this case, the object now does not only contain the bare minimum variables, but also the other ones passed through the mapping of labels and features of the “beta” parameters, namely the “b”, the “Variable”, and the “Dimension”. The original parameter name is kept under “ParameterOriginal” and the new parameter name used in the labels is the one defined by “Label” in the `L.betas`

object.

Now we can treat this object with more flexibility, as it contains features that define the “beta” parameters.

```
ggs_traceplot(S.beta) +
facet_grid(~ Variable)
```

# Multiple dimensions

In order to work with multiple dimensional parameters, we first have to fake it using the samples at hand. The following steps are never needed when the data is already structured in two (or more) dimensions.

```
# generate a bidimensional beta matrix out of repeating the first
# chain of the original object and the second chain with minus/plus
# differences.
s.dup <- list(cbind(s[[1]] - 2, s[[1]] + 2), cbind(s[[2]] - 2, s[[2]] + 2))
dimnames(s.dup[[1]])[[2]] <- dimnames(s.dup[[2]])[[2]] <-
c("beta[1,1]", "beta[1,2]", "sigma[1]", "beta[2,1]", "beta[2,2]", "sigma[2]")
attributes(s.dup[[1]])$mcpar <- attributes(s.dup[[2]])$mcpar <-
attributes(s[[1]])$mcpar
attributes(s.dup[[1]])$class <- attributes(s.dup[[2]])$class <-
attributes(s[[1]])$class
class(s.dup) <- "mcmc.list"
```

The `s.dup`

object now contains a two chains for 6 parameters (4 of them are from the “beta” family).

```
S.dup <- ggs(s.dup)
str(S.dup)
```

```
## Classes 'tbl_df', 'tbl' and 'data.frame': 12000 obs. of 4 variables:
## $ Iteration: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Chain : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Parameter: Factor w/ 6 levels "beta[1,1]","beta[1,2]",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 1.043 0.853 0.85 0.575 0.463 ...
## - attr(*, "nChains")= int 2
## - attr(*, "nParameters")= int 6
## - attr(*, "nIterations")= int 1000
## - attr(*, "nBurnin")= num 1000
## - attr(*, "nThin")= num 1
## - attr(*, "description")= chr "s.dup"
```

Then, create again the object with the labels for a two-dimensional family of parameters.

```
L.betas <- expand.grid(v = 1:2, d = 1:2) %>%
mutate(Parameter = paste("beta[", v, ",", d, "]", sep = ""),
Variable = factor(v, labels = c("One", "Two")),
Dimension = factor(d, labels = c("First", "Second")),
Label = paste(Variable, Dimension, sep = " : "))
L.betas
```

```
## v d Parameter Variable Dimension Label
## 1 1 1 beta[1,1] One First One : First
## 2 2 1 beta[2,1] Two First Two : First
## 3 1 2 beta[1,2] One Second One : Second
## 4 2 2 beta[2,2] Two Second Two : Second
```

`S.beta <- ggs(s.dup, family = "beta", par_labels = L.betas)`

And now just play with the display of the two variables (“Variable” and “Dimension”) than contain characteristics of the “beta” parameters.

```
ggs_traceplot(S.beta) +
facet_grid(Dimension ~ Variable)
```

```
ggs_density(S.beta, hpd = TRUE) +
facet_wrap(Variable ~ Dimension)
```

The examples covered here are only for families of parameters with one or two dimensions, but extending it to more dimensions is trivial with the modification of the dataset with the parameter labels. For instance, adding another dimension (let’s call it “Year”) would be as follows:

```
L.betas.time <- expand.grid(v = 1:2, d = 1:2, y = 1:9) %>%
mutate(Parameter = paste("beta[", v, ",", d, ",", y, "]", sep = ""),
Variable = factor(v, labels = c("One", "Two")),
Dimension = factor(d, labels = c("First", "Second")),
Year = as.numeric(as.character(factor(y, labels = 2001:2009))),
Label = paste(Variable, Dimension, sep = " : "))
```

# Example: electoral forecast in Bayern 2018

In a post on forecasting elections I presented the electoral forecasts for different parties through time. The \(\xi\) (*xi*) parameters account for the electoral support of every party (\(p\)) at a specific day (\(t\), for “time”), and therefore have two dimensions (\(\xi_{p,t}\)).

In order to generate a data frame with the meaningful labels for \(\xi\) we use the following:

```
# "names.parties" is a character vector containing the names of the parties
# in the correct order.
# "series.start" is a date when the forecast begins.
L.xis <- expand.grid(p = 1:PA, t = 1:T) %>%
mutate(Parameter = paste("xi[", p, ",", t, "]", sep = "")) %>%
mutate(Party = factor(p, label = names.parties)) %>%
mutate(Date = series.start + t - 1) %>%
mutate(Label = paste(Party, Date, sep = " : "))
xis <- ggs(s, family = "xi", par_labels = L.xis)
str(xis)
```

```
## Classes 'tbl_df', 'tbl' and 'data.frame': 5564800 obs. of 9 variables:
## $ Iteration : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Chain : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Parameter : Factor w/ 13912 levels "AfD : 2014-01-09",..: 1740 1740 1740 1740 1740 1740 1740 1740 1740 1740 ...
## $ value : num 49.8 48.7 49.5 49.6 48.5 ...
## $ ParameterOriginal: Factor w/ 13912 levels "xi[1,1]","xi[1,10]",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ p : int 1 1 1 1 1 1 1 1 1 1 ...
## $ t : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Party : Factor w/ 8 levels "CSU","SPD","FW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Date : Date, format: "2014-01-09" "2014-01-09" ...
## - attr(*, "nChains")= int 2
## - attr(*, "nParameters")= int 13912
## - attr(*, "nIterations")= int 200
## - attr(*, "nBurnin")= num 50091
## - attr(*, "nThin")= num 10
## - attr(*, "description")= chr "s"
```

`xis`

is a huge object containing 5,564,800 entries, corresponding to 13,912 parameters, two chains and 200 iterations. The reason is that the forecast is performed for every day between the last election (in 2014) until the election date in late 2018. Once the convergence tests have been performed using the `ggs_...()`

functions, we can simply summarize the posteriors using the `ci()`

function and keeping only the point estimates of interest.

```
ci.xis <- ci(xis)
ci.xis
```

```
## # A tibble: 13,912 x 11
## Parameter low Low median High high ParameterOrigin… p t
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <int> <int>
## 1 AfD : 20… 7.94 8.01 8.92 9.47 9.62 xi[8,1] 8 1
## 2 AfD : 20… 7.93 8.01 8.93 9.51 9.56 xi[8,2] 8 2
## 3 AfD : 20… 7.99 8.02 8.96 9.44 9.58 xi[8,3] 8 3
## 4 AfD : 20… 7.96 8.02 8.97 9.48 9.56 xi[8,4] 8 4
## 5 AfD : 20… 7.96 8.03 9.02 9.48 9.59 xi[8,5] 8 5
## 6 AfD : 20… 7.95 7.99 9.03 9.44 9.49 xi[8,6] 8 6
## 7 AfD : 20… 7.93 8.01 9.03 9.43 9.50 xi[8,7] 8 7
## 8 AfD : 20… 7.91 7.96 9.04 9.43 9.50 xi[8,8] 8 8
## 9 AfD : 20… 7.90 7.97 9.04 9.42 9.50 xi[8,9] 8 9
## 10 AfD : 20… 7.94 7.96 9.07 9.44 9.49 xi[8,10] 8 10
## # … with 13,902 more rows, and 2 more variables: Party <fct>, Date <date>
```

In this case `ci.xis`

contains “only” 13,912 entries, with the median, 90 and 95 percent credible intervals. Because it also incorporates the labels and meta-data about \(xi\), it is possible to use a simple call to `ggplot()`

to generate the figures of interest.

This is a simple point estimate of the temporal evolution of the forecasted value for every party.

```
ggplot(ci.xis, aes(x = Date, y = median,
color = Party, group = Party)) +
geom_line()
```

In order to incorporate the uncertainty bands we can simply use the `geom_ribbon()`

element overimposed to the line.

```
ggplot(ci.xis, aes(x = Date, y = median,
color = Party, group = Party)) +
geom_line() +
geom_ribbon(aes(ymin = low, ymax = high,
fill = Party), color = NA, alpha = 0.2)
```

# Final remarks

The combination of a tidy dataset in the `ggs`

format with the arguments “family” and “par_labels” provides a simple and unified way to treat samples generated by JAGS / Stan / Bugs using as a basis the functions provided by the `ggmcmc`

package, namely those that start with `ggs_...()`

. Combining them with `facet_grid()`

or `facet_wrap()`

, or `aes()`

and many other options in `ggplot2`

gives the user a lot of flexibility.

## Citation

When usind the `ggmcmc`

don’t forget to cite it using: (Fernández-i-Marín 2016)

Fernández-i-Marín, Xavier. 2016. “Ggmcmc: Analysis of Mcmc Samples and Bayesian Inference.” *Journal of Statistical Software* 70 (1): 1–20. doi:10.18637/jss.v070.i09.

#### Related

#### Comparison of Rhat versions: clarifying formulas for the potential scale reduction factor and its implications

##### Author: Xavier Fernández-i-Marín

##### March 6, 2019 - 7 minutes

##### Description of how to calculate Rhat (Potential Scale Reduction Factor) for Bayesian convergence usind different formulas, and its impact on the length of the chains

`ggmcmc`

`R`

`Bayesian`

#### V-Dem by default: Load and process V-Dem democracy scores in R

##### Author: Xavier Fernández-i-Marín

##### January 11, 2019 - 6 minutes

##### Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in R

`Governance`

`R`

`Data visualization`

#### Bavarian state elections 2018 - Evaluation of the forecasts

##### Author: Xavier Fernández-i-Marín

##### October 16, 2018 - 3 minutes

##### Comparison of the electoral results and forecasts for Bavarian elections 2018

`Electoral forecast`

`Bayesian`

`Data visualization`