Processing Stan samples with ggmcmc

Author: Xavier Fernández-i-Marín
October 16, 2012 - 4 minutes
Bayesian mcmc R Stan ggmcmc

Stan is a relative newcomer in the scenario of Bayesian tools aimed at provide samples from posteriors. But despite its novelty, it offers many interesting features for Bayesians. Amongst the features, which will probably be discussed here in the future, the possibility of having a sampler in a really fast programming language (C++) makes many of us dream of shorter times to run our models.

In order to make Stan and ggmcmc work together there are two possibilities, depending on how Stan is being used, either inside R with rstan or as a standalone program.

Create a fake dataset

In R we can easily create a fake dataset to play with.

This creates 500 data points from a plain simple linear model with intercept=3,slope=5 and standard deviation=1 (which is the default for rnorm()).

Data can be easily passed to Stan via a list:

The model declaration is stored simply as an R character object

There are three parts in this declaration. The first one (data) declares the data that will be used. The type of variables can (must) be specified. In this case N is an integer with a minimum value of 0, since it will loop across observations. X and y will be declared as any real number.

The second part (parameters) declares the prior specifications of the parameters to be sampled.

The third part (model) is really the core of the model specification. It declares the relationships between data and parameters. Prior specifications of the parameters can also be included here. In this case y is distributed normally with mean equal to an arithmetic relationship between the intercept, the slope and the data, and standard deviation equals to sigma. Notice that, contrary to JAGS and *Bugs, there is no more need to specify the dispersion parameter in terms of its precision (inverse of the variance).

Running Stan inside R with rstan

Now, with the data and the model ready we can run stan inside R

By default, the stan function call runs 4 chains of length 2000, with a warm-up interval of half the length (hence, 1000). Notice that Stan refers to “warm-up” instead of “burn-in” period, used in JAGS and *Bugs.

In order to access the samples we can use the extract method, discarding the warm-up period.

s is now an array with dimensions [1:n_iter, 1:n_chains, 1:n_parameters], without warm-up.

Then s can be converted into a typical coda object by

Finally the mcmc list can be easily transformed in an object that ggmcmc understands using a call to ggs().

Now the ggmcmc functions can be called using S as the argument: ggs_histogram, ggs_traceplot, ggs_density, ggs_compare_partial, ggs_running, ggs_autocorrelation, ggs_crosscorrelation, ggs_caterpillar

Or generating a single pdf file with all the output:

A traceplot (or any of the other plots that ggmcmc provides) of the samples can be easily obtained with

<img title=“traceplot-ggmcmc.png” alt=“traceplot-ggmcmc.png” src=“"class=“aligncenter” />

Running Stan alone

In order to run Stan without using the rstan function the model and the data must be written to disk

Then the usual steps required to run Stan follow, namely the conversion of stan code to C++ code, the compilation of the code, and the execution of the generated binary against the data. Please refer to the manual for the execution in your specific platform. In my Gentoo GNU/Linux system with Stan installed in /usr/local/stan/ I simply run the following in the command line

To generate two chains:

The chains can be imported easily as a table and transformed into an mcmc object with the following lines:

The -c(1:3) avoids loading Stan internal variables.

Finally, the s.mcmc object can also be converted into a ggmcmc object:

Is Stan fast?

My experience with Stan with respect to execution times is mixed, with some examples giving valid samples in short period of time and others taking much more than JAGS. I will do more extensive benchmarks and post the results here.

Comparison of Rhat versions: clarifying formulas for the potential scale reduction factor and its implications

Author: Xavier Fernández-i-Marín
March 6, 2019 - 7 minutes
Description of how to calculate Rhat (Potential Scale Reduction Factor) for Bayesian convergence usind different formulas, and its impact on the length of the chains
ggmcmc R Bayesian

Families and batches of parameters in Bayesian inference: how to treat them using ggmcmc

Author: Xavier Fernández-i-Marín
February 22, 2019 - 10 minutes
Tutorial of how to deal with families of parameters in Bayesian inference using ggmcmc
ggmcmc R Bayesian

V-Dem by default: Load and process V-Dem democracy scores in R

Author: Xavier Fernández-i-Marín
January 11, 2019 - 6 minutes
Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in R
Governance R Data visualization
comments powered by Disqus