Stan is a relative newcomer in the scenario of Bayesian tools aimed at provide samples from posteriors. But despite its novelty, it offers many interesting features for Bayesians. Amongst the features, which will probably be discussed here in the future, the possibility of having a sampler in a really fast programming language (C++) makes many of us dream of shorter times to run our models.

In order to make Stan and ggmcmc work together there are two possibilities, depending on how Stan is being used, either inside R with rstan or as a standalone program.

## Create a fake dataset

In R we can easily create a fake dataset to play with.

N <- 500 X <- runif(N) y <- 3 + 5 * X + rnorm(N)

This creates 500 data points from a plain simple linear model with intercept=3,slope=5 and standard deviation=1 (which is the default for `rnorm()`

).

Data can be easily passed to Stan via a list:

D <- list(N=N, X=X, y=y)

The model declaration is stored simply as an R character object

md <- ' data { int<lower=0> N; real X[N]; real y[N]; } parameters { real theta[2]; real<lower=0> sigma; } model { for (n in 1:N) { y[n] ~ normal(theta[1] + theta[2] * X[n], sigma); } } '

There are three parts in this declaration. The first one (data) declares the data that will be used. The type of variables can (must) be specified. In this case `N`

is an integer with a minimum value of 0, since it will loop across observations. `X`

and `y`

will be declared as any real number.

The second part (parameters) declares the prior specifications of the parameters to be sampled.

The third part (model) is really the core of the model specification. It declares the relationships between data and parameters. Prior specifications of the parameters can also be included here. In this case `y`

is distributed normally with mean equal to an arithmetic relationship between the intercept, the slope and the data, and standard deviation equals to `sigma`

. Notice that, contrary to JAGS and *Bugs, there is no more need to specify the dispersion parameter in terms of its precision (inverse of the variance).

## Running Stan inside R with rstan

Now, with the data and the model ready we can run stan inside R

library(rstan) m <- stan(model_code=md, data=D)

By default, the `stan`

function call runs 4 chains of length 2000, with a warm-up interval of half the length (hence, 1000). Notice that Stan refers to “warm-up” instead of “burn-in” period, used in JAGS and *Bugs.

In order to access the samples we can use the `extract`

method, discarding the warm-up period.

s <- extract(m, inc_warmup=FALSE)

`s`

is now an array with dimensions [1:n_iter, 1:n_chains, 1:n_parameters], without warm-up.

Then s can be converted into a typical `coda`

object by

s <- mcmc.list(mcmc(s[,1,-4]), mcmc(s[,2,-4]), mcmc(s[,3,-4]), mcmc(s[,4,-4])) # Or a cleaner way thanks to Sebastian Weber: s <- do.call(mcmc.list, alply(s[,,-4], 2, mcmc)) # Or even a more general way, thanks to Forrest Stevens s <- do.call(mcmc.list, alply(s[,,-(length(s[1,1,]))]], 2, mcmc))

Finally the mcmc list can be easily transformed in an object that ggmcmc understands using a call to `ggs()`

.

S <- ggs(s)

Now the ggmcmc functions can be called using `S`

as the argument: ggs_histogram, ggs_traceplot, ggs_density, ggs_compare_partial, ggs_running, ggs_autocorrelation, ggs_crosscorrelation, ggs_caterpillar

Or generating a single pdf file with all the output:

ggmcmc(S)

A traceplot (or any of the other plots that ggmcmc provides) of the samples can be easily obtained with

ggs_traceplot(S)

## Running Stan alone

In order to run Stan without using the `rstan`

function the model and the data must be written to disk

write(md, file="model.stan") dump(c("N", "X", "y"), file="data.Rdump")

Then the usual steps required to run Stan follow, namely the conversion of stan code to C++ code, the compilation of the code, and the execution of the generated binary against the data. Please refer to the manual for the execution in your specific platform. In my Gentoo GNU/Linux system with Stan installed in `/usr/local/stan/`

I simply run the following in the command line

stanc --name=model --o=model.cpp model.stan g++ -march=native -O3 -L/usr/local/stan/bin \ -I /usr/local/stan/src \ -I /usr/local/stan/lib/boost_1.51.0/ \ -I /usr/local/stan/lib/eigen_3.1.1 \ model.cpp -o model -lstan

To generate two chains:

./model --data=data.Rdump --iter=50000 --thin=1 --refresh=10000 \ --samples=s_1.csv --chain_id=1 ./model --data=data.Rdump --iter=50000 --thin=1 --refresh=10000 \ --samples=s_2.csv --chain_id=2

The chains can be imported easily as a table and transformed into an mcmc object with the following lines:

library(coda) s1 <- read.table("s_1.csv", sep=",", header=TRUE) s2 <- read.table("s_2.csv", sep=",", header=TRUE) s.mcmc <- mcmc.list(mcmc(s1[,-c(1:3)]), mcmc(s2[,-c(1:3)]))

The `-c(1:3)`

avoids loading Stan internal variables.

Finally, the `s.mcmc`

object can also be converted into a ggmcmc object:

library(ggmcmc) S <- ggs(s.mcmc)

## Is Stan fast?

My experience with Stan with respect to execution times is mixed, with some examples giving valid samples in short period of time and others taking much more than JAGS. I will do more extensive benchmarks and post the results here.

## By Forrest Stevens December 10, 2012 - 7:17 pm

I think you made a slight error with regards to the “cleaner” way to generate the mcmc.list object. The index to remove the log posterior should be negative:

# Or a cleaner way thanks to Sebastian Weber:

s <- do.call(mcmc.list, alply(s[,,-4], 2, mcmc))

But you might actually generalize this to a model of arbitrary complexity doing this:

s <- do.call(mcmc.list, alply(s[,,-(length(s[1,1,]))]], 2, mcmc))

Thanks *very* much for your examples though, it's been very helpful!

## By xavierfim December 13, 2012 - 9:25 pm

Absolutely Forrest. Many thanks. I did a mistake when translating the code from the editor into the blog.

Notice that ggmcmc-0.3,which is now ready, allows to directly import rstan output. I am preparing an entry to explain it.

Thanks for letting me know.

## By Nathaniel Decker September 10, 2013 - 9:42 pm

RStan has been updated such that the extract function now requires “permuted = FALSE” to work correctly with your example.