#### V-Dem by default: Load and process V-Dem democracy scores in R

##### Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in R
Governance R Data visualization

The Varieties of Democracies project (V-Dem) is the new kid in town in the different efforts and strategies of political scientists to classify, rank and evaluate democratic features of states.

I personally believe that it represents a great advance with respect to the Polity-IV score that nowadays still represents the default value that you usually plug-in to “control for democracy” or even for more demanding tasks:

• Multidimensional: It views democracy as a multidimensional feature, and therefore can be more theoretically grounded on what we aim to use it for. Are we using it for freedoms and liberties? Are we using it for electoral rules? The V-Dem offers 5 different major dimensions, namely electoral, liberal, deliberative, egalitarian and participative.
• Model: It is a result of an application of a measurement model, not a result of a simple arithmetic combination of features (as with many –too much– scores in comparative politics, but this is a story for another moment).

V-Dem must be the new default democracy score from now on. So unless you have an argument to prefer other democracy scores, I would advice to use V-Dem by default.

This entry intends to provide an R way to load the data, treat it, and produce several basic figures of interest.

The following packages are required to replicate the steps in this document:

library(dplyr)
library(tidyr)
library(ggplot2)

In order to use the dataset, the V-Dem team requires a registration, and therefore is not possible unfortunately to load the data directly from the servers.

So be sure to download the dataset store it in your current working directory in R and then unzip it. The current version is 8. This example uses the “Country-Year: V-Dem Core” and the CSV file. Use the function getwd() to know where you are in your computer and setwd() to change it.

file <- "Country_Year_V-Dem_Core_CSV_v8/V-Dem-CY-Core-v8.csv"
vdem.raw <- read.table(file, sep = ",", header = TRUE)

The structure of the object shows that we have more than 26,000 rows and 2,000 columns. The rows contain entries for every country over time, and the rows contain all the variables employed to generate the final scores.

dim(vdem.raw)
## [1] 26537  2021
names(vdem.raw)[1:10]
##  [1] "country_name"    "country_text_id" "country_id"
##  [4] "year"            "historical_date" "project"
##  [7] "historical"      "histname"        "codingstart"
## [10] "codingend"

## Clean and select only the minimal set of variables

There is no need to work with the full set of variables. If only the 5 main dimensions are of interest, we can easily produce an object using:

vdem.select <- vdem.raw %>%
select(Country = country_name, iso3c = country_text_id, COWcode,
Year = year,
Electoral = v2x_polyarchy,
Liberal = v2x_libdem,
Participatory = v2x_partipdem,
Deliberative = v2x_delibdem,
Egalitarian = v2x_egaldem) %>%
tbl_df()
dim(vdem.select)
## [1] 26537     9
names(vdem.select)
## [1] "Country"       "iso3c"         "COWcode"       "Year"
## [5] "Electoral"     "Liberal"       "Participatory" "Deliberative"
## [9] "Egalitarian"

The resulting object has only 9 columns, the first four being the identificators of the case and the following 5 the scores. Although it is now in a quite compact format, it is not yet tidy. In order to tidy it properly we must ensure that the dimensions are not in different columns, but identified with a singular variable (using gather()).

vdem <- vdem.select %>%
gather(Dimension, score, -Country, -iso3c, -COWcode, -Year)
vdem
## # A tibble: 132,685 x 6
##    Country     iso3c COWcode  Year Dimension score
##    <fct>       <fct>   <int> <int> <chr>     <dbl>
##  1 Afghanistan AFG       700  1789 Electoral    NA
##  2 Afghanistan AFG       700  1790 Electoral    NA
##  3 Afghanistan AFG       700  1791 Electoral    NA
##  4 Afghanistan AFG       700  1792 Electoral    NA
##  5 Afghanistan AFG       700  1793 Electoral    NA
##  6 Afghanistan AFG       700  1794 Electoral    NA
##  7 Afghanistan AFG       700  1795 Electoral    NA
##  8 Afghanistan AFG       700  1796 Electoral    NA
##  9 Afghanistan AFG       700  1797 Electoral    NA
## 10 Afghanistan AFG       700  1798 Electoral    NA
## # … with 132,675 more rows

Now the cleaned and tidy object is a more than 132,000 rows long with 6 columns, only the last of which contains the data (the score), whereas the other ones identify the observation (country, year and dimension).

The score is bounded between 0 (low, poor democratic features) and 1 (high, democratic), and therefore the figures must cover the whole range accordingly.

## Basic data description

The distribution of democracy scores in the last year

ggplot(filter(vdem, Year == max(Year)),
aes(x = score, color = Dimension, fill = Dimension)) +
geom_density(alpha = 0.4)

The relationships amongst dimensions for the last year of data available can be observed using a ggpairs() figure (from package GGally).

library(GGally)
vdem.only.dimensions.last.year <- vdem %>%
filter(Year == max(Year)) %>%
select(-Country, -iso3c, -COWcode, -Year)
vdem.only.dimensions.last.year
## # A tibble: 178 x 5
##    Deliberative Egalitarian Electoral Liberal Participatory
##           <dbl>       <dbl>     <dbl>   <dbl>         <dbl>
##  1       0.228        0.143     0.345  0.216         0.140
##  2       0.299        0.390     0.551  0.463         0.338
##  3       0.280        0.288     0.350  0.180         0.138
##  4       0.141        0.102     0.252  0.141         0.0879
##  5       0.605        0.581     0.765  0.631         0.512
##  6       0.275        0.321     0.399  0.239         0.235
##  7       0.804        0.760     0.875  0.827         0.632
##  8       0.725        0.745     0.841  0.763         0.631
##  9       0.0703       0.108     0.202  0.0668        0.0754
## 10       0.0515       0.117     0.128  0.0536        0.0389
## # … with 168 more rows
ggpairs(vdem.only.dimensions.last.year)

## Country on dimensions

In order to trace a single country in several dimensions we can create an object containing only the values of the selected country.

selected.country <- "Germany"
vdem.country <- filter(vdem, Country == selected.country)
ggplot(vdem.country, aes(x = Year, y = score,
group = Dimension, color = Dimension)) +
geom_line() +
expand_limits(y = c(0, 1)) # to limit to the range of the score

## Countries compared

When comparing countries it is useful to use facets to differentiate the democratic dimensions, like in this case

selected.countries <- c("Egypt", "Germany", "Venezuela")
vdem.countries <- filter(vdem, Country == selected.countries)
ggplot(vdem.countries, aes(x = Year, y = score, color = Country)) +
geom_line() +
expand_limits(y = c(0, 1)) +
facet_wrap(~ Dimension)

## Sparklines

Finally, the following code produces sparklines that contain the whole set of countries and dimensions over time

ggplot(vdem, aes(x = Year, y = score, color = Dimension)) +
geom_line(size = 0.5) +
facet_grid(Country ~ Dimension) +
#  scale_x_continuous(breaks = seq(1800, 2000, by = 50)) +
#  scale_y_continuous() +
theme(strip.text.y = element_text(angle = 0, hjust = 0),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab("Democracy score") +
ggtitle("Varieties of Democracies")

## Citation

When usind the V-Dem data don’t forget to cite it using: (Coppedge et al. 2017)

Coppedge, Michael, John Gerring, Staffan I Lindberg, Svend-Erik Skaaning, Jan Teorell, Joshua Krusell, Kyle L Marquardt, et al. 2017. “V-Dem Methodology V7.”

#### Comparison of Rhat versions: clarifying formulas for the potential scale reduction factor and its implications

##### Description of how to calculate Rhat (Potential Scale Reduction Factor) for Bayesian convergence usind different formulas, and its impact on the length of the chains
ggmcmc R Bayesian

#### Families and batches of parameters in Bayesian inference: how to treat them using ggmcmc

##### Tutorial of how to deal with families of parameters in Bayesian inference using ggmcmc
ggmcmc R Bayesian

#### Agency proliferation: processing the dataset of institutional characteristics of regulatory agencies - RegGov 2018

##### December 17, 2018 - 10 minutes
Regulatory agencies Governance Data visualization