V-Dem by default: Load and process V-Dem democracy scores in R

Author: Xavier Fernández-i-Marín
January 11, 2019 - 6 minutes
Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in R
Governance R Data visualization

The Varieties of Democracies project (V-Dem) is the new kid in town in the different efforts and strategies of political scientists to classify, rank and evaluate democratic features of states.

I personally believe that it represents a great advance with respect to the Polity-IV score that nowadays still represents the default value that you usually plug-in to “control for democracy” or even for more demanding tasks:

V-Dem must be the new default democracy score from now on. So unless you have an argument to prefer other democracy scores, I would advice to use V-Dem by default.

This entry intends to provide an R way to load the data, treat it, and produce several basic figures of interest.

The following packages are required to replicate the steps in this document:

library(dplyr)
library(tidyr)
library(ggplot2)

Load data

In order to use the dataset, the V-Dem team requires a registration, and therefore is not possible unfortunately to load the data directly from the servers.

So be sure to download the dataset store it in your current working directory in R and then unzip it. The current version is 8. This example uses the “Country-Year: V-Dem Core” and the CSV file. Use the function getwd() to know where you are in your computer and setwd() to change it.

file <- "Country_Year_V-Dem_Core_CSV_v8/V-Dem-CY-Core-v8.csv"
vdem.raw <- read.table(file, sep = ",", header = TRUE)

The structure of the object shows that we have more than 26,000 rows and 2,000 columns. The rows contain entries for every country over time, and the rows contain all the variables employed to generate the final scores.

dim(vdem.raw)
## [1] 26537  2021
names(vdem.raw)[1:10]
##  [1] "country_name"    "country_text_id" "country_id"     
##  [4] "year"            "historical_date" "project"        
##  [7] "historical"      "histname"        "codingstart"    
## [10] "codingend"

Clean and select only the minimal set of variables

There is no need to work with the full set of variables. If only the 5 main dimensions are of interest, we can easily produce an object using:

vdem.select <- vdem.raw %>%
  select(Country = country_name, iso3c = country_text_id, COWcode,
         Year = year,
         Electoral = v2x_polyarchy,
         Liberal = v2x_libdem,
         Participatory = v2x_partipdem,
         Deliberative = v2x_delibdem,
         Egalitarian = v2x_egaldem) %>%
  tbl_df()
dim(vdem.select)
## [1] 26537     9
names(vdem.select)
## [1] "Country"       "iso3c"         "COWcode"       "Year"         
## [5] "Electoral"     "Liberal"       "Participatory" "Deliberative" 
## [9] "Egalitarian"

The resulting object has only 9 columns, the first four being the identificators of the case and the following 5 the scores. Although it is now in a quite compact format, it is not yet tidy. In order to tidy it properly we must ensure that the dimensions are not in different columns, but identified with a singular variable (using gather()).

vdem <- vdem.select %>%
  gather(Dimension, score, -Country, -iso3c, -COWcode, -Year)
vdem
## # A tibble: 132,685 x 6
##    Country     iso3c COWcode  Year Dimension score
##    <fct>       <fct>   <int> <int> <chr>     <dbl>
##  1 Afghanistan AFG       700  1789 Electoral    NA
##  2 Afghanistan AFG       700  1790 Electoral    NA
##  3 Afghanistan AFG       700  1791 Electoral    NA
##  4 Afghanistan AFG       700  1792 Electoral    NA
##  5 Afghanistan AFG       700  1793 Electoral    NA
##  6 Afghanistan AFG       700  1794 Electoral    NA
##  7 Afghanistan AFG       700  1795 Electoral    NA
##  8 Afghanistan AFG       700  1796 Electoral    NA
##  9 Afghanistan AFG       700  1797 Electoral    NA
## 10 Afghanistan AFG       700  1798 Electoral    NA
## # … with 132,675 more rows

Now the cleaned and tidy object is a more than 132,000 rows long with 6 columns, only the last of which contains the data (the score), whereas the other ones identify the observation (country, year and dimension).

The score is bounded between 0 (low, poor democratic features) and 1 (high, democratic), and therefore the figures must cover the whole range accordingly.

Basic data description

The distribution of democracy scores in the last year

ggplot(filter(vdem, Year == max(Year)), 
       aes(x = score, color = Dimension, fill = Dimension)) +
  geom_density(alpha = 0.4)

The relationships amongst dimensions for the last year of data available can be observed using a ggpairs() figure (from package GGally).

library(GGally)
vdem.only.dimensions.last.year <- vdem %>%
  filter(Year == max(Year)) %>%
  spread(Dimension, score) %>%
  select(-Country, -iso3c, -COWcode, -Year)
vdem.only.dimensions.last.year
## # A tibble: 178 x 5
##    Deliberative Egalitarian Electoral Liberal Participatory
##           <dbl>       <dbl>     <dbl>   <dbl>         <dbl>
##  1       0.228        0.143     0.345  0.216         0.140 
##  2       0.299        0.390     0.551  0.463         0.338 
##  3       0.280        0.288     0.350  0.180         0.138 
##  4       0.141        0.102     0.252  0.141         0.0879
##  5       0.605        0.581     0.765  0.631         0.512 
##  6       0.275        0.321     0.399  0.239         0.235 
##  7       0.804        0.760     0.875  0.827         0.632 
##  8       0.725        0.745     0.841  0.763         0.631 
##  9       0.0703       0.108     0.202  0.0668        0.0754
## 10       0.0515       0.117     0.128  0.0536        0.0389
## # … with 168 more rows
ggpairs(vdem.only.dimensions.last.year)
Pair plot of the democracy scores in each dimension, for the last year of data available.

Figure 1: Pair plot of the democracy scores in each dimension, for the last year of data available.

Country on dimensions

In order to trace a single country in several dimensions we can create an object containing only the values of the selected country.

selected.country <- "Germany"
vdem.country <- filter(vdem, Country == selected.country)
ggplot(vdem.country, aes(x = Year, y = score,
                         group = Dimension, color = Dimension)) +
  geom_line() +
  expand_limits(y = c(0, 1)) # to limit to the range of the score
Temporal evolution of the democratic dimensions for a selected country.

Figure 2: Temporal evolution of the democratic dimensions for a selected country.

Countries compared

When comparing countries it is useful to use facets to differentiate the democratic dimensions, like in this case

selected.countries <- c("Egypt", "Germany", "Venezuela")
vdem.countries <- filter(vdem, Country == selected.countries)
ggplot(vdem.countries, aes(x = Year, y = score, color = Country)) +
  geom_line() +
  expand_limits(y = c(0, 1)) +
  facet_wrap(~ Dimension)
Temporal evolution of the democratic dimensions comparing countries.

Figure 3: Temporal evolution of the democratic dimensions comparing countries.

Sparklines

Finally, the following code produces sparklines that contain the whole set of countries and dimensions over time

ggplot(vdem, aes(x = Year, y = score, color = Dimension)) +
  geom_line(size = 0.5) + 
  facet_grid(Country ~ Dimension) +
#  scale_x_continuous(breaks = seq(1800, 2000, by = 50)) +
#  scale_y_continuous() +
  theme(strip.text.y = element_text(angle = 0, hjust = 0),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank()) +
  ylab("Democracy score") +
  ggtitle("Varieties of Democracies")
Sparklines with the temporal evolution of democracy scores in every dimension..

Figure 4: Sparklines with the temporal evolution of democracy scores in every dimension..

Citation

When usind the V-Dem data don’t forget to cite it using: (Coppedge et al. 2017)

Coppedge, Michael, John Gerring, Staffan I Lindberg, Svend-Erik Skaaning, Jan Teorell, Joshua Krusell, Kyle L Marquardt, et al. 2017. “V-Dem Methodology V7.”

Creating video lessons for higher education: a tutorial on how to use OBS for teaching

Author: Xavier Fernández-i-Marín
May 29, 2020 - 2 minutes
Tutorial to create video lessons for higher education teaching: an example teaching RStudio using OBS
Data visualization

Creating video lessons for higher education: a tutorial on how to use OBS for teaching

Author: Xavier Fernández-i-Marín
May 7, 2020 - 9 minutes
Tutorial to create video lessons for higher education teaching: an example teaching RStudio using OBS
Data visualization

Comparison of Rhat versions: clarifying formulas for the potential scale reduction factor and its implications

Author: Xavier Fernández-i-Marín
March 6, 2019 - 7 minutes
Description of how to calculate Rhat (Potential Scale Reduction Factor) for Bayesian convergence usind different formulas, and its impact on the length of the chains
ggmcmc R Bayesian
comments powered by Disqus