Author: Xavier Fernández-i-Marín
January 11, 2019 - 6 minutes
Tutorial of how to perform data analysis of the Varieties of Democracies (V-Dem) dataset in RGovernance R Data visualization
The Varieties of Democracies project (V-Dem) is the new kid in town in the different efforts and strategies of political scientists to classify, rank and evaluate democratic features of states.
I personally believe that it represents a great advance with respect to the Polity-IV score that nowadays still represents the default value that you usually plug-in to “control for democracy” or even for more demanding tasks:
- Multidimensional: It views democracy as a multidimensional feature, and therefore can be more theoretically grounded on what we aim to use it for. Are we using it for freedoms and liberties? Are we using it for electoral rules? The V-Dem offers 5 different major dimensions, namely electoral, liberal, deliberative, egalitarian and participative.
- Model: It is a result of an application of a measurement model, not a result of a simple arithmetic combination of features (as with many –too much– scores in comparative politics, but this is a story for another moment).
V-Dem must be the new default democracy score from now on. So unless you have an argument to prefer other democracy scores, I would advice to use V-Dem by default.
This entry intends to provide an
R way to load the data, treat it, and produce several basic figures of interest.
The following packages are required to replicate the steps in this document:
library(dplyr) library(tidyr) library(ggplot2)
In order to use the dataset, the V-Dem team requires a registration, and therefore is not possible unfortunately to load the data directly from the servers.
So be sure to download the dataset store it in your current working directory in
R and then unzip it. The current version is 8. This example uses the “Country-Year: V-Dem Core” and the CSV file. Use the function
getwd() to know where you are in your computer and
setwd() to change it.
file <- "Country_Year_V-Dem_Core_CSV_v8/V-Dem-CY-Core-v8.csv" vdem.raw <- read.table(file, sep = ",", header = TRUE)
The structure of the object shows that we have more than 26,000 rows and 2,000 columns. The rows contain entries for every country over time, and the rows contain all the variables employed to generate the final scores.
##  26537 2021
##  "country_name" "country_text_id" "country_id" ##  "year" "historical_date" "project" ##  "historical" "histname" "codingstart" ##  "codingend"
Clean and select only the minimal set of variables
There is no need to work with the full set of variables. If only the 5 main dimensions are of interest, we can easily produce an object using:
vdem.select <- vdem.raw %>% select(Country = country_name, iso3c = country_text_id, COWcode, Year = year, Electoral = v2x_polyarchy, Liberal = v2x_libdem, Participatory = v2x_partipdem, Deliberative = v2x_delibdem, Egalitarian = v2x_egaldem) %>% tbl_df() dim(vdem.select)
##  26537 9
##  "Country" "iso3c" "COWcode" "Year" ##  "Electoral" "Liberal" "Participatory" "Deliberative" ##  "Egalitarian"
The resulting object has only 9 columns, the first four being the identificators of the case and the following 5 the scores. Although it is now in a quite compact format, it is not yet tidy. In order to tidy it properly we must ensure that the dimensions are not in different columns, but identified with a singular variable (using
vdem <- vdem.select %>% gather(Dimension, score, -Country, -iso3c, -COWcode, -Year) vdem
## # A tibble: 132,685 x 6 ## Country iso3c COWcode Year Dimension score ## <fct> <fct> <int> <int> <chr> <dbl> ## 1 Afghanistan AFG 700 1789 Electoral NA ## 2 Afghanistan AFG 700 1790 Electoral NA ## 3 Afghanistan AFG 700 1791 Electoral NA ## 4 Afghanistan AFG 700 1792 Electoral NA ## 5 Afghanistan AFG 700 1793 Electoral NA ## 6 Afghanistan AFG 700 1794 Electoral NA ## 7 Afghanistan AFG 700 1795 Electoral NA ## 8 Afghanistan AFG 700 1796 Electoral NA ## 9 Afghanistan AFG 700 1797 Electoral NA ## 10 Afghanistan AFG 700 1798 Electoral NA ## # … with 132,675 more rows
Now the cleaned and tidy object is a more than 132,000 rows long with 6 columns, only the last of which contains the data (the score), whereas the other ones identify the observation (country, year and dimension).
The score is bounded between 0 (low, poor democratic features) and 1 (high, democratic), and therefore the figures must cover the whole range accordingly.
Basic data description
The distribution of democracy scores in the last year
ggplot(filter(vdem, Year == max(Year)), aes(x = score, color = Dimension, fill = Dimension)) + geom_density(alpha = 0.4)
The relationships amongst dimensions for the last year of data available can be observed using a
ggpairs() figure (from package
library(GGally) vdem.only.dimensions.last.year <- vdem %>% filter(Year == max(Year)) %>% spread(Dimension, score) %>% select(-Country, -iso3c, -COWcode, -Year) vdem.only.dimensions.last.year
## # A tibble: 178 x 5 ## Deliberative Egalitarian Electoral Liberal Participatory ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.228 0.143 0.345 0.216 0.140 ## 2 0.299 0.390 0.551 0.463 0.338 ## 3 0.280 0.288 0.350 0.180 0.138 ## 4 0.141 0.102 0.252 0.141 0.0879 ## 5 0.605 0.581 0.765 0.631 0.512 ## 6 0.275 0.321 0.399 0.239 0.235 ## 7 0.804 0.760 0.875 0.827 0.632 ## 8 0.725 0.745 0.841 0.763 0.631 ## 9 0.0703 0.108 0.202 0.0668 0.0754 ## 10 0.0515 0.117 0.128 0.0536 0.0389 ## # … with 168 more rows
Country on dimensions
In order to trace a single country in several dimensions we can create an object containing only the values of the selected country.
selected.country <- "Germany" vdem.country <- filter(vdem, Country == selected.country) ggplot(vdem.country, aes(x = Year, y = score, group = Dimension, color = Dimension)) + geom_line() + expand_limits(y = c(0, 1)) # to limit to the range of the score
When comparing countries it is useful to use facets to differentiate the democratic dimensions, like in this case
selected.countries <- c("Egypt", "Germany", "Venezuela") vdem.countries <- filter(vdem, Country == selected.countries) ggplot(vdem.countries, aes(x = Year, y = score, color = Country)) + geom_line() + expand_limits(y = c(0, 1)) + facet_wrap(~ Dimension)
Finally, the following code produces sparklines that contain the whole set of countries and dimensions over time
ggplot(vdem, aes(x = Year, y = score, color = Dimension)) + geom_line(size = 0.5) + facet_grid(Country ~ Dimension) + # scale_x_continuous(breaks = seq(1800, 2000, by = 50)) + # scale_y_continuous() + theme(strip.text.y = element_text(angle = 0, hjust = 0), axis.text.y = element_blank(), axis.ticks.y = element_blank()) + ylab("Democracy score") + ggtitle("Varieties of Democracies")
When usind the V-Dem data don’t forget to cite it using: (Coppedge et al. 2017)
Coppedge, Michael, John Gerring, Staffan I Lindberg, Svend-Erik Skaaning, Jan Teorell, Joshua Krusell, Kyle L Marquardt, et al. 2017. “V-Dem Methodology V7.”