Agency proliferation: processing the dataset of institutional characteristics of regulatory agencies - RegGov 2018
Author: Xavier Fernández-i-Marín
December 17, 2018 - 10 minutes
Regulatory agencies Governance Data visualizationIntroduction
This tutorial shows how to process the dataset of institutional characteristics
of regulatory agencies in R
. It presents the dataset presented at the journal article
entitled “Agency proliferation and the globalization of the regulatory state:
Introducing a data set on the institutional features of regulatory agencies”, by
Jacint Jordana, Xavier Fernández-i-Marín and Andrea C. Bianculli, published at
Regulation & Governance (December 2018).
The necessary packages in R
to follow the instructions are the following:
library(dplyr) # For the functions involving pipes ("%>%")
library(ggplot2) # For the figures
library(tidyr) # For arranging variables
The code shown below makes extensive use of pipes (%>%
) for more flexible, simple and readable coding.
Load the dataset
The dataset can be found in long and wide formats. Recall that the data contains scores that are normalized (mean zero and standard deviation one) and correspond to the position of the institution in 2010.
The long format has a row for every institution / dimension, with a column containing the value of the score.
The first four columns (Institution
, Coverage
, Cluster
and Dimension
) are
identifiers of the data, whereas the data is stored only in the last column
(score
).
url.l <- "http://xavier-fim.net/publication/regulation-governance-2018-agency-proliferation-globalization-regulatory-state/ra-reggov_2018-institutions-long.csv"
# ra.l: regulatory agencies in the long format
ra.l <- read.table(url.l, header = TRUE, sep = ";", check.names = FALSE) %>%
as_tibble()
str(ra.l)
## tibble [3,196 × 5] (S3: tbl_df/tbl/data.frame)
## $ Institution: chr [1:3196] "National Regulatory Authority for Electricity" "Drug, Food and Medical Technology Administration" "Central Bank of Argentina" "National Commission for Protection of Competition" ...
## $ Coverage : chr [1:3196] "Argentina :: Electricity" "Argentina :: Food Safety - Pharmaceuticals" "Argentina :: Central Bank - Financial Services" "Argentina :: Competition" ...
## $ Cluster : chr [1:3196] "#6 Responsible" "#6 Responsible" "#1 Ideal" "#1 Ideal" ...
## $ Dimension : chr [1:3196] "Managerial autonomy" "Managerial autonomy" "Managerial autonomy" "Managerial autonomy" ...
## $ score : num [1:3196] -0.26 -0.48 0.46 0.77 1.12 0.84 0.86 0.05 0.25 0.7 ...
head(ra.l)
## # A tibble: 6 x 5
## Institution Coverage Cluster Dimension score
## <chr> <chr> <chr> <chr> <dbl>
## 1 National Regulatory Auth… Argentina :: Electric… #6 Respon… Managerial … -0.26
## 2 Drug, Food and Medical T… Argentina :: Food Saf… #6 Respon… Managerial … -0.48
## 3 Central Bank of Argentina Argentina :: Central … #1 Ideal Managerial … 0.46
## 4 National Commission for … Argentina :: Competit… #1 Ideal Managerial … 0.77
## 5 National Regulatory Auth… Argentina :: Gas #1 Ideal Managerial … 1.12
## 6 Superintendence of Healt… Argentina :: Health S… #5 Autono… Managerial … 0.84
The wide format has a row for every institution, and the scores of the four dimensions are
stored in different columns. The first three columns identify the observation (Institution
, Coverage
and Cluster
),
and the remaining four columns store the data (Managerial autonomy
, Political independence
, Public accountability
, and Regulatory capabilities
).
url.w <- "http://xavier-fim.net/publication/regulation-governance-2018-agency-proliferation-globalization-regulatory-state/ra-reggov_2018-institutions-wide.csv"
# ra.w: regulatory agencies in the wide format
ra.w <- read.table(url.w, header = TRUE, sep = ";", check.names = FALSE) %>%
as_tibble()
str(ra.w)
## tibble [799 × 7] (S3: tbl_df/tbl/data.frame)
## $ Institution : chr [1:799] "Administration of Occupational Safety and Health " "Administrative Council for Economic Defense" "Afghan Atomic Energy High Commission" "Afghanistan Telecommunications Regulatory Authority" ...
## $ Coverage : chr [1:799] "Iceland :: Work Safety" "Brazil :: Competition" "Afghanistan :: Nuclear Safety and Radiological Protection" "Afghanistan :: Telecommunications" ...
## $ Cluster : chr [1:799] "#1 Ideal" "#2 Constrained" "#4 Dependent" "#4 Dependent" ...
## $ Managerial autonomy : num [1:799] 0.18 0.6 -0.63 0.38 -0.16 -0.49 -0.74 0.44 -0.34 -2 ...
## $ Political independence : num [1:799] 0.38 0.89 -2.01 -1.82 0.57 0.77 0.31 -0.47 -0.48 -1.33 ...
## $ Public accountability : num [1:799] 0.52 0.38 -1.6 -1.16 -0.7 0.39 -0.54 0.16 0.15 -0.4 ...
## $ Regulatory capabilities: num [1:799] 0.03 -0.57 -0.78 0.75 -1.29 -0.93 0.01 0.42 0.03 0.05 ...
head(ra.w)
## # A tibble: 6 x 7
## Institution Coverage Cluster `Managerial aut… `Political indep…
## <chr> <chr> <chr> <dbl> <dbl>
## 1 "Administration o… Iceland :: Wor… #1 Ideal 0.18 0.38
## 2 "Administrative C… Brazil :: Comp… #2 Cons… 0.6 0.89
## 3 "Afghan Atomic En… Afghanistan ::… #4 Depe… -0.63 -2.01
## 4 "Afghanistan Tele… Afghanistan ::… #4 Depe… 0.38 -1.82
## 5 "Agence National … Tunisia :: Env… #3 Mime… -0.16 0.570
## 6 "Agency for Envir… France :: Envi… #2 Cons… -0.49 0.77
## # … with 2 more variables: Public accountability <dbl>,
## # Regulatory capabilities <dbl>
Cluster characteristics
Several characteristics of the clusters can be calculaded by aggregating the
values of the institutions in each of them, such as the mean (mean()
) and the standard
deviation (sd()
):
cl.ch <- ra.l %>% # cl.ch: cluster characteristics
group_by(Cluster, Dimension) %>%
summarize(Mean = mean(score), SD = sd(score))
head(cl.ch)
## # A tibble: 6 x 4
## # Groups: Cluster [2]
## Cluster Dimension Mean SD
## <chr> <chr> <dbl> <dbl>
## 1 #1 Ideal Managerial autonomy 0.614 0.277
## 2 #1 Ideal Political independence 0.719 0.259
## 3 #1 Ideal Public accountability 0.581 0.396
## 4 #1 Ideal Regulatory capabilities 0.343 0.251
## 5 #2 Constrained Managerial autonomy 0.123 0.496
## 6 #2 Constrained Political independence 0.560 0.153
This can be processed and plotted.
ggplot(cl.ch, aes(x = Mean, y = SD, color = Cluster)) +
geom_point() +
facet_wrap(~ Dimension)
The figure shows that clusters with higher means in the dimension considered are also generally more homogeneous (as shown by the lower standard deviations), specially in the case of Managerial autonomy and Regulatory capabilities. It is less clear in Political independence and almost nonexistent in Public accountability.
Getting the coverage by country and sector
If the interest lies in having some sort of aggregated measures of country and
sector characteristics, then the variable Coverage
provides the country(es)
and sector/s that are covered by every regulatory agency.
Observations (institutions) must therefore be expanded to more cases in order to cover all the possible spaces (country/sector).
First we must separate the coverage by countries vs. sectors. Then we proceed to replicate the institutions by the each of the countries/sectors they are covering.
ra.cv <- ra.l %>% # ra.cv: regulatory agencies coverage
# Separate the content of coverage in two different new variables
separate(Coverage, c("countries", "sectors"), " :: ") %>%
# create new entries for each observation based on the variable
# sectors, that are generated using the multiple sectors present
# in the original plural variable (sectors).
# Rename the resulting variable into singular (Sector)
separate_rows(sectors, sep = " - ") %>%
rename(Sector = sectors) %>%
separate_rows(countries, sep = " - ") %>%
rename(Country = countries)
From the original object ra.l
with the following observations and variables:
dim(ra.l)
## [1] 3196 5
We obtain a new object ra.cv
that contains more observations (the institutions
copied over and over to fit the number of sectors and countries being covered)
and variables (the Country and Sector identifiers).
dim(ra.cv)
## [1] 4732 6
head(ra.cv)
## # A tibble: 6 x 6
## Institution Country Sector Cluster Dimension score
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 National Regulatory Author… Argenti… Electricity #6 Respo… Managerial a… -0.26
## 2 Drug, Food and Medical Tec… Argenti… Food Safety #6 Respo… Managerial a… -0.48
## 3 Drug, Food and Medical Tec… Argenti… Pharmaceut… #6 Respo… Managerial a… -0.48
## 4 Central Bank of Argentina Argenti… Central Ba… #1 Ideal Managerial a… 0.46
## 5 Central Bank of Argentina Argenti… Financial … #1 Ideal Managerial a… 0.46
## 6 National Commission for Pr… Argenti… Competition #1 Ideal Managerial a… 0.77
This way, we have moved the dataset from being based on institutions to being based on spaces covered by regulation, which now makes it suitable to work with any kind of aggregations (country or sector based) that the research has in mind. Just simply be aware that the original one was developed, thought and designed for institutions, not spaces, and therefore any transformation must be adapted to the researcher’s theoretical arguments and empirical needs.
Sector aggregations
We can now ask questions like what are the sectors with higher medians in Regulatory capabilities?
Sector | Sector median |
---|---|
Electricity | 0.410 |
Gas | 0.410 |
Water | 0.395 |
Telecommunications | 0.390 |
Postal Services | 0.375 |
Central Bank | 0.345 |
Financial Services | 0.220 |
Securities and Exchange | 0.085 |
Pensions | 0.060 |
Insurance | 0.050 |
Competition | 0.040 |
Pharmaceuticals | 0.020 |
Health Services | -0.115 |
Nuclear Safety and Radiological Protection | -0.360 |
Food Safety | -0.370 |
Work Safety | -0.480 |
Environment | -0.700 |
ra.cv %>%
filter(Dimension == "Regulatory capabilities") %>%
group_by(Sector) %>%
summarize(`Sector median` = median(score)) %>%
arrange(desc(`Sector median`))
Or we can generalize the study of medians to include all dimensions:
sm <- ra.cv %>% # sm: sector medians
group_by(Dimension, Sector) %>%
summarize(`Sector median` = median(score))
ggplot(sm, aes(x = `Sector median`,
y = reorder(Sector, `Sector median`),
color = Dimension)) +
ylab("Sector") +
geom_point()
Country aggregations
Country aggregations are a bit more problematic than sector ones, given that the number of institutions in each country is obviously much more limited than the number of institutions in each sector. Therefore, we must proceed with care.
For instance, to get the number of institutions in the 15 countries with fewer institutions:
ra.cv %>%
# Get the unique pairs of country * institution
# so that we obtain institutions, not spaces
select(Country, Institution) %>%
unique() %>%
# by country, simply count (using the n() function) the number
group_by(Country) %>%
summarize(`Number of regulatory agencies` = n()) %>%
# order by ascending number of agencies
# and report back only the first 10 cases
arrange(`Number of regulatory agencies`) %>%
slice(1:15)
Country | Number of regulatory agencies |
---|---|
Cambodia | 1 |
Korea, Democratic People’s Republic of | 1 |
Lao People’s Democratic Republic | 1 |
Madagascar | 1 |
Myanmar | 1 |
Brunei Darussalam | 2 |
Côte d’Ivoire | 2 |
Haiti | 2 |
Kuwait | 2 |
Syrian Arab Republic | 2 |
Burkina Faso | 3 |
Chad | 3 |
Congo, the Democratic Republic of the | 3 |
Nepal | 3 |
Niger | 3 |
Remember that the number of agencies do not necessarily has to match with the number of spaces (sectors) covered. So in this case we do almost exactly the same as before, but instead of counting agencies, we count spaces covered by such agencies.
ra.cv %>%
# Get the unique pairs of country * sector
# so that we obtain spaces, not spaces
select(Country, Sector) %>%
unique() %>%
group_by(Country) %>%
summarize(`Number of sectors covered` = n()) %>%
arrange(`Number of sectors covered`) %>%
slice(1:15)
Country | Number of sectors covered |
---|---|
Côte d’Ivoire | 2 |
Madagascar | 2 |
Myanmar | 2 |
Brunei Darussalam | 3 |
Burkina Faso | 3 |
Cambodia | 3 |
Chad | 3 |
Congo, the Democratic Republic of the | 3 |
Haiti | 3 |
Korea, Democratic People’s Republic of | 3 |
Kuwait | 3 |
Lao People’s Democratic Republic | 3 |
Afghanistan | 4 |
Cameroon | 4 |
Syrian Arab Republic | 4 |
We can also take a look at the aggregated scores by country in each of the dimensions, looking at their centrality (mean) and dispersion (standard deviation). Again, the aggregation function to use is entirely up to the researcher responsibility.
c.ch <- ra.cv %>% # c.ch: country characteristics
group_by(Country, Dimension) %>%
summarize(Mean = mean(score), SD = sd(score))
head(c.ch)
## # A tibble: 6 x 4
## # Groups: Country [2]
## Country Dimension Mean SD
## <chr> <chr> <dbl> <dbl>
## 1 Afghanistan Managerial autonomy -0.05 0.694
## 2 Afghanistan Political independence -0.888 1.30
## 3 Afghanistan Public accountability -1.27 0.389
## 4 Afghanistan Regulatory capabilities -0.192 0.796
## 5 Algeria Managerial autonomy 0.0387 0.579
## 6 Algeria Political independence -0.256 0.921
This can be processed and plotted.
ggplot(c.ch, aes(x = Mean, y = SD, color = Dimension, label = Country)) +
geom_point() +
geom_text(hjust = 0, nudge_x = 0.02) +
facet_wrap(~ Dimension)
There does not seem to be any sort of association between the median scores of the country agencies’ in each dimension and the dispersion of them.
But also other stories can be read from this figure. For instance, for Political independence Yemen has a mean value in all their spaces, but the spaces seem to be very heterogeneous. Therefore, one may conclude that probably in Yemen the regulatory spaces covered by the agencies are either with a very high or with a low value in their Political independence.
Sys.time()
## [1] "2021-03-15 11:48:28 CET"