On replication material in political science: challenges, opportunities, limits
Author: Xavier Fernández-i-Marín
May 6, 2021 - 9 minutes
Open science PolicyPortfoliosReplication within science
One of the key principles of science is replicability, that in essence it means that you must give enough details in your final report as to the concrete procedure that you have used to arrive to your conclusions. This is just a mere technical requirement, but a very powerful one. But science and replicability can also convey an ethical compromise, and this is where you have open science. It means that not only you need to ensure replicability from a technical point of view, but also you commit to present your work as much open as possible, and to lower as much as possible any possible barrier to its diffusion. This is indeed a very similar scenario to the software world, where you have mere open source software (a technical discussion) and then you have free software (a technical plus ethical approach). Although I am a member of the Open Science Center at the LMU, this post is not about it, but only about the concrete meaning of replication today, its potential, and also its limits.
Cases
Curiously, I have had three examples of replicability in the latest two months, and by reflecting upon them I would like to share some thoughts. Because concrete implementation of replicability is not always easy or clear
Comparative politics: Carbon pricing
With Yves Steinebach and Christian Aschenbrenner we published an article in Climate Policy, on the likelihood of countries' adopting different forms of carbon pricing policies. It took a while, but lately we made the code and the data available. The code is contained inside a full report, and the data shared is the original data that we compiled. What we don’t share is the original source data on several country variables, but only their cleaned and arranged working values.
Survey data: bureaucratic discrimination
With Christian Adam, Oliver James, Anita Manatschal, Carolin Rapp and Eva Thomann we recently published a piece in the Journal of European Public Policy The replication material includes the full set of code used to arrive from the original data source provided by YouGov (responsible for the fieldwork of the survey) to all the final output. The original data contains individual-level data about prioritization and discrimination, and although it is anonymized, we probably would have to anonymize it even more. So in addition to the article details about the procedure, we also provide the full code necessary to move from the original data to the final results, including any cleanings and transformations of the data.
Comparative public policy: policy diversity
Also very recently, with Christoph Knill and Yves Steinebach we have published an article in the American Political Science Review.
The journal is very strict on the replication material and requires it to be deposited at the Harvard Dataverse of the journal (even if to work with it it is better to get it from the first link in the personal webpage than in the Dataverse).
In this case, we opted to make it as much reproducible as possible.
All. From the very beginning.
We made available the whole set of data collection lines, and all the transformations and cleaning steps that took us to the final report.
Plus, we created an R
package, PolicyPortfolios, to also distribute the data and several functions to make working with policy portfolios easier.
Is it a matter of sharing the code?
Sharing the code that has allowed you to produce your results is good, no doubt. It allows others to check for mistakes, it ensures transparency, and it makes you accountable. But it does not equate to replicability.
Take for instance the case of natural sciences and experiments on the lab: the brand of the equipment that you use, the brand of the supplies, the way you clean, the way you process your samples… All of this is usually reported in the papers, because they rely on that for reproducibility, but these kind of details are usually left out in the social sciences. This sort of micro-decisions is not always reported. For instance, we do not always report what do we do with missing data. That is my personal battle, by the way. The issue is that these micro-decisions, sometimes so subtle to ourselves that we don’t even notice, can make a difference.
About the code, let me also say that sharing it is excellent as a learning tool. Both for you as a scientist (you need to clean it up and it forces you to use good programming habits, and avoid quick but dirty solutions), and both for other people who can learn and improve, and build on top of that (which is, by the way, a very close idea to free software mentioned earlier).
Is it a matter of sharing the data?
The question, for replication, is that much like as in an experiment, the source data shouldn’t be necessary, strictly speaking, to ensure replicability.
We have detailed all the process. So at the end, in the extreme, providing the full original data would only imply that anyone can get the data, process it using our code and get the same results. But that’s not the point of replicability in science. That’s transparency. Replicability, the element that makes science move forward, is to ensure that someone else can repeat the procedure and arrive to the same substantial conclusions. But if we equate replicability to technical reproducibility, what we are doing is counterproductive to science. “You can replicate my results because all the data and the code is available”. Well, yes, I can reproduce your results. But replicability means that I must be able to follow the instructions and arrive at the same place. Not that using your data and your code I simply reproduce what you got.
So, all in all, I am not fully convinced that sharing the whole data is indeed good for science. Certainly, it is transparent, and also it is good in the sense that reproducibility is ensured. But this may discourage replicability in the end, and as a scientist I want as many other people as possible to do what I’ve done (follow my instructions, not to run my code with my data) and prove me wrong. For instance, in the bureaucratic discrimination paper at JEPP, I want other people to use another survey firm to do the same, and see how robust are the results to the sampling procedure. That’s replicability and that is the way science has to advance. Plus, I want them to process the data using their own ways, their own code, making them their own different micro-decisions. Do we get the same results? Excellent. The findings are robust and the conclusions are reinforced and the margin of error (what makes us scientists) is shrinked. Do we not get the same results? Excellent, there is something that we need to understand here, because the results do not hold under different circumstances. So let’s investigate under wich conditions do they hold. We can advance with this, too. (Of course, if the reason of not arriving to the same conclusions is a mistake, and not the procedure or the replicability itself, it is also valid, although embarrassing).
Limits
As mentioned before, the replication material at the APSR article is complete. Full. Absolute. But even in this case, nothing ensures that replication is possible (technically speaking, obtainging 100% the same values), for two reasons:
The first one is that in the process of data collection we have used the WDI
package, that allows R to directly connect to the World Bank API and get the data directly.
It turns out, however, that nothing guarantees that if you use the function in the future, the same value will be returned.
Official statistics are constantly revised, and last years' GDP per capita of a country that you get today, may not be the same value if you get it in 5 years.
Because there is a constant process of revision
of statistical data.
So even in the extreme case of observational data, which is the norm in comparative politics, there is no such thing as an immutable dataset.
While you can always sample again, or perform the experiment again, in observational data sharing the whole dataset makes more sense, but even in this case, the revision of the data plus the micro-decisions can make a difference.
The second one is that software changes, evolves, and default options may no longer be the same in future versions.
Using Bayesian inference methods and MCMC involves a stochastic process that relies heavily on the computer’s capacity to produce random (or pseudo-random) numbers.
While setting seeds ensures this, there is nothing that fixes that computers will use exactly the same default algorithm to produce random numbers, specially taking into account the fact that the need for good random numbers is one of the most advanced fields of research in computational science and also an key component in blockchain technologies and security.
By the way, and related to this, I find very necessary to always provide the output of sessionInfo()
in R
with the details of the software (version and linked matrix algebra libraries), and its packages (versions and which ones are loaded).
So with these two caveats in mind, the next thing to do would have been to a) collect the data from the World Bank without relying on their API, or at least keep the values as they are in the present and b) produce a docker image with the current versions of the software involved. That is on my todo list for the future, even if I maintain a docker image for Bayesian analysis.
Conclusion
I am still ethically committed to open science and will try to make my research as available as possible. But more and more, I fear that the move towards reproducibility will distract us by equating it to replicability and through laziness we will end up simply accepting that because the data is available and the code is reproducible that is it, we don’t have to check it independently by ourselves. By processing the data in a different way, by making different micro-decisions. By, in the end, stressing our procedures to make them robust to the conclusions. So I guess that my next move is to preach with the example and try to replicate someone else’s results using my own ways, even if it is hardly sellable to journals, then. But that is another story for another day.