Reading data from CIS into R

The Spanish Center for Sociological Research (CIS, Centro de Investigaciones Sociológicas) provides its data to researchers in SAS and SPSS formats. I will not enter today into to discussion and need to provide publicly funded data in open formats, but will only cover the technical workaround that R users have to do to be able to easily import the datasets.

I have tried several times to use the read.spss.format() function from the package memisc, maintained by Martin Elff, but this function has problems with the CIS files.

My approach is to use another layer of software: PSPP. Yes, it is another complication, but PSPP works fine and from the command line, which makes things easier.

In addition to that, I also tend to recode files into UTF-8 from the very beginning, so I work from the original files transformed from ISO-8859-1 to UTF-8. The tool is recode.

So this is what I do in order to introduce CIS datasets into R.

I will use the survey 2384 for postelectoral general elections in 2000.

Recode the original file

recode latin1..utf8 ES2384

Run PSPP to generate a SAV SPSS file

pspp ES2384

Load the SAV file from R

Then within R the old ES2394 text file is now a SAV file, and so it can be read using read.spss().

library(foreign)
d <- read.spss("ES2384", to.data.frame=TRUE)

This entry was posted in Tips and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*