The new essurvey
1.0.3 is here! This release is mainly about downloading weight data from the European Social Survey (ESS), which has been on the works since 2017! As usual, you can install from CRAN or Github with:
# From CRAN
install.packages("essurvey")
# or development version from Github
devtools::install_github("ropensci/essurvey")
# and load
library(essurvey)
set_email("your@email.com")
Remember to set your registered email with set_email
to download ESS data. This is as easy as running set_email("your@email.com")
, with your email. The package now has two main functions to download weight data (called SDDF by the ESS): show_sddf_cntrounds
and import_sddf_country
. The first one returns the available weight rounds for a specific country. For example, for which rounds does Italy have weight data?
ita_rnds <- show_sddf_cntrounds("Italy")
ita_rnds
## [1] 6 8
How about Germany?
show_sddf_cntrounds("Germany")
## [1] 1 2 3 4 5 6 7 8
For some rounds, some countries used complete random sampling, so they didn’t need any weight data for correct estimation. Italy did not use a random sample for round 8 so let’s focus on that wave for the example. To actually download this round, we use import_sddf_country
:
# Download weight data
ita_dt <- import_sddf_country("Italy", 8)
ita_dt
## # A tibble: 2,626 x 10
## name essround edition proddate cntry idno psu domain stratum prob
## <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ESS8… 8 1.2 11.02.2… IT 1 11029 2 658 1.01e-4
## 2 ESS8… 8 1.2 11.02.2… IT 2 11170 2 665 1.11e-4
## 3 ESS8… 8 1.2 11.02.2… IT 4 11127 2 660 1.03e-4
## 4 ESS8… 8 1.2 11.02.2… IT 5 10771 2 671 1.04e-4
## 5 ESS8… 8 1.2 11.02.2… IT 6 11148 2 666 1.06e-4
## 6 ESS8… 8 1.2 11.02.2… IT 9 11163 1 667 1.05e-4
## 7 ESS8… 8 1.2 11.02.2… IT 14 11183 1 657 1.06e-4
## 8 ESS8… 8 1.2 11.02.2… IT 15 11184 2 661 9.97e-5
## 9 ESS8… 8 1.2 11.02.2… IT 16 10928 2 652 1.01e-4
## 10 ESS8… 8 1.2 11.02.2… IT 22 11171 2 664 9.97e-5
## # … with 2,616 more rows
Notice that the weight data has an idno
column. This column can be used to match each respondent from each country to the main ESS data. This means that you can now actually do proper weighted analysis using the ESS data on the fly! How would we match the data for Italy, for example?
We download the main data:
library(dplyr)
# Download main data
ita_main <- import_country("Italy", 8)
And then merge it with the weight data:
# Let's keep only the important weight columns
ita_dt <- ita_dt %>% select(idno, psu, domain, stratum, prob)
# Merged main data and weight data
complete_data <- inner_join(ita_main, ita_dt, by = "idno")
## Warning: Column `idno` has different attributes on LHS and RHS of join
# There we have the matched data
complete_data %>%
select(essround, idno, cntry, psu, stratum, prob)
## # A tibble: 2,626 x 6
## essround idno cntry psu stratum prob
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 8 1 IT 11029 658 0.000101
## 2 8 2 IT 11170 665 0.000111
## 3 8 4 IT 11127 660 0.000103
## 4 8 5 IT 10771 671 0.000104
## 5 8 6 IT 11148 666 0.000106
## 6 8 9 IT 11163 667 0.000105
## 7 8 14 IT 11183 657 0.000106
## 8 8 15 IT 11184 661 0.0000997
## 9 8 16 IT 10928 652 0.000101
## 10 8 22 IT 11171 664 0.0000997
## # … with 2,616 more rows
There we have the matched data! This can be easily piped to the survey
package and perform properly weighted analysis of the ESS data. In fact, an official ESS package for analyzing data is something we’re thinking of developing to making analyzing ESS data very easy.
Weight data (or SDDF data) is a bit tricky because not all country/rounds data have the same extension (some have SPSS, some have Stata, etc..) nor the same format (number of columns, name of columns, etc..). We would appreciate if you can submit any errors you find on Github and we’ll try taking care of them as soon as possible.
Special thanks to phnk, djhurio and Stefan Zins for helping out to push this.
Enjoy this new release!