Extending srvyr

Greg Freedman Ellis

2024-08-19

## Loading required package: convey
## Loading required package: laeken

I don’t expect this vignette to be help for most srvyr users, it is instead intended for other package developers. An exciting new feature that is easier now that I have reworked srvyr’s non-standard evaluation to match dplyr 0.7+ is that it is now possible for non-srvyr functions to be called from within summarize. This vignette describes some of the inner-workings of summarize so that others can extend srvyr. This is kind of a fiddly part of srvyr, and I don’t expect that many people will want or need to understand it, so this guide is mostly aimed at package authors who already have an understanding of how survey objects work. If you’d like more explanation, please let me know on github!

This guide has also been rewritten for srvyr 1.0, as I had to rework summarize and was unable to maintain backwards compatibility.

Translating from survey to srvyr

srvyr implements the “survey statistics” functions from the survey package. Some examples are the svymean, svytotal, svyciprop, svyquantile and svyratio all return a svystat object which usually prints out the estimate and its standard error and other estimates of the variance can be calculated from it. In srvyr, these estimates are created inside of a summarize call and the variance estimates are specified at the same time.

The combination of srvyr’s group_by and summarize is analogous to the svyby function that performs one of the survey statistic function and performs it on multiple groups. However, as of srvyr 1.0, srvyr no longer uses svyby, instead the survey object is split into each group’s

What summarize expects

srvyr’s summarize expects that the survey statistics functions will return objects that are formatted in a particular way. Below, I’ll explain some of the functions that will help create these objects for you in most cases, but the return should be:

Helper functions exported by srvyr

srvyr now exports several functions that can help convert functions designed for the survey package to this format.

Note that these functions may not work in all cases. In srvyr, I’ve actually had to write multiple versions of get_var_est() because of minor differences in the way survey objects are returned. Hopefully they will help in most situations, or at least give you a good place to start.

Miscellaneous conventions

Two less important conventions that srvyr functions follow are:

  1. snake_case function names (to better match the tidyverse)
  2. Multiple choice arguments that default to the first (so for var_type, if no parameters are specified, use only “se” not all of them).

Example: convey::svygini -> survey_gini

That was just a lot of text, but I think it’s probably easiest just to provide an example. The convey package provides several methods for analysis of inequality using survey data. The svygini function calculates the gini coefficient. Here, we’ll write functions that make a srvyr version survey_gini.

# S3 generic function
survey_gini <- function(
  x, na.rm = FALSE, vartype = c("se", "ci", "var", "cv"), ...
) {
  if (missing(vartype)) vartype <- "se"
  vartype <- match.arg(vartype, several.ok = TRUE)
  .svy <- srvyr::set_survey_vars(srvyr::cur_svy(), x)
  
  out <- convey::svygini(~`__SRVYR_TEMP_VAR__`, na.rm = na.rm, design = .svy)
  out <- srvyr::get_var_est(out, vartype)
  as_srvyr_result_df(out)
}

And here’s what this function looks like in practice:

# Example from ?convey::svygini
suppressPackageStartupMessages({
  library(srvyr)
  library(survey)
  library(convey)
  library(laeken)
})
data(eusilc) ; names( eusilc ) <- tolower( names( eusilc ) )

# Setup for survey package
des_eusilc <- svydesign(
  ids = ~rb030, 
  strata = ~db040,  
  weights = ~rb050, 
  data = eusilc
)
des_eusilc <- convey_prep(des_eusilc)

# Setup for srvyr package
srvyr_eusilc <- eusilc %>% 
  as_survey(
    ids = rb030,
    strata = db040,
    weights = rb050
  ) %>%
  convey_prep()

## Ungrouped
# Calculate ungrouped for survey package
svygini(~eqincome, design = des_eusilc)
#>             gini     SE
#> eqincome 0.26497 0.0019

# Use new function from summarize
srvyr_eusilc %>% 
  summarize(eqincome = survey_gini(eqincome))
#> # A tibble: 1 × 2
#>   eqincome eqincome_se
#>      <dbl>       <dbl>
#> 1    0.265     0.00195

## Groups
# Calculate by groups for survey
survey::svyby(~eqincome, ~rb090, des_eusilc, convey::svygini)
#>         rb090  eqincome          se
#> male     male 0.2578983 0.002617279
#> female female 0.2702080 0.002892713

# Use new function from summarize
srvyr_eusilc %>% 
  group_by(rb090) %>%
  summarize(eqincome = survey_gini(eqincome))
#> # A tibble: 2 × 3
#>   rb090  eqincome eqincome_se
#>   <fct>     <dbl>       <dbl>
#> 1 male      0.258     0.00262
#> 2 female    0.270     0.00289