Custom statistics

2024-07-14

The normal tidystats workflow consists of running statistics functions such as lm(), saving the output into variables, and then using the add_stats() function to add the statistics to a list. This works as long as tidystats has built-in support for the statistics functions you ran. So what should you do when this is not the case?

The first thing would be to let me know that there’s a function you would like tidystats to support. There are various ways to contact me. You can go to the Github page and create an issue. This is the preferred method because it is easy to paste code snippets and to ask follow-up questions. For alternative ways to contact me, see the tidystats website.

Of course, it’s not always possible to wait for me to add support for the function to tidystats. Nor is it always possible for me to add support for the function. This can happen when the statistic you want to report is not identifiable as belonging to a particular type of analysis (for example, the result of confint() returns a matrix, which does not contain any information about it being a matrix containing confidence intervals).

For these reasons, it is useful to know that there are two possible solutions.

The first solution is that I can still add support for functions that return objects without sufficient information. You can tell add_stats() what kind of object it is using the class argument. This only works if I explicitly coded support for a specific class. You can see which classes are supported in the help document of add_stats() (?add_stats).

The second solution is to manually extract the statistics yourself and create an object to add to add_stats(). I’ve created a few helper functions to help you do this: custom_stats() and custom_stat(). These two functions work together to help you store the statistics in a format needed for tidystats to function.

custom_stats() has two arguments: method and statistics. The method should contain a description of the type of method you used. The statistics argument requires a vector of statistics created with the custom_stat() function.

The custom_stat() function serves to create a statistic, along with the necessary information to report the statistic. At a minimum, it requires specifying the name and value of the statistic. Optionally you can also specify a symbol and a subscript so that the text editor add-ins can correctly report the statistic. Finally, it’s also possible that the statistic is a ranged statistic, i.e., it has a lower and upper bound. These statistics require that you specify the type of interval (e.g., “CI”, “HDI”), level (e.g., .95), lower and upper bound.

Below I show a few examples of adding custom statistics.

Example 1: Using the class argument

Say we want to calculate the confidence intervals for several parameters in a linear model, using the confint() function.

# Run a linear model
fit <- lm(100 / mpg ~ disp, data = mtcars)

# Compute the confidence intervals
fit_confint <- confint(fit)

# Create an empty list
statistics <- list()

# Add linear model and confidence intervals to the list
statistics <- statistics %>%
  add_stats(fit) %>%
  add_stats(fit_confint)

Unfortunately, we get an error:

Error in UseMethod(“tidy_stats”) : no applicable method for ‘tidy_stats’ applied to an object of class “c(‘matrix’, ‘array’, ‘double’, ‘numeric’)”

That’s because confint() return a standard matrix, rather than an object specific to the confint() function. You can check this yourself by running the class() function on the output of confint() (e.g., class(fit_confint)). You’ll see that it simply says "matrix" "array". That’s not enough for tidystats to work with. Ideally it would say something like "confint" so that tidystats knows what it is working with and extract the statistics.

Thankfully, we can still add statistics from confint() to a list via add_stats(), using the class argument. We can specify that the statistics are from the confint() function by saying class = "confint".

statistics <- statistics %>%
  add_stats(fit) %>%
  add_stats(fit_confint, class = "confint")

We don’t get an error this time, indicating that it worked.

Example 2: Using custom_stats()

Say you want to calculate a Bayes Factor using the BIC approach (Wagenmakers, 2007). An example of this approach can be found here; which I’ll repeat down below.

# Set seed for reproducibility
set.seed(14)

# Simulate some data
intercept_data <- data.frame(score = scale(rnorm(40), center = 0.72))

# Run two models and calculate the BIC
full_lm <- lm(score ~ 1, intercept_data)
null_lm <- lm(score ~ 0, intercept_data)

BF_BIC <- exp((BIC(null_lm) - BIC(full_lm)) / 2)

The Bayes Factor is 5.05. Now, how do you add this value to a tidystats list? If we try it the standard way, we’ll see that it fails.

# Load the tidystats package
library(tidystats)

# Create an empty list
statistics <- list()

# Add BIC to the list using add_stats()
statistics <- add_stats(statistics, BF_BIC)

This produces an error message that says:

Error in UseMethod(“tidy_stats”) : no applicable method for ‘tidy_stats’ applied to an object of class “c(‘double’, ‘numeric’)”

It’s because BF_BIC is simply a number and not the output of a statistics function, so tidystats doesn’t know how to store this number. Let’s fix this using custom_stats() and custom_stat().

# Create a list of custom statistics
BIC <- custom_stats(
  method = "BIC",
  statistics = custom_stat(
    name = "BIC Bayes Factor",
    value = BF_BIC,
    symbol = "BF",
    subscript = "10"
  )
)

# Add the statistics to the list
statistics <- add_stats(statistics, BIC)

Now we don’t get an error. Thanks to custom_stats() and custom_stat() we correctly structured the statistic so it can be added to the list via add_stats().

Summary

tidystats works by taking the output of statistical tests, extracting the statistics, and reorganizing them into a particular structure. This works if 1) tidystats has built-in support for the function and 2) if the function used to run the statistical test returns an object that tidystats can use to identify the test.

If you want to use tidystats on a function that is not supported yet, please contact me to let me know that I should add support for it.

Alternatively, you can manually create a list of statistics and supply it to the add_stats() function using custom_stats() and custom_stat().

The goal is to have tidystats support as many tests as possible, so that you rarely have to resort to this solution.