Package Functions

Example

To demonstrate the use of visual sensitivity analysis, we use the effect of child maltreatment on the extent of mental health problems in terms of internalising and externalising behaviour. The study of Kisely and colleagues (Kisely et al., 2018) was based on a general population sample in Brisbane, Australia, and compared 3554 mother-child pairs without ‘substantiated child maltreatment’ to, for example, 73 pairs with child neglect. Note that the results vary across different types of maltreatment assessed, we choose child neglect because its results (a smaller but still considerable l after adjustment) are particularly suited to illustrating sensitivity analysis. Maltreatment was assessed ‘by linkage to state child protection agency data’. Internalising and externalising behaviours were measured using the Youth Self-Report (YSR) scale (Achenbach & Rescorla, 2001) at around the age of 21. The study reports unadjusted mean differences and mean differences adjusted for ‘gender, parental ethnicity, maternal age, family income, maternal relationship status, maternal education, youth income level, youth education level, youth marital status’ (e.g., likely based on ordinary least squares regression, but the paper does not specify).

See supplemental document at https://static.cambridge.org/content/id/urn:cambridge.org:id:article:S0007125018002076/resource/name/S0007125018002076sup001.docx

Calculate effect size

To obtain the estimates and two-tailed 95% confidence intervals on the effect size d scale (via d = \(M_{difference}\) / \(SD_{Total}\), similar to Glass’ \(\Delta\)), we divided the reported unadjusted and adjusted mean differences for internalising and externalising by the respective standard deviations of the total sample.

internal_unadjusted <- list(
  mean_diff = 3.68,
  lower_diff = 1.73,
  upper_diff = 5.62,
  sd = 8.29
)

internal_adjusted <- list(
  mean_diff = 2.73,
  lower_diff = 0.77,
  upper_diff = 4.69,
  sd = 8.28
)

external_unadjusted <- list(
  mean_diff = 3.72,
  lower_diff = 2.13,
  upper_diff = 5.32,
  sd = 6.81
)

external_adjusted <- list(
  mean_diff = 3.10,
  lower_diff = 1.49,
  upper_diff = 4.71,
  sd = 6.81
)

list_values <- list(internal_unadjusted, internal_adjusted, 
                 external_unadjusted, external_adjusted)
list_names <- c("internal_unadjusted", "internal_adjusted", 
                 "external_unadjusted", "external_adjusted")
names(list_values) <- list_names

for (i in list_names){
  list_values[[i]][["d"]] <- list_values[[i]][["mean_diff"]] / list_values[[i]][["sd"]]
  list_values[[i]][["lower_d"]] <- list_values[[i]][["lower_diff"]] / list_values[[i]][["sd"]]
  list_values[[i]][["upper_d"]] <- list_values[[i]][["upper_diff"]] / list_values[[i]][["sd"]]
}

unlist(list_values)
#>  internal_unadjusted.mean_diff internal_unadjusted.lower_diff 
#>                     3.68000000                     1.73000000 
#> internal_unadjusted.upper_diff         internal_unadjusted.sd 
#>                     5.62000000                     8.29000000 
#>          internal_unadjusted.d    internal_unadjusted.lower_d 
#>                     0.44390832                     0.20868516 
#>    internal_unadjusted.upper_d    internal_adjusted.mean_diff 
#>                     0.67792521                     2.73000000 
#>   internal_adjusted.lower_diff   internal_adjusted.upper_diff 
#>                     0.77000000                     4.69000000 
#>           internal_adjusted.sd            internal_adjusted.d 
#>                     8.28000000                     0.32971014 
#>      internal_adjusted.lower_d      internal_adjusted.upper_d 
#>                     0.09299517                     0.56642512 
#>  external_unadjusted.mean_diff external_unadjusted.lower_diff 
#>                     3.72000000                     2.13000000 
#> external_unadjusted.upper_diff         external_unadjusted.sd 
#>                     5.32000000                     6.81000000 
#>          external_unadjusted.d    external_unadjusted.lower_d 
#>                     0.54625551                     0.31277533 
#>    external_unadjusted.upper_d    external_adjusted.mean_diff 
#>                     0.78120411                     3.10000000 
#>   external_adjusted.lower_diff   external_adjusted.upper_diff 
#>                     1.49000000                     4.71000000 
#>           external_adjusted.sd            external_adjusted.d 
#>                     6.81000000                     0.45521292 
#>      external_adjusted.lower_d      external_adjusted.upper_d 
#>                     0.21879589                     0.69162996

Calculate d and Confidence Interval

To visualise the lower level confidence interval l, we must obtain the lower bound of the confidence interval for our proposed effect size. ViSe includes a function calculate_d() that calculates from t.test() output, dataframes, individual vectors of data, sample statistics (M, SD, n for group), t-test values, or a pre-calculated d-value.

In this example, we have our d value from the original research, along with sample sizes from the study which can be used to calculate the two-tailed dlow_central or one-tailed done_low_central lower confidence interval. The package vignette shows all possible ways to calculate values from data and includes the non-centralized confidence intervals for effect size d as well (below).

internal_unadj_output <- calculate_d(
  d = list_values$internal_adjusted$d, # d value
  a = .05, # alpha for confidence interval
  lower = TRUE, # you expect d to be positive 
  n1 = 71, # sample size group 1 
  n2 = 3653  # sample size group 2
)

internal_unadj_output$dlow_central
#> [1] 0.09465985
internal_unadj_output$done_low_central
#> [1] 0.1324648

# note, the program also provide noncentral t confidence intervals
# in this case, they are unusable because d has been calculated from 
# mean difference / control rather than mean difference / spooled
# therefore the approximation of t and the noncentral 
# limits is not appropriate

Other Examples of `calculate_d`

# from dataframe
calculate_d(
  df = mtcars,
  x_col = "am",
  y_col = "hp",
  a = .05,
  lower = TRUE
)
#> $d
#> [1] -0.501465
#> 
#> $dlow
#> [1] -1.2067
#> 
#> $dhigh
#> [1] 0.2260463
#> 
#> $dlow_central
#> [1] -1.247618
#> 
#> $dhigh_central
#> [1] 0.2446883
#> 
#> $done_low
#> [1] -1.091478
#> 
#> $done_low_central
#> [1] -1.121567
#> 
#> $M1
#> [1] 126.8462
#> 
#> $sd1
#> [1] 84.06232
#> 
#> $se1
#> [1] 23.31469
#> 
#> $M1low
#> [1] 76.0478
#> 
#> $M1high
#> [1] 177.6445
#> 
#> $M2
#> [1] 160.2632
#> 
#> $sd2
#> [1] 53.9082
#> 
#> $se2
#> [1] 19.28522
#> 
#> $M2low
#> [1] 119.7464
#> 
#> $M2high
#> [1] 200.7799
#> 
#> $spooled
#> [1] 67.60359
#> 
#> $sepooled
#> [1] 24.33304
#> 
#> $n1
#> [1] 13
#> 
#> $n2
#> [1] 19
#> 
#> $df
#> [1] 30
#> 
#> $t
#> [1] -1.373318
#> 
#> $p
#> [1] 0.1798309
#> 
#> $estimate
#> [1] "$d_s$ = -0.50, 95\\% CI [-1.21, 0.23]"
#> 
#> $statistic
#> [1] "$t$(30) = -1.37, $p$ = .180"

# from two columns
x <- mtcars$am
y <- mtcars$hp

calculate_d(
  x_col = x,
  y_col = y,
  a = .05,
  lower = TRUE
)
#> $d
#> [1] -0.501465
#> 
#> $dlow
#> [1] -1.2067
#> 
#> $dhigh
#> [1] 0.2260463
#> 
#> $dlow_central
#> [1] -1.247618
#> 
#> $dhigh_central
#> [1] 0.2446883
#> 
#> $done_low
#> [1] -1.091478
#> 
#> $done_low_central
#> [1] -1.121567
#> 
#> $M1
#> [1] 126.8462
#> 
#> $sd1
#> [1] 84.06232
#> 
#> $se1
#> [1] 23.31469
#> 
#> $M1low
#> [1] 76.0478
#> 
#> $M1high
#> [1] 177.6445
#> 
#> $M2
#> [1] 160.2632
#> 
#> $sd2
#> [1] 53.9082
#> 
#> $se2
#> [1] 19.28522
#> 
#> $M2low
#> [1] 119.7464
#> 
#> $M2high
#> [1] 200.7799
#> 
#> $spooled
#> [1] 67.60359
#> 
#> $sepooled
#> [1] 24.33304
#> 
#> $n1
#> [1] 13
#> 
#> $n2
#> [1] 19
#> 
#> $df
#> [1] 30
#> 
#> $t
#> [1] -1.373318
#> 
#> $p
#> [1] 0.1798309
#> 
#> $estimate
#> [1] "$d_s$ = -0.50, 95\\% CI [-1.21, 0.23]"
#> 
#> $statistic
#> [1] "$t$(30) = -1.37, $p$ = .180"

# from summary statistics 
calculate_d(m1 = 14.37, # neglect mean
   sd1 = 10.716, # neglect sd
   n1 = 71, # neglect n
   m2 = 10.69, # none mean
   sd2 = 8.219, # none sd
   n2 = 3653, # none n
   a = .05, # alpha/confidence interval
   lower = TRUE) # lower or upper bound
#> $d
#> [1] 0.4448249
#> 
#> $dlow
#> [1] 0.2097233
#> 
#> $dhigh
#> [1] 0.6798669
#> 
#> $dlow_central
#> [1] 0.2096767
#> 
#> $dhigh_central
#> [1] 0.6799731
#> 
#> $done_low
#> [1] 0.2475166
#> 
#> $done_low_central
#> [1] 0.2474974
#> 
#> $M1
#> [1] 14.37
#> 
#> $sd1
#> [1] 10.716
#> 
#> $se1
#> [1] 1.271755
#> 
#> $M1low
#> [1] 11.83356
#> 
#> $M1high
#> [1] 16.90644
#> 
#> $M2
#> [1] 10.69
#> 
#> $sd2
#> [1] 8.219
#> 
#> $se2
#> [1] 0.135986
#> 
#> $M2low
#> [1] 10.42338
#> 
#> $M2high
#> [1] 10.95662
#> 
#> $spooled
#> [1] 8.272918
#> 
#> $sepooled
#> [1] 0.9913101
#> 
#> $n1
#> [1] 71
#> 
#> $n2
#> [1] 3653
#> 
#> $df
#> [1] 3722
#> 
#> $t
#> [1] 3.712259
#> 
#> $p
#> [1] 0.0002084243
#> 
#> $estimate
#> [1] "$d_s$ = 0.44, 95\\% CI [0.21, 0.68]"
#> 
#> $statistic
#> [1] "$t$(3722) = 3.71, $p$ < .001"

# from t-test model
output <- t.test(mtcars$hp ~ mtcars$am, var.equal = TRUE)
n_values <- tapply(mtcars$hp, mtcars$am, length)
calculate_d(
  model = output, 
  n1 = unname(n_values[1]),
  n2 = unname(n_values[2]),
  a = .05, 
  lower = TRUE
)
#> $d
#> [1] 0.501465
#> 
#> $dlow
#> [1] -0.2260463
#> 
#> $dhigh
#> [1] 1.2067
#> 
#> $dlow_central
#> [1] -0.2446883
#> 
#> $dhigh_central
#> [1] 1.247618
#> 
#> $done_low
#> [1] -0.1109204
#> 
#> $done_low_central
#> [1] -0.1186368
#> 
#> $M1
#> NULL
#> 
#> $sd1
#> NULL
#> 
#> $se1
#> NULL
#> 
#> $M1low
#> NULL
#> 
#> $M1high
#> NULL
#> 
#> $M2
#> NULL
#> 
#> $sd2
#> NULL
#> 
#> $se2
#> NULL
#> 
#> $M2low
#> NULL
#> 
#> $M2high
#> NULL
#> 
#> $spooled
#> NULL
#> 
#> $sepooled
#> NULL
#> 
#> $n1
#> [1] 19
#> 
#> $n2
#> [1] 13
#> 
#> $df
#> [1] 30
#> 
#> $t
#> [1] 1.373318
#> 
#> $p
#> [1] 0.1798309
#> 
#> $estimate
#> [1] "$d_s$ = 0.50, 95\\% CI [-0.23, 1.21]"
#> 
#> $statistic
#> [1] "$t$(30) = 1.37, $p$ = .180"

# from t-values 
calculate_d(
  t = 1.37,
  n1 = unname(n_values[1]),
  n2 = unname(n_values[2]),
  a = .05, 
  lower = TRUE
)
#> $d
#> [1] 0.5002533
#> 
#> $dlow
#> [1] -0.2271793
#> 
#> $dhigh
#> [1] 1.205462
#> 
#> $dlow_central
#> [1] -0.2458469
#> 
#> $dhigh_central
#> [1] 1.246353
#> 
#> $done_low
#> [1] -0.1120615
#> 
#> $done_low_central
#> [1] -0.1198044
#> 
#> $M1
#> NULL
#> 
#> $sd1
#> NULL
#> 
#> $se1
#> NULL
#> 
#> $M1low
#> NULL
#> 
#> $M1high
#> NULL
#> 
#> $M2
#> NULL
#> 
#> $sd2
#> NULL
#> 
#> $se2
#> NULL
#> 
#> $M2low
#> NULL
#> 
#> $M2high
#> NULL
#> 
#> $spooled
#> NULL
#> 
#> $sepooled
#> NULL
#> 
#> $n1
#> [1] 19
#> 
#> $n2
#> [1] 13
#> 
#> $df
#> [1] 30
#> 
#> $t
#> [1] 1.37
#> 
#> $p
#> [1] 0.1808537
#> 
#> $estimate
#> [1] "$d_s$ = 0.50, 95\\% CI [-0.23, 1.21]"
#> 
#> $statistic
#> [1] "$t$(30) = 1.37, $p$ = .181"

Convert between effect sizes (specification plot)

Not all research studies use d as the effect size of interest, and ViSe provides functionality to convert between effect sizes. The other_to_d() function can be used to convert effect sizes f, \(f^2\), NNT, r, probability of superiority, U1, U2, U3, and proportional overlap into d.

other_to_d(nnt = 35)
#> [1] 0.051

All options of other_to_d():

f = NULL,
f2 = NULL,
nnt = NULL,
r = NULL,
prob = NULL,
prop_u1 = NULL,
prop_u2 = NULL,
prop_u3 = NULL,
prop_overlap = NULL

The effect size d value can then be used to visualize all effects and their conversions at once in the visualize_effects() function. You can use the percent, color, and font family options to adjust the resulting graph for readability and color scheme.

visualize_effects(d = list_values$internal_adjusted$d,
                  circle_color = "lightblue",
                  circle_fill = "gray",
                  percent_color = "darkblue",
                  percent_size = 10,
                  text_color = "black", 
                  font_family = "Times")
#> $graph


# note graphs look better scaled, try saving them 
# ggsave(filename = "visualize_effects.png")

# you can make very ugly graphs if you want
visualize_effects(d = .2, 
                  circle_color = "green", 
                  circle_fill = "orange", 
                  percent_color = "pink", 
                  percent_size = 20, 
                  text_color = "purple",
                  font_family = "Arial")
#> $graph

This package uses the following conversion functions that can be implemented separately as well:

d_to_r(d = list_values$internal_adjusted$d)
#> [1] 0.1626596
d_to_f2(d = list_values$internal_adjusted$d)
#> $f
#> [1] 0.1648551
#> 
#> $f2
#> [1] 0.02717719
d_to_nnt(d = list_values$internal_adjusted$d)
#> [1] 5.424537
probability_superiority(d = list_values$internal_adjusted$d)
#> [1] 0.5921738
proportion_overlap(d = list_values$internal_adjusted$d)
#> $u1
#> [1] 0.2315626
#> 
#> $u2
#> [1] 0.565471
#> 
#> $u3
#> [1] 0.6291905
#> 
#> $p_o
#> [1] 0.8690581

Visualization of sensitivity of effect size to bias

Now that we have an idea of our d value, the lower confidence limit l, and other potential effect sizes, we can create a sensitivity plot of the effect size to bias. To visualize different d values and their representation of the distribution overlap, use estimate_d() to examine different effect sizes:

estimate_d(d = .09, 
           fill_1 = "red", 
           fill_2 = "blue", 
           text_color = "black")$graph

Similarly, estimate_r() shows a scatterplot of the entered correlation coefficient to visualize the relation between two variables:

estimate_r(r = .30)$graph

You can then use the two estimation functions to create a graph that shows you which combinations of effect size and r would indicate a causal effect. You can enter any effect size from our conversion options, and these will be converted to d with labels for inspection. The shaded area represents an area that would be considered the effect, and the points represent the entered combination of r and effect size. Points in the shaded area would be considered sensitive to bias.

You can use the plotly library to hover over those dots to see their values (also embedded in our shiny app). Note: Not run to make this package CRAN compatible (plotly makes html files large).

visual_c_mapped <-
  # your lower confidence limit required
  visualize_c_map(dlow = list_values$internal_adjusted$lower_d, 
                  # correlation values required
                  r_values = c(.1, .4, .3),
                  # other effect sizes you want to plot
                  d_values = c(.2, .8),
                  nnt_values = c(60),
                  # if you think d will be positive 
                  lower = TRUE,
                  # as many values as the max number effects
                  point_colors = c("red", "green", "blue"),
                  # a size for the shapes
                  size = 2,
                  # shape 1
                  shape_1 = 2,
                  # shape 2, make these the same number if you 
                  # want the shapes overlapping
                  # we think two different ones helps readability 
                  shape_2 = 3,
                  # color of the background highlighted area 
                  ribbon_color = "lightblue"
                  )

visual_c_mapped$graph


ggsave(filename = "visualize_c_map.png", width = 8, 
       height = 6, dpi = 300)

# ggplotly(visual_c_mapped$graph)

All options for the graph, including their defaults:

visualize_c_map(
  dlow,
  r_values,
  d_values = NULL,
  f_values = NULL,
  f2_values = NULL,
  nnt_values = NULL,
  prob_values = NULL,
  prop_u1_values = NULL,
  prop_u2_values = NULL,
  prop_u3_values = NULL,
  prop_overlap_values = NULL,
  lower = TRUE,
  point_colors = c("red", "green", "blue"),
  size = 2,
  shape_1 = 2,
  shape_2 = 3,
  ribbon_color = "lightblue"
)