ATE.ERROR.XY: Estimating Average Treatment Effect with Measurement Error in X and Misclassification in Y

Generating Simulated Data

First, we generate our simulated data, which includes the observed outcome variable Y_star, which may be misclassified, and the covariate X_star, which is subject to measurement error.

library(ATE.ERROR)
set.seed(1)
data(Simulated_data)
Y_star <- Simulated_data$Y_star
Y <- Simulated_data$Y
A <- Simulated_data$T
Z <- Simulated_data$Z
X_star <- Simulated_data$X_star
X <- Simulated_data$X
p11 <- 0.8
p10 <- 0.2
sigma_epsilon <- 0.1
B <- 100
Lambda <- seq(0, 2, by = 0.5)
bootstrap_number <- 10

In this section, we load the necessary libraries and data. The probabilities p11 and p10 are set to 0.8 and 0.2, respectively. We define the measurement error variance sigma_epsilon and set up the parameters for the number of simulation steps (B), the sequence of lambda values (Lambda), and the number of bootstrap samples (bootstrap_number).

Applying the ATE.ERROR.XY Function

We apply the ATE.ERROR.XY function using different types of extrapolation: linear, quadratic, and nonlinear. The results from each extrapolation are stored in separate variables.

ATE.ERROR.XY_results_linear <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                            B, Lambda, extrapolation = "linear", 
                                            bootstrap_number)

ATE.ERROR.XY_results_quadratic <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                               B, Lambda, extrapolation = "quadratic", 
                                               bootstrap_number)

ATE.ERROR.XY_results_nonlinear <- ATE.ERROR.XY(Y_star, A, Z, X_star, p11, p10, sigma_epsilon, 
                                               B, Lambda, extrapolation = "nonlinear", 
                                               bootstrap_number)

Combining Summaries

The summaries from the different extrapolation methods are combined into one table, which also includes the True ATE.

combined_summary <- rbind(
  ATE.ERROR.XY_results_linear$summary,
  ATE.ERROR.XY_results_quadratic$summary,
  ATE.ERROR.XY_results_nonlinear$summary)

Adding True ATE to the Result Summary:

The True ATE is added to the result summary, and the columns are reordered to report the true ATE and the naive estimate for ATE:

True_ATE <- True_Estimation(Y, A, Z, X)

Naive_ATE_XY <- Naive_Estimation(Y_star, A, Z, X_star)

combined_summary <- data.frame(True_ATE = round(True_ATE, 3), combined_summary)
print(combined_summary)
#>   True_ATE Naive_ATE_XY Sigma_epsilon p10 p11 Extrapolation   ATE    SE
#> 1    0.162        0.092           0.1 0.2 0.8        linear 0.149 0.004
#> 2    0.162        0.092           0.1 0.2 0.8     quadratic 0.153 0.012
#> 3    0.162        0.092           0.1 0.2 0.8     nonlinear 0.149 0.002
#>               CI
#> 1 (0.144, 0.157)
#> 2 (0.135, 0.168)
#> 3 (0.147, 0.153)

This table summarizes the results from the ATE.ERROR.XY function with different extrapolation methods. It includes the True ATE, Naive ATE, measurement error variance sigma_epsilon, probabilities p10 and p11, type of extrapolation, ATE, a standard error (SE), and a 95% confidence interval (CI).

Visualizing the Distribution of ATE Estimates Using a Boxplot

We create a boxplot for the N estimates of ATE obtained from the ATE.ERROR.XY function with different extrapolation methods.

combined_data <- rbind(
  ATE.ERROR.XY_results_linear$boxplot$data,
  ATE.ERROR.XY_results_quadratic$boxplot$data,
  ATE.ERROR.XY_results_nonlinear$boxplot$data
)

combined_plot <- ggplot(combined_data, aes(x = Method, y = ATE, fill = Method)) +
  geom_boxplot() +
  geom_hline(aes(yintercept = Naive_ATE_XY, color = "naive estimate"), 
             linetype = "dashed") +
  geom_hline(aes(yintercept = True_ATE, color = "true estimate"), 
             linetype = "dashed") +
  scale_color_manual(name = NULL, values = c("naive estimate" = "red", 
                                             "true estimate" = "blue")) +
  labs(title = "ATE Estimates from the ATE.ERROR.XY Method with different 
       Approximations of the Extrapolation Function", 
       y = "ATE Estimate") +
  theme_minimal() +
  theme(legend.position = "right") +
  guides(fill = guide_legend(title = NULL, order = 1),
         color = guide_legend(title = NULL, override.aes = list(linetype = "dashed"),
                              order = 2))

print(combined_plot)

Explanation

Naive Estimation: The Naive_Estimation function calculates the naive estimate of the ATE without correcting for measurement error or misclassification.
ATE.ERROR.XY Function: The ATE.ERROR.XY function corrects for measurement error and misclassification, providing more accurate ATE estimates.
Summary Table: The summary table includes naive and true estimates, measurement error variance, extrapolation method, standard error, and confidence intervals.
Boxplot: The boxplot visualizes the distribution of ATE estimates using different extrapolation methods, with dashed lines indicating the naive and true estimates.

This vignette provides a comprehensive overview of the ATE.ERROR.XY function, demonstrating how to apply it, interpret the results, and visualize the ATE estimates. The method effectively addresses measurement error and misclassification in the data.