Pixar Film Ratings

Setup

library(pixarfilms)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(forcats)
library(ggplot2)
library(irr)
#> Loading required package: lpSolve

Data wrangling

Before we can visualize our data, let’s wrangle our data to help us visualize it later on.

df <-
  public_response %>%
  select(-cinema_score) %>%
  mutate(film = fct_inorder(film)) %>%
  pivot_longer(cols = c("rotten_tomatoes", "metacritic", "critics_choice"),
               names_to = "ratings",
               values_to = "value") %>%
  mutate(ratings = case_when(
    ratings == "metacritic" ~ "Metacritic",
    ratings == "rotten_tomatoes" ~ "Rotten Tomatoes",
    ratings == "critics_choice" ~ "Critics Choice"
  )) %>%
  drop_na()

Ratings over time

Their first plot was comparing the Pixar films’ ratings over time.

df %>%
  ggplot(aes(x = film, y = value, col = ratings)) +
  geom_point() +
  geom_line(aes(group = ratings)) +
  scale_color_brewer(palette = "Dark2") +
  labs(x = "Pixar film", y = "Rating value") +
  guides(col = guide_legend(title = "Ratings")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
        legend.position = "bottom")

Verdict: people and critics generally agree that Cars 2 was not as good as the other Pixar films.

Ratings by rating group

Next, let’s group the rating categories to see if there is a consistency across.

df %>%
  ggplot(aes(x = ratings, y = value, col = ratings)) +
  geom_boxplot(width = 1.75 / length(unique(df$ratings))) +
  ggbeeswarm::geom_beeswarm() +
  ggrepel::geom_text_repel(data = . %>%
                             filter(film == "Cars 2" ) %>%
                             filter(ratings == "Rotten Tomatoes"),
                           aes(label = film),
                           point.padding = 0.4) +
  scale_color_brewer(palette = "Dark2") +
  guides(col = guide_legend(title = "Ratings")) +
  labs(x = "Rating group", y = "Rating value") +
  ylim(c(30, 100)) +
  theme_minimal() +
  theme(legend.position = "bottom")

Verdict: people at Rotten Tomatoes generally like Pixar films more than Metacritic and Critics Choice. The exception to this is Cars 2, which rated the lowest out of all critic groups.

Rating consistency

Are the groups statistically consistent? Let’s perform an interclass correlation among the different critic groups.

public_response %>%
  select(-c(cinema_score, film)) %>%
  drop_na() %>%
  icc(model = "twoway", type = "consistency")
#>  Single Score Intraclass Correlation
#> 
#>    Model: twoway 
#>    Type : consistency 
#> 
#>    Subjects = 21 
#>      Raters = 3 
#>    ICC(C,1) = 0.797
#> 
#>  F-Test, H0: r0 = 0 ; H1: r0 > 0 
#>    F(20,40) = 12.8 , p = 1.25e-11 
#> 
#>  95%-Confidence Interval for ICC Population Values:
#>   0.633 < ICC < 0.904

Verdict: with a null hypothesis that all critic groups are not consistent, for the 21 Pixar films we have data for all critic groups, all groups are consistent in rating Pixar films (p < 0.001).

Session information

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] irr_0.84.1       lpSolve_5.6.15   ggplot2_3.3.2    forcats_0.5.0   
#> [5] tidyr_1.1.2      dplyr_1.0.5      pixarfilms_0.2.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5         vipor_0.4.5        pillar_1.4.6       compiler_3.6.1    
#>  [5] RColorBrewer_1.1-2 tools_3.6.1        digest_0.6.27      evaluate_0.14     
#>  [9] lifecycle_1.0.0    tibble_3.0.4       gtable_0.3.0       pkgconfig_2.0.3   
#> [13] rlang_0.4.10       DBI_1.1.0          ggrepel_0.8.2      yaml_2.2.1        
#> [17] beeswarm_0.3.1     xfun_0.16          withr_2.4.1        stringr_1.4.0     
#> [21] knitr_1.29         generics_0.1.0     vctrs_0.3.7        grid_3.6.1        
#> [25] tidyselect_1.1.0   glue_1.4.2         R6_2.4.1           ggbeeswarm_0.6.0  
#> [29] rmarkdown_2.7      farver_2.0.3       purrr_0.3.4        magrittr_1.5      
#> [33] scales_1.1.1       ellipsis_0.3.1     htmltools_0.5.0    assertthat_0.2.1  
#> [37] colorspace_1.4-1   labeling_0.3       stringi_1.4.6      munsell_0.5.0     
#> [41] crayon_1.4.1