ridgetorus
: PCA on the Torus
via Density RidgesIn a wide spectrum of applied fields, data is present as multivariate
circular data (toroidal data), rendering spurious the application of
many standard statistical tools. The package ridgetorus
implements Toroidal Ridge PCA (TR-PCA), an alternative to PCA suited to
toroidal data (García-Portugués and Prieto-Tirado
2023). Below are some simple examples of the use of TR-PCA
through ridge_pca()
, the main function in
ridgetorus
.
The following example illustrates the application of TR-PCA to a small dataset comprised by wind directions measured at 6:00 and 7:00 from June 1, 2003 to June 30, 2003 at a weather station in Texas. The direction is measured in radians in \([-\pi, \pi)\) with \(-\pi/-\frac{\pi}{2}/0/\frac{\pi}{2}/\pi\) representing the East/South/West/North/East directions.
library(ridgetorus)
#> Loading required package: Rcpp
data("wind")
plot(wind, xlim = c(-pi, pi), ylim = c(-pi, pi), axes = FALSE,
xlab = expression(theta[1]), ylab = expression(theta[2]))
sdetorus::torusAxis()
TR-PCA works by first modeling the data with the best-fitting model among a Bivariate Wrapped Cauchy (Kato and Pewsey 2015) and a Bivariate Sine von Mises (Singh, Hnizdo, and Demchuk 2002). Then, the density ridge (see Genovese et al. (2014)) is computed and its connected component through the mode(s) of the distribution is extracted. In this case, this connected component explains \(62\%\) of the variance of the original data. The plot below shows how the scores are computed, and how their signs are assigned. The blue asterisk stands for the Fréchet mean of the sample through TR-PCA.
The produced scores allow the identification of outliers and “rectify” the original sample.
The following dataset represents pre-earthquake direction of steepest descent and the direction of lateral ground movement after an earthquake in Noshiro, Japan. We use TR-PCA to unveil the behavior and explain the variability of the earthquake phenomena.
data("earthquakes")
earthquakes <- sdetorus::toPiInt(earthquakes)
plot(earthquakes, xlim = c(-pi, pi), ylim = c(-pi, pi),
xlab = expression(theta[1]), ylab = expression(theta[2]), axes = FALSE)
sdetorus::torusAxis()
The dataset is modeled with the best-fitting model among a Bivariate Wrapped Cauchy and a Bivariate Sine von Mises. Once the distribution is determined, the connected component of the density ridge can be computed. This flexible first principal component explains \(66\%\) of the variance and shows a clear linear trend between both studied angles.