This R package provides a novel cluster extraction method for the OPTICS algorithm, OPTICS k-Xi, along with ggplot2 visualizations and a framework to compare clustering models with varying parameters using distance-based metrics.
Density-based clustering methods are well adapted to the clustering of high-dimensional data and enable the discovery of core groups of various shapes despite large amounts of noise.
The opticskxi R package provides a novel density-based cluster extraction method, OPTICS k-Xi, and a framework to compare k-Xi models using distance-based metrics to investigate datasets with unknown number of clusters. The vignette first introduces density-based algorithms with simulated datasets, then presents and evaluates the k-Xi cluster extraction method. Finally, the models comparison framework is described and experimented on 2 genetic datasets to identify groups and their discriminating features.
The k-Xi algorithm is a novel OPTICS cluster extraction method that specifies directly the number of clusters and does not require fine-tuning of the steepness parameter as the OPTICS Xi method. Combined with a framework that compares models with varying parameters, the OPTICS k-Xi method can identify groups in noisy datasets with unknown number of clusters.
Using the devtools package in R:
::install_git('https://framagit.org/thomaschln/opticskxi.git') devtools
Compute OPTICS profile and k-Xi clustering
data('multishapes')
<- dbscan::optics(multishapes[1:2])
optics_shapes <- opticskxi(optics_shapes, n_xi = 5, pts = 30) kxi_shapes
Visualize with ggplot2
ggplot_optics(optics_shapes)
ggplot_kxi_profile(kxi_shapes)
Compare multiple k-Xi models in dataset with unknown number of clusters and visualize the best models:
data('hla')
<- hla[-c(1:2)] %>% scale
m_hla <- expand.grid(n_xi = 3:5, pts = c(20, 30, 40),
df_params_hla dist = c('manhattan', 'euclidean', 'abscorrelation', 'abspearson'))
<- opticskxi_pipeline(m_hla, df_params_hla) df_kxi_hla
ggplot_kxi_metrics(df_kxi_hla, n = 8)
gtable_kxi_profiles(df_kxi_hla) %>% plot
<- get_best_kxi(df_kxi_hla, rank = 2)
best_kxi_hla <- best_kxi_hla$clusters
clusters_hla fortify_pca(m_hla, sup_vars = data.frame(Clusters = clusters_hla)) %>%
ggpairs('Clusters', ellipses = TRUE, variables = TRUE)
See the vignette for results and further details.
This work was inspired by Jérôme Wojcik (Precision for Medicine) and Sviatoslav Voloshynovskiy (University of Geneva).
This package is free and open source software, licensed under GPL-3.