README

UnversalCVI package is an effective tool for evaluating clustering results by several cluster validity indices. It has functions to compute several indices as listed below for a user specified range of numbers of clusters and compare them in grid plots. The package is compatible with six clustering methods including K-means, fuzzy C-means, EM clustering, and hierarchical clustering (single, average, and complete linkage). Moreover, the UniversalCVI package has a function that computes the accuracy of clustering results when the true classes are known.

UniversalCVI requires the use of the two packages: mclust and e1071 which are for performing the fuzzy C-means (FCM) and EM algorithms, respectively.

In addition to the evaluation tools, the UniversalCVI package also includes 17 simulated datasets intially used for testing and comparing cluster validity indices in several perspectives written in Wiroonsri(2024) and Wiroonsri and Preedasawakul(2023).

Dunn’s index, Calinski–Harabasz index, Davies–Bouldin’s index, Point biserial correlation index, Chou-Su-Lai measure, Davies–Bouldin*’s index, Score function, Starczewski index, Pakhira–Bandyopadhyay–Maulik (for crisp clustering) index, and Wiroonsri index.

Xie–Beni index, KWON index, KWON2 index, TANG index , HF index, Wu–Li index, Pakhira–Bandyopadhyay–Maulik (for fuzzy clustering) index, KPBM index, Correlation Cluster Validity index, Generalized C index, Wiroonsri and Preedasawakul index.

Installation

If you have not already installed mclust and e1071 in your local system, install these package as following first:

install.packages(c('e1071','mclust'))

install.packages('UniversalCVI')

 suppressPackageStartupMessages({
library(UniversalCVI)
library(e1071)
library(mclust)
})

Example

Check accuracy of clustering results when the true classes are known


### Use a dataset in this package
x = R1_data

# Check accuracy of clustering results obtained by kmeans, FCM, and EM clustering
AccClust(x, label.names = "label",algorithm = c("FCM","EM","Kmeans"), fzm = 2,
  scale = TRUE, nstart = 20,iter = 100)
  
x = D1_data

# Check accuracy of a clustering result obtained by the FCM algoritm
AccClust(x, label.names = "label",algorithm = "FCM", fzm = 2,
  scale = TRUE, nstart = 20,iter = 100)

Compute hard cluster validity indices

Using Hvalid to compute all index in function for a clustering result from 2 to 15

library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]


# Compute six cluster validity indices of a kmeans clustering result for k from 2 to 15
IDX.list = c("NCI", "DI", "DB", "STR", "CSL", "CH")

Hvalid.result = Hvalid(scale(x), kmax = 15, kmin = 2, indexlist = IDX.list,
  method = "hclust_average", p = 2, q = 2, corr = "pearson", nstart = 100, NCstart = TRUE)

# Plot the computed indices for k from 2 to 15
plot_idx(Hvalid.result)

Soft clustering

Using FzzyCVIs to compute all the fuzzy cluster validity indices for a clustering result for c from 2 to 15

UniversalCVI

Installation

Example

Check accuracy of clustering results when the true classes are known

Compute hard cluster validity indices

Soft clustering

License