hd_assess_clusters()
assesses the stability of the clusters by bootstrapping the data.
It calculates the mean Jaccard index for each cluster and removes clusters with a mean
Jaccard index below 0.5 or a sample size below 10. The remaining clusters are renamed
in order of size.
Usage
hd_assess_clusters(
cluster_object,
nrep = 100,
ji_lim = 0.5,
nsample_lim = 10,
seed = 123,
verbose = FALSE
)
Arguments
- cluster_object
The clustering results obtained from
hd_cluster_samples()
.- nrep
The number of bootstrap replicates. Default is 100.
- ji_lim
The minimum limit for the mean Jaccard index. Default is 0.5.
- nsample_lim
The minimum limit for the sample size. Default is 10.
- seed
The seed to use for reproducibility. Default is 123.
- verbose
A logical value indicating whether to print the progress of the bootstrapping. Default is FALSE.
Examples
# Create the HDAnalyzeR object providing the data and metadata
hd_object <- hd_initialize(example_data, example_metadata)
# Clustered data
clustering <- hd_cluster_samples(hd_object, gap_b = 10)
#> Determining optimal number of clusters using gap statistic.
#> Optimal number of clusters: 4
# Assess clusters
clustering <- hd_assess_clusters(clustering, nrep = 20)
#> Cluster results updated with stability assessment. Low-quality clusters removed.
# Access the results
clustering$cluster_assessment
#> # A tibble: 4 × 4
#> Cluster cluster_og n Mean_ji
#> <dbl> <int> <int> <dbl>
#> 1 4 4 70 0.557
#> 2 3 3 96 0.688
#> 3 2 1 112 0.741
#> 4 1 2 308 0.624