cluster_data()
takes a dataset and returns the same dataset ordered
according to the hierarchical clustering of the rows and columns. This data
can be used to plot a heatmap with ggplot2 that is not having clustering functionality.
Usage
cluster_data(
df,
distance_method = "euclidean",
clustering_method = "ward.D2",
cluster_rows = TRUE,
cluster_cols = TRUE,
wide = TRUE
)
Arguments
- df
The dataset to cluster.
- distance_method
The distance method to use. Default is "euclidean".
- clustering_method
The clustering method to use. Default is "ward.D2".
- cluster_rows
Whether to cluster rows. Default is TRUE.
- cluster_cols
Whether to cluster columns. Default is TRUE.
- wide
Whether the data is wide or long. Default is TRUE.
Value
(list). A list with the following elements:
clustered_data: A dataset ordered according to the hierarchical clustering of the rows and columns.
hc_rows: The hierarchical clustering object for rows.
hc_cols: The hierarchical clustering object for columns.
Examples
# Original data
clean_df <- example_data |> dplyr::select(DAid, Assay, NPX)
clean_df
#> # A tibble: 56,142 × 3
#> DAid Assay NPX
#> <chr> <chr> <dbl>
#> 1 DA00001 AARSD1 3.39
#> 2 DA00001 ABL1 2.76
#> 3 DA00001 ACAA1 1.71
#> 4 DA00001 ACAN 0.0333
#> 5 DA00001 ACE2 1.76
#> 6 DA00001 ACOX1 -0.919
#> 7 DA00001 ACP5 1.54
#> 8 DA00001 ACP6 2.15
#> 9 DA00001 ACTA2 2.81
#> 10 DA00001 ACTN4 0.742
#> # ℹ 56,132 more rows
# Clustered data
cluster_data(clean_df, wide = FALSE)
#> $clustered_data
#> # A tibble: 58,600 × 3
#> x y value
#> <fct> <fct> <dbl>
#> 1 DA00032 ALPP -4.99
#> 2 DA00032 AMY2A -1.14
#> 3 DA00032 AMY2B -0.291
#> 4 DA00032 ARG1 1.19
#> 5 DA00032 AGR3 -0.214
#> 6 DA00032 AOC1 0.924
#> 7 DA00032 ATP5PO 0.107
#> 8 DA00032 ATP6V1D -0.153
#> 9 DA00032 ADGRG2 -1.22
#> 10 DA00032 ADCYAP1R1 -1.16
#> # ℹ 58,590 more rows
#>
#> $hc_rows
#>
#> Call:
#> stats::hclust(d = stats::dist(wide_data, method = distance_method), method = clustering_method)
#>
#> Cluster method : ward.D2
#> Distance : euclidean
#> Number of objects: 586
#>
#>
#> $hc_cols
#>
#> Call:
#> stats::hclust(d = stats::dist(t(wide_data), method = distance_method), method = clustering_method)
#>
#> Cluster method : ward.D2
#> Distance : euclidean
#> Number of objects: 100
#>
#>