Cluster data — cluster_data • HDAnalyzeR

cluster_data() takes a dataset and returns the same dataset ordered according to the hierarchical clustering of the rows and columns. This data can be used to plot a heatmap with ggplot2 that is not having clustering functionality.

Usage

cluster_data(
  df,
  distance_method = "euclidean",
  clustering_method = "ward.D2",
  cluster_rows = TRUE,
  cluster_cols = TRUE,
  wide = TRUE
)

Arguments

df: The dataset to cluster.
distance_method: The distance method to use. Default is "euclidean".
clustering_method: The clustering method to use. Default is "ward.D2".
cluster_rows: Whether to cluster rows. Default is TRUE.
cluster_cols: Whether to cluster columns. Default is TRUE.
wide: Whether the data is wide or long. Default is TRUE.

Value

(list). A list with the following elements:

clustered_data: A dataset ordered according to the hierarchical clustering of the rows and columns.
hc_rows: The hierarchical clustering object for rows.
hc_cols: The hierarchical clustering object for columns.

Examples

# Original data
clean_df <- example_data |> dplyr::select(DAid, Assay, NPX)
clean_df
#> # A tibble: 56,142 × 3
#>    DAid    Assay      NPX
#>    <chr>   <chr>    <dbl>
#>  1 DA00001 AARSD1  3.39  
#>  2 DA00001 ABL1    2.76  
#>  3 DA00001 ACAA1   1.71  
#>  4 DA00001 ACAN    0.0333
#>  5 DA00001 ACE2    1.76  
#>  6 DA00001 ACOX1  -0.919 
#>  7 DA00001 ACP5    1.54  
#>  8 DA00001 ACP6    2.15  
#>  9 DA00001 ACTA2   2.81  
#> 10 DA00001 ACTN4   0.742 
#> # ℹ 56,132 more rows

# Clustered data
cluster_data(clean_df, wide = FALSE)
#> $clustered_data
#> # A tibble: 58,600 × 3
#>    x       y          value
#>    <fct>   <fct>      <dbl>
#>  1 DA00032 ALPP      -4.99 
#>  2 DA00032 AMY2A     -1.14 
#>  3 DA00032 AMY2B     -0.291
#>  4 DA00032 ARG1       1.19 
#>  5 DA00032 AGR3      -0.214
#>  6 DA00032 AOC1       0.924
#>  7 DA00032 ATP5PO     0.107
#>  8 DA00032 ATP6V1D   -0.153
#>  9 DA00032 ADGRG2    -1.22 
#> 10 DA00032 ADCYAP1R1 -1.16 
#> # ℹ 58,590 more rows
#> 
#> $hc_rows
#> 
#> Call:
#> stats::hclust(d = stats::dist(wide_data, method = distance_method),     method = clustering_method)
#> 
#> Cluster method   : ward.D2 
#> Distance         : euclidean 
#> Number of objects: 586 
#> 
#> 
#> $hc_cols
#> 
#> Call:
#> stats::hclust(d = stats::dist(t(wide_data), method = distance_method),     method = clustering_method)
#> 
#> Cluster method   : ward.D2 
#> Distance         : euclidean 
#> Number of objects: 100 
#> 
#>