pkgdown/extra.css

Skip to contents

do_pca() runs a PCA analysis on the provided data. The function can visualize the sample points on the first and second PC plane as well as the PCA loadings and the explained variance. It can also save the plots in the results directory.

Usage

do_pca(
  olink_data,
  metadata = NULL,
  pcs = 5,
  color = "Disease",
  palette = NULL,
  wide = TRUE,
  assay = FALSE,
  impute = TRUE,
  plots = TRUE,
  x = "PC1",
  y = "PC2",
  npcs = 4,
  nproteins = 8,
  loadings = FALSE,
  save = FALSE
)

Arguments

A tibble with the data to be used in the PCA analysis.

metadata

A tibble with metadata information to be used in the PCA plots. Default is NULL.

pcs

The number of PCs to be calculated. Default is 5.

color

The name of the column in metadata that contains the variable. to be used to plot the points color. Default is "Disease".

palette

The color palette for the plot. If it is a character, it should be one of the palettes from get_hpa_palettes().

wide

If TRUE, the data is assumed to be in wide format. Default is TRUE.

assay

If TRUE, each point is an assay and not a sample. Default is FALSE.

impute

If TRUE, the data is imputed before the PCA analysis. Default is TRUE.

plots

If TRUE, the function creates plots of the PCA results. Default is TRUE.

x

The component to be plotted on the x-axis. Default is "PC1".

y

The component to be plotted on the y-axis. Default is "PC2".

npcs

The number of PCs to be plotted. Default is 4.

nproteins

The number of proteins to be plotted. Default is 8.

loadings

If TRUE, the PCA loadings are plotted on the 2 dimensional plot. Default is FALSE.

save

If TRUE, the plots are saved in the results directory. Default is FALSE.

Value

A list with the PCA results and, if requested, the PCA plots.

  • pca_res: A tibble with the PCA results.

  • loadings: A tibble with the PCA loadings.

  • pca_plot: A ggplot object with the data points on the 1st and 2nd PCs plane.

  • loadings_plot: A PCA loadings ggplot object.

  • variance_plot: A ggplot object with the explained variance and cumulative explained variance.

Examples

do_pca(example_data,
       metadata = example_metadata,
       pcs = 8,
       wide = FALSE,
       color = "Disease",
       palette = "cancers12")
#> $pca_res
#> # A tibble: 586 × 9
#>    DAid        PC1     PC2      PC3    PC4    PC5     PC6     PC7    PC8
#>    <fct>     <dbl>   <dbl>    <dbl>  <dbl>  <dbl>   <dbl>   <dbl>  <dbl>
#>  1 DA00001  -3.67  -4.28   -2.34    -3.10  -2.65  -2.73   -2.78    0.433
#>  2 DA00002   4.11  -2.64    2.04    -0.441 -4.43  -1.91   -0.897   1.54 
#>  3 DA00003  -3.34   4.72    1.41     0.881 -0.561  0.308  -0.0612 -0.267
#>  4 DA00004  -4.78   0.443   1.41     0.107 -1.10  -0.262   0.350   3.33 
#>  5 DA00005  -4.98  -3.67    0.711   -5.70  -0.807 -3.77   -0.969   1.83 
#>  6 DA00006   0.395  0.0572 -1.90    -7.75   0.707 -2.70   -0.681   0.301
#>  7 DA00007 -10.5   -2.91   -0.382   -0.841 -1.61   1.96   -1.30    2.51 
#>  8 DA00008   2.64  -2.01    2.75    -0.128 -0.852  0.408  -1.70    0.595
#>  9 DA00009  -1.79  -0.461   2.79    -2.61  -2.71   0.0732 -3.08    1.75 
#> 10 DA00010   3.57   0.821  -0.00756  1.50  -2.66  -0.856   0.0190  1.17 
#> # ℹ 576 more rows
#> 
#> $loadings
#> # A tibble: 10,000 × 4
#>    Assay    Value PC    id       
#>    <chr>    <dbl> <chr> <chr>    
#>  1 AARSD1 -0.133  PC1   pca_EoYnc
#>  2 ABL1   -0.198  PC1   pca_EoYnc
#>  3 ACAA1  -0.163  PC1   pca_EoYnc
#>  4 ACAN    0.0123 PC1   pca_EoYnc
#>  5 ACE2   -0.0583 PC1   pca_EoYnc
#>  6 ACOX1  -0.135  PC1   pca_EoYnc
#>  7 ACP5   -0.0626 PC1   pca_EoYnc
#>  8 ACP6   -0.0933 PC1   pca_EoYnc
#>  9 ACTA2  -0.0751 PC1   pca_EoYnc
#> 10 ACTN4  -0.0424 PC1   pca_EoYnc
#> # ℹ 9,990 more rows
#> 
#> $pca_plot

#> 
#> $loadings_plot

#> 
#> $variance_plot

#>