Over-representation analysis — hd

hd_ora() performs over-representation analysis (ORA) using the clusterProfiler package.

Usage

hd_ora(
  gene_list,
  database = c("GO", "Reactome", "KEGG"),
  ontology = c("BP", "CC", "MF", "ALL"),
  background = NULL,
  pval_lim = 0.05
)

Arguments

gene_list: A character vector containing the gene names. These can be differentially expressed proteins or selected protein features from classification models.
database: The database to perform the ORA. It can be either "GO", "KEGG", or "Reactome".
ontology: The ontology to use when database = "GO". It can be "BP" (Biological Process), "CC" (Cellular Component), "MF" (Molecular Function), or "ALL". In the case of KEGG and Reactome, this parameter is ignored.
background: A character vector containing the background genes or a string with the name of the background gene list to use (use hd_show_backgrounds() to see available lists). If NULL, the full proteome is used as background.
pval_lim: The p-value threshold to consider a term as significant in the enrichment analysis.

Value

A list containing the results of the ORA.

Details

To perform the ORA, clusterProfiler package is used. The qvalueCutoff is set to 1 by default to prioritize filtering by adjusted p-values (p.adjust). This simplifies the workflow by ensuring a single, clear significance threshold based on the false discovery rate (FDR). While q-values are not used for filtering by default, they are still calculated and included in the results for users who wish to apply additional criteria. For more information, please refer to the clusterProfiler documentation.

If you want to learn more about ORA, please refer to the following publications:

Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. doi: 10.1371/journal.pcbi.1010348. PMID: 35951505; PMCID: PMC9371296. https://pmc.ncbi.nlm.nih.gov/articles/PMC9371296/
https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm

Examples

# Initialize an HDAnalyzeR object
hd_object <- hd_initialize(example_data, example_metadata)

# Run differential expression analysis for AML vs all others
de_results <- hd_de_limma(hd_object, case = "AML")

# Extract the up-regulated proteins for AML
sig_up_proteins_aml <- de_results$de_res |>
  dplyr::filter(adj.P.Val < 0.05 & logFC > 0) |>
  dplyr::pull(Feature)

# Perform ORA with `GO` database and `BP` ontology
enrichment <- hd_ora(sig_up_proteins_aml, database = "GO", ontology = "BP")
#> No background gene list provided. For meaningful enrichment results, it is recommended to specify a relevant background list of genes (e.g., the full proteome or a set of genes that could be impacted in your experiment). The absence of a background may lead to misleading results in the over-representation analysis (ORA).
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: coercing argument of type 'double' to logical

# Access the results
head(enrichment$enrichment@result)
#>                    ID                          Description GeneRatio   BgRatio
#> GO:0050900 GO:0050900                  leukocyte migration      7/22 395/18805
#> GO:0050926 GO:0050926    regulation of positive chemotaxis      3/22  27/18805
#> GO:0071674 GO:0071674           mononuclear cell migration      5/22 230/18805
#> GO:0045785 GO:0045785 positive regulation of cell adhesion      6/22 499/18805
#> GO:0001666 GO:0001666                  response to hypoxia      5/22 316/18805
#> GO:0036293 GO:0036293  response to decreased oxygen levels      5/22 330/18805
#>            RichFactor FoldEnrichment    zScore       pvalue     p.adjust
#> GO:0050900 0.01772152       15.14787  9.725607 2.222549e-07 0.0002611495
#> GO:0050926 0.11111111       94.97475 16.723315 3.991584e-06 0.0022807251
#> GO:0071674 0.02173913       18.58202  9.181680 5.823128e-06 0.0022807251
#> GO:0045785 0.01202405       10.27783  7.188758 1.760247e-05 0.0051707246
#> GO:0001666 0.01582278       13.52488  7.684481 2.702828e-05 0.0063516447
#> GO:0036293 0.01515152       12.95110  7.495950 3.326012e-05 0.0065134398
#>                  qvalue                         geneID Count
#> GO:0050900 0.0001586198 100/566/9048/25/2683/199/30817     7
#> GO:0050926 0.0013852915                   566/285/9048     3
#> GO:0071674 0.0013852915           566/9048/25/2683/199     5
#> GO:0045785 0.0031406506      100/566/54518/9289/25/199     6
#> GO:0001666 0.0038579306         100/285/51129/405/1386     5
#> GO:0036293 0.0039562036         100/285/51129/405/1386     5

# With a background gene list
enrichment <- hd_ora(sig_up_proteins_aml,
                     database = "GO",
                     ontology = "BP",
                     background = "olink_explore_ht")
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: 1.31% of input gene IDs are fail to map...
#> Warning: coercing argument of type 'double' to logical

# Access the results
head(enrichment$enrichment@result)
#>                    ID                                   Description GeneRatio
#> GO:0050900 GO:0050900                           leukocyte migration      7/22
#> GO:0007186 GO:0007186  G protein-coupled receptor signaling pathway      7/22
#> GO:0050926 GO:0050926             regulation of positive chemotaxis      3/22
#> GO:0001666 GO:0001666                           response to hypoxia      5/22
#> GO:0040012 GO:0040012                      regulation of locomotion      9/22
#> GO:0060135 GO:0060135 maternal process involved in female pregnancy      3/22
#>             BgRatio RichFactor FoldEnrichment   zScore       pvalue   p.adjust
#> GO:0050900 209/4805 0.03349282       7.315137 6.330362 2.600575e-05 0.02554723
#> GO:0007186 244/4805 0.02868852       6.265835 5.725254 7.066822e-05 0.02554723
#> GO:0050926  19/4805 0.15789474      34.485646 9.917719 7.700889e-05 0.02554723
#> GO:0001666 115/4805 0.04347826       9.496047 6.253772 1.368573e-04 0.02554723
#> GO:0040012 479/4805 0.01878914       4.103720 4.854781 1.368986e-04 0.02554723
#> GO:0060135  23/4805 0.13043478      28.488142 8.961221 1.390836e-04 0.02554723
#>                qvalue                                geneID Count
#> GO:0050900 0.02125936        100/566/9048/25/2683/199/30817     7
#> GO:0007186 0.02125936         100/566/9289/181/25/976/30817     7
#> GO:0050926 0.02125936                          566/285/9048     3
#> GO:0001666 0.02125936                100/285/51129/405/1386     5
#> GO:0040012 0.02125936 100/566/9289/285/9048/25/59/51742/199     9
#> GO:0060135 0.02125936                         285/181/59272     3