hd_ora()
performs over-representation analysis (ORA) using the clusterProfiler package.
Arguments
- gene_list
A character vector containing the gene names. These can be differentially expressed proteins or selected protein features from classification models.
- database
The database to perform the ORA. It can be either "GO", "KEGG", or "Reactome".
- ontology
The ontology to use when database = "GO". It can be "BP" (Biological Process), "CC" (Cellular Component), "MF" (Molecular Function), or "ALL". In the case of KEGG and Reactome, this parameter is ignored.
- background
A character vector containing the background genes or a string with the name of the background gene list to use (use
hd_show_backgrounds()
to see available lists). If NULL, the full proteome is used as background.- pval_lim
The p-value threshold to consider a term as significant in the enrichment analysis.
Details
To perform the ORA, clusterProfiler
package is used.
The qvalueCutoff
is set to 1 by default to prioritize filtering by adjusted
p-values (p.adjust). This simplifies the workflow by ensuring a single, clear
significance threshold based on the false discovery rate (FDR). While q-values
are not used for filtering by default, they are still calculated and included
in the results for users who wish to apply additional criteria.
For more information, please refer to the clusterProfiler
documentation.
If you want to learn more about ORA, please refer to the following publications:
Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. doi: 10.1371/journal.pcbi.1010348. PMID: 35951505; PMCID: PMC9371296. https://pmc.ncbi.nlm.nih.gov/articles/PMC9371296/
https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm
Examples
# Initialize an HDAnalyzeR object
hd_object <- hd_initialize(example_data, example_metadata)
# Run differential expression analysis for AML vs all others
de_results <- hd_de_limma(hd_object, case = "AML")
# Extract the up-regulated proteins for AML
sig_up_proteins_aml <- de_results$de_res |>
dplyr::filter(adj.P.Val < 0.05 & logFC > 0) |>
dplyr::pull(Feature)
# Perform ORA with `GO` database and `BP` ontology
enrichment <- hd_ora(sig_up_proteins_aml, database = "GO", ontology = "BP")
#> No background gene list provided. For meaningful enrichment results, it is recommended to specify a relevant background list of genes (e.g., the full proteome or a set of genes that could be impacted in your experiment). The absence of a background may lead to misleading results in the over-representation analysis (ORA).
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: coercing argument of type 'double' to logical
# Access the results
head(enrichment$enrichment@result)
#> ID Description GeneRatio BgRatio
#> GO:0050900 GO:0050900 leukocyte migration 7/22 395/18805
#> GO:0050926 GO:0050926 regulation of positive chemotaxis 3/22 27/18805
#> GO:0071674 GO:0071674 mononuclear cell migration 5/22 230/18805
#> GO:0045785 GO:0045785 positive regulation of cell adhesion 6/22 499/18805
#> GO:0001666 GO:0001666 response to hypoxia 5/22 316/18805
#> GO:0036293 GO:0036293 response to decreased oxygen levels 5/22 330/18805
#> RichFactor FoldEnrichment zScore pvalue p.adjust
#> GO:0050900 0.01772152 15.14787 9.725607 2.222549e-07 0.0002611495
#> GO:0050926 0.11111111 94.97475 16.723315 3.991584e-06 0.0022807251
#> GO:0071674 0.02173913 18.58202 9.181680 5.823128e-06 0.0022807251
#> GO:0045785 0.01202405 10.27783 7.188758 1.760247e-05 0.0051707246
#> GO:0001666 0.01582278 13.52488 7.684481 2.702828e-05 0.0063516447
#> GO:0036293 0.01515152 12.95110 7.495950 3.326012e-05 0.0065134398
#> qvalue geneID Count
#> GO:0050900 0.0001586198 100/566/9048/25/2683/199/30817 7
#> GO:0050926 0.0013852915 566/285/9048 3
#> GO:0071674 0.0013852915 566/9048/25/2683/199 5
#> GO:0045785 0.0031406506 100/566/54518/9289/25/199 6
#> GO:0001666 0.0038579306 100/285/51129/405/1386 5
#> GO:0036293 0.0039562036 100/285/51129/405/1386 5
# With a background gene list
enrichment <- hd_ora(sig_up_proteins_aml,
database = "GO",
ontology = "BP",
background = "olink_explore_ht")
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: 1.31% of input gene IDs are fail to map...
#> Warning: coercing argument of type 'double' to logical
# Access the results
head(enrichment$enrichment@result)
#> ID Description GeneRatio
#> GO:0050900 GO:0050900 leukocyte migration 7/22
#> GO:0007186 GO:0007186 G protein-coupled receptor signaling pathway 7/22
#> GO:0050926 GO:0050926 regulation of positive chemotaxis 3/22
#> GO:0001666 GO:0001666 response to hypoxia 5/22
#> GO:0040012 GO:0040012 regulation of locomotion 9/22
#> GO:0060135 GO:0060135 maternal process involved in female pregnancy 3/22
#> BgRatio RichFactor FoldEnrichment zScore pvalue p.adjust
#> GO:0050900 209/4805 0.03349282 7.315137 6.330362 2.600575e-05 0.02554723
#> GO:0007186 244/4805 0.02868852 6.265835 5.725254 7.066822e-05 0.02554723
#> GO:0050926 19/4805 0.15789474 34.485646 9.917719 7.700889e-05 0.02554723
#> GO:0001666 115/4805 0.04347826 9.496047 6.253772 1.368573e-04 0.02554723
#> GO:0040012 479/4805 0.01878914 4.103720 4.854781 1.368986e-04 0.02554723
#> GO:0060135 23/4805 0.13043478 28.488142 8.961221 1.390836e-04 0.02554723
#> qvalue geneID Count
#> GO:0050900 0.02125936 100/566/9048/25/2683/199/30817 7
#> GO:0007186 0.02125936 100/566/9289/181/25/976/30817 7
#> GO:0050926 0.02125936 566/285/9048 3
#> GO:0001666 0.02125936 100/285/51129/405/1386 5
#> GO:0040012 0.02125936 100/566/9289/285/9048/25/59/51742/199 9
#> GO:0060135 0.02125936 285/181/59272 3