hd_ora()
performs over-representation analysis (ORA) using the clusterProfiler package.
Arguments
- gene_list
A character vector containing the gene names. These can be differentially expressed proteins or selected protein features from classification models.
- database
The database to perform the ORA. It can be either "GO", "KEGG", or "Reactome".
- ontology
The ontology to use when database = "GO". It can be "BP" (Biological Process), "CC" (Cellular Component), "MF" (Molecular Function), or "ALL". In the case of KEGG and Reactome, this parameter is ignored.
- background
A character vector containing the background genes or a string with the name of the background gene list to use (use
hd_show_backgrounds()
to see available lists). If NULL, the full proteome is used as background.- pval_lim
The p-value threshold to consider a term as significant in the enrichment analysis.
Details
To perform the ORA, clusterProfiler
package is used.
The qvalueCutoff
is set to 1 by default to prioritize filtering by adjusted
p-values (p.adjust). This simplifies the workflow by ensuring a single, clear
significance threshold based on the false discovery rate (FDR). While q-values
are not used for filtering by default, they are still calculated and included
in the results for users who wish to apply additional criteria.
For more information, please refer to the clusterProfiler
documentation.
If you want to learn more about ORA, please refer to the following publications:
Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. doi: 10.1371/journal.pcbi.1010348. PMID: 35951505; PMCID: PMC9371296. https://pmc.ncbi.nlm.nih.gov/articles/PMC9371296/
https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm
Examples
# Initialize an HDAnalyzeR object
hd_object <- hd_initialize(example_data, example_metadata)
# Run differential expression analysis for AML vs all others
de_results <- hd_de_limma(hd_object, case = "AML")
# Extract the up-regulated proteins for AML
sig_up_proteins_aml <- de_results$de_res |>
dplyr::filter(adj.P.Val < 0.05 & logFC > 0) |>
dplyr::pull(Feature)
# Perform ORA with `GO` database and `BP` ontology
enrichment <- hd_ora(sig_up_proteins_aml, database = "GO", ontology = "BP")
#> No background gene list provided. For meaningful enrichment results, it is recommended to specify a relevant background list of genes (e.g., the full proteome or a set of genes that could be impacted in your experiment). The absence of a background may lead to misleading results in the over-representation analysis (ORA).
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: coercing argument of type 'double' to logical
# Access the results
head(enrichment$enrichment@result)
#> ID Description GeneRatio BgRatio
#> GO:0050900 GO:0050900 leukocyte migration 7/22 404/18986
#> GO:0050926 GO:0050926 regulation of positive chemotaxis 3/22 27/18986
#> GO:0071674 GO:0071674 mononuclear cell migration 5/22 231/18986
#> GO:0045785 GO:0045785 positive regulation of cell adhesion 6/22 499/18986
#> GO:0001666 GO:0001666 response to hypoxia 5/22 317/18986
#> GO:0036293 GO:0036293 response to decreased oxygen levels 5/22 331/18986
#> RichFactor FoldEnrichment zScore pvalue p.adjust
#> GO:0050900 0.01732673 14.95297 9.655234 2.427348e-07 0.0002854562
#> GO:0050926 0.11111111 95.88889 16.805104 3.879176e-06 0.0022259912
#> GO:0071674 0.02164502 18.67965 9.208160 5.678549e-06 0.0022259912
#> GO:0045785 0.01202405 10.37675 7.229716 1.667718e-05 0.0049030906
#> GO:0001666 0.01577287 13.61199 7.712662 2.621632e-05 0.0061660792
#> GO:0036293 0.01510574 13.03625 7.524183 3.224239e-05 0.0063195083
#> qvalue geneID Count
#> GO:0050900 0.0001729805 100/566/9048/25/2683/199/30817 7
#> GO:0050926 0.0013489045 566/285/9048 3
#> GO:0071674 0.0013489045 566/9048/25/2683/199 5
#> GO:0045785 0.0029711711 100/566/54518/9289/25/199 6
#> GO:0001666 0.0037365159 100/285/51129/405/1386 5
#> GO:0036293 0.0038294908 100/285/51129/405/1386 5
# With a background gene list
enrichment <- hd_ora(sig_up_proteins_aml,
database = "GO",
ontology = "BP",
background = "olink_explore_ht")
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: 1.24% of input gene IDs are fail to map...
#> Warning: coercing argument of type 'double' to logical
# Access the results
head(enrichment$enrichment@result)
#> ID Description GeneRatio
#> GO:0050900 GO:0050900 leukocyte migration 7/22
#> GO:0050926 GO:0050926 regulation of positive chemotaxis 3/22
#> GO:0007186 GO:0007186 G protein-coupled receptor signaling pathway 7/22
#> GO:0040012 GO:0040012 regulation of locomotion 9/22
#> GO:0001666 GO:0001666 response to hypoxia 5/22
#> GO:0060135 GO:0060135 maternal process involved in female pregnancy 3/22
#> BgRatio RichFactor FoldEnrichment zScore pvalue p.adjust
#> GO:0050900 213/4854 0.03286385 7.250960 6.294827 2.756335e-05 0.02448193
#> GO:0050926 19/4854 0.15789474 34.837321 9.970744 7.473564e-05 0.02448193
#> GO:0007186 250/4854 0.02800000 6.177818 5.671555 7.740526e-05 0.02448193
#> GO:0040012 478/4854 0.01882845 4.154241 4.900306 1.244693e-04 0.02448193
#> GO:0001666 115/4854 0.04347826 9.592885 6.292129 1.305178e-04 0.02448193
#> GO:0060135 23/4854 0.13043478 28.778656 9.009686 1.349942e-04 0.02448193
#> qvalue geneID Count
#> GO:0050900 0.02031746 100/566/9048/25/2683/199/30817 7
#> GO:0050926 0.02031746 566/285/9048 3
#> GO:0007186 0.02031746 100/566/9289/181/25/976/30817 7
#> GO:0040012 0.02031746 100/566/9289/285/9048/25/59/51742/199 9
#> GO:0001666 0.02031746 100/285/51129/405/1386 5
#> GO:0060135 0.02031746 285/181/59272 3