pkgdown/extra.css

Skip to contents

hd_ora() performs over-representation analysis (ORA) using the clusterProfiler package.

Usage

hd_ora(
  gene_list,
  database = c("GO", "Reactome", "KEGG"),
  ontology = c("BP", "CC", "MF", "ALL"),
  background = NULL,
  pval_lim = 0.05
)

Arguments

gene_list

A character vector containing the gene names. These can be differentially expressed proteins or selected protein features from classification models.

database

The database to perform the ORA. It can be either "GO", "KEGG", or "Reactome".

ontology

The ontology to use when database = "GO". It can be "BP" (Biological Process), "CC" (Cellular Component), "MF" (Molecular Function), or "ALL". In the case of KEGG and Reactome, this parameter is ignored.

background

A character vector containing the background genes or a string with the name of the background gene list to use (use hd_show_backgrounds() to see available lists). If NULL, the full proteome is used as background.

pval_lim

The p-value threshold to consider a term as significant in the enrichment analysis.

Value

A list containing the results of the ORA.

Details

To perform the ORA, clusterProfiler package is used. The qvalueCutoff is set to 1 by default to prioritize filtering by adjusted p-values (p.adjust). This simplifies the workflow by ensuring a single, clear significance threshold based on the false discovery rate (FDR). While q-values are not used for filtering by default, they are still calculated and included in the results for users who wish to apply additional criteria. For more information, please refer to the clusterProfiler documentation.

If you want to learn more about ORA, please refer to the following publications:

  • Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. doi: 10.1371/journal.pcbi.1010348. PMID: 35951505; PMCID: PMC9371296. https://pmc.ncbi.nlm.nih.gov/articles/PMC9371296/

  • https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm

Examples

# Initialize an HDAnalyzeR object
hd_object <- hd_initialize(example_data, example_metadata)

# Run differential expression analysis for AML vs all others
de_results <- hd_de_limma(hd_object, case = "AML")

# Extract the up-regulated proteins for AML
sig_up_proteins_aml <- de_results$de_res |>
  dplyr::filter(adj.P.Val < 0.05 & logFC > 0) |>
  dplyr::pull(Feature)

# Perform ORA with `GO` database and `BP` ontology
enrichment <- hd_ora(sig_up_proteins_aml, database = "GO", ontology = "BP")
#> No background gene list provided. For meaningful enrichment results, it is recommended to specify a relevant background list of genes (e.g., the full proteome or a set of genes that could be impacted in your experiment). The absence of a background may lead to misleading results in the over-representation analysis (ORA).
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: coercing argument of type 'double' to logical

# Access the results
head(enrichment$enrichment@result)
#>                    ID                          Description GeneRatio   BgRatio
#> GO:0050900 GO:0050900                  leukocyte migration      7/22 404/18986
#> GO:0050926 GO:0050926    regulation of positive chemotaxis      3/22  27/18986
#> GO:0071674 GO:0071674           mononuclear cell migration      5/22 231/18986
#> GO:0045785 GO:0045785 positive regulation of cell adhesion      6/22 499/18986
#> GO:0001666 GO:0001666                  response to hypoxia      5/22 317/18986
#> GO:0036293 GO:0036293  response to decreased oxygen levels      5/22 331/18986
#>            RichFactor FoldEnrichment    zScore       pvalue     p.adjust
#> GO:0050900 0.01732673       14.95297  9.655234 2.427348e-07 0.0002854562
#> GO:0050926 0.11111111       95.88889 16.805104 3.879176e-06 0.0022259912
#> GO:0071674 0.02164502       18.67965  9.208160 5.678549e-06 0.0022259912
#> GO:0045785 0.01202405       10.37675  7.229716 1.667718e-05 0.0049030906
#> GO:0001666 0.01577287       13.61199  7.712662 2.621632e-05 0.0061660792
#> GO:0036293 0.01510574       13.03625  7.524183 3.224239e-05 0.0063195083
#>                  qvalue                         geneID Count
#> GO:0050900 0.0001729805 100/566/9048/25/2683/199/30817     7
#> GO:0050926 0.0013489045                   566/285/9048     3
#> GO:0071674 0.0013489045           566/9048/25/2683/199     5
#> GO:0045785 0.0029711711      100/566/54518/9289/25/199     6
#> GO:0001666 0.0037365159         100/285/51129/405/1386     5
#> GO:0036293 0.0038294908         100/285/51129/405/1386     5

# With a background gene list
enrichment <- hd_ora(sig_up_proteins_aml,
                     database = "GO",
                     ontology = "BP",
                     background = "olink_explore_ht")
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: 1.24% of input gene IDs are fail to map...
#> Warning: coercing argument of type 'double' to logical

# Access the results
head(enrichment$enrichment@result)
#>                    ID                                   Description GeneRatio
#> GO:0050900 GO:0050900                           leukocyte migration      7/22
#> GO:0050926 GO:0050926             regulation of positive chemotaxis      3/22
#> GO:0007186 GO:0007186  G protein-coupled receptor signaling pathway      7/22
#> GO:0040012 GO:0040012                      regulation of locomotion      9/22
#> GO:0001666 GO:0001666                           response to hypoxia      5/22
#> GO:0060135 GO:0060135 maternal process involved in female pregnancy      3/22
#>             BgRatio RichFactor FoldEnrichment   zScore       pvalue   p.adjust
#> GO:0050900 213/4854 0.03286385       7.250960 6.294827 2.756335e-05 0.02448193
#> GO:0050926  19/4854 0.15789474      34.837321 9.970744 7.473564e-05 0.02448193
#> GO:0007186 250/4854 0.02800000       6.177818 5.671555 7.740526e-05 0.02448193
#> GO:0040012 478/4854 0.01882845       4.154241 4.900306 1.244693e-04 0.02448193
#> GO:0001666 115/4854 0.04347826       9.592885 6.292129 1.305178e-04 0.02448193
#> GO:0060135  23/4854 0.13043478      28.778656 9.009686 1.349942e-04 0.02448193
#>                qvalue                                geneID Count
#> GO:0050900 0.02031746        100/566/9048/25/2683/199/30817     7
#> GO:0050926 0.02031746                          566/285/9048     3
#> GO:0007186 0.02031746         100/566/9289/181/25/976/30817     7
#> GO:0040012 0.02031746 100/566/9289/285/9048/25/59/51742/199     9
#> GO:0001666 0.02031746                100/285/51129/405/1386     5
#> GO:0060135 0.02031746                         285/181/59272     3