pkgdown/extra.css

Skip to contents

hd_gsea() performs gene set enrichment analysis (GSEA) using the clusterProfiler package.

Usage

hd_gsea(
  de_results,
  database = c("GO", "Reactome", "KEGG"),
  ontology = c("BP", "CC", "MF", "ALL"),
  ranked_by = "logFC",
  pval_lim = 0.05
)

Arguments

de_results

An hd_de object from hd_de_limma() or hd_de_ttest() or a tibble containing the results of a differential expression analysis.

database

The database to perform the ORA. It can be either "GO", "KEGG", or "Reactome".

ontology

The ontology to use when database = "GO". It can be "BP" (Biological Process), "CC" (Cellular Component), "MF" (Molecular Function), or "ALL". In the case of KEGG and Reactome, this parameter is ignored.

ranked_by

The variable to rank the proteins. It can be "logFC", "both" which is the product of logFC and -log(adj.P.Val) or a custom sorting variable. It should be however a column in the DE results tibble (de_results argument).

pval_lim

The p-value threshold to consider a term as significant in the enrichment analysis. Default is 0.05.

Value

A list containing the results of the GSEA.

Details

To perform the GSEA, clusterProfiler package is used. For more information, please refer to the clusterProfiler documentation.

If you want to learn more about GSEA, please refer to the following publications:

  • Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348. doi: 10.1371/journal.pcbi.1010348. PMID: 35951505; PMCID: PMC9371296. https://pmc.ncbi.nlm.nih.gov/articles/PMC9371296/

  • https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm

Examples

# Initialize an HDAnalyzeR object
hd_object <- hd_initialize(example_data, example_metadata)

# Run differential expression analysis for AML vs all others
de_results <- hd_de_limma(hd_object, case = "AML")

# Run GSEA with GO database
hd_gsea(de_results,
        database = "GO",
        ontology = "BP",
        ranked_by = "logFC",
        pval_lim = 0.9)
#> 
#> 
#> 'select()' returned 1:1 mapping between keys and columns
#> using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
#> preparing geneSet collections...
#> GSEA analysis...
#> leading edge analysis...
#> done...
#> $gene_list
#>          328          566          100         9289        54518         9048 
#>  1.559583713  1.532943471  1.462382647  1.231859388  1.173013982  0.828793582 
#>          285          181           25          191        51129         9296 
#>  0.772836518  0.756171703  0.746830743  0.734041132  0.522682913  0.519848239 
#>           59        59272        51742        25814          199          267 
#>  0.504682910  0.466035852  0.458170283  0.451765263  0.432416150  0.428812794 
#>         2683         9938          976        30817         1386          405 
#>  0.421745793  0.397319743  0.365431820  0.359446007  0.338581655  0.337841863 
#>        23452        10159        51816          374          410          133 
#>  0.292444281  0.271377450  0.253959306  0.242008838  0.226357122  0.199535962 
#>       170689        51327          411        84335          214       375790 
#>  0.177164675  0.175706968  0.173203848  0.170442918  0.162775605  0.159570184 
#>         8751           95       129138          308        80755          558 
#>  0.123946384  0.122930955  0.113185504  0.108778505  0.106165339  0.093214100 
#>         9068          283          218          258          189        81693 
#>  0.092176270  0.086140961  0.081630236  0.059789250  0.045913185  0.043794218 
#>          290           54           81         8745           51         1109 
#>  0.026847324  0.019847155 -0.003178737 -0.009723351 -0.009971140 -0.016368666 
#>          231          177          383          117          392          350 
#> -0.018310077 -0.038228906 -0.039306455 -0.043239615 -0.054674439 -0.076035424 
#>        51382          539          333          419       155465         9131 
#> -0.097147761 -0.098498094 -0.104025127 -0.104158209 -0.104553945 -0.110089916 
#>          127         8312          101          259           30        27329 
#> -0.129895411 -0.135089674 -0.138922933 -0.160969394 -0.161931907 -0.183584355 
#>         8639          216       347902          279        11199           26 
#> -0.194111783 -0.195512297 -0.200816339 -0.215411972 -0.244774741 -0.245691583 
#>        10218       170690        51205        11093        10000          475 
#> -0.272748339 -0.298014216 -0.308354203 -0.314105581 -0.333881970 -0.337183300 
#>        11095          307        93974        10149          280          176 
#> -0.339134804 -0.342512836 -0.356798140 -0.375853514 -0.376116879 -0.383419881 
#>          311        55937          203          306       115201        10551 
#> -0.404836340 -0.440825951 -0.481572885 -0.619682407 -0.629226223 -0.704655666 
#>        23365          351          250          284 
#> -0.806377522 -0.822744636 -1.036735621 -1.696033915 
#> 
#> $enrichment
#> #
#> # Gene Set Enrichment Analysis
#> #
#> #...@organism 	 Homo sapiens 
#> #...@setType 	 BP 
#> #...@keytype 	 ENTREZID 
#> #...@geneList 	 Named num [1:100] 1.56 1.53 1.46 1.23 1.17 ...
#>  - attr(*, "names")= chr [1:100] "328" "566" "100" "9289" ...
#> #...nPerm 	 
#> #...pvalues adjusted by 'BH' with cutoff <0.9 
#> #...236 enriched terms found
#> 'data.frame':	236 obs. of  11 variables:
#>  $ ID             : chr  "GO:0015031" "GO:0071495" "GO:0065007" "GO:0050789" ...
#>  $ Description    : chr  "protein transport" "cellular response to endogenous stimulus" "biological regulation" "regulation of biological process" ...
#>  $ setSize        : int  10 13 76 71 11 20 11 16 16 42 ...
#>  $ enrichmentScore: num  -0.769 0.738 0.552 0.526 -0.706 ...
#>  $ NES            : num  -1.88 1.73 1.63 1.58 -1.8 ...
#>  $ pvalue         : num  0.00227 0.00331 0.00166 0.00395 0.00664 ...
#>  $ p.adjust       : num  0.241 0.241 0.241 0.241 0.272 ...
#>  $ qvalue         : num  0.231 0.231 0.231 0.231 0.261 ...
#>  $ rank           : num  7 24 28 28 15 15 7 7 7 19 ...
#>  $ leading_edge   : chr  "tags=30%, list=7%, signal=31%" "tags=62%, list=24%, signal=54%" "tags=36%, list=28%, signal=107%" "tags=37%, list=28%, signal=91%" ...
#>  $ core_enrichment: chr  "115201/351/284" "9289/285/181/25/59/51742/1386/405" "328/566/100/9289/54518/9048/285/181/25/51129/9296/59/59272/51742/25814/199/267/2683/9938/976/30817/1386/405/234"| __truncated__ "328/566/100/9289/54518/9048/285/181/25/51129/59/59272/51742/25814/199/267/2683/9938/976/30817/1386/405/23452/10159/51816/374" ...
#> #...Citation
#> S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320 
#> 
#> 
#> attr(,"class")
#> [1] "hd_enrichment"
# Remember that the data is artificial, this is why we use an absurdly high p-value cutoff

# Run GSEA with different ranking variable
enrichment <- hd_gsea(de_results,
                      database = "GO",
                      ontology = "BP",
                      ranked_by = "both",
                      pval_lim = 0.9)
#> 'select()' returned 1:1 mapping between keys and columns
#> using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
#> preparing geneSet collections...
#> GSEA analysis...
#> Warning: There are ties in the preranked stats (22% of the list).
#> The order of those tied genes will be arbitrary, which may produce unexpected results.
#> Warning: All values in the stats vector are greater than zero and scoreType is "std", maybe you should switch to scoreType = "pos".
#> leading edge analysis...
#> done...

# Access the results
head(enrichment$enrichment@result)
#>                    ID                               Description setSize
#> GO:0003008 GO:0003008                            system process      15
#> GO:0080134 GO:0080134          regulation of response to stress      13
#> GO:0090304 GO:0090304            nucleic acid metabolic process      16
#> GO:0044093 GO:0044093 positive regulation of molecular function      10
#> GO:0002252 GO:0002252                   immune effector process      12
#> GO:0044249 GO:0044249             cellular biosynthetic process      40
#>            enrichmentScore      NES     pvalue  p.adjust    qvalue rank
#> GO:0003008       0.7263352 1.324918 0.01822844 0.8108102 0.8108102   19
#> GO:0080134       0.7333143 1.323335 0.02369916 0.8108102 0.8108102   18
#> GO:0090304       0.7151243 1.311893 0.02395296 0.8108102 0.8108102   22
#> GO:0044093       0.7465882 1.307659 0.03423968 0.8108102 0.8108102   19
#> GO:0002252       0.7285557 1.300938 0.03524673 0.8108102 0.8108102   10
#> GO:0044249       0.6657842 1.285208 0.01548580 0.8108102 0.8108102   25
#>                              leading_edge
#> GO:0003008 tags=27%, list=19%, signal=25%
#> GO:0080134 tags=38%, list=18%, signal=36%
#> GO:0090304 tags=44%, list=22%, signal=41%
#> GO:0044093 tags=50%, list=19%, signal=45%
#> GO:0002252 tags=25%, list=10%, signal=26%
#> GO:0044249 tags=38%, list=25%, signal=47%
#>                                                           core_enrichment
#> GO:0003008                                             231/177/81693/8312
#> GO:0080134                                             54/383/177/117/350
#> GO:0090304                                 81/177/9131/333/8312/80755/283
#> GO:0044093                                           81/177/9131/350/8312
#> GO:0002252                                                     54/383/177
#> GO:0044249 81/51/231/1109/54/189/383/177/117/333/350/8312/80755/283/51327