Perform gene set enrichment analysis — do

This function performs gene set enrichment analysis (GSEA) using the clusterProfiler package.

Usage

do_gsea(de_results, database = c("GO", "Reactome"), pval_lim = 0.05)

Arguments

de_results: A tibble containing the results of a differential expression analysis.
database: The database to perform the GSEA. It can be either "GO" or "Reactome".
pval_lim: The p-value threshold to consider a term as significant.

Value

A list containing the results of the GSEA.

Details

The ontology option used when database = "GO" is "ALL".

Examples

# Run Differential Expression Analysis and extract results
control = c("BRC", "CLL", "CRC", "CVX", "ENDC", "GLIOM", "LUNGC", "LYMPH", "MYEL", "OVC", "PRC")
de_res <- do_limma(example_data,
                   example_metadata,
                   case = "AML",
                   control = control,
                   wide = FALSE)
#> Comparing AML with BRC, CLL, CRC, CVX, ENDC, GLIOM, LUNGC, LYMPH, MYEL, OVC, PRC.
de_results <- de_res$de_results

# Run GSEA with Reactome database
do_gsea(de_results,
        database = "GO",
        pval_lim = 0.9)
#> 
#> 
#> 'select()' returned 1:1 mapping between keys and columns
#> using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
#> preparing geneSet collections...
#> GSEA analysis...
#> leading edge analysis...
#> done...
#> #
#> # Gene Set Enrichment Analysis
#> #
#> #...@organism 	 Homo sapiens 
#> #...@setType 	 BP 
#> #...@keytype 	 ENTREZID 
#> #...@geneList 	 Named num [1:100] 1.54 1.48 1.4 1.21 1.12 ...
#>  - attr(*, "names")= chr [1:100] "566" "328" "100" "9289" ...
#> #...nPerm 	 
#> #...pvalues adjusted by 'BH' with cutoff <0.9 
#> #...265 enriched terms found
#> 'data.frame':	265 obs. of  11 variables:
#>  $ ID             : chr  "GO:0051641" "GO:0045184" "GO:0048585" "GO:0033036" ...
#>  $ Description    : chr  "cellular localization" "establishment of protein localization" "negative regulation of response to stimulus" "macromolecule localization" ...
#>  $ setSize        : int  25 10 18 19 15 15 42 10 15 14 ...
#>  $ enrichmentScore: num  -0.552 -0.713 0.657 -0.579 -0.612 ...
#>  $ NES            : num  -1.7 -1.69 1.68 -1.65 -1.64 ...
#>  $ pvalue         : num  0.01297 0.01367 0.00905 0.01525 0.02033 ...
#>  $ p.adjust       : num  0.501 0.501 0.501 0.501 0.501 ...
#>  $ qvalue         : num  0.482 0.482 0.482 0.482 0.482 ...
#>  $ rank           : num  10 10 24 13 10 10 15 20 10 10 ...
#>  $ leading_edge   : chr  "tags=24%, list=10%, signal=29%" "tags=30%, list=10%, signal=30%" "tags=50%, list=24%, signal=46%" "tags=26%, list=13%, signal=28%" ...
#>  $ core_enrichment: chr  "93974/306/115201/10551/351/284" "93974/115201/284" "100/285/25/51742/199/267/9938/59272/405" "55937/93974/115201/10551/284" ...
#> #...Citation
#>  T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
#>  clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
#>  The Innovation. 2021, 2(3):100141 
#> 
# Remember that the data is artificial, this is why we use an absurdly high p-value cutoff