This notebook will demonstrate how to:
clusterProfiler
packageenrichplot
packageIn this notebook, we will perform Gene Set Enrichment Analysis (GSEA) on the neuroblastoma cell line differential gene expression (DGE) results we generated during the RNA-seq module.
To refresh our memory, we analyzed data from Harenza et al.
(2017) and specifically tested for DGE between with MYCN
amplified cell lines and non-amplified cell lines using
DESeq2
. We have a table of results that contains our log2
fold changes and adjusted p-values.
GSEA is a functional class scoring (FCS) approach to pathway analysis that was first introduced in Subramanian et al. (2005). The rationale behind FCS approaches is that small changes in individual genes that participate in the same biological process or pathway can be significant and of biological interest. FCS methods are better suited for identifying these pathways that show coordinated changes than ORA. In ORA, we pick a cutoff that typically only captures genes with large individual changes.
There are 3 general steps in FCS methods (Khatri et al. 2012):
clusterProfiler
for GSEA, see
Intro
to DGE: Functional Analysis. from Harvard Chan Bioinformatics Core
Training.clusterProfiler
here uses
fgsea
(Fast Gene Set Enrichment Analysis) under the hood.
You can read more about fgsea in Korotkevich et al.
(2021).# Package to run GSEA
library(clusterProfiler)
clusterProfiler v4.12.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
Attaching package: 'clusterProfiler'
The following object is masked from 'package:stats':
filter
# Package that contains the MSigDB gene sets in tidy format
library(msigdbr)
# Where the DGE results are stored
input_dir <- file.path("..", "RNA-seq", "results", "NB-cell")
# We will create a directory to specifically hold our GSEA results if it does
# not yet exist
output_dir <- file.path("results", "NB-cell")
if (!dir.exists(output_dir)) {
dir.create(output_dir, recursive = TRUE)
}
# DGE results
dge_results_file <- file.path(input_dir,
"NB-cell_DESeq_amplified_v_nonamplified_results.tsv")
# GSEA pathway-level scores and statistics
gsea_results_file <- file.path(output_dir,
"NB-cell_gsea_results.tsv")
Adapted from refine.bio examples