Resources for Working with Single-Cell RNA-seq
On this page, we’ve assembled some resources you may find helpful for working with single-cell RNA-seq data.
Table of contents
Obtaining practice datasets
The Single-Cell Pediatric Cancer Atlas (ScPCA)
The Childhood Cancer Data Lab builds and maintains the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, a resource of single-cell/nuclei transcriptomic data generated by ALSF-funded labs and uniformly processed by the Data Lab.
The data here has been quantified and processed into Bioconductor SingleCellExperiment
objects, saved as .rds
files, and are ready for downstream analysis using the tools we have shown in this workshop.
You can read more about how we process data in ScPCA and how you can use ScPCA datasets in our documentation.
The scRNAseq
Bioconductor package
The scRNAseq
Bioconductor package contains dozens of scRNA-seq datasets formatted as SingleCellExperiment
objects.
Tabula Muris data
This is a more extensive set of the Tabula Muris data (mouse tissue) that are used in our “Introduction to scRNA-seq” training.
These samples, already processed by salmon alevin
, can be found in the ~/shared-data/training-data/tabula-muris/alevin
directory.
Metadata, including tissue of origin for each sample (since the sample names themselves are not informative), can be found in ~/training-modules/scRNA-seq/data/tabula-muris/TM_droplet_metadata.csv
.
Note that this data is given at the cell level: simplifying the table to the sample level is a good opportunity to practice some data wrangling skills!
(It is also a CSV
file; don’t forget to use readr::read_csv()
when loading it!)
Human Cell Atlas data
Another potential source for processed single cell data is the Human Cell Atlas (HCA) Data Portal. The data here is from a mix of technologies, including both 10X, Smart-seq2, and DropSeq. The HCA has standardized processing pipelines for 10X and Smart-seq2, though it seems that most of the processed data is 10X, so we recommend focusing on those projects.
To download a data set, first browse or search to find a project of interest. Click on the project name to see an abstract and other information for the project.
You can then select “Project Matrices” from the left side to download the processed single-cell expression data.
Scroll down to the “DCP Generated Matrices” section on the “Project Matrices” page, as the data here will be uniformly processed and in a standard data format.
That format is called loom
, and we can read it into R
in a fairly straightforward way.
Once you find a loom file listed (not all projects have one, unfortunately), you have two options:
-
Click the “Copy download link” button (the tiny clipboard icon) and then use that URL to download the file directly to the RStudio server following these instructions. Be sure to put quotes around the very long URL that is provided, and specify a filename for the download with the
-O
option. -
Download the loom file to your computer (look for the tiny icon with the arrow pointing down) and upload it to the server following these instructions.
Reading loom
format data in R
Once you have a .loom
file on the server, you can use the following commands in R to import the data as a SingleCellExperiment
-compatible object.
loomfile <- file.path("path", "to", "file.loom")
sce <- LoomExperiment::import(loomfile, type = "SingleCellLoomExperiment")
# the first assay matrix should be named "counts"
assayNames(sce)[1] <- "counts"
The last command is to be sure that the main data matrix, which contains count data, has the name that the SingleCellExperiment
commands expect.
The gene and cell identifiers are stored in rowData
and colData
respectively, but those identifiers aren’t used as row names and column names.
To make the format a little closer to what we work with during instruction (and so we can visualize individual genes), we need to do the following:
rownames(sce) <- rowData(sce)$Gene
colnames(sce) <- colData(sce)$CellID
Once that is done, all of the SingleCellExperiment
commands that we have demonstrated should work!
You will want to be sure to look at rowData()
and colData()
, as some of the contents will be different from what we have seen in previous data sets (and may vary among projects).
Some of the QC calculations may have already been performed, but the data will not be filtered or normalized.
You will need to perform those steps on your own.
Additional single-cell RNA-seq resources
This list provides some links to external resources on single-cell RNA-seq analysis methods that may be useful to you as you develop your own single-cell RNA-seq analysis skills and practices. Please note, this is not an exhaustive list! It includes multiple types of resources for various topics in single-cell RNA-seq analysis, but does not represent the complete breadth of analysis topics. Resources are listed by topic and in alphabetical order, not in order of recommendation.
General Single-cell resources
- An introduction to the SingleCellExperiment class - Bioconductor
- Analysis of single cell RNA-seq data - Hemburg Lab
- Best practices for single-cell analysis across modalities - Heumos et al. (2023)
- Orchestrating Single-cell Analysis (OSCA) with Bioconductor - Bioconductor
Alignment and quantification of gene expression
- A like-for-like comparison of lightweight-mapping pipelines for single-cell RNA-seq data pre-processing - Zakeri et al. (2021)
- Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data - He et al. (2022)
- Cell Ranger Overview - 10X Genomics
- Comparative analysis of common alignment tools for single-cell RNA sequencing - Bruning et al. (2022)
Filtering and normalization
- EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data - Lun et al. (2019)
- miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data - Hippen et al. (2021)
- OSCA Basics
- Utilities for handling droplet-based single-cell RNA-seq data: Detecting empty droplets
Dimensionality reduction and clustering
- OSCA chapter on clustering
- OSCA chapter on clustering metrics
- OSCA chapter on dimensionality reduction
- PCA - Principal Component Analysis
- Principal Component Analysis - A Brief Introduction
Cell type annotation
- AUCell: Identifying cells with active gene sets
- Azimuth
- Identifying cell types to interpret scRNA-seq data: how, why and more possibilities - Wang et al. (2020)
- OSCA chapter on cell type annotation
- scType
- The SingleR Book - Bioconductor
- Web resources for cell type annotation - 10X Genomics
CITE-seq
- OSCA chapter on integrating with protein abundance
- Simultaneous epitope and transcriptome measurement in single cells - Stoeckius et al. (2017)
Integrating scRNA-seq samples
- A description of the theory behind the
fastMNN
algorithm - Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors - Haghverdi et al. (2018)
- Benchmarking atlas-level data integration in single-cell genomics - Luecken et al. (2021)
- Fast, sensitive and accurate integration of single-cell data with Harmony - Korsunsky et al. (2019)
- OSCA section on multi-sample analyses
Differential expression analysis
- Batch effects and the effective design of single-cell gene expression studies - Tung et al. (2017)
- Differential gene expression analyses in scRNA-seq data between conditions with biological replicates - 10X Genomics
- Harvard-Chan Bioinformatics Core tutorial on differential expression analysis with DESeq2
- OSCA chapter on DE analyses between conditions