On this page, we’ve assembled some resources you may find helpful for working with single-cell RNA-seq data.

Table of contents

Obtaining practice datasets

The Single-Cell Pediatric Cancer Atlas (ScPCA)

The Childhood Cancer Data Lab builds and maintains the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, a resource of single-cell/nuclei transcriptomic data generated by ALSF-funded labs and uniformly processed by the Data Lab. The data here has been quantified and processed into Bioconductor SingleCellExperiment objects, saved as .rds files, and are ready for downstream analysis using the tools we have shown in this workshop. You can read more about how we process data in ScPCA and how you can use ScPCA datasets in our documentation.

The scRNAseq Bioconductor package

The scRNAseq Bioconductor package contains dozens of scRNA-seq datasets formatted as SingleCellExperiment objects.

Tabula Muris data

This is a more extensive set of the Tabula Muris data (mouse tissue) that are used in our “Introduction to scRNA-seq” training. These samples, already processed by salmon alevin, can be found in the ~/shared-data/training-data/tabula-muris/alevin directory. Metadata, including tissue of origin for each sample (since the sample names themselves are not informative), can be found in ~/training-modules/scRNA-seq/data/tabula-muris/TM_droplet_metadata.csv. Note that this data is given at the cell level: simplifying the table to the sample level is a good opportunity to practice some data wrangling skills! (It is also a CSV file; don’t forget to use readr::read_csv() when loading it!)

Human Cell Atlas data

Another potential source for processed single cell data is the Human Cell Atlas (HCA) Data Portal. The data here is from a mix of technologies, including both 10X, Smart-seq2, and DropSeq. The HCA has standardized processing pipelines for 10X and Smart-seq2, though it seems that most of the processed data is 10X, so we recommend focusing on those projects.

To download a data set, first browse or search to find a project of interest. Click on the project name to see an abstract and other information for the project.

You can then select “Project Matrices” from the left side to download the processed single-cell expression data. Scroll down to the “DCP Generated Matrices” section on the “Project Matrices” page, as the data here will be uniformly processed and in a standard data format. That format is called loom, and we can read it into R in a fairly straightforward way. Once you find a loom file listed (not all projects have one, unfortunately), you have two options:

  1. Click the “Copy download link” button (the tiny clipboard icon) and then use that URL to download the file directly to the RStudio server following these instructions. Be sure to put quotes around the very long URL that is provided, and specify a filename for the download with the -O option.

  2. Download the loom file to your computer (look for the tiny icon with the arrow pointing down) and upload it to the server following these instructions.

Reading loom format data in R

Once you have a .loom file on the server, you can use the following commands in R to import the data as a SingleCellExperiment-compatible object.

loomfile <-  file.path("path", "to", "file.loom")
sce <- LoomExperiment::import(loomfile, type = "SingleCellLoomExperiment")
# the first assay matrix should be named "counts"
assayNames(sce)[1] <- "counts"

The last command is to be sure that the main data matrix, which contains count data, has the name that the SingleCellExperiment commands expect.

The gene and cell identifiers are stored in rowData and colData respectively, but those identifiers aren’t used as row names and column names. To make the format a little closer to what we work with during instruction (and so we can visualize individual genes), we need to do the following:

rownames(sce) <- rowData(sce)$Gene
colnames(sce) <- colData(sce)$CellID

Once that is done, all of the SingleCellExperiment commands that we have demonstrated should work! You will want to be sure to look at rowData() and colData(), as some of the contents will be different from what we have seen in previous data sets (and may vary among projects). Some of the QC calculations may have already been performed, but the data will not be filtered or normalized. You will need to perform those steps on your own.

Additional single-cell RNA-seq resources

This list provides some links to external resources on single-cell RNA-seq analysis methods that may be useful to you as you develop your own single-cell RNA-seq analysis skills and practices. Please note, this is not an exhaustive list! It includes multiple types of resources for various topics in single-cell RNA-seq analysis, but does not represent the complete breadth of analysis topics. Resources are listed by topic and in alphabetical order, not in order of recommendation.

General Single-cell resources

Alignment and quantification of gene expression

Filtering and normalization

Dimensionality reduction and clustering

Cell type annotation

CITE-seq

Integrating scRNA-seq samples

Differential expression analysis

Differential abundance analysis