Resources for Consultation Sessions
Our consultation sessions are designed for you to spend your time as you would like with the support of your instructors.
You can review instruction materials, work through exercise notebooks we provide, or analyze your own data.
On this page, we’ve assembled some resources you may find helpful during these sessions. For more information about the structure of consultation sessions and how to get help, please review the Consultation sessions section of the Workshop Structure page.
Table of contents
- Module cheatsheets
- Working with your own data on RStudio Server
- Obtaining practice datasets
- Transcriptome indices for common organisms
Module cheatsheets
The modules-cheatsheets
directory of our GitHub repository of training materials contains Markdown and PDF version of “cheatsheets” that contain tables with short descriptions of functions used throughout training modules and links to documentation.
- Introduction to R/Tidyverse cheatsheet (View Markdown, Download PDF)
- RNA-seq module cheatsheet (View Markdown, Download PDF)
You may find these helpful as you review instruction material or work through exercise notebooks.
Working with your own data on RStudio Server
If you plan on working with your own data during consultations, you may find it helpful to leverage our RStudio Server.
You can find instructions for working with your own data on RStudio Server here. Please read these instructions carefully.
We’ll reiterate some of the most important points from those instructions below:
- As a rule of thumb, if the data you are working with would be released under controlled access, rather than made publicly available, at the time of publication of a scientific manuscript, it should not be uploaded to our RStudio Server.
- You have 50GB of space available. If your data is larger than 50GB, please contact an instructor.
Obtaining practice datasets
The Childhood Cancer Data Lab built and maintains refine.bio, resource of uniformly processed transcriptomic data obtained from publicly available sources. You can read more about how we process data in refine.bio in our documentation.
If you’d like to practice some of the skills we cover in training or gain some additional ones like making highly customizable heatmaps with the ComplexHeatmap
R package, obtaining processed data from refine.bio is a great starting point.
You may find our examples for working with data from refine.bio helpful as you look to practice and expand your skills.
In those examples, we use R Notebooks, which you will be familiar with from this workshop!
See the “Getting Started” section for more information on utilizing our example notebooks.
You can start by searching refine.bio for keywords relevant to your scientific questions and filtering to the organism and technology (e.g., microarray vs. RNA-seq; refine.bio contains both) you’re interested in.
Microarray data
In this version of our workshop, we won’t work with microarray data, but there are hundreds of thousands of microarray samples available from refine.bio.
The microarray datasets you can download from the refine.bio web interface are quantile normalized and are distributed as TSV files you can read into R using functions we cover in training.
The metadata is included in your download in a TSV file that starts with metadata_
.
You may find our microarray example notebooks for working with refine.bio data helpful with your differential expression, dimension reduction, or GSEA pathway analyses, to name a few.
Note that our training material is largely RNA-seq specific, so if you obtain microarray data from refine.bio, you should not expect to use the exact same code as we do in training.
RNA-seq data
The format of the RNA-seq data you can download from the web interface of refine.bio data will be slightly different from what we cover in training.
We summarize our data to the gene-level with tximport
(docs), instead of tximeta
like we do in training, before you download it.
When downloading your data from refine.bio, we recommend checking the box that says “Skip quantile normalization for RNA-seq samples” to obtain the non-quantile normalized data (docs).
You will receive a TSV file that you can use as the counts matrix input for a DESeqDataSet
.
Note that we recommend using non-quantile normalized data as the DESeqDataSetFromMatrix()
function requires a counts matrix and not a matrix with normalized or corrected value like TPMs.
See this nice DESeq2
vignette for more information (Love et al., 2014).
You can read more about using DESeq2
with refine.bio data here.
If you identify an RNA-seq experiment from refine.bio that you’d like to use with DESeq2
(specifically with DESeqDataSetFromMatrix()
), you can begin by following the instructions in the “Obtain the dataset from refine.bio” section of any of our RNA-seq refinebio example notebooks and continue following the steps up until the “Create a DESeqDataset” section, as these steps remain pretty much the same across notebooks. Note that you will also need the associated metadata file, which is included in your download in a TSV file that starts with metadata_
, to create a DESeqDataSet
object.
Transcriptome indices for common organisms
During the introduction to bulk RNA-seq module, we used human data and included a transcriptome index for human in training-modules/RNA-seq/index/
.
If you have non-human RNA-seq data you would like to quantify, or want to experiment with slightly different index parameters, we have prepared indices for select organisms relevant to the study of childhood cancer.
Note that for most of these, you will need to perform a few extra steps to read in the quantification data with tximeta
after performing quantification.
Please see the notebook RNA-seq/00c-tximeta_other_species.Rmd
for details on how to set this up.
If you have RNA-seq data for an organism that is not listed, please post in the training-specific Slack channel and let your instructors know.
Homo sapiens
Ensembl GRCh38 (hg38) v95
File description | File use | File path |
---|---|---|
Human Salmon index -k 23 |
Salmon index for use with salmon quant ; appropriate for reads shorter than 75bp or for increased sensitivity with --validateMappings (docs) |
~/shared-data/reference/refgenie/hg38_cdna/salmon_index/short |
Human Salmon index -k 31 |
Salmon index for use with salmon quant ; appropriate for reads 75bp or longer (docs) |
~/shared-data/reference/refgenie/hg38_cdna/salmon_index/long |
Mus musculus
Ensembl GRCm38 (mm10) v95
File description | File use | File path |
---|---|---|
Mouse Salmon index -k 23 |
Salmon index for use with salmon quant ; appropriate for reads shorter than 75bp or for increased sensitivity with --validateMappings (docs) |
~/shared-data/reference/refgenie/mm10_cdna/salmon_index/short |
Mouse Salmon index -k 31 |
Salmon index for use with salmon quant ; appropriate for reads 75bp or longer (docs) |
~/shared-data/reference/refgenie/mm10_cdna/salmon_index/long |
Danio rerio
Ensembl GRCz11 v95
File description | File use | File path |
---|---|---|
Zebrafish Salmon index -k 23 |
Salmon index for use with salmon quant ; appropriate for reads shorter than 75bp or for increased sensitivity with --validateMappings (docs) |
~/shared-data/reference/refgenie/z11_cdna/salmon_index/short |
Zebrafish Salmon index -k 31 |
Salmon index for use with salmon quant ; appropriate for reads 75bp or longer (docs) |
~/shared-data/reference/refgenie/z11_cdna/salmon_index/long |
Canis lupus familiaris
Ensembl CanFam3.1 v95
File description | File use | File path |
---|---|---|
Dog Salmon index -k 23 |
Salmon index for use with salmon quant ; appropriate for reads shorter than 75bp or for increased sensitivity with --validateMappings (docs) |
~/shared-data/reference/refgenie/CanFam3p1_cdna/salmon_index/short |
Dog Salmon index -k 31 |
Salmon index for use with salmon quant ; appropriate for reads 75bp or longer (docs) |
~/shared-data/reference/refgenie/CanFam3p1_cdna/salmon_index/long |