1 Purpose of this analysis

This notebook takes RNA-seq expression data and metadata from refine.bio and identifies differentially expressed genes between two experimental groups.

Differential expression analysis identifies genes with significantly varying expression among experimental groups by comparing the variation among samples within a group to the variation between groups. The simplest version of this analysis is comparing two groups where one of those groups is a control group.

Our refine.bio RNA-seq examples use DESeq2 for these analyses because it handles RNA-seq data well and has great documentation.
Read more about DESeq2 and why we like it on our Getting Started page.

⬇️ Jump to the analysis code ⬇️

2 How to run this example

For general information about our tutorials and the basic software packages you will need, please see our ‘Getting Started’ section. We recommend taking a look at our Resources for Learning R if you have not written code in R before.

2.1 Obtain the `.Rmd` file

To run this example yourself, download the .Rmd for this analysis by clicking this link.

Clicking this link will most likely send this to your downloads folder on your computer. Move this .Rmd file to where you would like this example and its files to be stored.

You can open this .Rmd file in RStudio and follow the rest of these steps from there. (See our section about getting started with R notebooks if you are unfamiliar with .Rmd files.)

2.2 Set up your analysis folders

Good file organization is helpful for keeping your data analysis project on track! We have set up some code that will automatically set up a folder structure for you. Run this next chunk to set up your folders!

If you have trouble running this chunk, see our introduction to using .Rmds for more resources and explanations.

# Create the data folder if it doesn't exist
if (!dir.exists("data")) {
  dir.create("data")
}

# Define the file path to the plots directory
plots_dir <- "plots"

# Create the plots folder if it doesn't exist
if (!dir.exists(plots_dir)) {
  dir.create(plots_dir)
}

# Define the file path to the results directory
results_dir <- "results"

# Create the results folder if it doesn't exist
if (!dir.exists(results_dir)) {
  dir.create(results_dir)
}

In the same place you put this .Rmd file, you should now have three new empty folders called data, plots, and results!

2.3 Obtain the dataset from refine.bio

For general information about downloading data for these examples, see our ‘Getting Started’ section.

Go to this dataset’s page on refine.bio.

Click the “Download Now” button on the right side of this screen.

Fill out the pop up window with your email and our Terms and Conditions:

We are going to use non-quantile normalized data for this analysis. To get this data, you will need to check the box that says “Skip quantile normalization for RNA-seq samples.” Note that this option will only be available for RNA-seq datasets.

It may take a few minutes for the dataset to process. You will get an email when it is ready.

2.4 About the dataset we are using for this example

For this example analysis, we are using RNA-seq data from an acute lymphoblastic leukemia (ALL) mouse lymphoid cell model (Kampen et al. 2019). All of the lymphoid mouse cell samples in this experiment have a human RPL10 gene; three with a reference (wild-type) RPL10 gene and three with the R98S mutation. We will perform our differential expression using these knock-in and wild-type mice designations.

2.5 Place the dataset in your new `data/` folder

refine.bio will send you a download button in the email when it is ready. Follow the prompt to download a zip file that has a name with a series of letters and numbers and ends in .zip. Double clicking should unzip this for you and create a folder of the same name.

For more details on the contents of this folder see these docs on refine.bio.

The <experiment_accession_id> folder has the data and metadata TSV files you will need for this example analysis. Experiment accession ids usually look something like GSE1235 or SRP12345.

Copy and paste the SRP123625 folder into your newly created data/ folder.

2.6 Check out our file structure!

Your new analysis folder should contain:

The example analysis .Rmd you downloaded
A folder called “data” which contains:
- The SRP123625 folder which contains:
  - The gene expression
  - The metadata TSV
A folder for plots (currently empty)
A folder for results (currently empty)

Your example analysis folder should now look something like this (except with respective experiment accession ID and analysis notebook name you are using):

In order for our example here to run without a hitch, we need these files to be in these locations so we’ve constructed a test to check before we get started with the analysis. These chunks will declare your file paths and double check that your files are in the right place.

First we will declare our file paths to our data and metadata files, which should be in our data directory. This is handy to do because if we want to switch the dataset (see next section for more on this) we are using for this analysis, we will only have to change the file path here to get started.

# Define the file path to the data directory
# Replace with the path of the folder the files will be in
data_dir <- file.path("data", "SRP123625")

# Declare the file path to the gene expression matrix file
# inside directory saved as `data_dir`
# Replace with the path to your dataset file
data_file <- file.path(data_dir, "SRP123625.tsv")

# Declare the file path to the metadata file
# inside the directory saved as `data_dir`
# Replace with the path to your metadata file
metadata_file <- file.path(data_dir, "metadata_SRP123625.tsv")

Now that our file paths are declared, we can use the file.exists() function to check that the files are where we specified above.

# Check if the gene expression matrix file is at the path stored in `data_file`
file.exists(data_file)

## [1] TRUE

# Check if the metadata file is at the file path stored in `metadata_file`
file.exists(metadata_file)

## [1] TRUE

If the chunk above printed out FALSE to either of those tests, you won’t be able to run this analysis as is until those files are in the appropriate place.

If the concept of a “file path” is unfamiliar to you; we recommend taking a look at our section about file paths.

3 Using a different refine.bio dataset with this analysis?

If you’d like to adapt an example analysis to use a different dataset from refine.bio, we recommend placing the files in the data/ directory you created and changing the filenames and paths in the notebook to match these files (we’ve put comments to signify where you would need to change the code). We suggest saving plots and results to plots/ and results/ directories, respectively, as these are automatically created by the notebook. From here you can customize this analysis example to fit your own scientific questions and preferences.

4 Differential Expression

4.1 Install libraries

See our Getting Started page with instructions for package installation for a list of the other software you will need, as well as more tips and resources.

In this analysis, we will be using DESeq2 (Love et al. 2014) for the differential expression testing. We will also use EnhancedVolcano (Blighe et al. 2020) for plotting and apeglm (Zhu et al. 2018) for some log fold change estimates in the results table

if (!("DESeq2" %in% installed.packages())) {
  # Install this package if it isn't installed yet
  BiocManager::install("DESeq2", update = FALSE)
}
if (!("EnhancedVolcano" %in% installed.packages())) {
  # Install this package if it isn't installed yet
  BiocManager::install("EnhancedVolcano", update = FALSE)
}
if (!("apeglm" %in% installed.packages())) {
  # Install this package if it isn't installed yet
  BiocManager::install("apeglm", update = FALSE)
}

Attach the libraries we need for this analysis:

# Attach the DESeq2 library
library(DESeq2)

# Attach the ggplot2 library for plotting
library(ggplot2)

# We will need this so we can use the pipe: %>%
library(magrittr)

The jitter plot we make later on with the DESeq2::plotCounts() function involves some randomness. As is good practice when our analysis involves randomness, we will set the seed.

set.seed(12345)

4.2 Import data and metadata

Data downloaded from refine.bio include a metadata tab separated values (TSV) file and a data TSV file. This chunk of code will read the both TSV files and add them as data frames to your environment.

We stored our file paths as objects named metadata_file and data_file in this previous step.

# Read in metadata TSV file
metadata <- readr::read_tsv(metadata_file)

## 
## ── Column specification ──────────────────────────────────────────────
## cols(
##   .default = col_logical(),
##   refinebio_accession_code = col_character(),
##   experiment_accession = col_character(),
##   refinebio_organism = col_character(),
##   refinebio_platform = col_character(),
##   refinebio_source_database = col_character(),
##   refinebio_specimen_part = col_character(),
##   refinebio_subject = col_character(),
##   refinebio_title = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

# Read in data TSV file
expression_df <- readr::read_tsv(data_file) %>%
  tibble::column_to_rownames("Gene")

## 
## ── Column specification ──────────────────────────────────────────────
## cols(
##   Gene = col_character(),
##   SRR6255584 = col_double(),
##   SRR6255585 = col_double(),
##   SRR6255586 = col_double(),
##   SRR6255587 = col_double(),
##   SRR6255588 = col_double(),
##   SRR6255589 = col_double()
## )

Let’s ensure that the metadata and data are in the same sample order.

# Make the data in the order of the metadata
expression_df <- expression_df %>%
  dplyr::select(metadata$refinebio_accession_code)

# Check if this is in the same order
all.equal(colnames(expression_df), metadata$refinebio_accession_code)

## [1] TRUE

The information we need to make the comparison is in the refinebio_title column of the metadata data.frame.

head(metadata$refinebio_title)

## [1] "R98S11_mRNA_Suppl" "R98S13_mRNA_Suppl" "R98S35_mRNA_Suppl"
## [4] "WT28_mRNA_Suppl"   "WT29_mRNA_Suppl"   "WT36_mRNA_Suppl"

4.3 Set up metadata

This dataset includes data from mouse lymphoid cells with human RPL10, with and without a R98S mutation. The mutation status is stored along with other information in a single string (this is not very convenient for us). We need to extract the mutation status information into its own column to make it easier to use.

metadata <- metadata %>%
  # Let's get the RPL10 mutation status from this variable
  dplyr::mutate(mutation_status = dplyr::case_when(
    stringr::str_detect(refinebio_title, "R98S") ~ "R98S",
    stringr::str_detect(refinebio_title, "WT") ~ "reference"
  ))

Let’s take a look at metadata to see if this worked by looking at the refinebio_title and mutation_status columns.

# Let's take a look at the original metadata column's info
# and our new `mutation_status` column
dplyr::select(metadata, refinebio_title, mutation_status)

Before we set up our model in the next step, we want to check if our modeling variable is set correctly. We want our “control” to to be set as the first level in the variable we provide as our experimental variable. Here we will use the str() function to print out a preview of the structure of our variable

# Print out a preview of `mutation_status`
str(metadata$mutation_status)

##  chr [1:6] "R98S" "R98S" "R98S" "reference" "reference" ...

Currently, mutation_status is stored as a character, which is not necessarily what we want. To make sure it is set how we want for the DESeq object and subsequent testing, let’s change it to a factor so we can explicitly set the levels.

In the levels argument, we will list reference first since that is our control group.

# Make mutation_status a factor and set the levels appropriately
metadata <- metadata %>%
  dplyr::mutate(
    # Here we define the values our factor variable can have and their order.
    mutation_status = factor(mutation_status, levels = c("reference", "R98S"))
  )

Note if you don’t specify levels, the factor() function will set levels in alphabetical order – which sometimes means your control group will not be listed first!

Let’s double check if the levels are what we want using the levels() function.

levels(metadata$mutation_status)

## [1] "reference" "R98S"

Yes! reference is the first level as we want it to be. We’re all set and ready to move on to making our DESeq2Dataset object.

4.4 Define a minimum counts cutoff

We want to filter out the genes that have not been expressed or that have low expression counts, since these do not have high enough counts to yield reliable differential expression results. Removing these genes saves on memory usage during the tests. We are going to do some pre-filtering to keep only genes with 10 or more reads in total across the samples.

# Define a minimum counts cutoff and filter the data to include
# only rows (genes) that have total counts above the cutoff
filtered_expression_df <- expression_df %>%
  dplyr::filter(rowSums(.) >= 10)

If you have a bigger dataset, you will probably want to make this cutoff larger.

4.5 Create a DESeq2Dataset

We will be using the DESeq2 package for differential expression testing, which requires us to format our data into a DESeqDataSet object. First we need to prep our gene expression data frame so that all of the count values are integers, making it compatible with the DESeqDataSetFromMatrix() function in the next step.

# round all expression counts
gene_matrix <- round(filtered_expression_df)

Now we need to create a DESeqDataSet from our expression dataset. We use the mutation_status variable we created in the design formula because that will allow us to model the presence/absence of R98S mutation.

ddset <- DESeqDataSetFromMatrix(
  # Here we supply non-normalized count data
  countData = gene_matrix,
  # Supply the `colData` with our metadata data frame
  colData = metadata,
  # Supply our experimental variable to `design`
  design = ~mutation_status
)

## converting counts to integer mode

4.6 Run differential expression analysis

We’ll use the wrapper function DESeq() to do our differential expression analysis. In our DESeq2 object we designated our mutation_status variable as the model argument. Because of this, the DESeq function will use groups defined by mutation_status to test for differential expression.

deseq_object <- DESeq(ddset)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

Let’s extract the results table from the DESeq object.

deseq_results <- results(deseq_object)

Here we will use lfcShrink() function to obtain shrunken log fold change estimates based on negative binomial distribution. This will add the estimates to your results table. Using lfcShrink() can help decrease noise and preserve large differences between groups (it requires that apeglm package be installed) (Zhu et al. 2018).

deseq_results <- lfcShrink(
  deseq_object, # The original DESeq2 object after running DESeq()
  coef = 2, # The log fold change coefficient used in DESeq(); the default is 2.
  res = deseq_results # The original DESeq2 results table
)

## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
##     Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
##     sequence count data: removing the noise and preserving large differences.
##     Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895

Now let’s take a peek at what our new results table looks like.

head(deseq_results)

## log2 fold change (MAP): mutation status R98S vs reference 
## Wald test p-value: mutation status R98S vs reference 
## DataFrame with 6 rows and 5 columns
##                     baseMean log2FoldChange     lfcSE      pvalue
##                    <numeric>      <numeric> <numeric>   <numeric>
## ENSMUSG00000000001 9579.0571     -0.4349384  0.160640 2.59595e-03
## ENSMUSG00000000028 1199.7333      0.0647514  0.134708 6.04429e-01
## ENSMUSG00000000056 1287.5086      0.3243824  0.272978 1.02032e-01
## ENSMUSG00000000058   20.1703      5.0170059  1.515508 6.85780e-05
## ENSMUSG00000000078 4939.6277     -0.9574237  0.234363 4.75060e-06
## ENSMUSG00000000085 1150.9626      0.0929495  0.126941 4.32755e-01
##                           padj
##                      <numeric>
## ENSMUSG00000000001 0.019791734
## ENSMUSG00000000028 0.808664075
## ENSMUSG00000000056 0.283225795
## ENSMUSG00000000058 0.001074535
## ENSMUSG00000000078 0.000113951
## ENSMUSG00000000085 0.682936007

Note it is not filtered or sorted, so we will use tidyverse to do this before saving our results to a file.

# this is of class DESeqResults -- we want a data frame
deseq_df <- deseq_results %>%
  # make into data.frame
  as.data.frame() %>%
  # the gene names are row names -- let's make them a column for easy display
  tibble::rownames_to_column("Gene") %>%
  # add a column for significance threshold results
  dplyr::mutate(threshold = padj < 0.05) %>%
  # sort by statistic -- the highest values will be genes with
  # higher expression in RPL10 mutated samples
  dplyr::arrange(dplyr::desc(log2FoldChange))

Let’s print out the top results.

head(deseq_df)

4.6.1 Check results by plotting one gene

To double check what a differentially expressed gene looks like, we can plot one with DESeq2::plotCounts() function.

plotCounts(ddset, gene = "ENSMUSG00000026623", intgroup = "mutation_status")

The R98S mutated samples have higher expression of this gene than the control group, which helps assure us that the results are showing us what we are looking for.

4.7 Save results to TSV

Write the results table to file.

readr::write_tsv(
  deseq_df,
  file.path(
    results_dir,
    "SRP123625_diff_expr_results.tsv" # Replace with a relevant output file name
  )
)

4.8 Create a volcano plot

We’ll use the EnhancedVolcano package’s main function to plot our data (Blighe et al. 2020).

Here we are plotting the log2FoldChange (which was estimated by lfcShrink step) on the x axis and padj on the y axis. The padj variable are the p values corrected with Benjamini-Hochberg (the default from the results() step).

Because we are using adjusted p values we can feel safe in making our pCutoff argument 0.01 (default is 1e-05).
Take a look at all the options for tailoring this plot using ?EnhancedVolcano.

We will save the plot to our environment as volcano_plot to make it easier to save the figure separately later.

# We'll assign this as `volcano_plot`
volcano_plot <- EnhancedVolcano::EnhancedVolcano(
  deseq_df,
  lab = deseq_df$Gene,
  x = "log2FoldChange",
  y = "padj",
  pCutoff = 0.01 # Loosen the cutoff since we supplied corrected p-values
)

## Registered S3 methods overwritten by 'ggalt':
##   method                  from   
##   grid.draw.absoluteGrob  ggplot2
##   grobHeight.absoluteGrob ggplot2
##   grobWidth.absoluteGrob  ggplot2
##   grobX.absoluteGrob      ggplot2
##   grobY.absoluteGrob      ggplot2

# Print out plot here
volcano_plot

This looks pretty good! Let’s save it to a PNG.

ggsave(
  plot = volcano_plot,
  file.path(plots_dir, "SRP123625_volcano_plot.png")
) # Replace with a plot name relevant to your data

## Saving 7 x 5 in image

Heatmaps are also a pretty common way to show differential expression results. You can take your results from this example and make a heatmap following our heatmap module.

5 Further learning resources about this analysis

DESeq2 vignette
DESeq2 paper (Love et al. 2014)
StatQuest Video: DESeq2, part 1, Library Normalization
The EnhancedVolcano vignette has more examples on how to tailor your volcano plot (Blighe et al. 2020).

6 Session info

At the end of every analysis, before saving your notebook, we recommend printing out your session info. This helps make your code more reproducible by recording what versions of software and packages you used to run this.

# Print session info
sessioninfo::session_info()

## ─ Session info ─────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.5 (2021-03-31)
##  os       Ubuntu 20.04.3 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Etc/UTC                     
##  date     2022-03-01                  
## 
## ─ Packages ─────────────────────────────────────────────────────────
##  package              * version    date       lib source        
##  annotate               1.68.0     2020-10-27 [1] Bioconductor  
##  AnnotationDbi          1.52.0     2020-10-27 [1] Bioconductor  
##  apeglm                 1.12.0     2020-10-27 [1] Bioconductor  
##  ash                    1.0-15     2015-09-01 [1] RSPM (R 4.0.3)
##  assertthat             0.2.1      2019-03-21 [1] RSPM (R 4.0.3)
##  backports              1.2.1      2020-12-09 [1] RSPM (R 4.0.3)
##  bbmle                  1.0.23.1   2020-02-03 [1] RSPM (R 4.0.3)
##  bdsmatrix              1.3-4      2020-01-13 [1] RSPM (R 4.0.3)
##  beeswarm               0.3.1      2021-03-07 [1] RSPM (R 4.0.3)
##  Biobase              * 2.50.0     2020-10-27 [1] Bioconductor  
##  BiocGenerics         * 0.36.1     2021-04-16 [1] Bioconductor  
##  BiocParallel           1.24.1     2020-11-06 [1] Bioconductor  
##  bit                    4.0.4      2020-08-04 [1] RSPM (R 4.0.3)
##  bit64                  4.0.5      2020-08-30 [1] RSPM (R 4.0.3)
##  bitops                 1.0-7      2021-04-24 [1] RSPM (R 4.0.4)
##  blob                   1.2.1      2020-01-20 [1] RSPM (R 4.0.3)
##  bslib                  0.2.5      2021-05-12 [1] RSPM (R 4.0.4)
##  cachem                 1.0.5      2021-05-15 [1] RSPM (R 4.0.4)
##  cli                    2.5.0      2021-04-26 [1] RSPM (R 4.0.4)
##  coda                   0.19-4     2020-09-30 [1] RSPM (R 4.0.3)
##  colorspace             2.0-1      2021-05-04 [1] RSPM (R 4.0.4)
##  crayon                 1.4.1      2021-02-08 [1] RSPM (R 4.0.3)
##  DBI                    1.1.1      2021-01-15 [1] RSPM (R 4.0.3)
##  DelayedArray           0.16.3     2021-03-24 [1] Bioconductor  
##  DESeq2               * 1.30.1     2021-02-19 [1] Bioconductor  
##  digest                 0.6.27     2020-10-24 [1] RSPM (R 4.0.3)
##  dplyr                  1.0.6      2021-05-05 [1] RSPM (R 4.0.4)
##  ellipsis               0.3.2      2021-04-29 [1] RSPM (R 4.0.4)
##  emdbook                1.3.12     2020-02-19 [1] RSPM (R 4.0.0)
##  EnhancedVolcano        1.8.0      2020-10-27 [1] Bioconductor  
##  evaluate               0.14       2019-05-28 [1] RSPM (R 4.0.3)
##  extrafont              0.17       2014-12-08 [1] RSPM (R 4.0.3)
##  extrafontdb            1.0        2012-06-11 [1] RSPM (R 4.0.3)
##  fansi                  0.4.2      2021-01-15 [1] RSPM (R 4.0.3)
##  farver                 2.1.0      2021-02-28 [1] RSPM (R 4.0.3)
##  fastmap                1.1.0      2021-01-25 [1] RSPM (R 4.0.3)
##  genefilter             1.72.1     2021-01-21 [1] Bioconductor  
##  geneplotter            1.68.0     2020-10-27 [1] Bioconductor  
##  generics               0.1.0      2020-10-31 [1] RSPM (R 4.0.3)
##  GenomeInfoDb         * 1.26.7     2021-04-08 [1] Bioconductor  
##  GenomeInfoDbData       1.2.4      2022-03-01 [1] Bioconductor  
##  GenomicRanges        * 1.42.0     2020-10-27 [1] Bioconductor  
##  getopt                 1.20.3     2019-03-22 [1] RSPM (R 4.0.0)
##  ggalt                  0.4.0      2017-02-15 [1] RSPM (R 4.0.0)
##  ggbeeswarm             0.6.0      2017-08-07 [1] RSPM (R 4.0.0)
##  ggplot2              * 3.3.3      2020-12-30 [1] RSPM (R 4.0.3)
##  ggrastr                0.2.3      2021-03-01 [1] RSPM (R 4.0.5)
##  ggrepel                0.9.1      2021-01-15 [1] RSPM (R 4.0.3)
##  glue                   1.4.2      2020-08-27 [1] RSPM (R 4.0.3)
##  gtable                 0.3.0      2019-03-25 [1] RSPM (R 4.0.3)
##  highr                  0.9        2021-04-16 [1] RSPM (R 4.0.4)
##  hms                    1.0.0      2021-01-13 [1] RSPM (R 4.0.3)
##  htmltools              0.5.1.1    2021-01-22 [1] RSPM (R 4.0.3)
##  httr                   1.4.2      2020-07-20 [1] RSPM (R 4.0.3)
##  IRanges              * 2.24.1     2020-12-12 [1] Bioconductor  
##  jquerylib              0.1.4      2021-04-26 [1] RSPM (R 4.0.4)
##  jsonlite               1.7.2      2020-12-09 [1] RSPM (R 4.0.3)
##  KernSmooth             2.23-18    2020-10-29 [2] CRAN (R 4.0.5)
##  knitr                  1.33       2021-04-24 [1] RSPM (R 4.0.4)
##  labeling               0.4.2      2020-10-20 [1] RSPM (R 4.0.3)
##  lattice                0.20-41    2020-04-02 [2] CRAN (R 4.0.5)
##  lifecycle              1.0.0      2021-02-15 [1] RSPM (R 4.0.3)
##  locfit                 1.5-9.4    2020-03-25 [1] RSPM (R 4.0.3)
##  magrittr             * 2.0.1      2020-11-17 [1] RSPM (R 4.0.3)
##  maps                   3.3.0      2018-04-03 [1] RSPM (R 4.0.3)
##  MASS                   7.3-53.1   2021-02-12 [2] CRAN (R 4.0.5)
##  Matrix                 1.3-2      2021-01-06 [2] CRAN (R 4.0.5)
##  MatrixGenerics       * 1.2.1      2021-01-30 [1] Bioconductor  
##  matrixStats          * 0.58.0     2021-01-29 [1] RSPM (R 4.0.3)
##  memoise                2.0.0      2021-01-26 [1] RSPM (R 4.0.3)
##  munsell                0.5.0      2018-06-12 [1] RSPM (R 4.0.3)
##  mvtnorm                1.1-1      2020-06-09 [1] RSPM (R 4.0.3)
##  numDeriv               2016.8-1.1 2019-06-06 [1] RSPM (R 4.0.3)
##  optparse             * 1.6.6      2020-04-16 [1] RSPM (R 4.0.0)
##  pillar                 1.6.1      2021-05-16 [1] RSPM (R 4.0.4)
##  pkgconfig              2.0.3      2019-09-22 [1] RSPM (R 4.0.3)
##  plyr                   1.8.6      2020-03-03 [1] RSPM (R 4.0.3)
##  proj4                  1.0-10.1   2021-01-26 [1] RSPM (R 4.0.3)
##  ps                     1.6.0      2021-02-28 [1] RSPM (R 4.0.3)
##  purrr                  0.3.4      2020-04-17 [1] RSPM (R 4.0.3)
##  R.cache                0.15.0     2021-04-30 [1] RSPM (R 4.0.4)
##  R.methodsS3            1.8.1      2020-08-26 [1] RSPM (R 4.0.3)
##  R.oo                   1.24.0     2020-08-26 [1] RSPM (R 4.0.3)
##  R.utils                2.10.1     2020-08-26 [1] RSPM (R 4.0.3)
##  R6                     2.5.0      2020-10-28 [1] RSPM (R 4.0.3)
##  RColorBrewer           1.1-2      2014-12-07 [1] RSPM (R 4.0.3)
##  Rcpp                   1.0.6      2021-01-15 [1] RSPM (R 4.0.3)
##  RCurl                  1.98-1.3   2021-03-16 [1] RSPM (R 4.0.4)
##  readr                  1.4.0      2020-10-05 [1] RSPM (R 4.0.4)
##  rematch2               2.1.2      2020-05-01 [1] RSPM (R 4.0.3)
##  rlang                  0.4.11     2021-04-30 [1] RSPM (R 4.0.4)
##  rmarkdown              2.8        2021-05-07 [1] RSPM (R 4.0.4)
##  RSQLite                2.2.7      2021-04-22 [1] RSPM (R 4.0.4)
##  rstudioapi             0.13       2020-11-12 [1] RSPM (R 4.0.3)
##  Rttf2pt1               1.3.8      2020-01-10 [1] RSPM (R 4.0.3)
##  S4Vectors            * 0.28.1     2020-12-09 [1] Bioconductor  
##  sass                   0.4.0      2021-05-12 [1] RSPM (R 4.0.4)
##  scales                 1.1.1      2020-05-11 [1] RSPM (R 4.0.3)
##  sessioninfo            1.1.1      2018-11-05 [1] RSPM (R 4.0.3)
##  stringi                1.6.1      2021-05-10 [1] RSPM (R 4.0.4)
##  stringr                1.4.0      2019-02-10 [1] RSPM (R 4.0.3)
##  styler                 1.4.1      2021-03-30 [1] RSPM (R 4.0.4)
##  SummarizedExperiment * 1.20.0     2020-10-27 [1] Bioconductor  
##  survival               3.2-10     2021-03-16 [2] CRAN (R 4.0.5)
##  tibble                 3.1.2      2021-05-16 [1] RSPM (R 4.0.4)
##  tidyselect             1.1.1      2021-04-30 [1] RSPM (R 4.0.4)
##  utf8                   1.2.1      2021-03-12 [1] RSPM (R 4.0.3)
##  vctrs                  0.3.8      2021-04-29 [1] RSPM (R 4.0.4)
##  vipor                  0.4.5      2017-03-22 [1] RSPM (R 4.0.0)
##  withr                  2.4.2      2021-04-18 [1] RSPM (R 4.0.4)
##  xfun                   0.23       2021-05-15 [1] RSPM (R 4.0.4)
##  XML                    3.99-0.6   2021-03-16 [1] RSPM (R 4.0.4)
##  xtable                 1.8-4      2019-04-21 [1] RSPM (R 4.0.3)
##  XVector                0.30.0     2020-10-27 [1] Bioconductor  
##  yaml                   2.2.1      2020-02-01 [1] RSPM (R 4.0.3)
##  zlibbioc               1.36.0     2020-10-27 [1] Bioconductor  
## 
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library

References

Blighe K., S. Rana, and M. Lewis, 2020 EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling. https://github.com/kevinblighe/EnhancedVolcano

Kampen K. R., L. Fancello, T. Girardi, G. Rinaldi, M. Planque, et al., 2019 Translatome analysis reveals altered serine and glycine metabolism in t-cell acute lymphoblastic leukemia cells. Nature Communications 10. https://doi.org/10.1038/s41467-019-10508-2

Love M. I., W. Huber, and S. Anders, 2014 Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biology 15. https://doi.org/10.1186/s13059-014-0550-8

Zhu A., J. G. Ibrahim, and M. I. Love, 2018 Heavy-tailed prior distributions for sequence count data: Removing the noise and preserving large differences. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895

Differential Expression - RNA-seq

CCDL for ALSF

December 2020

1 Purpose of this analysis

2 How to run this example

2.1 Obtain the `.Rmd` file

2.2 Set up your analysis folders

2.3 Obtain the dataset from refine.bio

2.4 About the dataset we are using for this example

2.5 Place the dataset in your new `data/` folder

2.6 Check out our file structure!

3 Using a different refine.bio dataset with this analysis?

4 Differential Expression

4.1 Install libraries

4.2 Import data and metadata

4.3 Set up metadata

4.4 Define a minimum counts cutoff

4.5 Create a DESeq2Dataset

4.6 Run differential expression analysis

4.6.1 Check results by plotting one gene

4.7 Save results to TSV

4.8 Create a volcano plot

5 Further learning resources about this analysis

6 Session info

References

Differential Expression - RNA-seq

CCDL for ALSF

December 2020

1 Purpose of this analysis

2 How to run this example

2.1 Obtain the .Rmd file

2.2 Set up your analysis folders

2.3 Obtain the dataset from refine.bio

2.4 About the dataset we are using for this example

2.5 Place the dataset in your new data/ folder

2.6 Check out our file structure!

3 Using a different refine.bio dataset with this analysis?

4 Differential Expression

4.1 Install libraries

4.2 Import data and metadata

4.3 Set up metadata

4.4 Define a minimum counts cutoff

4.5 Create a DESeq2Dataset

4.6 Run differential expression analysis

4.6.1 Check results by plotting one gene

4.7 Save results to TSV

4.8 Create a volcano plot

5 Further learning resources about this analysis

6 Session info

References

2.1 Obtain the `.Rmd` file

2.5 Place the dataset in your new `data/` folder