Getting Started

Table of Contents generated with DocToc

About refine.bio
About how this tutorial book is structured
What you need to install to run the examples
- Required software
How to get the data for these examples
How to use R Markdown Documents
An important note about file paths and .Rmds
Resources for learning R
Additional resources from the CCDL

0.1 About refine.bio

refine.bio is a collection of ready-to-use transcriptomic data! Publicly available gene expression data is uniformly processed and made available for easy download. This tutorial has follow-along examples for use with refine.bio downloads.

0.2 About how this tutorial book is structured

This tutorial contains follow-along analysis examples for refine.bio gene expression data. The analysis examples are organized by technology: “Microarray” or “RNA-seq”, in addition to an “Advanced Topics” section. Each analysis is self-contained and provides information with how to obtain the dataset used in the example from refine.bio. We encourage you to download the .Rmd and follow the “getting started” section in the example before diving into our analysis examples.

Each analysis contains:

A README that introduces you to the analyses, concepts, requirements, and workflows for that module.
An R Notebook which consists of:
- An R markdown (.Rmd) file(s) that you can use in RStudio to run the analysis and contains it’s own “getting started” section which describes how to download the example dataset from refine.bio.
- An nb.html file that is the resulting output of the .Rmd file rendered as an HTML file.

0.3 What you need to install to run the examples

Our tutorial module requires you to install the following software to be able to run the examples. These requirements can be installed by following the instructions at these links in the section below. We recommend installing devtools from CRAN (e.g. running install.packages("devtools") in R).

0.3.1 Required software:

R (R Core Team 2019)
RStudio an integrated development environment for working with R and R Notebooks (RStudio Team 2020).
Bioconductor for installing packages from this package repository (Huber et al. 2015).
tidyverse - we opt for using tidyverse packages for handling and cleaning the data (Wickham et al. 2019).
devtools will be required for installing some packages from GitHub (Wickham et al. 2020).

Each example analysis has additional required packages but will check if they are installed and will install them if they are not. Depending on your particular configuration, sometimes problems occur (here’s a list of the most common R package installation errors and what they mean). Each example module directory will include further instructions for how to follow along with the examples.

0.4 How to get the data for these examples

Each analysis will include a link to the example dataset’s page on refine.bio as well as step-by-step instructions of how to set up your data and folders for your analysis. As you become more comfortable with the analysis, we encourage you to switch out this example dataset for another that may better suit your interests and scientific questions. We’ve placed comments where the code will absolutely need to be changed for a different dataset, but you may find other parts of the analysis you will want to alter to fit your needs. Each analysis also includes links to resources and documentation that we hope helps you make useful alterations to the analysis. You will likely encounter errors and bugs as you make changes; don’t let this discourage you, it is all a part of the process. See this debugging guide for our list of the most common R errors and how you might be able to address them.

0.5 How to use R Markdown Documents

We use R Markdown throughout this tutorial. R Markdown documents are helpful for scientific code by allowing you to keep detailed notes, code, and output in one place.

When you execute code within the notebook, the results appear beneath the code. Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

print("The output from the code in this chunk will print below!")

## [1] "The output from the code in this chunk will print below!"

R Markdown documents also have the added benefit of producing HTML file output that is nicely rendered and easy to read. Saving one of our R Markdowns (the files that end in .Rmd) on your computer will create an HTML file containing the code and output to be saved alongside it (will end in .nb.html).

See this guide using to R Notebooks for more information about inserting and executing code chunks.

0.6 An important note about file paths and `.Rmd`s

A current directory refers to where R will look for files or otherwise operate. Directories are the folders of files on your computer; a file path is the series of folders leading to the file you are referring to. R Markdown documents have the current directory always set as wherever the .Rmd file itself is saved. This means all file paths specified in the .Rmd must be specified relative to the location of the .Rmd.

For more practice with setting file paths in .Rmd files see these:

0.7 Resources for learning R

0.8 Additional resources from the CCDL

References

Huber W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, et al., 2015 Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods 12: 115–121. https://doi.org/10.1038/nmeth.3252

R Core Team, 2019 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org

RStudio Team, 2020 RStudio: Integrated development environment for R. RStudio, PBC., Boston, MA. http://www.rstudio.com/

Wickham H., M. Averick, J. Bryan, W. Chang, L. D. McGowan, et al., 2019 Welcome to the tidyverse. Journal of Open Source Software 4: 1686. https://doi.org/10.21105/joss.01686

Wickham H., J. Hester, and W. Chang, 2020 devtools: Tools to make developing R packages easier. https://CRAN.R-project.org/package=devtools