Data analyses are generally not “one size fits all”; this is particularly true when with approaches used to analyze RNA-seq and microarray data. The characteristics of the data produced by these two technologies can be quite different. This tutorial has example analyses organized by technology so you can follow examples that are more closely tailored to the nature of the data at hand.
Table of Contents generated with DocToc
Microarrays measure gene expression using chips filled with oligonucleotide probes designed to hybridize to labeled RNA samples. After hybridization, the microarrays are scanned, and the fluorescence intensity for each probe is measured. The fluorescence intensity indicates the number of labeled fragments bound and therefore the relative quantity of the transcript the probe is designed for.
(based on diagram from Farina 2020)
There are many different kinds of microarray platforms, which can be broadly separated into single-color and two-color arrays. At this time, refine.bio only supports single-color arrays, so our examples and advice are generally from the perspective of using single-color array. The diagram above shows an overview of the single-color array process which includes extracting the total RNA from a sample, labeling the RNA with fluorescent dye, hybridizing the labels, and scanning the fluorescent image to analyze the fluorescence intensity.
The two most common microarray platforms on refine.bio are Affymetrix GeneChips and Illumina BeadArray. A longer list of specific arrays that are supported by refine.bio can be found here.
As with all experimental methods, microarrays have strengths and limitations that you should consider in regards to your scientific questions.
As a result of these historical advantages, vast quantities of data have been generated worldwide using microarrays. The microarray data compiled by refine.bio includes over 500,000 individual samples across over 25,000 experiments. For many scientific questions, the best available gene expression data may be microarray based!
refine.bio drops outdated probes based on Brainarray’s annotation packages and uses SCAN normalization methods prior to your downloads to help address these probe nucleotide composition biases (Dai et al. 2005; Piccolo et al. 2012).
Microarray chips are generally experimentally processed in groups of chips - this can lead to experimental batch effects. To minimize this, all refine.bio microarray data downloads come quantile-normalized which enables more confident comparisons of expression levels among experiments. The use of different microarray chips is also a type of batch effect, but quantile normalization allows us to compare data from different chips to a limited degree, if we proceed with caution. See the refine.bio docs for more about the microarray processing steps, including the quantile normalization.
A common and simple reason you may not see your gene of interest is that the microarray chip used in the experiment you are analyzing did not originally have probes designed to target that gene.
refine.bio uses Brainarray packages to annotate the microarray probe data for microarray platforms that have this available (Dai et al. 2005). This annotation identifies which probes map to which genes according to the updated transcriptome annotation (which likely changed since the microarray’s probes were first designed). Some probes may have since become obsolete (they do not bind reliably to one location according to updated genome annotations), which may result in the gene they targeted being removed. If your gene of interest was covered by the original probes of the microarray chip and the version of the Brainarray package used maintains that it is still accurate, your gene of interest will show up in the Gene column. You can find your dataset’s microarray chip and Brainarray version information on the refine.bio dataset page and by following these instructions.
One additional reason you may not see a gene of interest applies only if you are refine.bio’s aggregate by species option. When data is aggregated across different platforms, only the genes common to all experiments aggregated will be kept.