Introduction to Microarray Data

Data analyses are generally not “one size fits all”; this is particularly true when with approaches used to analyze RNA-seq and microarray data. The characteristics of the data produced by these two technologies can be quite different. This tutorial has example analyses organized by technology so you can follow examples that are more closely tailored to the nature of the data at hand.

Table of Contents generated with DocToc

Introduction to microarray technology
- Microarray data strengths:
- Microarray data limitations:
About quantile normalization
More resources on microarray technology:
Common questions
- Why doesn’t the gene I care about show up in this microarray dataset?
References

0.1 Introduction to microarray technology

Microarrays measure gene expression using chips filled with oligonucleotide probes designed to hybridize to labeled RNA samples. After hybridization, the microarrays are scanned, and the fluorescence intensity for each probe is measured. The fluorescence intensity indicates the number of labeled fragments bound and therefore the relative quantity of the transcript the probe is designed for.

(based on diagram from Farina 2020)

There are many different kinds of microarray platforms, which can be broadly separated into single-color and two-color arrays. At this time, refine.bio only supports single-color arrays, so our examples and advice are generally from the perspective of using single-color array. The diagram above shows an overview of the single-color array process which includes extracting the total RNA from a sample, labeling the RNA with fluorescent dye, hybridizing the labels, and scanning the fluorescent image to analyze the fluorescence intensity.

The two most common microarray platforms on refine.bio are Affymetrix GeneChips and Illumina BeadArray. A longer list of specific arrays that are supported by refine.bio can be found here.

As with all experimental methods, microarrays have strengths and limitations that you should consider in regards to your scientific questions.

0.1.1 Microarray data strengths:

Microarrays historically were less expensive than RNA-seq allowing for more replicates and greater statistical power (Tarca et al. 2006).
Microarrays generally had a faster turn-around than RNA-seq (LCSciences 2014).

As a result of these historical advantages, vast quantities of data have been generated worldwide using microarrays. The microarray data compiled by refine.bio includes over 500,000 individual samples across over 25,000 experiments. For many scientific questions, the best available gene expression data may be microarray based!

0.1.2 Microarray data limitations:

If a transcript doesn’t have a probe designed to it on a microarray, it won’t be measured; standard microarrays can’t be used for transcript discovery (Mantione et al. 2014).
A chip’s probe designs are only as up to date as the genome annotation at the time it was designed (Mantione et al. 2014).
As is true for all techniques that involve nucleotide hybridization (RNA-seq too); microarray probes come with some biases depending on their nucleotide sequence composition (like GC bias).

refine.bio drops outdated probes based on Brainarray’s annotation packages and uses SCAN normalization methods prior to your downloads to help address these probe nucleotide composition biases (Dai et al. 2005; Piccolo et al. 2012).

0.2 About quantile normalization

Microarray chips are generally experimentally processed in groups of chips - this can lead to experimental batch effects. To minimize this, all refine.bio microarray data downloads come quantile-normalized which enables more confident comparisons of expression levels among experiments. The use of different microarray chips is also a type of batch effect, but quantile normalization allows us to compare data from different chips to a limited degree, if we proceed with caution. See the refine.bio docs for more about the microarray processing steps, including the quantile normalization.

0.3 More resources on microarray technology:

0.4 Common questions

0.4.1 Why doesn’t the gene I care about show up in this microarray dataset?

A common and simple reason you may not see your gene of interest is that the microarray chip used in the experiment you are analyzing did not originally have probes designed to target that gene.
refine.bio uses Brainarray packages to annotate the microarray probe data for microarray platforms that have this available (Dai et al. 2005). This annotation identifies which probes map to which genes according to the updated transcriptome annotation (which likely changed since the microarray’s probes were first designed). Some probes may have since become obsolete (they do not bind reliably to one location according to updated genome annotations), which may result in the gene they targeted being removed. If your gene of interest was covered by the original probes of the microarray chip and the version of the Brainarray package used maintains that it is still accurate, your gene of interest will show up in the Gene column. You can find your dataset’s microarray chip and Brainarray version information on the refine.bio dataset page and by following these instructions.
One additional reason you may not see a gene of interest applies only if you are refine.bio’s aggregate by species option. When data is aggregated across different platforms, only the genes common to all experiments aggregated will be kept.

References

Dai M., P. Wang, A. D. Boyd, G. Kostov, B. Athey, et al., 2005 Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research 33: e175. https://doi.org/10.1093/nar/gni179

Farina D., 2020 Gene expression analysis and DNA microarray assays. https://www.youtube.com/watch?v=Hv5flUOsE0s

Govindarajan R., J. Duraiyan, K. Kaliyappan, and M. Palanisamy, 2012 Microarray and its applications. Journal of Pharmacy and Bioallied Sciences 4: S310–312. https://doi.org/10.4103/0975-7406.100283

LCSciences, 2014 Microarray or RNA sequencing? https://www.lcsciences.com/news/microarray-or-rna-sequencing/

Mantione K. J., R. M. Kream, H. Kuzelova, R. Ptacek, J. Raboch, et al., 2014 Comparing bioinformatic gene expression profiling methods: Microarray and RNA-Seq. Medical Science Monitor Basic Research 20: 138–142. https://doi.org/10.12659/MSMBR.892101

Piccolo S. R., Y. Sun, J. D. Campbell, M. E. Lenburg, A. H. Bild, et al., 2012 A single-sample microarray normalization method to facilitate personalized-medicine workflows. Genomics 100: 337–344. https://doi.org/10.1016/j.ygeno.2012.08.003

Sánchez A., and M. C. R. de Villa, 2008 A tutorial review of microarray data analysis. http://www.ub.edu/stat/docencia/bioinformatica/microarrays/ADM/slides/A_Tutorial_Review_of_Microarray_data_Analysis_17-06-08.pdf

Slonim D. K., and I. Yanai, 2009 Getting started in gene expression microarray analysis. PLOS Computational Biology 5: e1000543. https://doi.org/10.1371/journal.pcbi.1000543

Tarca A. L., R. Romero, and S. Draghici, 2006 Analysis of microarray experiments of gene expression profiling. American Journal of Obstetrics and Gynecology 195: 373–388. https://doi.org/10.1016/j.ajog.2006.07.001

Wu H., Introduction to gene expression microarray data analysis. http://web1.sph.emory.edu/users/hwu30/teaching/bioc/GE1.pdf