Objectives

This notebook will demonstrate how to:

  • Normalize expression counts to better compare expression among cells
  • Explore the effects of normalization on variation among cells

In this notebook, we’ll continue with processing the same dataset that we have been working with, moving onto normalization of scRNA-seq count data that we have already done quality-control analyses of.

For this tutorial, we will be using a pair of single-cell analysis specific R packages: scater and scran to work with our data. This tutorial is in part based on the scran tutorial.

Roadmap: QC and filtering
Roadmap: QC and filtering

Set Up

Load the libraries we will be using, and set the random number generation seed value for reproducibility.

# Set seed for reproducibility
set.seed(1234)

# GGPlot2 for the plots
library(ggplot2)

# Packages for single cell processing
library(scater)
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: 'MatrixGenerics'
The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
    tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors

Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':

    findMatches
The following objects are masked from 'package:base':

    expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':

    rowMedians
The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians
Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
Loading required package: scuttle
library(scran)

Now let’s set up the files we will be using:

# main data directory
data_dir <- file.path("data", "tabula-muris")

# Filtered count matrix file from previous notebook
filtered_sce_file <- file.path(data_dir, "filtered", "filtered_sce.rds")

# Metadata file location
metadata_file <- file.path(data_dir, "TM_droplet_metadata.csv")

# Output directory for normalized data
norm_dir <- file.path(data_dir, "normalized")
fs::dir_create(norm_dir)

Read in the filtered count matrix and metadata

bladder_sce <- readr::read_rds(filtered_sce_file)
sc_metadata <- readr::read_csv(metadata_file)
Rows: 70118 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): cell, channel, mouse.id, tissue, subtissue, mouse.sex, cell_ontolog...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Adding more metadata to the SCE object

Because the Tabula Muris project is a well-studied data set, we actually have some cell type information for this data set that we can refer to.

Note that we would normally NOT have this information until later in the analysis pipeline! Nonetheless, adding it here will be useful for visualizing the results of our normalization (and demonstrating how one might add metadata to the SingleCellExperiment object).

# get the column (cell) metadata (this includes earlier QC stats!)
# and convert to a data frame
cell_info <- data.frame(colData(bladder_sce)) |>
  # convert the row names of this data frame to a separate column
  tibble::rownames_to_column("barcode")

cell_metadata <- sc_metadata |>
  # filter to just the sample we are working with
  dplyr::filter(channel == "10X_P4_3") |>
  # extract the 16 nt cell barcodes from the `cell` column
  dplyr::mutate(barcode = stringr::str_sub(cell, start= -16)) |>
  # choose only the columns we want to add
  dplyr::select(barcode, cell_ontology_class, free_annotation)

# Join the tables together, using `left_join()` to preserve all rows in cell_info
cell_info <- cell_info |>
  dplyr::left_join(cell_metadata)
Joining with `by = join_by(barcode)`

Check that the sample info accession ids are still the same as the columns of our data.

all.equal(cell_info$barcode, colnames(bladder_sce))
[1] TRUE

Now we can add that data back to the SingleCellExperiment object. To keep with the format of that object, we have to convert our table to a DataFrame object in order for this to work. Just to keep things confusing, a DataFrame is not the same as a data.frame that we have been using throughout. We also need to be sure to include the row.names argument to keep those properly attached.

Note that this will replace all of the previous column (cell) metadata, which is part of the reason that we pulled out all previous column data content first.

# add new metadata data back to `bladder_sce`
colData(bladder_sce) <- DataFrame(cell_info, row.names = cell_info$barcode)

Normalization of count data

In whatever data we are working with, we are always looking to maximize biological variance and minimize technical variance. A primary source of technical variation we are concerned with is the variation in library sizes among our samples. While different cells may have different total transcript counts, it seems more likely that the primary source of variation that we see is due to library construction, amplification, and sequencing.

This is where normalization methods usually come into the workflow. The distribution of the counts that we saw in the previous notebook, and in particular the fact that the count data is noisy with many zero counts, makes normalization particularly tricky. To handle this noise, we normalize cells in groups with other cells like them; a method introduced in Lun et al. (2016).

Briefly, we first cluster the cells to find groups of similar cells, then compute normalization factors based on the sums of expression in those groups. The group normalization is then applied back to the individual cells within the group to create a normalized count matrix. In this case, we will also log-transform the normalized counts to get a less skewed distribution of expression measures. Note that because of the zero counts, the logNormCounts() function will add a pseudocount of 1 to each value before computing the log.

# Step 1) Group cells with other like cells by clustering.
qclust <- scran::quickCluster(bladder_sce)
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values
# Step 2) Compute sum factors for each cell cluster grouping.
bladder_sce <- scran::computeSumFactors(bladder_sce, clusters = qclust)

# Step 3) Normalize using these pooled sum factors and log transform.
bladder_sce <- scater::logNormCounts(bladder_sce)

Compare normalized data to count data

One way to determine whether our normalization yields biologically relevant results is to plot it and see if similarly labeled samples and cells end up together. Because plotting expression for thousands genes together isn’t practical, we will reduce the dimensions of our data using Principal Components Analysis (PCA).

We will also make the same plot with our unnormalized data, to visualize the effect of normalization on our sample. We’ll do this comparison twice:

  • Once coloring the points by their total UMI count
  • Once coloring the points based on their cell labels

Before plotting the unnormalized data, we will log transform the raw counts to make their scaling more comparable to the normalized data. To do this we will use the log1p() function, which is specifically designed for the case where we want to add 1 to all of our values before taking the log, as we do here. (We could do something like log(counts + 1), but this is both more efficient and more accurate.)

# Use PCA for dimension reduction of cells' scran normalized data
norm_pca <- scater::calculatePCA(bladder_sce)

# PCA on the raw counts, log transformed
log_pca <- counts(bladder_sce) |> # get the raw counts
  log1p() |> # log transform to make these more comparable to the normalized values
  scater::calculatePCA() # calculate PCA scores

Note that we are using scater::calculatePCA() two different ways here: once on the full bladder_sce object, and once on just the counts matrix. When we use calculatePCA() on the object, it automatically uses the log normalized matrix from inside the object.

Next we will arrange the PCA scores for plotting, adding a column for each of the total UMI counts and the cell type labels so we can color each point of the plot.

# Set up the PCA scores for plotting
norm_pca_scores <- data.frame(norm_pca,
                              geo_accession = rownames(norm_pca),
                              total_umi = bladder_sce$sum,
                              cell_type = bladder_sce$cell_ontology_class)
log_pca_scores <- data.frame(log_pca,
                             geo_accession = rownames(log_pca),
                             total_umi = bladder_sce$sum,
                             cell_type = bladder_sce$cell_ontology_class)

First, we will plot the unnormalized PCA scores with their total UMI counts:

# Now plot counts pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = total_umi)) +
  geom_point() +
  labs(title = "Log counts (unnormalized) PCA scores",
       color = "Total UMI count")  +
  scale_color_viridis_c() +
  theme_bw()

We’ve plotted the unnormalized data for you. Knowing that we want the same graph, but different data, use the above template to plot the normalized data. Feel free to customize the plot with a different theme or color scheme!

Let’s plot the norm_pca_scores data:

ggplot(norm_pca_scores, aes(x = PC1, y = PC2, color = total_umi)) +
  geom_point() +
  labs(title = "Normalized log counts PCA scores",
       color = "Total UMI count") +
  scale_color_viridis_c() +
  theme_bw()

Do you see an effect from the normalization when comparing these plots?

Now, let’s plot these two sets of PCA scores again, but colored by cell type. Do you see an effect from the normalization when comparing these plots?

# First, plot the normalized pca
ggplot(norm_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
  geom_point() +
  labs(title = "Normalized log counts PCA scores",
       color = "Cell Type") +
  scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
  theme_bw()

# Next, plot log count pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
  geom_point() +
  labs(title = "Log counts (unnormalized) PCA scores",
       color = "Cell Type") +
  scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
  theme_bw()

Save the normalized data to tsv file

In case we wanted to return to this data later, let’s save the normalized data to a tsv file. In order to do this we need to extract our normalized counts from bladder_sce. Refer back to the SingleCellExperiment figure above to determine why we are using this logcounts() function.

Recall that readr::write_tsv requires a data frame so we need to convert the logcounts matrix to a data frame. We will actually have to do this in two steps: first by making the sparse matrix to a standard R matrix, then converting that to a data frame.

# Save this gene matrix to a tsv file
logcounts(bladder_sce) |>
  as.matrix() |>
  as.data.frame() |>
  readr::write_tsv(file.path(norm_dir, "scran_norm_gene_matrix.tsv"))

We may want to return to our normalized bladder_sce object in the future, so we will also save our data in an RDS file so that we can re-load it into our R environment as a SingleCellExperiment object.

# Save the data as an RDS
readr::write_rds(bladder_sce, file.path(norm_dir, "normalized_bladder_sce.rds"))
LS0tCnRpdGxlOiAiTm9ybWFsaXppbmcgc2NSTkEtc2VxIGRhdGEiCmF1dGhvcjogQ0NETCBmb3IgQUxTRgpkYXRlOiAyMDIxCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgojIyBPYmplY3RpdmVzCgpUaGlzIG5vdGVib29rIHdpbGwgZGVtb25zdHJhdGUgaG93IHRvOgoKLSBOb3JtYWxpemUgZXhwcmVzc2lvbiBjb3VudHMgdG8gYmV0dGVyIGNvbXBhcmUgZXhwcmVzc2lvbiBhbW9uZyBjZWxscwotIEV4cGxvcmUgdGhlIGVmZmVjdHMgb2Ygbm9ybWFsaXphdGlvbiBvbiB2YXJpYXRpb24gYW1vbmcgY2VsbHMKCi0tLQoKSW4gdGhpcyBub3RlYm9vaywgd2UnbGwgY29udGludWUgd2l0aCBwcm9jZXNzaW5nIHRoZSBzYW1lIGRhdGFzZXQgdGhhdCB3ZSBoYXZlIGJlZW4gd29ya2luZyB3aXRoLCBtb3Zpbmcgb250byBub3JtYWxpemF0aW9uIG9mIHNjUk5BLXNlcSBjb3VudCBkYXRhIHRoYXQgd2UgaGF2ZSBhbHJlYWR5IGRvbmUgcXVhbGl0eS1jb250cm9sIGFuYWx5c2VzIG9mLgoKRm9yIHRoaXMgdHV0b3JpYWwsIHdlIHdpbGwgYmUgdXNpbmcgYSBwYWlyIG9mIHNpbmdsZS1jZWxsIGFuYWx5c2lzIHNwZWNpZmljClIgcGFja2FnZXM6IGBzY2F0ZXJgIGFuZCBgc2NyYW5gIHRvIHdvcmsgd2l0aCBvdXIgZGF0YS4KVGhpcyB0dXRvcmlhbCBpcyBpbiBwYXJ0IGJhc2VkIG9uIHRoZSBbc2NyYW4KdHV0b3JpYWxdKGh0dHBzOi8vYmlvY29uZHVjdG9yLm9yZy9wYWNrYWdlcy9yZWxlYXNlL2Jpb2MvdmlnbmV0dGVzL3NjcmFuL2luc3QvZG9jL3NjcmFuLmh0bWwpLgoKIVtSb2FkbWFwOiBRQyBhbmQgZmlsdGVyaW5nXShkaWFncmFtcy9yb2FkbWFwX3NpbmdsZV9xY19ub3JtX2FsZXZpbi5wbmcpCgojIyBTZXQgVXAKCkxvYWQgdGhlIGxpYnJhcmllcyB3ZSB3aWxsIGJlIHVzaW5nLCBhbmQgc2V0IHRoZSByYW5kb20gbnVtYmVyIGdlbmVyYXRpb24gc2VlZCB2YWx1ZSBmb3IgcmVwcm9kdWNpYmlsaXR5LgoKYGBge3Igc2V0dXB9CiMgU2V0IHNlZWQgZm9yIHJlcHJvZHVjaWJpbGl0eQpzZXQuc2VlZCgxMjM0KQoKIyBHR1Bsb3QyIGZvciB0aGUgcGxvdHMKbGlicmFyeShnZ3Bsb3QyKQoKIyBQYWNrYWdlcyBmb3Igc2luZ2xlIGNlbGwgcHJvY2Vzc2luZwpsaWJyYXJ5KHNjYXRlcikKbGlicmFyeShzY3JhbikKYGBgCgpOb3cgbGV0J3Mgc2V0IHVwIHRoZSBmaWxlcyB3ZSB3aWxsIGJlIHVzaW5nOgoKYGBge3IgZmlsZXBhdGhzfQojIG1haW4gZGF0YSBkaXJlY3RvcnkKZGF0YV9kaXIgPC0gZmlsZS5wYXRoKCJkYXRhIiwgInRhYnVsYS1tdXJpcyIpCgojIEZpbHRlcmVkIGNvdW50IG1hdHJpeCBmaWxlIGZyb20gcHJldmlvdXMgbm90ZWJvb2sKZmlsdGVyZWRfc2NlX2ZpbGUgPC0gZmlsZS5wYXRoKGRhdGFfZGlyLCAiZmlsdGVyZWQiLCAiZmlsdGVyZWRfc2NlLnJkcyIpCgojIE1ldGFkYXRhIGZpbGUgbG9jYXRpb24KbWV0YWRhdGFfZmlsZSA8LSBmaWxlLnBhdGgoZGF0YV9kaXIsICJUTV9kcm9wbGV0X21ldGFkYXRhLmNzdiIpCgojIE91dHB1dCBkaXJlY3RvcnkgZm9yIG5vcm1hbGl6ZWQgZGF0YQpub3JtX2RpciA8LSBmaWxlLnBhdGgoZGF0YV9kaXIsICJub3JtYWxpemVkIikKZnM6OmRpcl9jcmVhdGUobm9ybV9kaXIpCmBgYAoKCiMjIFJlYWQgaW4gdGhlIGZpbHRlcmVkIGNvdW50IG1hdHJpeCBhbmQgbWV0YWRhdGEKCmBgYHtyIHJlYWRfZGF0YX0KYmxhZGRlcl9zY2UgPC0gcmVhZHI6OnJlYWRfcmRzKGZpbHRlcmVkX3NjZV9maWxlKQpzY19tZXRhZGF0YSA8LSByZWFkcjo6cmVhZF9jc3YobWV0YWRhdGFfZmlsZSkKYGBgCgojIyMgQWRkaW5nIG1vcmUgbWV0YWRhdGEgdG8gdGhlIFNDRSBvYmplY3QKCkJlY2F1c2UgdGhlIFRhYnVsYSBNdXJpcyBwcm9qZWN0IGlzIGEgd2VsbC1zdHVkaWVkIGRhdGEgc2V0LCB3ZSBhY3R1YWxseSBoYXZlIHNvbWUgY2VsbCB0eXBlIGluZm9ybWF0aW9uIGZvciB0aGlzIGRhdGEgc2V0IHRoYXQgd2UgY2FuIHJlZmVyIHRvLgoKTm90ZSB0aGF0IHdlIHdvdWxkIG5vcm1hbGx5ICoqTk9UKiogaGF2ZSB0aGlzIGluZm9ybWF0aW9uIHVudGlsIGxhdGVyIGluIHRoZSBhbmFseXNpcyBwaXBlbGluZSEKTm9uZXRoZWxlc3MsIGFkZGluZyBpdCBoZXJlIHdpbGwgYmUgdXNlZnVsIGZvciB2aXN1YWxpemluZyB0aGUgcmVzdWx0cyBvZiBvdXIgbm9ybWFsaXphdGlvbiAoYW5kIGRlbW9uc3RyYXRpbmcgaG93IG9uZSBtaWdodCBhZGQgbWV0YWRhdGEgdG8gdGhlIGBTaW5nbGVDZWxsRXhwZXJpbWVudGAgb2JqZWN0KS4KCgpgYGB7ciBzYW1wbGVfaW5mb30KIyBnZXQgdGhlIGNvbHVtbiAoY2VsbCkgbWV0YWRhdGEgKHRoaXMgaW5jbHVkZXMgZWFybGllciBRQyBzdGF0cyEpCiMgYW5kIGNvbnZlcnQgdG8gYSBkYXRhIGZyYW1lCmNlbGxfaW5mbyA8LSBkYXRhLmZyYW1lKGNvbERhdGEoYmxhZGRlcl9zY2UpKSB8PgogICMgY29udmVydCB0aGUgcm93IG5hbWVzIG9mIHRoaXMgZGF0YSBmcmFtZSB0byBhIHNlcGFyYXRlIGNvbHVtbgogIHRpYmJsZTo6cm93bmFtZXNfdG9fY29sdW1uKCJiYXJjb2RlIikKCmNlbGxfbWV0YWRhdGEgPC0gc2NfbWV0YWRhdGEgfD4KICAjIGZpbHRlciB0byBqdXN0IHRoZSBzYW1wbGUgd2UgYXJlIHdvcmtpbmcgd2l0aAogIGRwbHlyOjpmaWx0ZXIoY2hhbm5lbCA9PSAiMTBYX1A0XzMiKSB8PgogICMgZXh0cmFjdCB0aGUgMTYgbnQgY2VsbCBiYXJjb2RlcyBmcm9tIHRoZSBgY2VsbGAgY29sdW1uCiAgZHBseXI6Om11dGF0ZShiYXJjb2RlID0gc3RyaW5ncjo6c3RyX3N1YihjZWxsLCBzdGFydD0gLTE2KSkgfD4KICAjIGNob29zZSBvbmx5IHRoZSBjb2x1bW5zIHdlIHdhbnQgdG8gYWRkCiAgZHBseXI6OnNlbGVjdChiYXJjb2RlLCBjZWxsX29udG9sb2d5X2NsYXNzLCBmcmVlX2Fubm90YXRpb24pCgojIEpvaW4gdGhlIHRhYmxlcyB0b2dldGhlciwgdXNpbmcgYGxlZnRfam9pbigpYCB0byBwcmVzZXJ2ZSBhbGwgcm93cyBpbiBjZWxsX2luZm8KY2VsbF9pbmZvIDwtIGNlbGxfaW5mbyB8PgogIGRwbHlyOjpsZWZ0X2pvaW4oY2VsbF9tZXRhZGF0YSkKYGBgCgpDaGVjayB0aGF0IHRoZSBzYW1wbGUgaW5mbyBhY2Nlc3Npb24gaWRzIGFyZSBzdGlsbCB0aGUgc2FtZSBhcyB0aGUgY29sdW1ucyBvZiBvdXIgZGF0YS4KCmBgYHtyIGNoZWNrX3NhbXBsZWluZm8sIGxpdmUgPSBUUlVFfQphbGwuZXF1YWwoY2VsbF9pbmZvJGJhcmNvZGUsIGNvbG5hbWVzKGJsYWRkZXJfc2NlKSkKYGBgCgpOb3cgd2UgY2FuIGFkZCB0aGF0IGRhdGEgYmFjayB0byB0aGUgYFNpbmdsZUNlbGxFeHBlcmltZW50YCBvYmplY3QuClRvIGtlZXAgd2l0aCB0aGUgZm9ybWF0IG9mIHRoYXQgb2JqZWN0LCB3ZSBoYXZlIHRvIGNvbnZlcnQgb3VyIHRhYmxlIHRvIGEgYERhdGFGcmFtZWAgb2JqZWN0IGluIG9yZGVyIGZvciB0aGlzIHRvIHdvcmsuCkp1c3QgdG8ga2VlcCB0aGluZ3MgY29uZnVzaW5nLCBhIGBEYXRhRnJhbWVgIGlzIG5vdCB0aGUgc2FtZSBhcyBhIGBkYXRhLmZyYW1lYCB0aGF0IHdlIGhhdmUgYmVlbiB1c2luZyB0aHJvdWdob3V0LgpXZSBhbHNvIG5lZWQgdG8gYmUgc3VyZSB0byBpbmNsdWRlIHRoZSBgcm93Lm5hbWVzYCBhcmd1bWVudCB0byBrZWVwIHRob3NlIHByb3Blcmx5IGF0dGFjaGVkLgoKTm90ZSB0aGF0IHRoaXMgd2lsbCByZXBsYWNlIGFsbCBvZiB0aGUgcHJldmlvdXMgY29sdW1uIChjZWxsKSBtZXRhZGF0YSwgd2hpY2ggaXMgcGFydCBvZiB0aGUgcmVhc29uIHRoYXQgd2UgcHVsbGVkIG91dCBhbGwgcHJldmlvdXMgY29sdW1uIGRhdGEgY29udGVudCBmaXJzdC4KCmBgYHtyIHJlcGxhY2VfY29sZGF0YSwgbGl2ZSA9IFRSVUV9CiMgYWRkIG5ldyBtZXRhZGF0YSBkYXRhIGJhY2sgdG8gYGJsYWRkZXJfc2NlYApjb2xEYXRhKGJsYWRkZXJfc2NlKSA8LSBEYXRhRnJhbWUoY2VsbF9pbmZvLCByb3cubmFtZXMgPSBjZWxsX2luZm8kYmFyY29kZSkKYGBgCgoKIyMgTm9ybWFsaXphdGlvbiBvZiBjb3VudCBkYXRhCgpJbiB3aGF0ZXZlciBkYXRhIHdlIGFyZSB3b3JraW5nIHdpdGgsIHdlIGFyZSBhbHdheXMgbG9va2luZyB0byBtYXhpbWl6ZSBiaW9sb2dpY2FsIHZhcmlhbmNlIGFuZCBtaW5pbWl6ZSB0ZWNobmljYWwgdmFyaWFuY2UuCkEgcHJpbWFyeSBzb3VyY2Ugb2YgdGVjaG5pY2FsIHZhcmlhdGlvbiB3ZSBhcmUgY29uY2VybmVkIHdpdGggaXMgdGhlIHZhcmlhdGlvbiBpbiBsaWJyYXJ5IHNpemVzIGFtb25nIG91ciBzYW1wbGVzLgpXaGlsZSBkaWZmZXJlbnQgY2VsbHMgbWF5IGhhdmUgZGlmZmVyZW50IHRvdGFsIHRyYW5zY3JpcHQgY291bnRzLCBpdCBzZWVtcyBtb3JlIGxpa2VseSB0aGF0IHRoZSBwcmltYXJ5IHNvdXJjZSBvZiB2YXJpYXRpb24gdGhhdCB3ZSBzZWUgaXMgZHVlIHRvIGxpYnJhcnkgY29uc3RydWN0aW9uLCBhbXBsaWZpY2F0aW9uLCBhbmQgc2VxdWVuY2luZy4KClRoaXMgaXMgd2hlcmUgbm9ybWFsaXphdGlvbiBtZXRob2RzIHVzdWFsbHkgY29tZSBpbnRvIHRoZSB3b3JrZmxvdy4KVGhlIGRpc3RyaWJ1dGlvbiBvZiB0aGUgY291bnRzIHRoYXQgd2Ugc2F3IGluIHRoZSBwcmV2aW91cyBub3RlYm9vaywgYW5kIGluIHBhcnRpY3VsYXIgdGhlIGZhY3QgdGhhdCB0aGUgY291bnQgZGF0YSBpcyBub2lzeSB3aXRoIG1hbnkgemVybyBjb3VudHMsIG1ha2VzIG5vcm1hbGl6YXRpb24gcGFydGljdWxhcmx5IHRyaWNreS4KVG8gaGFuZGxlIHRoaXMgbm9pc2UsIHdlIG5vcm1hbGl6ZSBjZWxscyBpbiBncm91cHMgd2l0aCBvdGhlciBjZWxscyBsaWtlIHRoZW07IGEgbWV0aG9kIGludHJvZHVjZWQgaW4gW0x1biAqZXQgYWwuKiAoMjAxNildKGh0dHBzOi8vZ2Vub21lYmlvbG9neS5iaW9tZWRjZW50cmFsLmNvbS9hcnRpY2xlcy8xMC4xMTg2L3MxMzA1OS0wMTYtMDk0Ny03KS4KCkJyaWVmbHksIHdlIGZpcnN0IGNsdXN0ZXIgdGhlIGNlbGxzIHRvIGZpbmQgZ3JvdXBzIG9mIHNpbWlsYXIgY2VsbHMsIHRoZW4gY29tcHV0ZSBub3JtYWxpemF0aW9uIGZhY3RvcnMgYmFzZWQgb24gdGhlIHN1bXMgb2YgZXhwcmVzc2lvbiBpbiB0aG9zZSBncm91cHMuClRoZSBncm91cCBub3JtYWxpemF0aW9uIGlzIHRoZW4gYXBwbGllZCBiYWNrIHRvIHRoZSBpbmRpdmlkdWFsIGNlbGxzIHdpdGhpbiB0aGUgZ3JvdXAgdG8gY3JlYXRlIGEgbm9ybWFsaXplZCBjb3VudCBtYXRyaXguCkluIHRoaXMgY2FzZSwgd2Ugd2lsbCBhbHNvIGxvZy10cmFuc2Zvcm0gdGhlIG5vcm1hbGl6ZWQgY291bnRzIHRvIGdldCBhIGxlc3Mgc2tld2VkIGRpc3RyaWJ1dGlvbiBvZiBleHByZXNzaW9uIG1lYXN1cmVzLgpOb3RlIHRoYXQgYmVjYXVzZSBvZiB0aGUgemVybyBjb3VudHMsIHRoZSBgbG9nTm9ybUNvdW50cygpYCBmdW5jdGlvbiB3aWxsIGFkZCBhIHBzZXVkb2NvdW50IG9mIDEgdG8gZWFjaCB2YWx1ZSBiZWZvcmUgY29tcHV0aW5nIHRoZSBsb2cuCgpgYGB7ciBzY2Vfbm9ybWFsaXplfQojIFN0ZXAgMSkgR3JvdXAgY2VsbHMgd2l0aCBvdGhlciBsaWtlIGNlbGxzIGJ5IGNsdXN0ZXJpbmcuCnFjbHVzdCA8LSBzY3Jhbjo6cXVpY2tDbHVzdGVyKGJsYWRkZXJfc2NlKQoKIyBTdGVwIDIpIENvbXB1dGUgc3VtIGZhY3RvcnMgZm9yIGVhY2ggY2VsbCBjbHVzdGVyIGdyb3VwaW5nLgpibGFkZGVyX3NjZSA8LSBzY3Jhbjo6Y29tcHV0ZVN1bUZhY3RvcnMoYmxhZGRlcl9zY2UsIGNsdXN0ZXJzID0gcWNsdXN0KQoKIyBTdGVwIDMpIE5vcm1hbGl6ZSB1c2luZyB0aGVzZSBwb29sZWQgc3VtIGZhY3RvcnMgYW5kIGxvZyB0cmFuc2Zvcm0uCmJsYWRkZXJfc2NlIDwtIHNjYXRlcjo6bG9nTm9ybUNvdW50cyhibGFkZGVyX3NjZSkKYGBgCgojIyMgQ29tcGFyZSBub3JtYWxpemVkIGRhdGEgdG8gY291bnQgZGF0YQoKT25lIHdheSB0byBkZXRlcm1pbmUgd2hldGhlciBvdXIgbm9ybWFsaXphdGlvbiB5aWVsZHMgYmlvbG9naWNhbGx5IHJlbGV2YW50IHJlc3VsdHMgaXMgdG8gcGxvdCBpdCBhbmQgc2VlIGlmIHNpbWlsYXJseSBsYWJlbGVkIHNhbXBsZXMgYW5kIGNlbGxzIGVuZCB1cCB0b2dldGhlci4KQmVjYXVzZSBwbG90dGluZyBleHByZXNzaW9uIGZvciB0aG91c2FuZHMgZ2VuZXMgdG9nZXRoZXIgaXNuJ3QgcHJhY3RpY2FsLCB3ZSB3aWxsIHJlZHVjZSB0aGUgZGltZW5zaW9ucyBvZiBvdXIgZGF0YSB1c2luZyBQcmluY2lwYWwgQ29tcG9uZW50cyBBbmFseXNpcyAoUENBKS4KCldlIHdpbGwgYWxzbyBtYWtlIHRoZSBzYW1lIHBsb3Qgd2l0aCBvdXIgKnVubm9ybWFsaXplZCogZGF0YSwgdG8gdmlzdWFsaXplIHRoZSBlZmZlY3Qgb2Ygbm9ybWFsaXphdGlvbiBvbiBvdXIgc2FtcGxlLgpXZSdsbCBkbyB0aGlzIGNvbXBhcmlzb24gdHdpY2U6CgotIE9uY2UgY29sb3JpbmcgdGhlIHBvaW50cyBieSB0aGVpciB0b3RhbCBVTUkgY291bnQKLSBPbmNlIGNvbG9yaW5nIHRoZSBwb2ludHMgYmFzZWQgb24gdGhlaXIgY2VsbCBsYWJlbHMKCkJlZm9yZSBwbG90dGluZyB0aGUgdW5ub3JtYWxpemVkIGRhdGEsIHdlIHdpbGwgbG9nIHRyYW5zZm9ybSB0aGUgcmF3IGNvdW50cyB0byBtYWtlIHRoZWlyIHNjYWxpbmcgbW9yZSBjb21wYXJhYmxlIHRvIHRoZSBub3JtYWxpemVkIGRhdGEuClRvIGRvIHRoaXMgd2Ugd2lsbCB1c2UgdGhlIGBsb2cxcCgpYCBmdW5jdGlvbiwgd2hpY2ggaXMgc3BlY2lmaWNhbGx5IGRlc2lnbmVkIGZvciB0aGUgY2FzZSB3aGVyZSB3ZSB3YW50IHRvIGFkZCAxIHRvIGFsbCBvZiBvdXIgdmFsdWVzIGJlZm9yZSB0YWtpbmcgdGhlIGxvZywgYXMgd2UgZG8gaGVyZS4KKFdlIGNvdWxkIGRvIHNvbWV0aGluZyBsaWtlIGBsb2coY291bnRzICsgMSlgLCBidXQgdGhpcyBpcyBib3RoIG1vcmUgZWZmaWNpZW50IGFuZCBtb3JlIGFjY3VyYXRlLikKCgpgYGB7ciBwY2F9CiMgVXNlIFBDQSBmb3IgZGltZW5zaW9uIHJlZHVjdGlvbiBvZiBjZWxscycgc2NyYW4gbm9ybWFsaXplZCBkYXRhCm5vcm1fcGNhIDwtIHNjYXRlcjo6Y2FsY3VsYXRlUENBKGJsYWRkZXJfc2NlKQoKIyBQQ0Egb24gdGhlIHJhdyBjb3VudHMsIGxvZyB0cmFuc2Zvcm1lZApsb2dfcGNhIDwtIGNvdW50cyhibGFkZGVyX3NjZSkgfD4gIyBnZXQgdGhlIHJhdyBjb3VudHMKICBsb2cxcCgpIHw+ICMgbG9nIHRyYW5zZm9ybSB0byBtYWtlIHRoZXNlIG1vcmUgY29tcGFyYWJsZSB0byB0aGUgbm9ybWFsaXplZCB2YWx1ZXMKICBzY2F0ZXI6OmNhbGN1bGF0ZVBDQSgpICMgY2FsY3VsYXRlIFBDQSBzY29yZXMKCmBgYAoKTm90ZSB0aGF0IHdlIGFyZSB1c2luZyBgc2NhdGVyOjpjYWxjdWxhdGVQQ0EoKWAgdHdvIGRpZmZlcmVudCB3YXlzIGhlcmU6IG9uY2Ugb24gdGhlIGZ1bGwgYGJsYWRkZXJfc2NlYCBvYmplY3QsIGFuZCBvbmNlIG9uIGp1c3QgdGhlIGBjb3VudHNgIG1hdHJpeC4KV2hlbiB3ZSB1c2UgYGNhbGN1bGF0ZVBDQSgpYCBvbiB0aGUgb2JqZWN0LCBpdCBhdXRvbWF0aWNhbGx5IHVzZXMgdGhlIGxvZyBub3JtYWxpemVkIG1hdHJpeCBmcm9tIGluc2lkZSB0aGUgb2JqZWN0LgoKTmV4dCB3ZSB3aWxsIGFycmFuZ2UgdGhlIFBDQSBzY29yZXMgZm9yIHBsb3R0aW5nLCBhZGRpbmcgYSBjb2x1bW4gZm9yIGVhY2ggb2YgdGhlIHRvdGFsIFVNSSBjb3VudHMgYW5kIHRoZSBjZWxsIHR5cGUgbGFiZWxzIHNvIHdlIGNhbiBjb2xvciBlYWNoIHBvaW50IG9mIHRoZSBwbG90LgoKYGBge3IgcGNhX2RmfQojIFNldCB1cCB0aGUgUENBIHNjb3JlcyBmb3IgcGxvdHRpbmcKbm9ybV9wY2Ffc2NvcmVzIDwtIGRhdGEuZnJhbWUobm9ybV9wY2EsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGdlb19hY2Nlc3Npb24gPSByb3duYW1lcyhub3JtX3BjYSksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRvdGFsX3VtaSA9IGJsYWRkZXJfc2NlJHN1bSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY2VsbF90eXBlID0gYmxhZGRlcl9zY2UkY2VsbF9vbnRvbG9neV9jbGFzcykKbG9nX3BjYV9zY29yZXMgPC0gZGF0YS5mcmFtZShsb2dfcGNhLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgIGdlb19hY2Nlc3Npb24gPSByb3duYW1lcyhsb2dfcGNhKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICB0b3RhbF91bWkgPSBibGFkZGVyX3NjZSRzdW0sCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY2VsbF90eXBlID0gYmxhZGRlcl9zY2UkY2VsbF9vbnRvbG9neV9jbGFzcykKYGBgCgpGaXJzdCwgd2Ugd2lsbCBwbG90IHRoZSB1bm5vcm1hbGl6ZWQgUENBIHNjb3JlcyB3aXRoIHRoZWlyIHRvdGFsIFVNSSBjb3VudHM6CgpgYGB7ciBwY2FfcGxvdH0KIyBOb3cgcGxvdCBjb3VudHMgcGNhCmdncGxvdChsb2dfcGNhX3Njb3JlcywgYWVzKHggPSBQQzEsIHkgPSBQQzIsIGNvbG9yID0gdG90YWxfdW1pKSkgKwogIGdlb21fcG9pbnQoKSArCiAgbGFicyh0aXRsZSA9ICJMb2cgY291bnRzICh1bm5vcm1hbGl6ZWQpIFBDQSBzY29yZXMiLAogICAgICAgY29sb3IgPSAiVG90YWwgVU1JIGNvdW50IikgICsKICBzY2FsZV9jb2xvcl92aXJpZGlzX2MoKSArCiAgdGhlbWVfYncoKQpgYGAKCldlJ3ZlIHBsb3R0ZWQgdGhlIHVubm9ybWFsaXplZCBkYXRhIGZvciB5b3UuCktub3dpbmcgdGhhdCB3ZSB3YW50IHRoZSBzYW1lIGdyYXBoLCBidXQgZGlmZmVyZW50IGRhdGEsIHVzZSB0aGUgYWJvdmUgdGVtcGxhdGUgdG8gcGxvdCB0aGUgbm9ybWFsaXplZCBkYXRhLgpGZWVsIGZyZWUgdG8gY3VzdG9taXplIHRoZSBwbG90IHdpdGggYSBkaWZmZXJlbnQgdGhlbWUgb3IgY29sb3Igc2NoZW1lIQoKTGV0J3MgcGxvdCB0aGUgYG5vcm1fcGNhX3Njb3Jlc2AgZGF0YToKCmBgYHtyIG5vcm1fcGNhX3Bsb3QsIGxpdmUgPSBUUlVFfQpnZ3Bsb3Qobm9ybV9wY2Ffc2NvcmVzLCBhZXMoeCA9IFBDMSwgeSA9IFBDMiwgY29sb3IgPSB0b3RhbF91bWkpKSArCiAgZ2VvbV9wb2ludCgpICsKICBsYWJzKHRpdGxlID0gIk5vcm1hbGl6ZWQgbG9nIGNvdW50cyBQQ0Egc2NvcmVzIiwKICAgICAgIGNvbG9yID0gIlRvdGFsIFVNSSBjb3VudCIpICsKICBzY2FsZV9jb2xvcl92aXJpZGlzX2MoKSArCiAgdGhlbWVfYncoKQpgYGAKCkRvIHlvdSBzZWUgYW4gZWZmZWN0IGZyb20gdGhlIG5vcm1hbGl6YXRpb24gd2hlbiBjb21wYXJpbmcgdGhlc2UgcGxvdHM/CgoKCk5vdywgbGV0J3MgcGxvdCB0aGVzZSB0d28gc2V0cyBvZiBQQ0Egc2NvcmVzIGFnYWluLCBidXQgY29sb3JlZCBieSBjZWxsIHR5cGUuCkRvIHlvdSBzZWUgYW4gZWZmZWN0IGZyb20gdGhlIG5vcm1hbGl6YXRpb24gd2hlbiBjb21wYXJpbmcgdGhlc2UgcGxvdHM/CgpgYGB7ciBjZWxsdHlwZV9wY2FfcGxvdHN9CiMgRmlyc3QsIHBsb3QgdGhlIG5vcm1hbGl6ZWQgcGNhCmdncGxvdChub3JtX3BjYV9zY29yZXMsIGFlcyh4ID0gUEMxLCB5ID0gUEMyLCBjb2xvciA9IGNlbGxfdHlwZSkpICsKICBnZW9tX3BvaW50KCkgKwogIGxhYnModGl0bGUgPSAiTm9ybWFsaXplZCBsb2cgY291bnRzIFBDQSBzY29yZXMiLAogICAgICAgY29sb3IgPSAiQ2VsbCBUeXBlIikgKwogIHNjYWxlX2NvbG9yX2JyZXdlcihwYWxldHRlID0gIkRhcmsyIiwgbmEudmFsdWUgPSAiZ3JleTcwIikgKyAjIGFkZCBhIHZpc3VhbGx5IGRpc3RpbmN0IGNvbG9yIHBhbGV0dGUKICB0aGVtZV9idygpCgojIE5leHQsIHBsb3QgbG9nIGNvdW50IHBjYQpnZ3Bsb3QobG9nX3BjYV9zY29yZXMsIGFlcyh4ID0gUEMxLCB5ID0gUEMyLCBjb2xvciA9IGNlbGxfdHlwZSkpICsKICBnZW9tX3BvaW50KCkgKwogIGxhYnModGl0bGUgPSAiTG9nIGNvdW50cyAodW5ub3JtYWxpemVkKSBQQ0Egc2NvcmVzIiwKICAgICAgIGNvbG9yID0gIkNlbGwgVHlwZSIpICsKICBzY2FsZV9jb2xvcl9icmV3ZXIocGFsZXR0ZSA9ICJEYXJrMiIsIG5hLnZhbHVlID0gImdyZXk3MCIpICsgIyBhZGQgYSB2aXN1YWxseSBkaXN0aW5jdCBjb2xvciBwYWxldHRlCiAgdGhlbWVfYncoKQpgYGAKCgoKIyMgU2F2ZSB0aGUgbm9ybWFsaXplZCBkYXRhIHRvIHRzdiBmaWxlCgpJbiBjYXNlIHdlIHdhbnRlZCB0byByZXR1cm4gdG8gdGhpcyBkYXRhIGxhdGVyLCBsZXQncyBzYXZlIHRoZSBub3JtYWxpemVkIGRhdGEKdG8gYSB0c3YgZmlsZS4KSW4gb3JkZXIgdG8gZG8gdGhpcyB3ZSBuZWVkIHRvIGV4dHJhY3Qgb3VyIG5vcm1hbGl6ZWQgY291bnRzIGZyb20gYGJsYWRkZXJfc2NlYC4KUmVmZXIgYmFjayB0byB0aGUgYFNpbmdsZUNlbGxFeHBlcmltZW50YCBmaWd1cmUgYWJvdmUgdG8gZGV0ZXJtaW5lIHdoeSB3ZSBhcmUgdXNpbmcgdGhpcyBgbG9nY291bnRzKClgIGZ1bmN0aW9uLgoKUmVjYWxsIHRoYXQgYHJlYWRyOjp3cml0ZV90c3ZgIHJlcXVpcmVzIGEgZGF0YSBmcmFtZSBzbyB3ZSBuZWVkIHRvIGNvbnZlcnQgdGhlIGBsb2djb3VudHNgIG1hdHJpeCB0byBhIGRhdGEgZnJhbWUuCldlIHdpbGwgYWN0dWFsbHkgaGF2ZSB0byBkbyB0aGlzIGluIHR3byBzdGVwczogZmlyc3QgYnkgbWFraW5nIHRoZSBzcGFyc2UgbWF0cml4IHRvIGEgc3RhbmRhcmQgUiBtYXRyaXgsIHRoZW4gY29udmVydGluZyB0aGF0IHRvIGEgZGF0YSBmcmFtZS4KCmBgYHtyIHNhdmVfdHN2fQojIFNhdmUgdGhpcyBnZW5lIG1hdHJpeCB0byBhIHRzdiBmaWxlCmxvZ2NvdW50cyhibGFkZGVyX3NjZSkgfD4KICBhcy5tYXRyaXgoKSB8PgogIGFzLmRhdGEuZnJhbWUoKSB8PgogIHJlYWRyOjp3cml0ZV90c3YoZmlsZS5wYXRoKG5vcm1fZGlyLCAic2NyYW5fbm9ybV9nZW5lX21hdHJpeC50c3YiKSkKYGBgCgpXZSBtYXkgd2FudCB0byByZXR1cm4gdG8gb3VyIG5vcm1hbGl6ZWQgYGJsYWRkZXJfc2NlYCBvYmplY3QgaW4gdGhlIGZ1dHVyZSwgc28gd2Ugd2lsbAphbHNvIHNhdmUgb3VyIGRhdGEgaW4gYW4gUkRTIGZpbGUgc28gdGhhdCB3ZSBjYW4gcmUtbG9hZCBpdCBpbnRvIG91ciBSCmVudmlyb25tZW50IGFzIGEgYFNpbmdsZUNlbGxFeHBlcmltZW50YCBvYmplY3QuCgpgYGB7ciBzYXZlX3Jkc30KIyBTYXZlIHRoZSBkYXRhIGFzIGFuIFJEUwpyZWFkcjo6d3JpdGVfcmRzKGJsYWRkZXJfc2NlLCBmaWxlLnBhdGgobm9ybV9kaXIsICJub3JtYWxpemVkX2JsYWRkZXJfc2NlLnJkcyIpKQpgYGAKCgojIyMgUHJpbnQgc2Vzc2lvbiBpbmZvCgpgYGB7ciBzZXNzaW9uaW5mb30Kc2Vzc2lvbkluZm8oKQpgYGAK