Summary:

Sequence Read Archive (SRA) is a public repository of sequencing data, including many RNA-seq datasets.
SRAdb is an R package on Bioconductor that can help you retrieve metadata associated with samples on SRA, which is what we’ll use it for here. It also can help you look up URLs for files you might want to retrieve.

In this example, we will show you how to retrieve metadata from SRAdb for a project or study id (eg. SRP123456). The authors of SRAdb provide an sqlite file, SRAmetadb.sqlite, that contains the database that we will query to get the metadata we need.

The Tidyverse includes the dbplyr package made for working with database files like sqlite using much of the same dplyr syntax that we have already been using. We will use *db*plyr (an extension of *d*plyr functions) to extract sample information related to our SRA project ID (SRP) id of interest. You will recognize a lot of the functions from our Intro to the Tidyverse notebook.

Obtaining the SRAdb sqlite file

SRAmetadb.sqlite is already downloaded for you in ~/shared-data/SRAdb. We are including this code for your reference so you know how to obtain this file outside of our RStudio Server.

Note that this file is very large (24 GB!), so please be sure to avoid downloading another copy to our RStudio Server.

# Install the SRAdb package if it is not installed.
# if (!("SRAdb" %in% installed.packages())) {
#   BiocManager::install("SRAdb")
# }

# Declare directory to hold SRAdb file
# sra_db_dr <- <DIRECTORY NAME HERE>

# Create the directory where the SRAdb file will be downloaded to
# if (!dir.exists(sra_db_dir)) {
#   dir.create(sra_db_dir, recursive = TRUE)
# }

# Downloading this file will take some time, so we will only download it if 
# it doesn't exist.
# if (!file.exists(sqlite_file)) {
#   # Use SRAdb's function to download the file
#   SRAdb::getSRAdbFile(destdir = sra_db_dir)
# }

Set Up

For this example, we will use SRP045496, an RNA-seq medulloblastoma mouse model experiment.

For more on SRA ids and more than you’ll need to know about the SRA database, see this knowledge base.

# Declare the project id you are interested in.
study_id <- "SRP045496"
# magrittr pipe
`%>%` <- dplyr::`%>%`

Set up SRA input directory and file path.

We have already downloaded this SRAdb file to the shared-data folder for you.

# Declare SRA directory 
# Here ~ refers to your home directory
sra_db_dir <- file.path("~", "shared-data", "SRAdb")

# Declare database file path
sqlite_file <- file.path(sra_db_dir, "SRAmetadb.sqlite")

Set up output directory and file path.

# Declare output directory 
output_dir <- file.path("SRA_metadata")

# Declare output file path using the project ID
output_metadata_file <- file.path(output_dir, paste0(study_id, "_metadata.tsv"))

# Create the directory if it doesn't exist.
if (!dir.exists(output_dir)) {
  dir.create(output_dir,  recursive = TRUE)
}

Connect to the sqlite file.

# Make connection to sqlite database file
sra_con <- DBI::dbConnect(RSQLite::SQLite(), sqlite_file)

Find information on our selected SRA project id

sqlite databases are made up of a set of tables, much like the data frames we have been using. We need to create variables that point to each table that we want to use with dplyr.

Using dplyr::tbl(), create an object that refers to the sra table from the sqlite connection: sra_con, to the sqlite file, sqlite_file.

sra_table <- dplyr::tbl(sra_con, "sra") 
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      

Create an object that refers to the study table from the same sqlite connection/file..

study_table <- dplyr::tbl(sra_con, "study")

Use the d*b*plyr extension of dplyr functions to collect information related to our declared study_id. In this context, these transformation steps that you’ll recognize (filter(), inner_join()) are working on the sqlite database directly (using dbplyr), which is why the as.data.frame() step is required at the end to bring the data into the R environment as a standard data frame.

# Use the sqlite connection table to apply functions to it
sra_df <- sra_table %>% 
  # Filter to the study_id we are looking for
  dplyr::filter(study_accession == study_id) %>% 
  # Inner join to the study table that has more specific info about the study
  dplyr::inner_join(study_table, by = "study_accession") %>% 
  # We need to do this so the dbplyr queries are transformed to a data frame
  as.data.frame() 

Retrieve sample-level information

Create a vector of sample ids that are related to our study_id.

# Pull out sample accessions for the corresponding study
sample_accessions <- sra_df %>% 
  dplyr::pull(sample_accession)

Connect to sample table in our sqlite connection, sra_con.

sample_table <- dplyr::tbl(sra_con, "sample") 

Use dbplyr functions to filter the sample_table to only the samples related to our study_id.

# Filter the sample_table of the sqlite file
sample_df <- sample_table %>% 
  # Collect the samples that we identified in the previous set of steps
  dplyr::filter(sample_accession %in% sample_accessions) %>%
  # Turn into a data.frame
  as.data.frame()

Clean the metadata

Here we’ll do some cleaning. These cleaning steps will be dependent the metadata itself and on what information you are interested in.

cleaned_sample_df <- sample_df %>% 
  # Here we getting rid of any columns that only consist of NAs
  dplyr::select(-which(apply(is.na(.), 2, all)))

We’re interested in the sample accession and the sample attributes only, so we will use the dplyr::select() function to pick the columns we want to retain.

cleaned_sample_df <- cleaned_sample_df %>%
  dplyr::select(sample_accession,
                sample_attribute)

Sample attributes

The metadata we’re interested in is in a column called sample_attribute, but unfortunately it is in a format that is not usable yet. Let’s take a look at that column.

cleaned_sample_df$sample_attribute
 [1] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: wild type"                                    
 [2] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: wild type"                                    
 [3] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: wild type"                                    
 [4] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
 [5] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
 [6] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
 [7] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
 [8] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
 [9] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
[10] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
[11] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by Olig1Cre"
[12] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[13] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[14] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[15] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[16] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[17] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[18] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"
[19] "source_name: cerebellum || strain: C57BL/6 || tissue: cerebellum || age: post natal day 60 || genotype: Gsa conditional knockout mediated by hGFAPCre"

Notice how this single column contains 5 columns worth of information: source_name, strain, tissue, age, and genotype. We’d prefer the 5 columns!

Luckily there’s another package in the Tidyverse that has functionality for splitting up the column, tidyr, and the function is called separate().

Each column’s worth of information is divided up by ||, so we can use that to split sample_attribute up into individual columns. | is a special character in regular expressions or regex, which means we need to place \\ in front of each | to escape the character.

First we need to create a character vector that contains the new column names.

# We use the attribute names from above. This character vector will be specific
# to the experiment you are working with.
new_columns <- c("source_name", 
                 "strain", 
                 "tissue", 
                 "age", 
                 "genotype")
cleaned_sample_df <- cleaned_sample_df %>%
  # We first tell separate which column we'd like to separate into multiple
  # column
  tidyr::separate(col = sample_attribute,
                  # Here we need to tell the function what columns to split 
                  # things *into* - we'll use the vector we created above
                  into = new_columns,
                  # What characters can be used to mark the separation between
                  # items that should become multiple columns
                  # We need to use \\ in front of each | because | is a 
                  # special character
                  sep = " \\|\\| ")

Let’s take a look at our new columns.

cleaned_sample_df %>%
  dplyr::select(new_columns)

This is certainly better, but it’s not quite what we want yet. We’ll want to remove the words before the : in each column. We can use dplyr::mutate() and yet another Tidyverse package, stringr.

Let’s first look at how we can use stringr::str_extract() to get only the relevant information we want. Here the relevant information is everything after the pattern :. We’ll use source_name as an example. What .* does below is say that the pattern we want to extract from source_name is everything before :.

stringr::str_extract(cleaned_sample_df$source_name, ".*: ")
 [1] "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: "
[10] "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: " "source_name: "
[19] "source_name: "

Using .*: returns everything we want to remove from this column. We can use this same approach with stringr::str_remove() to remove everything before and including ::

stringr::str_remove(cleaned_sample_df$source_name, ".*: ")
 [1] "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum"
[13] "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum" "cerebellum"

This is exactly what we want to keep in the source_name column!

Now we’re ready to use stringr::str_remove() with dplyr::mutate() to get some cleaned metadata! We’ll actually use dplyr::mutate_at(), which will apply the function at each column we specify.

cleaned_sample_df <- cleaned_sample_df %>%
  # We can use the same character vector as before to specify which columns
  # we want to apply stringr::str_remove() at
  dplyr::mutate_at(new_columns,
                   # This tells mutate_at to apply stringr::str_remove where
                   # the column is the first argument to stringr::str_remove -
                   # the string we are removing a pattern from
                  ~ stringr::str_remove(., ".*: "))

Let’s take a look at our mutated columns.

head(cleaned_sample_df)

Adding run accessions

Run accessions in SRA essentially correspond to libraries. refine.bio, and often other resources dealing with processed RNA-seq data, use run accessions because that is what usually corresponds to FASTQ files. Multiple runs can map to the same sample accession, but a run will only map to a single sample accession. To use this metadata with refine.bio expression values, we’ll want to include the run accession for each sample. We have that information in sra_df already.

cleaned_sample_df <- sra_df %>%
  # Select only the relevant accessions, run and sample, from sra_df to
  # join to our cleaned metadata
  dplyr::select(run_accession,
                sample_accession) %>%
  # This effectively "tacks on" the run accessions as the first column of
  # our sample metadata
  dplyr::inner_join(cleaned_sample_df, 
                    by = "sample_accession")

Write the sample_df to TSV file

cleaned_sample_df %>% 
  readr::write_tsv(output_metadata_file) 
LS0tCnRpdGxlOiAiRXhhbXBsZSBvZiBIb3cgdG8gUmV0cmlldmUgTWV0YWRhdGEgZnJvbSBTUkFkYiAiCmF1dGhvcjogIkNhbmRhY2UgU2F2b25lbiIKZGF0ZTogIjIwMjAiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgojIyBTdW1tYXJ5OgoKU2VxdWVuY2UgUmVhZCBBcmNoaXZlIChTUkEpIGlzIGEgcHVibGljIHJlcG9zaXRvcnkgb2Ygc2VxdWVuY2luZyBkYXRhLCBpbmNsdWRpbmcgbWFueSBSTkEtc2VxIGRhdGFzZXRzLiAgCltTUkFkYl0oaHR0cHM6Ly9yZHJyLmlvL2Jpb2MvU1JBZGIvKSBpcyBhbiBSIHBhY2thZ2Ugb24gQmlvY29uZHVjdG9yIHRoYXQgY2FuIGhlbHAgeW91IHJldHJpZXZlIG1ldGFkYXRhIGFzc29jaWF0ZWQgd2l0aCBzYW1wbGVzIG9uIFNSQSwgd2hpY2ggaXMgd2hhdCB3ZSdsbCB1c2UgaXQgZm9yIGhlcmUuCkl0IGFsc28gY2FuIGhlbHAgeW91IGxvb2sgdXAgVVJMcyBmb3IgZmlsZXMgeW91IG1pZ2h0IHdhbnQgdG8gcmV0cmlldmUuCgpJbiB0aGlzIGV4YW1wbGUsIHdlIHdpbGwgc2hvdyB5b3UgaG93IHRvIHJldHJpZXZlIG1ldGFkYXRhIGZyb20gU1JBZGIgZm9yIGEgcHJvamVjdCBvciBzdHVkeSBpZCAoZWcuICoqU1JQKioxMjM0NTYpLgpUaGUgYXV0aG9ycyBvZiBTUkFkYiBwcm92aWRlIGFuIFtzcWxpdGUgZmlsZV0oaHR0cHM6Ly93d3cuc3FsaXRlLm9yZy9hZmZfc2hvcnQuaHRtbCksCiBgU1JBbWV0YWRiLnNxbGl0ZWAsIHRoYXQgY29udGFpbnMgdGhlIGRhdGFiYXNlIHRoYXQgd2Ugd2lsbCBxdWVyeSB0byBnZXQgdGhlIG1ldGFkYXRhIHdlIG5lZWQuCgpUaGUgVGlkeXZlcnNlIGluY2x1ZGVzIHRoZSBbYGRicGx5cmAgcGFja2FnZV0oaHR0cHM6Ly9kYnBseXIudGlkeXZlcnNlLm9yZy8pIG1hZGUgZm9yIHdvcmtpbmcgd2l0aCBfZGF0YWJhc2VfIGZpbGVzIGxpa2UgYHNxbGl0ZWAgdXNpbmcgbXVjaCBvZiB0aGUgc2FtZSBgZHBseXJgIHN5bnRheCB0aGF0IHdlIGhhdmUgYWxyZWFkeSBiZWVuIHVzaW5nLgpXZSB3aWxsIHVzZSBgKmRiKnBseXJgIChhbiBleHRlbnNpb24gb2YgYCpkKnBseXJgIGZ1bmN0aW9ucykgdG8gZXh0cmFjdCBzYW1wbGUgaW5mb3JtYXRpb24gcmVsYXRlZCB0byBvdXIgU1JBIHByb2plY3QgSUQgKGBTUlBgKSBpZCBvZiBpbnRlcmVzdC4gCllvdSB3aWxsIHJlY29nbml6ZSBhIGxvdCBvZiB0aGUgZnVuY3Rpb25zIGZyb20gb3VyIFtJbnRybyB0byB0aGUgVGlkeXZlcnNlIG5vdGVib29rXShodHRwczovL2dpdGh1Yi5jb20vQWxleHNMZW1vbmFkZS90cmFpbmluZy1tb2R1bGVzL2Jsb2IvbWFzdGVyL2ludHJvLXRvLVItdGlkeXZlcnNlLzAzLWludHJvX3RvX3RpZHl2ZXJzZS5SbWQpLiAKCiMjIyBPYnRhaW5pbmcgdGhlIFNSQWRiIHNxbGl0ZSBmaWxlCgpgU1JBbWV0YWRiLnNxbGl0ZWAgaXMgYWxyZWFkeSBkb3dubG9hZGVkIGZvciB5b3UgaW4gYH4vc2hhcmVkLWRhdGEvU1JBZGJgLgpXZSBhcmUgaW5jbHVkaW5nIHRoaXMgY29kZSBmb3IgeW91ciByZWZlcmVuY2Ugc28geW91IGtub3cgaG93IHRvIG9idGFpbiB0aGlzIGZpbGUgb3V0c2lkZSBvZiBvdXIgUlN0dWRpbyBTZXJ2ZXIuIAoKTm90ZSB0aGF0IHRoaXMgZmlsZSBpcyB2ZXJ5IGxhcmdlICgyNCBHQiEpLCBzbyBwbGVhc2UgYmUgc3VyZSB0byBhdm9pZCBkb3dubG9hZGluZyBhbm90aGVyIGNvcHkgdG8gb3VyIFJTdHVkaW8gU2VydmVyLgoKYGBgcgojIEluc3RhbGwgdGhlIFNSQWRiIHBhY2thZ2UgaWYgaXQgaXMgbm90IGluc3RhbGxlZC4KIyBpZiAoISgiU1JBZGIiICVpbiUgaW5zdGFsbGVkLnBhY2thZ2VzKCkpKSB7CiMgICBCaW9jTWFuYWdlcjo6aW5zdGFsbCgiU1JBZGIiKQojIH0KCiMgRGVjbGFyZSBkaXJlY3RvcnkgdG8gaG9sZCBTUkFkYiBmaWxlCiMgc3JhX2RiX2RyIDwtIDxESVJFQ1RPUlkgTkFNRSBIRVJFPgoKIyBDcmVhdGUgdGhlIGRpcmVjdG9yeSB3aGVyZSB0aGUgU1JBZGIgZmlsZSB3aWxsIGJlIGRvd25sb2FkZWQgdG8KIyBpZiAoIWRpci5leGlzdHMoc3JhX2RiX2RpcikpIHsKIyAgIGRpci5jcmVhdGUoc3JhX2RiX2RpciwgcmVjdXJzaXZlID0gVFJVRSkKIyB9CgojIERvd25sb2FkaW5nIHRoaXMgZmlsZSB3aWxsIHRha2Ugc29tZSB0aW1lLCBzbyB3ZSB3aWxsIG9ubHkgZG93bmxvYWQgaXQgaWYgCiMgaXQgZG9lc24ndCBleGlzdC4KIyBpZiAoIWZpbGUuZXhpc3RzKHNxbGl0ZV9maWxlKSkgewojICAgIyBVc2UgU1JBZGIncyBmdW5jdGlvbiB0byBkb3dubG9hZCB0aGUgZmlsZQojICAgU1JBZGI6OmdldFNSQWRiRmlsZShkZXN0ZGlyID0gc3JhX2RiX2RpcikKIyB9CmBgYAoKIyMjIFNldCBVcAoKRm9yIHRoaXMgZXhhbXBsZSwgd2Ugd2lsbCB1c2UgIFtTUlAwNDU0OTZdKGh0dHBzOi8vd3d3LnJlZmluZS5iaW8vZXhwZXJpbWVudHMvU1JQMDQ1NDk2L3RoZS1nLXByb3RlaW4tYWxwaGEtc3VidW5pdC1nc2EtaXMtYS10dW1vci1zdXBwcmVzc29yLWluLXNvbmljLWhlZGdlaG9nLWRyaXZlbi1tZWR1bGxvYmxhc3RvbWEtcm5hLXNlcSksIGFuIFJOQS1zZXEgbWVkdWxsb2JsYXN0b21hIG1vdXNlIG1vZGVsIGV4cGVyaW1lbnQuCgpGb3IgbW9yZSBvbiBTUkEgaWRzIGFuZCBtb3JlIHRoYW4geW91J2xsIG5lZWQgdG8ga25vdyBhYm91dCB0aGUgU1JBIGRhdGFiYXNlLCBzZWUgdGhpcyBba25vd2xlZGdlIGJhc2VdKGh0dHBzOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvYm9va3MvTkJLNTY5MTMvKS4KCmBgYHtyfQojIERlY2xhcmUgdGhlIHByb2plY3QgaWQgeW91IGFyZSBpbnRlcmVzdGVkIGluLgpzdHVkeV9pZCA8LSAiU1JQMDQ1NDk2IgpgYGAKCmBgYHtyfQojIG1hZ3JpdHRyIHBpcGUKYCU+JWAgPC0gZHBseXI6OmAlPiVgCmBgYAoKU2V0IHVwIFNSQSBpbnB1dCBkaXJlY3RvcnkgYW5kIGZpbGUgcGF0aC4gCgpXZSBoYXZlIGFscmVhZHkgZG93bmxvYWRlZCB0aGlzIFNSQWRiIGZpbGUgdG8gdGhlIGBzaGFyZWQtZGF0YWAgZm9sZGVyIGZvciB5b3UuCgpgYGB7cn0KIyBEZWNsYXJlIFNSQSBkaXJlY3RvcnkgCiMgSGVyZSB+IHJlZmVycyB0byB5b3VyIGhvbWUgZGlyZWN0b3J5CnNyYV9kYl9kaXIgPC0gZmlsZS5wYXRoKCJ+IiwgInNoYXJlZC1kYXRhIiwgIlNSQWRiIikKCiMgRGVjbGFyZSBkYXRhYmFzZSBmaWxlIHBhdGgKc3FsaXRlX2ZpbGUgPC0gZmlsZS5wYXRoKHNyYV9kYl9kaXIsICJTUkFtZXRhZGIuc3FsaXRlIikKYGBgCgpTZXQgdXAgb3V0cHV0IGRpcmVjdG9yeSBhbmQgZmlsZSBwYXRoLiAKCmBgYHtyfQojIERlY2xhcmUgb3V0cHV0IGRpcmVjdG9yeSAKb3V0cHV0X2RpciA8LSBmaWxlLnBhdGgoIlNSQV9tZXRhZGF0YSIpCgojIERlY2xhcmUgb3V0cHV0IGZpbGUgcGF0aCB1c2luZyB0aGUgcHJvamVjdCBJRApvdXRwdXRfbWV0YWRhdGFfZmlsZSA8LSBmaWxlLnBhdGgob3V0cHV0X2RpciwgcGFzdGUwKHN0dWR5X2lkLCAiX21ldGFkYXRhLnRzdiIpKQoKIyBDcmVhdGUgdGhlIGRpcmVjdG9yeSBpZiBpdCBkb2Vzbid0IGV4aXN0LgppZiAoIWRpci5leGlzdHMob3V0cHV0X2RpcikpIHsKICBkaXIuY3JlYXRlKG91dHB1dF9kaXIsICByZWN1cnNpdmUgPSBUUlVFKQp9CmBgYAoKQ29ubmVjdCB0byB0aGUgc3FsaXRlIGZpbGUuIAoKYGBge3J9CiMgTWFrZSBjb25uZWN0aW9uIHRvIHNxbGl0ZSBkYXRhYmFzZSBmaWxlCnNyYV9jb24gPC0gREJJOjpkYkNvbm5lY3QoUlNRTGl0ZTo6U1FMaXRlKCksIHNxbGl0ZV9maWxlKQpgYGAKCiMjIEZpbmQgaW5mb3JtYXRpb24gb24gb3VyIHNlbGVjdGVkIFNSQSBwcm9qZWN0IGlkIAoKYHNxbGl0ZWAgZGF0YWJhc2VzIGFyZSBtYWRlIHVwIG9mIGEgc2V0IG9mIHRhYmxlcywgbXVjaCBsaWtlIHRoZSBkYXRhIGZyYW1lcyB3ZSBoYXZlIGJlZW4gdXNpbmcuIApXZSBuZWVkIHRvIGNyZWF0ZSB2YXJpYWJsZXMgdGhhdCBwb2ludCB0byBlYWNoIHRhYmxlIHRoYXQgd2Ugd2FudCB0byB1c2Ugd2l0aCBgZHBseXJgLgoKVXNpbmcgYGRwbHlyOjp0YmwoKWAsIGNyZWF0ZSBhbiBvYmplY3QgdGhhdCByZWZlcnMgdG8gdGhlIGBzcmFgIHRhYmxlIGZyb20gdGhlIHNxbGl0ZSBjb25uZWN0aW9uOiBgc3JhX2NvbmAsIHRvIHRoZSBzcWxpdGUgZmlsZSwgYHNxbGl0ZV9maWxlYC4KCmBgYHtyfQpzcmFfdGFibGUgPC0gZHBseXI6OnRibChzcmFfY29uLCAic3JhIikgCmBgYAoKQ3JlYXRlIGFuIG9iamVjdCB0aGF0IHJlZmVycyB0byB0aGUgYHN0dWR5YCB0YWJsZSBmcm9tIHRoZSBzYW1lIHNxbGl0ZSBjb25uZWN0aW9uL2ZpbGUuLgoKYGBge3J9CnN0dWR5X3RhYmxlIDwtIGRwbHlyOjp0Ymwoc3JhX2NvbiwgInN0dWR5IikKYGBgCgpVc2UgdGhlIGBkKmIqcGx5cmAgZXh0ZW5zaW9uIG9mIGBkcGx5cmAgZnVuY3Rpb25zIHRvIGNvbGxlY3QgaW5mb3JtYXRpb24gcmVsYXRlZCB0byBvdXIgZGVjbGFyZWQgYHN0dWR5X2lkYC4KSW4gdGhpcyBjb250ZXh0LCB0aGVzZSB0cmFuc2Zvcm1hdGlvbiBzdGVwcyB0aGF0IHlvdSdsbCByZWNvZ25pemUgKGBmaWx0ZXIoKWAsIGBpbm5lcl9qb2luKClgKSBhcmUgd29ya2luZyBvbiB0aGUgc3FsaXRlIGRhdGFiYXNlIGRpcmVjdGx5ICh1c2luZyBgZGJwbHlyYCksIHdoaWNoIGlzIHdoeSB0aGUgYGFzLmRhdGEuZnJhbWUoKWAgc3RlcCBpcyByZXF1aXJlZCBhdCB0aGUgZW5kIHRvIGJyaW5nIHRoZSBkYXRhIGludG8gdGhlIFIgZW52aXJvbm1lbnQgYXMgYSBzdGFuZGFyZCBkYXRhIGZyYW1lLiAKCmBgYHtyfQojIFVzZSB0aGUgc3FsaXRlIGNvbm5lY3Rpb24gdGFibGUgdG8gYXBwbHkgZnVuY3Rpb25zIHRvIGl0CnNyYV9kZiA8LSBzcmFfdGFibGUgJT4lIAogICMgRmlsdGVyIHRvIHRoZSBzdHVkeV9pZCB3ZSBhcmUgbG9va2luZyBmb3IKICBkcGx5cjo6ZmlsdGVyKHN0dWR5X2FjY2Vzc2lvbiA9PSBzdHVkeV9pZCkgJT4lIAogICMgSW5uZXIgam9pbiB0byB0aGUgc3R1ZHkgdGFibGUgdGhhdCBoYXMgbW9yZSBzcGVjaWZpYyBpbmZvIGFib3V0IHRoZSBzdHVkeQogIGRwbHlyOjppbm5lcl9qb2luKHN0dWR5X3RhYmxlLCBieSA9ICJzdHVkeV9hY2Nlc3Npb24iKSAlPiUgCiAgIyBXZSBuZWVkIHRvIGRvIHRoaXMgc28gdGhlIGRicGx5ciBxdWVyaWVzIGFyZSB0cmFuc2Zvcm1lZCB0byBhIGRhdGEgZnJhbWUKICBhcy5kYXRhLmZyYW1lKCkgCmBgYAoKIyMgUmV0cmlldmUgc2FtcGxlLWxldmVsIGluZm9ybWF0aW9uCgpDcmVhdGUgYSB2ZWN0b3Igb2Ygc2FtcGxlIGlkcyB0aGF0IGFyZSByZWxhdGVkIHRvIG91ciBgc3R1ZHlfaWRgLgoKYGBge3J9CiMgUHVsbCBvdXQgc2FtcGxlIGFjY2Vzc2lvbnMgZm9yIHRoZSBjb3JyZXNwb25kaW5nIHN0dWR5CnNhbXBsZV9hY2Nlc3Npb25zIDwtIHNyYV9kZiAlPiUgCiAgZHBseXI6OnB1bGwoc2FtcGxlX2FjY2Vzc2lvbikKYGBgCgpDb25uZWN0IHRvIGBzYW1wbGVgIHRhYmxlIGluIG91ciBzcWxpdGUgY29ubmVjdGlvbiwgYHNyYV9jb25gLgoKYGBge3J9CnNhbXBsZV90YWJsZSA8LSBkcGx5cjo6dGJsKHNyYV9jb24sICJzYW1wbGUiKSAKYGBgCgpVc2UgYGRicGx5cmAgZnVuY3Rpb25zIHRvIGZpbHRlciB0aGUgYHNhbXBsZV90YWJsZWAgdG8gb25seSB0aGUgc2FtcGxlcyByZWxhdGVkIHRvIG91ciBgc3R1ZHlfaWRgLgoKYGBge3J9CiMgRmlsdGVyIHRoZSBzYW1wbGVfdGFibGUgb2YgdGhlIHNxbGl0ZSBmaWxlCnNhbXBsZV9kZiA8LSBzYW1wbGVfdGFibGUgJT4lIAogICMgQ29sbGVjdCB0aGUgc2FtcGxlcyB0aGF0IHdlIGlkZW50aWZpZWQgaW4gdGhlIHByZXZpb3VzIHNldCBvZiBzdGVwcwogIGRwbHlyOjpmaWx0ZXIoc2FtcGxlX2FjY2Vzc2lvbiAlaW4lIHNhbXBsZV9hY2Nlc3Npb25zKSAlPiUKICAjIFR1cm4gaW50byBhIGRhdGEuZnJhbWUKICBhcy5kYXRhLmZyYW1lKCkKYGBgCgojIyBDbGVhbiB0aGUgbWV0YWRhdGEKCkhlcmUgd2UnbGwgZG8gc29tZSBjbGVhbmluZy4gClRoZXNlIGNsZWFuaW5nIHN0ZXBzIHdpbGwgYmUgZGVwZW5kZW50IHRoZSBtZXRhZGF0YSBpdHNlbGYgYW5kIG9uIHdoYXQgaW5mb3JtYXRpb24geW91IGFyZSBpbnRlcmVzdGVkIGluLgoKYGBge3J9CmNsZWFuZWRfc2FtcGxlX2RmIDwtIHNhbXBsZV9kZiAlPiUgCiAgIyBIZXJlIHdlIGdldHRpbmcgcmlkIG9mIGFueSBjb2x1bW5zIHRoYXQgb25seSBjb25zaXN0IG9mIE5BcwogIGRwbHlyOjpzZWxlY3QoLXdoaWNoKGFwcGx5KGlzLm5hKC4pLCAyLCBhbGwpKSkKYGBgCgpXZSdyZSBpbnRlcmVzdGVkIGluIHRoZSBzYW1wbGUgYWNjZXNzaW9uIGFuZCB0aGUgc2FtcGxlIGF0dHJpYnV0ZXMgb25seSwgc28gd2Ugd2lsbCB1c2UgdGhlIGBkcGx5cjo6c2VsZWN0KClgIGZ1bmN0aW9uIHRvIHBpY2sgdGhlIGNvbHVtbnMgd2Ugd2FudCB0byByZXRhaW4uCgpgYGB7cn0KY2xlYW5lZF9zYW1wbGVfZGYgPC0gY2xlYW5lZF9zYW1wbGVfZGYgJT4lCiAgZHBseXI6OnNlbGVjdChzYW1wbGVfYWNjZXNzaW9uLAogICAgICAgICAgICAgICAgc2FtcGxlX2F0dHJpYnV0ZSkKYGBgCgojIyMgU2FtcGxlIGF0dHJpYnV0ZXMKClRoZSBtZXRhZGF0YSB3ZSdyZSBpbnRlcmVzdGVkIGluIGlzIGluIGEgY29sdW1uIGNhbGxlZCBgc2FtcGxlX2F0dHJpYnV0ZWAsIGJ1dCB1bmZvcnR1bmF0ZWx5IGl0IGlzIGluIGEgZm9ybWF0IHRoYXQgaXMgbm90IHVzYWJsZSB5ZXQuCkxldCdzIHRha2UgYSBsb29rIGF0IHRoYXQgY29sdW1uLgoKYGBge3J9CmNsZWFuZWRfc2FtcGxlX2RmJHNhbXBsZV9hdHRyaWJ1dGUKYGBgCgpOb3RpY2UgaG93IHRoaXMgc2luZ2xlIGNvbHVtbiBjb250YWlucyA1IGNvbHVtbnMgd29ydGggb2YgaW5mb3JtYXRpb246IGBzb3VyY2VfbmFtZWAsIGBzdHJhaW5gLCBgdGlzc3VlYCwgYGFnZWAsIGFuZCBgZ2Vub3R5cGVgLgpXZSdkIHByZWZlciB0aGUgNSBjb2x1bW5zIQoKTHVja2lseSB0aGVyZSdzIGFub3RoZXIgcGFja2FnZSBpbiB0aGUgVGlkeXZlcnNlIHRoYXQgaGFzIGZ1bmN0aW9uYWxpdHkgZm9yIHNwbGl0dGluZyB1cCB0aGUgY29sdW1uLCBbYHRpZHlyYF0oaHR0cHM6Ly90aWR5ci50aWR5dmVyc2Uub3JnLyksIGFuZCB0aGUgZnVuY3Rpb24gaXMgY2FsbGVkIGBzZXBhcmF0ZSgpYC4KCkVhY2ggY29sdW1uJ3Mgd29ydGggb2YgaW5mb3JtYXRpb24gaXMgZGl2aWRlZCB1cCBieSBgIHx8IGAsIHNvIHdlIGNhbiB1c2UgdGhhdCB0byBzcGxpdCBgc2FtcGxlX2F0dHJpYnV0ZWAgdXAgaW50byBpbmRpdmlkdWFsIGNvbHVtbnMuCmB8YCBpcyBhIHNwZWNpYWwgY2hhcmFjdGVyIGluIHJlZ3VsYXIgZXhwcmVzc2lvbnMgb3IgcmVnZXgsIHdoaWNoIG1lYW5zIHdlIG5lZWQgdG8gcGxhY2UgYFxcYCBpbiBmcm9udCBvZiBlYWNoIGB8YCB0byBlc2NhcGUgdGhlIGNoYXJhY3Rlci4KCkZpcnN0IHdlIG5lZWQgdG8gY3JlYXRlIGEgY2hhcmFjdGVyIHZlY3RvciB0aGF0IGNvbnRhaW5zIHRoZSBuZXcgY29sdW1uIG5hbWVzLgoKYGBge3J9CiMgV2UgdXNlIHRoZSBhdHRyaWJ1dGUgbmFtZXMgZnJvbSBhYm92ZS4gVGhpcyBjaGFyYWN0ZXIgdmVjdG9yIHdpbGwgYmUgc3BlY2lmaWMKIyB0byB0aGUgZXhwZXJpbWVudCB5b3UgYXJlIHdvcmtpbmcgd2l0aC4KbmV3X2NvbHVtbnMgPC0gYygic291cmNlX25hbWUiLCAKICAgICAgICAgICAgICAgICAic3RyYWluIiwgCiAgICAgICAgICAgICAgICAgInRpc3N1ZSIsIAogICAgICAgICAgICAgICAgICJhZ2UiLCAKICAgICAgICAgICAgICAgICAiZ2Vub3R5cGUiKQpgYGAKCgpgYGB7cn0KY2xlYW5lZF9zYW1wbGVfZGYgPC0gY2xlYW5lZF9zYW1wbGVfZGYgJT4lCiAgIyBXZSBmaXJzdCB0ZWxsIHNlcGFyYXRlIHdoaWNoIGNvbHVtbiB3ZSdkIGxpa2UgdG8gc2VwYXJhdGUgaW50byBtdWx0aXBsZQogICMgY29sdW1uCiAgdGlkeXI6OnNlcGFyYXRlKGNvbCA9IHNhbXBsZV9hdHRyaWJ1dGUsCiAgICAgICAgICAgICAgICAgICMgSGVyZSB3ZSBuZWVkIHRvIHRlbGwgdGhlIGZ1bmN0aW9uIHdoYXQgY29sdW1ucyB0byBzcGxpdCAKICAgICAgICAgICAgICAgICAgIyB0aGluZ3MgKmludG8qIC0gd2UnbGwgdXNlIHRoZSB2ZWN0b3Igd2UgY3JlYXRlZCBhYm92ZQogICAgICAgICAgICAgICAgICBpbnRvID0gbmV3X2NvbHVtbnMsCiAgICAgICAgICAgICAgICAgICMgV2hhdCBjaGFyYWN0ZXJzIGNhbiBiZSB1c2VkIHRvIG1hcmsgdGhlIHNlcGFyYXRpb24gYmV0d2VlbgogICAgICAgICAgICAgICAgICAjIGl0ZW1zIHRoYXQgc2hvdWxkIGJlY29tZSBtdWx0aXBsZSBjb2x1bW5zCiAgICAgICAgICAgICAgICAgICMgV2UgbmVlZCB0byB1c2UgXFwgaW4gZnJvbnQgb2YgZWFjaCB8IGJlY2F1c2UgfCBpcyBhIAogICAgICAgICAgICAgICAgICAjIHNwZWNpYWwgY2hhcmFjdGVyCiAgICAgICAgICAgICAgICAgIHNlcCA9ICIgXFx8XFx8ICIpCmBgYAoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgb3VyICoqbmV3KiogY29sdW1ucy4KCmBgYHtyfQpjbGVhbmVkX3NhbXBsZV9kZiAlPiUKICBkcGx5cjo6c2VsZWN0KG5ld19jb2x1bW5zKQpgYGAKClRoaXMgaXMgY2VydGFpbmx5IF9iZXR0ZXJfLCBidXQgaXQncyBub3QgcXVpdGUgd2hhdCB3ZSB3YW50IHlldC4KV2UnbGwgd2FudCB0byByZW1vdmUgdGhlIHdvcmRzIGJlZm9yZSB0aGUgYDpgIGluIGVhY2ggY29sdW1uLgpXZSBjYW4gdXNlIGBkcGx5cjo6bXV0YXRlKClgIGFuZCB5ZXQgYW5vdGhlciBUaWR5dmVyc2UgcGFja2FnZSwgW2BzdHJpbmdyYF0oaHR0cHM6Ly9zdHJpbmdyLnRpZHl2ZXJzZS5vcmcvKS4KCkxldCdzIGZpcnN0IGxvb2sgYXQgaG93IHdlIGNhbiB1c2UgYHN0cmluZ3I6OnN0cl9leHRyYWN0KClgIHRvIGdldCBvbmx5IHRoZSByZWxldmFudCBpbmZvcm1hdGlvbiB3ZSB3YW50LiAKSGVyZSB0aGUgcmVsZXZhbnQgaW5mb3JtYXRpb24gaXMgZXZlcnl0aGluZyBhZnRlciB0aGUgcGF0dGVybiBgOiBgLgpXZSdsbCB1c2UgYHNvdXJjZV9uYW1lYCBhcyBhbiBleGFtcGxlLgpXaGF0IGAuKmAgZG9lcyBiZWxvdyBpcyBzYXkgdGhhdCB0aGUgcGF0dGVybiB3ZSB3YW50IHRvIF9leHRyYWN0XyBmcm9tIGBzb3VyY2VfbmFtZWAgaXMgZXZlcnl0aGluZyBiZWZvcmUgYDogYC4KCmBgYHtyfQpzdHJpbmdyOjpzdHJfZXh0cmFjdChjbGVhbmVkX3NhbXBsZV9kZiRzb3VyY2VfbmFtZSwgIi4qOiAiKQpgYGAKClVzaW5nIGAuKjpgIHJldHVybnMgZXZlcnl0aGluZyB3ZSB3YW50IHRvIHJlbW92ZSBmcm9tIHRoaXMgY29sdW1uLgpXZSBjYW4gdXNlIHRoaXMgc2FtZSBhcHByb2FjaCB3aXRoIGBzdHJpbmdyOjpzdHJfcmVtb3ZlKClgIHRvIHJlbW92ZSBldmVyeXRoaW5nIGJlZm9yZSBhbmQgaW5jbHVkaW5nIGA6IGA6CgpgYGB7cn0Kc3RyaW5ncjo6c3RyX3JlbW92ZShjbGVhbmVkX3NhbXBsZV9kZiRzb3VyY2VfbmFtZSwgIi4qOiAiKQpgYGAKClRoaXMgaXMgZXhhY3RseSB3aGF0IHdlIHdhbnQgdG8ga2VlcCBpbiB0aGUgYHNvdXJjZV9uYW1lYCBjb2x1bW4hIAoKTm93IHdlJ3JlIHJlYWR5IHRvIHVzZSBgc3RyaW5ncjo6c3RyX3JlbW92ZSgpYCB3aXRoIGBkcGx5cjo6bXV0YXRlKClgIHRvIGdldCBzb21lIGNsZWFuZWQgbWV0YWRhdGEhCldlJ2xsIGFjdHVhbGx5IHVzZSBgZHBseXI6Om11dGF0ZV9hdCgpYCwgd2hpY2ggd2lsbCBhcHBseSB0aGUgZnVuY3Rpb24gKmF0KiBlYWNoIGNvbHVtbiB3ZSBzcGVjaWZ5LgoKYGBge3J9CmNsZWFuZWRfc2FtcGxlX2RmIDwtIGNsZWFuZWRfc2FtcGxlX2RmICU+JQogICMgV2UgY2FuIHVzZSB0aGUgc2FtZSBjaGFyYWN0ZXIgdmVjdG9yIGFzIGJlZm9yZSB0byBzcGVjaWZ5IHdoaWNoIGNvbHVtbnMKICAjIHdlIHdhbnQgdG8gYXBwbHkgc3RyaW5ncjo6c3RyX3JlbW92ZSgpIGF0CiAgZHBseXI6Om11dGF0ZV9hdChuZXdfY29sdW1ucywKICAgICAgICAgICAgICAgICAgICMgVGhpcyB0ZWxscyBtdXRhdGVfYXQgdG8gYXBwbHkgc3RyaW5ncjo6c3RyX3JlbW92ZSB3aGVyZQogICAgICAgICAgICAgICAgICAgIyB0aGUgY29sdW1uIGlzIHRoZSBmaXJzdCBhcmd1bWVudCB0byBzdHJpbmdyOjpzdHJfcmVtb3ZlIC0KICAgICAgICAgICAgICAgICAgICMgdGhlIHN0cmluZyB3ZSBhcmUgcmVtb3ZpbmcgYSBwYXR0ZXJuIGZyb20KICAgICAgICAgICAgICAgICAgfiBzdHJpbmdyOjpzdHJfcmVtb3ZlKC4sICIuKjogIikpCmBgYAoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgb3VyIG11dGF0ZWQgY29sdW1ucy4KCmBgYHtyfQpoZWFkKGNsZWFuZWRfc2FtcGxlX2RmKQpgYGAKCiMjIyBBZGRpbmcgcnVuIGFjY2Vzc2lvbnMKClJ1biBhY2Nlc3Npb25zIGluIFNSQSBlc3NlbnRpYWxseSBjb3JyZXNwb25kIHRvIGxpYnJhcmllcy4KcmVmaW5lLmJpbywgYW5kIG9mdGVuIG90aGVyIHJlc291cmNlcyBkZWFsaW5nIHdpdGggcHJvY2Vzc2VkIFJOQS1zZXEgZGF0YSwgdXNlIHJ1biBhY2Nlc3Npb25zIGJlY2F1c2UgdGhhdCBpcyB3aGF0IHVzdWFsbHkgY29ycmVzcG9uZHMgdG8gRkFTVFEgZmlsZXMuCk11bHRpcGxlIHJ1bnMgY2FuIG1hcCB0byB0aGUgc2FtZSBzYW1wbGUgYWNjZXNzaW9uLCBidXQgYSBydW4gd2lsbCBvbmx5IG1hcCB0byBhIHNpbmdsZSBzYW1wbGUgYWNjZXNzaW9uLgpUbyB1c2UgdGhpcyBtZXRhZGF0YSB3aXRoIHJlZmluZS5iaW8gZXhwcmVzc2lvbiB2YWx1ZXMsIHdlJ2xsIHdhbnQgdG8gaW5jbHVkZSB0aGUgcnVuIGFjY2Vzc2lvbiBmb3IgZWFjaCBzYW1wbGUuCldlIGhhdmUgdGhhdCBpbmZvcm1hdGlvbiBpbiBgc3JhX2RmYCBhbHJlYWR5LgoKYGBge3J9CmNsZWFuZWRfc2FtcGxlX2RmIDwtIHNyYV9kZiAlPiUKICAjIFNlbGVjdCBvbmx5IHRoZSByZWxldmFudCBhY2Nlc3Npb25zLCBydW4gYW5kIHNhbXBsZSwgZnJvbSBzcmFfZGYgdG8KICAjIGpvaW4gdG8gb3VyIGNsZWFuZWQgbWV0YWRhdGEKICBkcGx5cjo6c2VsZWN0KHJ1bl9hY2Nlc3Npb24sCiAgICAgICAgICAgICAgICBzYW1wbGVfYWNjZXNzaW9uKSAlPiUKICAjIFRoaXMgZWZmZWN0aXZlbHkgInRhY2tzIG9uIiB0aGUgcnVuIGFjY2Vzc2lvbnMgYXMgdGhlIGZpcnN0IGNvbHVtbiBvZgogICMgb3VyIHNhbXBsZSBtZXRhZGF0YQogIGRwbHlyOjppbm5lcl9qb2luKGNsZWFuZWRfc2FtcGxlX2RmLCAKICAgICAgICAgICAgICAgICAgICBieSA9ICJzYW1wbGVfYWNjZXNzaW9uIikKYGBgCgojIyBXcml0ZSB0aGUgc2FtcGxlX2RmIHRvIFRTViBmaWxlCgpgYGB7cn0KY2xlYW5lZF9zYW1wbGVfZGYgJT4lIAogIHJlYWRyOjp3cml0ZV90c3Yob3V0cHV0X21ldGFkYXRhX2ZpbGUpIApgYGAKCiMjIyBQcmludCBvdXQgc2Vzc2lvbiBpbmZvCgpgYGB7cn0Kc2Vzc2lvbkluZm8oKQpgYGAK