OpenPBTA-analysis

Analysis Modules

This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.

Modules at a glance

The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules. In addition, this table reflects which analyses are included in the OpenPBTA manuscript. This is in service of documenting interdependent analyses. In the field Output Files Consumed by Other Analyses, if the given data file is marked (included in data download), that means the analysis module created the data file, but the relevant “other analyses” will read that file in from the data release directly, not from that module’s internal results. Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv) even when it is not explicitly included in the table below.

Module Input Files Brief Description Output Files Consumed by Other Analyses Analysis included in manuscript? Produces files for data release?
chromosomal-instability pbta-histologies.tsv
pbta-sv-manta.tsv.gz
pbta-cnv-cnvkit.seg.gz
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
Yes No
chromothripsis pbta-sv-manta.tsv.gz
pbta-cnv-consensus.seg.gz
independent-specimens.wgs.primary-plus.tsv
figures/palettes/histology_label_color_table.tsv
analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
This module runs ShatterSeek, identifies chromothripsis regions, and visualizes the results. N/A Yes No
cnv-chrom-plot pbta-cnv-consensus-gistic.zip
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg
Plots genome wide visualizations relating to copy number results N/A Yes No
cnv-comparison Earlier version of SEG files Deprecated; compared earlier version of the CNV methods. N/A No No
collapse-rnaseq pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
gencode.v27.primary_assembly.annotation.gtf.gz
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub)
results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub)
Yes Yes
comparative-RNASeq-analysis pbta-gene-expression-rsem-tpm.polya.rds
pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-manifest.tsv
pbta-mend-qc-results.tar.gz
Produces expression outlier profiles per #229 N/A No No
compare-gistic analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 N/A No No
copy_number_consensus_call pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-sv-manta.tsv.gz
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made results/cnv_consensus.tsv
results/pbta-cnv-consensus.seg.gz (included in data download)
ref/cnv_excluded_regions.bed
ref/cnv_callable.bed
Yes Yes
count-contributions N/A - uses Git logs Counts Git contributions to the repository N/A No No
create-subset-files All files This module contains the code to create the subset files used in continuous integration All subset files for continuous integration Not directly No
focal-cn-file-preparation pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz
Maps from copy number variant caller segments to “most focal unit” results/cnvkit_annotated_cn_autosomes.tsv.gz
results/cnvkit_annotated_cn_x_and_y.tsv.gz
results/controlfreec_annotated_cn_autosomes.tsv.gz
results/controlfreec_annotated_cn_x_and_y.tsv.gz
results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download)
results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download)
results/consensus_seg_with_status.tsv (included in data download)
Yes Yes
fusion_filtering pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Standardizes, filters, and prioritizes fusion calls results/pbta-fusion-putative-oncogenic.tsv (included in data download)
results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download)
results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download)
Yes Yes
fusion-summary pbta-histologies.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Generate summary tables from fusion files (#398; #623) results/fusion_summary_embryonal_foi.tsv (included in data download)
results/fusion_summary_ependymoma_foi.tsv (included in data download)
results/fusion_summary_ewings_foi.tsv (included in data download)
Yes Yes
gene-set-enrichment-analysis analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Updated gene set enrichment analysis with appropriate RNA-seq expression data results/gsva_scores_stranded.tsv
results/gsva_scores_polya.tsv
for stranded, polya expression data respectively
Yes No
hotspots-detection pbta-snv-strelka2.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
pbta-snv-lancet.vep.maf.gz
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. pbta-snv-hotspots-mutation.maf.tsv.gz (included in data download) Yes Yes
immune-deconv pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Immune/Stroma characterization across PBTA (part of #15) results/quantiseq_deconv-output.rds Yes No
independent-samples pbta-histologies.tsv Generates independent specimen lists for WGS/WXS samples results/independent-specimens.wgs.primary.tsv (included in data download)
results/independent-specimens.wgs.primary-plus.tsv (included in data download)
results/independent-specimens.wgswxs.primary.tsv (included in data download)
results/independent-specimens.wgswxs.primary-plus.tsv (included in data download)
Yes Yes
interaction-plots independent-specimens.wgs.primary-plus.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) N/A Yes No
molecular-subtyping-ATRT analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-snv-consensus-mutation-tmb-all.tsv
pbta-cnv-consensus-gistic.zip
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work N/A No No
molecular-subtyping-CRANIO pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
Molecular subtyping of craniopharyngiomas samples #810 results/CRANIO_molecular_subtype.tsv Yes No
molecular-subtyping-EPN pbta-histologies-base.tsv
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-cnv-consensus-gistic.zip
analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv
fusion_summary_ependymoma_foi.tsv
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
Molecular subtyping of ependymoma tumors results/EPN_all_data_withsubgroup.tsv Yes No
molecular-subtyping-EWS pbta-histologies-base.tsv
fusion_summary_ewings_foi.tsv
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 results/EWS_samples.tsv Yes No
molecular-subtyping-HGG pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-fusion-putative-oncogenic.tsv
pbta-cnv-consensus-gistic.zip
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of high-grade glioma samples #249 results/HGG_molecular_subtype.tsv Yes No
molecular-subtyping-LGAT pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
Molecular subtyping of Low-grade astrocytic tumor samples #631 results/lgat_subtyping.tsv Yes No
molecular-subtyping-MB pbta-histologies-base.tsv
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Molecular classification of Medulloblastoma subtypes (part of #731) results/MB_molecular_subtype.tsv
results/MB_batchcorrected_molecular_subtype.tsv
for uncorrected and batch-corrected input matrix
Yes No
molecular-subtyping-SHH-tp53 pbta-histologies
pbta-snv-consensus-mutation.maf.tsv.gz
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 N/A No No
molecular-subtyping-chordoma consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
In progress; identifying poorly-differentiated chordoma samples per #250 N/A Yes No
molecular-subtyping-embryonal pbta-histologies-base.tsv
fusion_summary_embryonal_foi.tsv
pbta-sv-manta.tsv.gz
consensus_seg_annotated_cn_x_and_y.tsv.gz

pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 results/embryonal_tumor_molecular_subtypes.tsv Yes No
molecular-subtyping-integrate pbta-histologies-base.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
Add molecular subtype information to base histology results/pbta-histologies.tsv (included in data download) Yes Yes
molecular-subtyping-neurocytoma pbta-histologies-base.tsv Molecular subtyping of Neurocytoma samples #805 results/neurocytoma_subtyping.tsv Yes No
molecular-subtyping-pathology analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv
analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv
analyses/molecular-subtyping-EWS/results/EWS_samples.tsv
analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv
analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv
analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv
analyses/molecular-subtyping-chordoma/results/chordoma_smarcb1_status.tsv
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 results/compiled_molecular_subtyping_with_clinical_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv
Yes No
mutational-signatures pbta-snv-consensus-mutation.maf.tsv.gz Performs three separate analyses of mutational signatures: 1) Analyzes COSMIC and Alexandrov et al. mutational signatures using the consensus SNV data; 2) Performs de novo signature extraction using only the WGS samples from the consensus SNV data; 3) Fits known CNS signatures to the WGS samples from the consensus SNV data N/A Yes No
mutect2-vs-strelka2 pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
Deprecated; comparison of only two SNV callers, subsumed by snv-callers N/A No No
oncoprint-landscape pbta-snv-consensus-mutation.maf.tsv.gz
pbta-fusion-putative-oncogenic.tsv
consensus_seg_annotated_cn_autosomes.tsv.gz
consensus_seg_annotated_cn_x_and_y.tsv.gz
independent-specimens.*
Combines mutation, copy number, and fusion data into an OncoPrint plot N/A Yes No
rna-seq-composition pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-results.tar.gz
pbta-mend-qc-manifest.tsv
pbta-star-log-manifest.tsv
pbta-star-log-final.tar.gz
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition N/A No No
run-gistic pbta-histologies.tsv
pbta-cnv-consensus.seg.gz
Runs GISTIC 2.0 on SEG files pbta-cnv-consensus-gistic.zip (included in data download) Yes Yes
sample-distribution-analysis pbta-histologies.tsv Produces plots and tables that illustrate the distribution of different histologies in the PBTA data N/A No No
selection-strategy-comparison pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
Deprecated; Comparison of RNA-seq data from different selection strategies N/A No No
sex-prediction-from-RNASeq pbta-gene-expression-kallisto.stranded.rds
pbta-histologies.tsv
Predicts genetic sex using RNA-seq data (#84) N/A No No
snv-callers pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub)
results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv (included in data download)
results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv (included in data download; too large for tracking via GitHub)
results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz (included in data download)
results/consensus/tcga-snv-mutation-tmb.tsv (included in data download)
results/consensus/tcga-snv-mutation-tmb-coding.tsv (included in data download)
Yes Yes
ssgsea-hallmark pbta-gene-counts-rsem-expected_count.stranded.rds Deprecated; performs GSVA using Hallmark gene sets N/A No, subsumed by gene-set-enrichment-analysis No
survival-analysis pbta-histologies.tsv
independent-specimens.wgswxs.primary.tsv
tp53_altered_status.tsv (results from tp53_nf1_score module)
quantiseq_deconv-output.rds (results from immune-deconv module)
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Performs kaplan-meier, log rank, and/or cox regression univariate or multivariate survival modeling N/A Yes No
telomerase-activity-prediction pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
Quantify telomerase activity across pediatric brain tumors (part of #148) results/TelomeraseScores_PTBAPolya_counts
results/TelomeraseScores_PTBAPolya_FPKM.txt
results/TelomeraseScores_PTBAStranded_counts.txt
results/TelomeraseScores_PTBAStranded_FPKM.txt
results/EXTENDScores_{broad_histology}.tsv
Yes No
tmb-compare pbta-snv-consensus-mutation-tmb-coding.tsv Deprecated. Compares PBTA tumor mutation burden to adult TCGA data. N/A Not directly, similar figure generated in figures/ No
tp53_nf1_score pbta-snv-consensus-mutation.maf.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 N/A Yes No
transcriptomic-dimension-reduction pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
Dimension reduction and visualization of RNA-seq data N/A Yes No
tcga-capture-kit-investigation pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
pbta-histologies.tsv
pbta-tcga-manifest.tsv
WGS.hg38.lancet.unpadded.bed
WGS.hg38.strelka2.unpadded.bed
WGS.hg38.mutect2.vardict.unpadded.bed
Deprecated; Investigation of the TMB discrepancy between PBTA and TCGA data results/*.bed No No
tumor-purity-exploration pbta-histologies.tsv This modules explores tumor purity distributions and potential covariates, as well as establishes a cancer-group specific threshold for selecting high tumor purity samples. thresholded_rna_stranded_same-extraction.tsv Yes No