This notebook will demonstrate how to:
tximeta
We will continue with the Tabula Muris data set that we started with in the previous notebook.
# tximeta for importing alevin results
library(tximeta)
Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'
# SingleCellExperiment package for organizing our results
library(SingleCellExperiment)
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: 'MatrixGenerics'
The following objects are masked from 'package:matrixStats':
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':
findMatches
The following objects are masked from 'package:base':
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':
rowMedians
The following objects are masked from 'package:matrixStats':
anyMissing, rowMedians
# GGPlot2 for the plots
library(ggplot2)
The data files we will be using for this part of the project are in
the data/tabula-muris
subdirectory of the
scRNA-seq
directory where this notebook is located.
The main files we will be using at this stage are the results from
our earlier quantification, located in the alevin-quant
subdirectory. Rather than just the subset, we will use the full data in
order to get a somewhat more realistic view of a 10x data set. This data
set is still a few years old though: newer datasets will tend to have
more cells!
# main data directory
data_dir <- file.path("data", "tabula-muris")
# reference files
ref_dir <- file.path("data", "reference")
# Path to the single-sample alevin results
alevin_file <- file.path(data_dir, "alevin-quant",
"10X_P4_3", "alevin", "quants_mat.gz")
# Mitochondrial gene table
mito_file <- file.path(ref_dir,
"mm_mitochondrial_genes.tsv")
# create the output directory using fs::dir_create()
filtered_dir <- file.path(data_dir, "filtered")
fs::dir_create(filtered_dir)
# Output file
filtered_sce_file <- file.path(filtered_dir, "filtered_sce.rds")
tximeta
needs a data frame with at least these two
columns: - a files
column with the file paths to the
quant.mat.gz files - a names
column with the sample
names
In this case, we are only importing a single experiment, so we will create a data frame with only one row.
coldata <- data.frame(files = alevin_file,
names = "10X_P4_3")
Using the coldata
data frame that we set up, we can now
run the tximeta()
to import our expression data while
automatically finding and associating the transcript annotations that
were used when we performed the quantification.
The first time you run tximeta()
you may get a message
about storing downloaded transcriptome data in a cache directory so that
it can retrieve the data more quickly the next time. We recommend you
use the cache, and accept the default location.
# Read in alevin results with tximeta
bladder_sce <- tximeta(coldata, type = "alevin")
importing quantifications
reading in alevin gene-level counts across cells with 'eds'
found matching transcriptome:
[ Ensembl - Mus musculus - release 95 ]
useHub=TRUE: checking for EnsDb via 'AnnotationHub'
found matching EnsDb via 'AnnotationHub'
downloading 1 resources
retrieving 1 resource
loading from cache
require("ensembldb")
generating gene ranges
generating gene ranges
A quick aside! When we ran alevinQC
on this data in the
last notebook, we saw that salmon alevin
had identified a
“whitelist” of barcodes that passed its quality control standards. We
could use this filtered list directly, but salmon alevin
can be quite strict, and methods for filtering quite variable. Instead,
we will use the default behavior of tximeta()
and read in
all of the barcodes for which there is a non-zero UMI count (after
barcode correction). If you wanted instead to include only only barcodes
that passed salmon alevin
’s filter, you could supply the
additional argument alevinArgs = list(filterBarcodes=TRUE)
to the tximeta()
function. Even if you do choose to read in
pre-filtered data, it’s still important to explore the data as we’re
about to do here and potentially filter further based on your
observations, in particular since mapping software’s quality control
measures (spoilers!) don’t always filter based on mitochondrial gene
content.
In the intro-to-R-tidyverse module notebook,
01-intro-to-base_R.Rmd
, we discuss base R object types, but
there are some ‘special’ object types that are package-specific.
tximeta
creates a SummarizedExperiment
object
(or more specifically a RangedSummarizedExperiment
object),
which is used by many Bioconductor packages to store and process results
from gene expression studies.
# Explore the SummarizedExperiment data
bladder_sce
class: RangedSummarizedExperiment
dim: 35429 344
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(1): counts
rownames(35429): ENSMUSG00000000001 ENSMUSG00000000003 ...
ENSMUSG00000117649 ENSMUSG00000117651
rowData names(8): gene_id gene_name ... symbol entrezid
colnames(344): CGGAGTCAGTACGCCC TTGGCAACATGATCCA ... ACGTCAAGTGTAATGA
ATTACTCAGAGAACAG
colData names(0):
The main component we are concerned with for now is the
counts
matrix, which is stored as an “assay”, with a row
for each gene and a column for each cell. In this case, we can see there
is information for 35,429 genes, and Alevin reports data for 344
cells.
tximeta
also automatically added some annotation
information about each gene, which can be seen by extracting the
rowData
table.
# Examine row (gene) metadata
rowData(bladder_sce)
DataFrame with 35429 rows and 8 columns
gene_id gene_name gene_biotype
<character> <character> <character>
ENSMUSG00000000001 ENSMUSG00000000001 Gnai3 protein_coding
ENSMUSG00000000003 ENSMUSG00000000003 Pbsn protein_coding
ENSMUSG00000000028 ENSMUSG00000000028 Cdc45 protein_coding
ENSMUSG00000000037 ENSMUSG00000000037 Scml2 protein_coding
ENSMUSG00000000049 ENSMUSG00000000049 Apoh protein_coding
... ... ... ...
ENSMUSG00000117643 ENSMUSG00000117643 AC122453.2 processed_pseudogene
ENSMUSG00000117644 ENSMUSG00000117644 AC108777.1 processed_pseudogene
ENSMUSG00000117646 ENSMUSG00000117646 AC122271.3 processed_pseudogene
ENSMUSG00000117649 ENSMUSG00000117649 AC165087.2 processed_pseudogene
ENSMUSG00000117651 ENSMUSG00000117651 CT485613.6 processed_pseudogene
seq_coord_system description
<character> <character>
ENSMUSG00000000001 chromosome guanine nucleotide b..
ENSMUSG00000000003 chromosome probasin [Source:MGI..
ENSMUSG00000000028 chromosome cell division cycle ..
ENSMUSG00000000037 chromosome Scm polycomb group p..
ENSMUSG00000000049 chromosome apolipoprotein H [So..
... ... ...
ENSMUSG00000117643 chromosome Wilms tumour 1-assoc..
ENSMUSG00000117644 chromosome gametocyte specific ..
ENSMUSG00000117646 chromosome developmental plurip..
ENSMUSG00000117649 chromosome heterogeneous nuclea..
ENSMUSG00000117651 chromosome NSE1 homolog, SMC5-S..
gene_id_version symbol entrezid
<character> <character> <list>
ENSMUSG00000000001 ENSMUSG00000000001.4 Gnai3 14679
ENSMUSG00000000003 ENSMUSG00000000003.15 Pbsn 54192
ENSMUSG00000000028 ENSMUSG00000000028.15 Cdc45 12544
ENSMUSG00000000037 ENSMUSG00000000037.16 Scml2 107815
ENSMUSG00000000049 ENSMUSG00000000049.11 Apoh 11818
... ... ... ...
ENSMUSG00000117643 ENSMUSG00000117643.1 AC122453.2 NA
ENSMUSG00000117644 ENSMUSG00000117644.1 AC108777.1 NA
ENSMUSG00000117646 ENSMUSG00000117646.1 AC122271.3 NA
ENSMUSG00000117649 ENSMUSG00000117649.1 AC165087.2 NA
ENSMUSG00000117651 ENSMUSG00000117651.1 CT485613.6 NA
We could leave the object as it is, but we can unlock some extra
functionality by converting this from a
SummarizedExperiment
object to a
SingleCellExperiment
, so we will go ahead and do that next.
SingleCellExperiment
objects are a subtype of
SummarizedExperiment
objects that a lot of single-cell
analysis R packages use, so we will try to get acquainted with them.
For more information on SingleCellExperiment
objects, as
well as many other topics related to this course, we highly recommend
the e-book Orchestrating
Single-Cell Analysis with Bioconductor (OSCA) and/or Amezquita
et al. (2020).
Below is a figure from OSCA that shows the general structure of
SingleCellExperiment
objects.
Note that three are slots for raw data, metadata about cells, metadata about genes or features, and slots for various transformations of the input data. Many of these will not be filled in when we first create the object, but as we proceed through the workshop we will add in more data to these slots as we compute new summaries and transformations.
To perform the conversion to a SingleCellExperiment
, we
will use the R function as()
, which “coerces” objects from
one type to another.
# Convert the SummarizedExperiment to a SingleCellExperiment
bladder_sce <- as(bladder_sce, "SingleCellExperiment")
bladder_sce
class: SingleCellExperiment
dim: 35429 344
metadata(6): tximetaInfo quantInfo ... txomeInfo txdbInfo
assays(1): counts
rownames(35429): ENSMUSG00000000001 ENSMUSG00000000003 ...
ENSMUSG00000117649 ENSMUSG00000117651
rowData names(8): gene_id gene_name ... symbol entrezid
colnames(344): CGGAGTCAGTACGCCC TTGGCAACATGATCCA ... ACGTCAAGTGTAATGA
ATTACTCAGAGAACAG
colData names(0):
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
Doing this added a couple of (currently empty) slots for things like dimensionality reduction results and alternative feature experiments. Foreshadowing!
For a first pass at the data, we will extract just the counts matrix
from the SingleCellExperiment
object, and use some base R
functions to look at our results.
We can extract the gene by cell count matrix using the
counts()
function. This actually returns a special format
of matrix called a “sparse” matrix. Since single cell count data is
mostly zeros, this format (a dgCMatrix
object) allows R to
save a lot of memory. This object takes up about 6.4 MB, but if we
stored it in the normal format, it would be closer to 100 MB!
Thankfully, most of the functions that we use to work with regular
matrices work just fine with these as well.
sc_counts <- counts(bladder_sce)
Let’s look at the mean expression of the genes in this dataset. We
will use apply()
in order to calculate things across our
data frame. The second argument in apply()
specifies
whether we are calculating by rows or columns. (1 = rows, 2 =
columns).
In the code chunk below, use apply()
with the correct
arguments to calculate the gene means.
# Let's calculate the gene means (by row)
gene_means <- apply(sc_counts, 1, mean)
This works just fine, but you may have noticed it is a bit slow. For
a few common summary functions like means and sums, R has much more
efficient functions to calculate across rows or columns. In this case,
we can use rowMeans()
to do the same calculation much more
quickly.
# use rowMeans() to calculate gene means
gene_means <- rowMeans(sc_counts)
Let’s make our first density plot with these data. We will use
ggplot()
as you have seen before, but since the object we
want to plot, gene_means
, is a vector not a data frame, we
will skip the data
argument and go straight to the
mapping
aesthetics. The remainder of the
ggplot
code should look familiar.
# Plot the density of the means using ggplot2
ggplot(mapping = aes(x = gene_means)) +
geom_density() +
labs(x = "Mean gene count")
That plot is not quite as informative as we might like, as a few
genes with high expression are making the scale just a bit
wide. Lets zoom in on the left part of the graph by adding an
xlim()
argument. (Note that xlim()
will remove
points outside the specified range, so you will get a warning.)
# Plot the density of the means using ggplot2
ggplot(mapping = aes(x = gene_means)) +
geom_density() +
labs(x = "Mean gene count") +
xlim(0, 5)
Warning: Removed 203 rows containing non-finite outside the scale range
(`stat_density()`).
Even as we zoom in, the counts data has many zeroes, which we actually expect in a single cell RNA-seq experiment.
Let’s calculate what proportion of the count data is zeros:
sum(sc_counts == 0)/(nrow(sc_counts) * ncol(sc_counts))
[1] 0.9447591
The small amount of RNA in a single cell results in higher chances of errors and biases in RNA isolation, amplification, and sequencing. We should check that the overall data we observe for each sample/cell are reasonable before proceeding too far.
The next section explores some of the ways we can filter the data set to clean things up before we continue to downstream analysis.
First, lets look at the total number of counts per cell, across all
genes. For this we will use colSums()
, as each column
represents a different sampled cell.
# Make a vector of total_counts number of counts per sample using colSums()
total_counts <- colSums(sc_counts)
# Take a look at the summary statistics for the total counts
summary(total_counts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 287.8 3971.5 8089.9 12008.8 62446.0
Yikes, at least one of the cells has only 1 read!, compared to the median of ~4000! It’s highly likely that this ‘cell’ is either an empty well or did not get sequenced properly.
Let’s visualize the distribution of total counts to see if this is the only cell we might want to exclude.
In following graphs, we will use vertical red lines to indicate possible cutoffs.
# Let's use the same kind of plot as above but add more layers
ggplot(mapping = aes(x = total_counts)) +
geom_density(fill = "lightblue") +
geom_vline(xintercept = 1000, color = "red") +
labs(x = "Counts per cell")
How many cells would be removed with this (or other cutoffs) for counts per sample?
# Calculate the number of cells that would be removed with a given cutoff
count_cutoff <- 1000
sum(total_counts <= count_cutoff)
[1] 133
What if a single gene accounted for all counts in a particular cell? This cell would not have helpful data for us, so we should look to remove any cells we suspect might not have a useful amount of its transcriptome measured.
But before we can determine how many genes we consider a particular cell to be expressing we need to determine a numeric cutoff for what we consider to be a detected gene. How many counts must there be for you to consider a gene expressed? Here let’s go for a simple detection cutoff of > 0.
# make a detection_mat matrix that is TRUE when a gene is expressed in a sample
detection_mat <- sc_counts > 0
Now that we have turned our data into a matrix of
TRUE/FALSE
for detection, we can sum this data by column to
effectively get a vector of how many genes were measured in each
cell.
# Make a vector that contains the number of genes expressed by a particular cell
num_genes_exp <- colSums(detection_mat)
Let’s plot this using the same style and type of graph as above.
ggplot(mapping = aes(x = num_genes_exp)) +
geom_density(fill = "lightblue") +
labs(x = "Number of genes expressed") +
theme_classic()
This plot helps us visualize the distribution of genes per cell and can help inform how we choose the cutoff. It’s important to remember that different cell types can have quite different patterns with regards to number of genes expressed. If we were to use strict cutoffs to select which cells are “valid”, there is the possibility that we could bias our results, so this is something we want to be careful about.
Let’s see what happens if we only keep cells with > 500 expressed genes. Just like when we looked at total counts, we can add in a vertical line to the previous plot where the possible cutoff would be.
ggplot(mapping = aes(x = num_genes_exp)) +
geom_density(fill = "lightblue") +
labs(x = "Number of genes expressed") +
theme_classic() +
geom_vline(xintercept = 500, color = "red")
How many cells would be removed with this cutoff?
# Calculate the number of cells that would be removed with a given cutoff
gene_cutoff <- 500
sum(num_genes_exp <= gene_cutoff)
[1] 145
If a cell is dead or dying, its mRNA will tend to leak out of the
cell, leaving an overabundance of mitochondrial RNA, which is more
likely to stay within the mitochondria longer. To look for this, we
would like to calculate the fraction of mitochondrial expression for
each cell as well. First, we will need a list of the mitochondrial
genes, which we have prepared in a tsv file
mm_mitochondrial_genes.tsv
that we will now read in, and
filter to just the genes that are found in the data set.
# read `mm_mitochondrial_genes.tsv` from ref_dir and
# create from it a single vector containing only the gene ids
mito_genes <- readr::read_tsv(mito_file) |>
# filter to only gene in the sce object
dplyr::filter(gene_id %in% rownames(bladder_sce)) |>
# pull takes this column out of the data frame as a stand-alone vector
dplyr::pull(gene_id)
Rows: 37 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (9): gene_id, gene_name, seqnames, strand, gene_biotype, seq_coord_syste...
dbl (4): start, end, width, entrezid
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now we can use the genes from that list to select only the rows of the count matrix that correspond to the mitochondrial genes and sum their expression for each sample.
# create a mito_rows vector that is TRUE for mitochondrial genes in our dataset
mito_rows <- rownames(sc_counts) %in% mito_genes
# sum the counts from just those genes for all samples
mito_counts <- colSums(sc_counts[mito_rows, ])
# calculate mito_fraction for all samples
mito_fraction <- mito_counts/total_counts
Lets make a plot of this distribution as well!
ggplot(mapping = aes(x = mito_fraction)) +
geom_density(fill = "lightblue") +
labs(x = "Mitchondrial fraction") +
geom_vline(xintercept = 0.2, color = "red") +
theme_classic()
Here, we want to keep cells with a low fraction of reads
corresponding to mitochondrial genes and remove any cells with a high
mitochondrial fraction. Again, it’s important to take this step even if
you started with filtered data, since mapping software like
salmon alevin
and Cell Ranger do not usually consider
mitochondrial read percentages when filtering.
Lets put all of the QC measures we have calculated into a single data frame, so we can look at how they might relate to one another.
# make a data frame with number of genes expressed, total counts, and mito fraction
qc_df <- data.frame(barcode = names(num_genes_exp),
genes_exp = num_genes_exp,
total_counts = total_counts,
mito_fraction = mito_fraction)
Now we can plot these measures all together, along with some possible cutoffs.
ggplot(qc_df, aes (x = total_counts,
y = genes_exp,
color = mito_fraction)) +
geom_point(alpha = 0.5) +
scale_color_viridis_c() +
geom_vline(xintercept = 1000, color = "red") +
geom_hline(yintercept = 500, color = "red") +
labs(x = "Total Count",
y = "Number of Genes Expressed",
color = "Mitochondrial\nFraction") +
theme_bw()
If we want to filter our data based on these measures and cutoffs we
like, we can do this with dplyr::filter()
and then select
the resulting columns from the matrix.
# create a filtered_samples data frame from qc_df
filtered_samples <- qc_df |>
dplyr::filter(total_counts > 1000,
genes_exp > 500,
mito_fraction < 0.2)
# select only passing samples for bladder_sce_filtered
sc_counts_filtered <- sc_counts[, filtered_samples$barcode]
SingleCellExperiment
directlyscater
The methods above were nice for demonstrating the kinds of filtering
we might do, but all the steps would certainly be repetitive if we had
to do them for each sample. Thankfully, there are some nice methods that
have been developed in packages like scater
to perform them
all at once and add the results to the SingleCellExperiment
object. The advantages of using functions like this are that we can keep
all of the metadata together, filter directly on the object of interest,
avoid a lot of repetition, and in doing so avoid many potential
errors.
We will start with the function addPerCellQC()
, which
takes a SingleCellExperiment
and a list of gene sets that
that we might want to calculate subset information for. In our case, we
will just look at mitochondrial genes again.
bladder_sce <- scater::addPerCellQC(
bladder_sce,
# a list of named gene subsets that we want stats for
# here we are using mitochondrial genes
subsets = list(mito = mito_genes)
)
The results of these calculations are now stored as a data frame in
the colData
slot of the SingleCellExperiment
object, which we can pull out with the colData()
function.
(Unfortunately, it is not quite a regular data frame, but we can easily
convert it to one.) Even nicer, we can access the QC data in those
columns directly with just the familiar $
syntax!
The calculated statistics include sum
, the total UMI
count for the cell, detected
, the number of genes detected,
and a few different statistics for each subset that we gave, including
the percent (not fraction!) of all UMIs from the subset. Since the
subset we used was named mito
, this column is called
subsets_mito_percent
.
Using these, we can recreate the plot from before:
# extract the column data and convert to a data frame
bladder_qc <- data.frame(colData(bladder_sce))
# plot with the qc data frame
ggplot(bladder_qc, aes (x = sum,
y = detected,
color = subsets_mito_percent)) +
geom_point(alpha = 0.5) +
scale_color_viridis_c() +
labs(x = "Total Count",
y = "Number of Genes Expressed",
color = "Mitochondrial\nFraction") +
theme_bw()
SingleCellExperiment
Filtering the SingleCellExperiment
object is done as if
it were just the counts matrix, with brackets and indexes. While this
will look much like what we did before, it is better, because it will
also keep the filtered QC stats alongside, in case we wanted to revisit
them later. Otherwise, we would have to filter our QC results
separately, which is an easy place for errors to creep in.
# create a boolean vector of QC filters
cells_to_keep <- bladder_sce$sum > 1000 &
bladder_sce$detected > 500 &
bladder_sce$subsets_mito_percent < 20
# filter the sce object (cells are columns)
bladder_sce_filtered <- bladder_sce[, cells_to_keep]
Just to check, we should have the same number of cells in
bladder_sce_filtered
as our previous
sc_counts_filtered
.
ncol(sc_counts_filtered) == ncol(bladder_sce_filtered)
[1] TRUE
Now we have an idea of what cells we probably want to get rid of. But what if our data contains genes that we can’t reliably measure in these cells?
We could use our earlier detection_mat
to add up how
many cells express each gene, but we will skip straight to the
scater
function this time, which is called
addPerFeatureQC()
. This will add QC statistics to the
rowData
for each gene (alongside the annotation data we
already had there) The columns it adds are the average expression level
of each gene (mean
) and the percentage of cells in which it
was detected (detected
).
bladder_sce_filtered <- scater::addPerFeatureQC(bladder_sce_filtered)
Let’s make another density plot with the percentage of samples that express each gene:
# extract the gene information with
gene_info <- data.frame(rowData(bladder_sce_filtered))
# Plot the detected percentage
ggplot(gene_info, aes(x = detected) )+
geom_density(fill = "lightblue") +
labs(x = "Percent of Cells Expressing Each Gene") +
theme_classic()
How many genes will be excluded if we draw our cutoff at 5% of cells?
sum(gene_info$detected < 5)
[1] 23960
That’s a lot! How do we feel about that?
cutoff <- 2
# filter bladder_sce_filtered to only genes above a cutoff value
bladder_sce_filtered <- bladder_sce_filtered[gene_info$detected >= cutoff, ]
How big is the SingleCellExperiment
object now?
dim(bladder_sce_filtered)
[1] 13648 186
We will save the filtered SingleCellExperiment
object as
a .rds
file for later use.
# Save object to the file filtered_sce_file, which
# we defined at the top of this notebook
readr::write_rds(bladder_sce_filtered, file = filtered_sce_file)
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] ensembldb_2.28.0 AnnotationFilter_1.28.0
[3] GenomicFeatures_1.56.0 AnnotationDbi_1.66.0
[5] ggplot2_3.5.1 SingleCellExperiment_1.26.0
[7] SummarizedExperiment_1.34.0 Biobase_2.64.0
[9] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
[11] IRanges_2.38.0 S4Vectors_0.42.0
[13] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
[15] matrixStats_1.3.0 tximeta_1.22.0
[17] optparse_1.7.5
loaded via a namespace (and not attached):
[1] jsonlite_1.8.8 tximport_1.32.0
[3] magrittr_2.0.3 ggbeeswarm_0.7.2
[5] farver_2.1.1 rmarkdown_2.26
[7] fs_1.6.4 BiocIO_1.14.0
[9] zlibbioc_1.50.0 vctrs_0.6.5
[11] memoise_2.0.1 Rsamtools_2.20.0
[13] DelayedMatrixStats_1.26.0 RCurl_1.98-1.14
[15] htmltools_0.5.8.1 S4Arrays_1.4.0
[17] progress_1.2.3 AnnotationHub_3.12.0
[19] curl_5.2.1 BiocNeighbors_1.22.0
[21] SparseArray_1.4.0 sass_0.4.9
[23] bslib_0.7.0 httr2_1.0.1
[25] cachem_1.0.8 GenomicAlignments_1.40.0
[27] mime_0.12 lifecycle_1.0.4
[29] pkgconfig_2.0.3 rsvd_1.0.5
[31] Matrix_1.7-0 R6_2.5.1
[33] fastmap_1.1.1 GenomeInfoDbData_1.2.12
[35] digest_0.6.35 colorspace_2.1-0
[37] scater_1.32.0 irlba_2.3.5.1
[39] RSQLite_2.3.6 beachmat_2.20.0
[41] filelock_1.0.3 labeling_0.4.3
[43] fansi_1.0.6 httr_1.4.7
[45] abind_1.4-5 compiler_4.4.1
[47] bit64_4.0.5 withr_3.0.0
[49] BiocParallel_1.38.0 viridis_0.6.5
[51] DBI_1.2.2 highr_0.10
[53] biomaRt_2.60.0 rappdirs_0.3.3
[55] DelayedArray_0.30.0 rjson_0.2.21
[57] tools_4.4.1 vipor_0.4.7
[59] beeswarm_0.4.0 glue_1.7.0
[61] restfulr_0.0.15 grid_4.4.1
[63] generics_0.1.3 gtable_0.3.5
[65] tzdb_0.4.0 hms_1.1.3
[67] ScaledMatrix_1.12.0 BiocSingular_1.20.0
[69] xml2_1.3.6 utf8_1.2.4
[71] XVector_0.44.0 ggrepel_0.9.5
[73] BiocVersion_3.19.1 pillar_1.9.0
[75] stringr_1.5.1 vroom_1.6.5
[77] dplyr_1.1.4 getopt_1.20.4
[79] BiocFileCache_2.12.0 lattice_0.22-6
[81] rtracklayer_1.64.0 bit_4.0.5
[83] tidyselect_1.2.1 Biostrings_2.72.0
[85] scuttle_1.14.0 knitr_1.46
[87] gridExtra_2.3 ProtGenerics_1.36.0
[89] xfun_0.43 eds_1.6.0
[91] stringi_1.8.3 UCSC.utils_1.0.0
[93] lazyeval_0.2.2 yaml_2.3.8
[95] evaluate_0.23 codetools_0.2-20
[97] tibble_3.2.1 BiocManager_1.30.22
[99] cli_3.6.2 munsell_0.5.1
[ reached getOption("max.print") -- omitted 18 entries ]