R/IMMUcan_2022_CancerExample.R
IMMUcan_2022_CancerExample.Rd
Obtain the IMMUcan_2022_CancerExample dataset, which consists of three data objects: single cell data, multichannel images and cell segmentation masks. Data were obtained by imaging mass cytometry (IMC) of sections of 4 patients with different tumor indications.
IMMUcan_2022_CancerExample(
data_type = c("sce", "spe", "images", "masks"),
version = "latest",
metadata = FALSE,
on_disk = FALSE,
h5FilesPath = NULL,
force = FALSE
)
type of object to load, `images` for multichannel images or
`masks` for cell segmentation masks. Single cell data are retrieved using
either `sce` for the SingleCellExperiment
format or `spe` for the
SpatialExperiment
format.
dataset version. By default, the latest version is returned.
if FALSE (default), the data object selected in
data_type
is returned. If TRUE, only the metadata associated to this
object is returned.
logical indicating if images in form of
HDF5Array objects (as .h5 files) should be stored on disk
rather than in memory. This setting is valid when downloading images
and masks
.
path to where the .h5 files for on disk representation
are stored. This path needs to be defined when on_disk = TRUE
. When
files should only temporarily be stored on disk, please set
h5FilesPath = getHDF5DumpDir()
.
logical indicating if images should be overwritten when files with the same name already exist on disk.
A SingleCellExperiment object with single cell data, a CytoImageList object containing multichannel images, or a CytoImageList object containing cell segmentation masks.
This is an Imaging Mass Cytometry (IMC) dataset used in the IMC data analysis book
images
contains 14 multichannel images, each
containing 50 channels, in the form of a CytoImageList class
object.
masks
contains the cell segmentation masks associated
with the images, in the form of a CytoImageList class object.
sce
contains the single cell data extracted from the
multichannel images using the cell segmentation masks, as well as the
associated metadata, in the form of a SingleCellExperiment
object. Single cell data can also be retrieved as a
SpatialExperiment object. This represents a total of 46,825
cells x 40 channels.
All data are downloaded from ExperimentHub and cached for local re-use.
Mapping between the three data objects is performed via variables located in
their metadata columns: mcols()
for the CytoImageList
objects and ColData()
for the SingleCellExperiment
object. Mapping at the image level can be performed with the
sample_id
or image_name
variables. Mapping between cell
segmentation masks and single cell data is performed with the
cell_number
variable, the values of which correspond to the
intensity values of the masks
object. For practical examples, please
refer to the "Accessing IMC datasets" vignette.
This imaging mass cytometry dataset serves as an example to demonstrate downstream analysis tools including spatial data analysis. The data was generated as part of the Integrated iMMUnoprofiling of large adaptive CANcer patient cohorts (IMMUcan) project (immucan.eu) using the Hyperion imaging system.
Relevant entries to the colData
slot are as follows:
sample_id
image name.
cell_number
cell identifier.
width_px
width of the image.
height_px
height of the image.
patient_id
patient identifier.
ROI
region of interest identifier.
indication
cancer type.
cell_labels
labels of manually labelled cells.
cell_type
cell type as defined by classification.
spatial_community
identifiers of each spatial tumor or
non-tumor community
cn_celltypes
cellular neighborhoods as defined by
clustering cells based on the frequency of neighboring cell types.
cn_expression
cellular neighborhoods as defined by
clustering cells based on the mean expression of neighboring cells
lisa_clusters
cellular neighborhoods as detected by the
lisaClust package.
spatial_context
spatial contexts defined in
cn_celltype
.
spatial_context_filtered
filtered spatial context
identifiers.
patch_id
identifier of the spatial tumor patch.
cell_x
spatial x coordinate.
cell_y
spatial y coordinate.
The marker-associated metadata, including antibody information and metal
tags are stored in the rowData
of the
SingleCellExperiment object.
The assay
slot of the SingleCellExperiment object
contains two assays:
counts
: mean ion counts per cell
exprs
: arsinh-transformed counts per cell, with cofactor 1.
The colPair
slot of the SingleCellExperiment object
contains the following spatial object graphs:
neighborhood
steinbock generated graph.
knn_interaction_graph
20-nearest neighbor graph.
expansion_interaction_graph
expansion graph using a threshold
of 20.
delaunay_interaction_graph
interaction graph constructed by
delaunay triangulation.
knn_spatialcontext_graph
40-nearest neighbor graph.
File sizes:
`images`
: size in memory = 1.5 Gb, size on disk = 786 Mb.
`masks`
: size in memory = 19 Mb, size on disk = 1.2 Mb.
`sce`
: size in memory = 182 Mb, size on disk = 82 Mb.
`spe`
: size in memory = 183 Mb, size on disk = 81 Mb.
When storing images on disk, these need to be first fully read into memory before writing them to disk. This means the process of downloading the data is slower than directly keeping them in memory. However, downstream analysis will lose its memory overhead when storing images on disk.
# Load single cell data
sce <- IMMUcan_2022_CancerExample(data_type = "sce")
#> snapshotDate(): 2023-10-24
#> see ?imcdatasets and browseVignettes('imcdatasets') for documentation
#> loading from cache
print(sce)
#> class: SingleCellExperiment
#> dim: 40 47794
#> metadata(5): color_vectors cluster_codes SOM_codes delta_area
#> filterSpatialContext
#> assays(2): counts exprs
#> rownames(40): MPO H3 ... DNA1 DNA2
#> rowData names(17): channel metal ... ilastik deepcell
#> colnames(47794): 1_1 1_2 ... 14_2844 14_2845
#> colData names(43): sample_id ObjectNumber ... cell_x cell_y
#> reducedDimNames(8): UMAP TSNE ... seurat UMAP_seurat
#> mainExpName: IMMUcan_2022_CancerExample_v1
#> altExpNames(0):
# Display metadata
IMMUcan_2022_CancerExample(data_type = "sce", metadata = TRUE)
#> snapshotDate(): 2023-10-24
#> ExperimentHub with 1 record
#> # snapshotDate(): 2023-10-24
#> # names(): EH7842
#> # package(): imcdatasets
#> # $dataprovider: University of Zurich
#> # $species: Homo sapiens
#> # $rdataclass: SingleCellExperiment
#> # $rdatadateadded: 2023-01-30
#> # $title: IMMUcan_2022_CancerExample - sce - v1
#> # $description: Single cell data for the IMMUcan_2022_CancerExample IMC dataset
#> # $taxonomyid: 9606
#> # $genome: NA
#> # $sourcetype: Zip
#> # $sourceurl: https://zenodo.org/record/6810879
#> # $sourcesize: NA
#> # $tags: c("Homo_sapiens_Data", "ImmunoOncologyData",
#> # "ReproducibleResearch", "SingleCellData", "SpatialData",
#> # "TechnologyData", "Tissue")
#> # retrieve record with 'object[["EH7842"]]'
# Load masks on disk
library(HDF5Array)
masks <- IMMUcan_2022_CancerExample(data_type = "masks", on_disk = TRUE,
h5FilesPath = getHDF5DumpDir())
#> snapshotDate(): 2023-10-24
#> see ?imcdatasets and browseVignettes('imcdatasets') for documentation
#> loading from cache
print(head(masks))
#> CytoImageList containing 6 image(s)
#> names(6): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003
#> Each image contains 1 channel