vignettes/imcdatasets.Rmd
imcdatasets.Rmd
The imcdatasets package provides access to publicly available datasets generated using imaging mass cytometry (IMC) (Giesen et al. 2014).
IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 \(\mu m\) Giesen et al. (2014). In classical processing pipelines, such as the ImcSegmentationPipeline or steinbock, the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images.
Each dataset in imcdatasets
is composed of three
elements that can be retrieved separately:
1. Single-cell data in the form of a SingleCellExperiment
or SpatialExperiment
class object (named
sce.rds
).
2. Multichannel images in the form of a CytoImageList
class
object (named images.rds
).
3. Cell segmentation masks in the form of a CytoImageList
class object (named masks.rds
).
The listDatasets()
function returns all available
datasets in imcdatasets
, along with associated information.
The FunctionCall
column gives the name of the R function
that enables to load the dataset.
datasets <- listDatasets()
datasets <- as.data.frame(datasets)
datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall)
knitr::kable(datasets)
FunctionCall | Species | Tissue | NumberOfCells | NumberOfImages | NumberOfChannels | Reference |
---|---|---|---|---|---|---|
Damond_2019_Pancreas() |
Human | Pancreas | 252059 | 100 | 38 | Damond et al. (2019) |
HochSchulz_2022_Melanoma() |
Human | Metastatic melanoma | 325881 | 50 | 41 | Hoch et al. (2022) |
JacksonFischer_2020_BreastCancer() |
Human | Primary breast tumour | 285851 | 100 | 42 | Jackson et al. (2020) |
Zanotelli_2020_Spheroids() |
Human | Cell line spheroids | 229047 | 517 | 51 | Zanotelli et al. (2020) |
IMMUcan_2022_CancerExample() |
Human | Primary tumor | 46825 | 14 | 40 | None |
Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing an example dataset linked to the IMMUcan project.
Importing single-cell expression data and metadata
sce <- IMMUcan_2022_CancerExample("sce")
sce
## class: SingleCellExperiment
## dim: 40 47794
## metadata(5): color_vectors cluster_codes SOM_codes delta_area
## filterSpatialContext
## assays(2): counts exprs
## rownames(40): MPO H3 ... DNA1 DNA2
## rowData names(17): channel metal ... ilastik deepcell
## colnames(47794): 1_1 1_2 ... 14_2844 14_2845
## colData names(43): sample_id ObjectNumber ... cell_x cell_y
## reducedDimNames(8): UMAP TSNE ... seurat UMAP_seurat
## mainExpName: IMMUcan_2022_CancerExample_v1
## altExpNames(0):
Importing multichannel images
images <- IMMUcan_2022_CancerExample("images")
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 40 channel(s)
## channelNames(40): MPO H3 SMA CD16 CD38 HLA_DR CD27 CD15 CD45RA CD163 B2M CD20 CD68 IDO1 CD3e LAG3 CD11c PD_1 PDGFRB CD7 GZMB PD_L1 TCF7 CD45RO FOXP3 ICOS CD8a CA9 CD33 Ki67 VISTA CD40 CD4 CD14 CDH1 CD303 CD206 c_PARP DNA1 DNA2
Importing cell segmentation masks
masks <- IMMUcan_2022_CancerExample("masks")
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 1 channel
On disk storage
Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis.
To write images or masks to disk, set on_disk = TRUE
and
specify a path where images/masks will be stored as .h5 files:
# Create temporary location
cur_path <- tempdir()
masks <- IMMUcan_2022_CancerExample(data_type = "masks", on_disk = TRUE,
h5FilesPath = cur_path)
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 1 channel
Additional information about each dataset is available in the help page:
?IMMUcan_2022_CancerExample
The metadata associated with a specific data object can be displayed as follows:
IMMUcan_2022_CancerExample(data_type = "sce", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "images", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "masks", metadata = TRUE)
The SingleCellExperiment
class objects can be used for
data analysis. For more information, please refer to the SingleCellExperiment
package and to the Orchestrating
Single-Cell Analysis with Bioconductor workflow.
The CytoImageList
class objects can be used for plotting
cell and pixel information. Some typical use cases are given below. For
more information, please see the cytomapper
package and the associated
vignette.
Subsetting the images and masks
cur_images <- images[1:5]
cur_masks <- masks[1:5]
Plotting pixel information
The images
objects can be used to display pixel-level
data.
plotPixels(
cur_images,
colour_by = c("CD8a", "CD68", "CDH1"),
bcg = list(
CD8a = c(0,4,1),
CD68 = c(0,5,1),
CDH1 = c(0,5,1)
)
)
Plotting cell information
The masks
and sce
objects can be combined
to display cell-level data.
plotCells(
cur_masks, object = sce,
img_id = "image_number", cell_id = "cell_number",
colour_by = c("CD8a", "CD68", "CDH1"),
exprs_values = "exprs"
)
Outlining cells on images
Cell information can be displayed on top of images by combining the
images
, masks
and sce
objects.
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/Zurich
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] imcdatasets_1.10.0 SpatialExperiment_1.12.0
## [3] cytomapper_1.14.0 EBImage_4.44.0
## [5] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [7] Biobase_2.62.0 GenomicRanges_1.54.1
## [9] GenomeInfoDb_1.38.7 IRanges_2.36.0
## [11] S4Vectors_0.40.2 BiocGenerics_0.48.1
## [13] MatrixGenerics_1.14.0 matrixStats_1.2.0
## [15] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7
## [3] gridExtra_2.3 rlang_1.1.3
## [5] magrittr_2.0.3 svgPanZoom_0.3.4
## [7] shinydashboard_0.7.2 RSQLite_2.3.5
## [9] compiler_4.3.1 png_0.1-8
## [11] systemfonts_1.0.6 fftwtools_0.9-11
## [13] vctrs_0.6.5 pkgconfig_2.0.3
## [15] crayon_1.5.2 fastmap_1.1.1
## [17] dbplyr_2.4.0 magick_2.8.3
## [19] XVector_0.42.0 ellipsis_0.3.2
## [21] utf8_1.2.4 promises_1.2.1
## [23] rmarkdown_2.26 ggbeeswarm_0.7.2
## [25] ragg_1.3.0 bit_4.0.5
## [27] purrr_1.0.2 xfun_0.42
## [29] zlibbioc_1.48.2 cachem_1.0.8
## [31] jsonlite_1.8.8 blob_1.2.4
## [33] highr_0.10 later_1.3.2
## [35] rhdf5filters_1.14.1 DelayedArray_0.28.0
## [37] interactiveDisplayBase_1.40.0 Rhdf5lib_1.24.2
## [39] BiocParallel_1.36.0 jpeg_0.1-10
## [41] tiff_0.1-12 terra_1.7-71
## [43] parallel_4.3.1 R6_2.5.1
## [45] bslib_0.6.1 RColorBrewer_1.1-3
## [47] jquerylib_0.1.4 Rcpp_1.0.12
## [49] bookdown_0.38 knitr_1.45
## [51] httpuv_1.6.14 Matrix_1.6-5
## [53] nnls_1.5 tidyselect_1.2.1
## [55] viridis_0.6.5 abind_1.4-5
## [57] yaml_2.3.8 codetools_0.2-19
## [59] curl_5.2.1 lattice_0.22-5
## [61] tibble_3.2.1 withr_3.0.0
## [63] KEGGREST_1.42.0 shiny_1.8.0
## [65] evaluate_0.23 desc_1.4.3
## [67] BiocFileCache_2.10.1 Biostrings_2.70.3
## [69] ExperimentHub_2.10.0 filelock_1.0.3
## [71] pillar_1.9.0 BiocManager_1.30.22
## [73] generics_0.1.3 sp_2.1-3
## [75] RCurl_1.98-1.14 BiocVersion_3.18.1
## [77] ggplot2_3.5.0 munsell_0.5.0
## [79] scales_1.3.0 xtable_1.8-4
## [81] glue_1.7.0 tools_4.3.1
## [83] AnnotationHub_3.10.0 locfit_1.5-9.9
## [85] fs_1.6.3 rhdf5_2.46.1
## [87] grid_4.3.1 AnnotationDbi_1.64.1
## [89] colorspace_2.1-0 GenomeInfoDbData_1.2.11
## [91] raster_3.6-26 beeswarm_0.4.0
## [93] HDF5Array_1.30.1 vipor_0.4.7
## [95] cli_3.6.2 rappdirs_0.3.3
## [97] textshaping_0.3.7 fansi_1.0.6
## [99] viridisLite_0.4.2 S4Arrays_1.2.1
## [101] svglite_2.1.3 dplyr_1.1.4
## [103] gtable_0.3.4 sass_0.4.8
## [105] digest_0.6.35 SparseArray_1.2.4
## [107] rjson_0.2.21 htmlwidgets_1.6.4
## [109] memoise_2.0.1 htmltools_0.5.7
## [111] pkgdown_2.0.7 lifecycle_1.0.4
## [113] httr_1.4.7 mime_0.12
## [115] bit64_4.0.5