Downstream analysis of IMC data
This page gives a brief overview of the common steps for downstream analysis of IMC™ data. For a more detailed workflow using example data, please refer to the IMC Data Analysis repository (under development).
The analysis described here relies on the Bioconductor framework for single-cell analysis. The book Orchestrating Single-Cell Analysis with Bioconductor provides a good introduction into general single-cell analysis tasks. Although the book mainly focuses on single-cell RNA sequencing data analysis, most concepts (e.g., data handling, clustering, and visualization) can be applied to single-cell IMC data. The book is offered in basic and advanced versions.
Reading in the data
The imcRtools1 package reads data generated by steinbock
(using read_steinbock) or by the IMC Segmentation Pipeline (using read_cpout) for downstream analysis in R.
Single-cell data (mean intensities per cell and channel, morphological features and locations, and spatial object graphs) are read into SpatialExperiment or SingleCellExperiment objects.
The structure is as follows:
- mean intensity per cell and channel are stored in the counts assay of the objects (cells as columns, channels as rows).
- cell-specific morphological features and image information are stored in the colData slot of the objects
- marker-specific information extracted from the panel file are stored in the rowData slot of the objects
- spatial object graphs are stored as edge lists in the colPair slot of the objects
With the SingleCellExperiment
object, the locations of cells are stored in the colData
slot. For the SpatialExperiment
container, the cells' locations are stored in the spatialCoords
slot.
Spillover correction
Channel-to-channel spillover that can occur between neighboring channels leads to false signals. This should be corrected as described previously2. Spillover correction is an optional step in IMC data analysis and requires a specific experimental setup:
On an agarose slide, metal-tagged antibodies are spotted individually, and an individual region of interest is ablated for each antibody. Data from individual regions are processed using imcRtools as explained in the spillover correction section of IMC data analysis. After obtaining a spillover matrix, single-cell data can be compensated using the CATALYST package.
Quality control
Quality control (QC) is a crucial step to avoid technical artefacts in downstream analyses. Broadly speaking, there are two levels of data quality control:
-
Pixel-level QC: The first step after image acquisition and pre-processing is the visual assessment of image quality. For this, interactive tools such as napari-imc, histoCAT, QuPath, or cytomapper (see below) can be used. Considerations when viewing images for the first time are: (i) are markers known to be specifically expressed detected as expected (e.g., high signal-to-noise ratio, cell type specificity, adequate maximum intensity per channel, etc.), (ii) is channel-spillover visually detectable, and (iii) are staining differences detectable between acquisitions?
-
Cell-level QC: After reading in the single-cell data (e.g. using
imcRtools
), cell-level QC includes (i) global staining differences between acquisitions or batches (e.g., in form of ridge plots using dittoSeq), (ii) marker-to-marker correlations across all cells, (iii) low-dimensional representations of the data (see below), and (iv) the cell density and cell size distributions per acquisition.
Cell-type identification
One of the first analysis tasks includes cell phenotyping. This is commonly done by grouping cells based on marker expression and labeling these groups based on their biological roles. There are three major approaches for identification of cell types in the single-cell IMC data.
-
Unsupervised clustering-based: Clustering is the most common approach for identification of cell types in an unbiased manner. To cluster single-cells based on their marker expression, software packages have been developed that utilize graph-based clustering strategies (e.g., Rphenograph, scran) or self-organising maps (the
flowSOM
implementation in the CATALYST package). The bluster package offers a wide variety of clustering strategies applicable to single-cell data. After clustering, the mean marker expression per group is used to label clusters. -
Gating and classification: An alternative and more supervised approach to identify cell types from highly multiplexed images relies on manual gating (labeling) of cells and consecutive classification of all unlabeled cells. The
cytomapperShiny
application available in the cytomapper package allows to gate cells based on marker expression with joint visualization of the selected cells on composite images. The ground truth labels are saved and can be used as training dataset for general machine learning approaches (e.g., using caret, mlr or tidymodels). We observed a high classification accuracy when applying random forest-based classification. Additionally, a more specialized framework for classifying cells is the SingleR package. -
Classification based on prior knowledge: Classification approaches that do not rely on manual labeling of cells were implemented in the R Garnett and Python Astir modeling frameworks. Using these strategies, the user defines which cell types express which markers and cell labeling is performed automatically.
Data visualization
Visualization for bioimaging data can be performed on the pixel- and single-cell levels:
Pixel level visualization
Interactive visualization can be performed using GUI-based tools as described in the Image visualization section. As an alternative, the cytomapper Bioconductor package supports handling and visualization of multi-channel images using the statistical programming language R.
The main visualization functionalities of cytomapper
are three-fold:
plotPixels
The function takes a CytoImageList
object (available via the cytomapper
package) containing multi-channel images representing pixel-level expression values, and optionally a CytoImageList
object containing segmentation masks and a SingleCellExperiment
object containing cell-level metadata.
It allows the visualization of pixel-level information of up to six channels and outlining of cells based on cell-level metadata.
plotCells
This function takes a CytoImageList
object containing segmentation masks, and a SingleCellExperiment
object containing cell-level mean expression values and metadata information.
It allows the visualization of cell-level expression data and metadata information.
cytomapperShiny
This Shiny application allows gating of cells based on their expression values, and visualizes selected cells on their corresponding images.
It requires at least a SingleCellExperiment
as input and optionally CytoImageList
objects containing segmentation masks and multi-channel images.
Single-cell level visualization
Multiple Bioconductor packages have been released to visualize single-cell data contained in a SingleCellExperiment
or a SpatialExperiment
object.
A common approach to single-cell visualization is to first perform dimensionality reduction via PCA, tSNE, UMAP or Diffusion Maps. These functions are available in the scater Bioconductor package.
The scater
package further supports visualization of cells in low dimensions and a variety of functions to visualize single-cell expression (see here).
In addition, the dittoSeq package offers an extensive range of visualization functionality using a SingleCellExperiment
object as input.
For spatial visualization of single-cells and their interactions, the imcRtools
package exports the plotSpatial function.
Spatial analysis
A number of data analysis approaches have previously been described to extract biological information from spatially annotated, single-cell resolved data34567.
The imcRtools package standardizes spatial data analysis by providing functionalities that wrap around the SpatialExperiment
and SingleCellExperiment
object classes.
Whereas steinbock
and the IMC Segmentation Pipeline compute spatial object graphs, the imcRtools
package provides the buildSpatialGraph function for constructing interaction graphs "on the fly".
The aggregateNeighbors function computes the proportions of cell types within the direct neighborhood of each cell. Alternatively, expression counts of neighboring cells are summarized. These summarized values can be used to cluster cells and to detect cellular neighborhoods36.
The patchDetection function detects spatial clusters of defined types of cells. By defining a certain distance threshold, all cells within the vicinity of these clusters are detected as well7.
The countInteractions and testInteractions functions allow counting and testing the average cell type to cell type interactions per image45.
Example datasets
The imcdatasets Bioconductor package provides a collection of publicly available IMC datasets.
These can be used for data re-analysis and methods development and are provided in standardized formats using the SingleCellExperiment
and CytoImageList
object classes.
-
Windhager J. et al. (2021) An end-to-end workflow for multiplexed image processing and analysis. bioRxiv ↩
-
Chevrier S. et al. (2018) Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry. Cell Systems ↩
-
Goltsev Y. et al. (2018) Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell 174 ↩↩
-
Schapiro D. et al. (2017) histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nature Methods ↩↩
-
Schulz D. et al. (2018) Simultaneous Multiplexed Imaging of mRNA and Proteins with Subcellular Resolution in Breast Cancer Tissue Samples by Mass Cytometry. Cell Systems ↩↩
-
Schürch C. M. et al. (2021) Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front. Cell ↩↩
-
Hoch T. et al. (2021) Multiplexed Imaging Mass Cytometry of Chemokine Milieus in Metastatic Melanoma Characterizes Features of Response to Immunotherapy. bioRxiv ↩↩