Skip to content

Pre-processing

prepro

During the first step of the segmentation pipeline, raw data need to be converted to file formats that can be read-in by external software (Fiji, R, python, histoCAT).

Please follow the preprocessing.ipynb script to pre-process the raw data.
To get started, please refer to the instructions here.

Input

The zipped .mcd files

The Hyperion Imaging System produces vendor controlled .mcd and .txt files in the following folder structure:

├── {XYZ}_ROI_001_1.txt
├── {XYZ}_ROI_002_2.txt
├── {XYZ}_ROI_003_3.txt
├── {XYZ}.mcd

where XYZ defines the filename, ROI_001, ROI_002, ROI_003 are names (description) for the selected regions of interest (ROI) and 1, 2, 3 indicate the acquistion identifiers. The ROI description entry can be specified in the Fluidigm software when selecting ROIs. The .mcd file contains the raw imaging data of all acquired ROIs while each .txt file contains data of a single ROI. To enforce a consistent naming scheme and to bundle all metadata, we recommend to zip the folder and specify the location of all .zip files for preprocessing. Each .zip file should only contain data from a single .mcd file and the name of the .zip file should match the name of the .mcd file.

The panel file

The panel file (in .csv format) specifies the type of antibodies that were used in the experiment and all additional channels (e.g. metals used for counterstaining1) that you want to include in downstream processing. Example entries to the panel file can look like this:

Metal Tag Target full ilastik
Dy161 Ecad 1 1
Dy162 CD45 1 0
Er166 CD3 1 1
Ru100 Counterstain 1 0

Usually there are more columns but the important ones in this case are Metal Tag, full and ilastik. The 1 in the full column specifies channels that should be written out to an image stack that will be later on used to extract features (also refered to as "full stack"). Here, please specify all channels as 1 that you want to have included in the analysis. The 1 in the ilastik column indicates channels that will be used for Ilastik pixel classification therefore being used for image segmentation (also refered to as "Ilastik stack"). During the pre-processing steps, you will need to specify the name of the panel column that contains the metal isotopes, the name of the column that contains the 1 or 0 entries for the channels to be analysed and the name of the column that indicates the channels used for Ilastik training as seen above.

Naming conventions

When going through the preprocessing script, you will notice the specification of the _full and _ilastik suffix indicating the two image stacks mentioned above.

Example data

We provide raw IMC example data at zenodo.org/record/5949116. This dataset contains 4 .zip archives each of which holds one .mcd and multiple .txt files. The data was acquired as part of the Integrated iMMUnoprofiling of large adaptive CANcer patient cohorts (IMMUcan) project immucan.eu using the Hyperion imaging system. Data of 4 patients with different cancer types are provided. To download the raw data together with the panel file, sample metadata and a pre-trained Ilastik classifier, please follow the download script.

Conversion fom .mcd to .ome.tiff files

In the first step of the pipeline, raw .mcd files are converted into .ome.tiff files2. This serves the purpose to allow vendor independent downstream analysis and visualization of the images. For in-depth information of the .ome.tiff file format see here. Each .mcd file can contain multiple acquisitions. This means that multiple multi-channel .ome.tiff files per .mcd file are produced. The Fluor and Name entries of each channel are set. Here Name contains the actual name of the antibody as defined in the panel file and Fluor contains the metal tag of the antibody. For IMC data, the metal tag is defined as: (IsotopeShortname)(Mass), e.g. Ir191 for Iridium isotope 191.

To perform this conversion, we use the extract_mcd_file function of the internal imcsegpipe python package. It uses the readimc python package to read .mcd files and the xtiff python package to write the .ome.tiff files.

The ometiff output folder for each sample has the following form:

├── {XYZ}_s0_a1_ac.ome.tiff
├── {XYZ}_s0_a2_ac.ome.tiff
├── {XYZ}_s0_a3_ac.ome.tiff
├── {XYZ}_s0_a1_ac.ome.csv
├── {XYZ}_s0_a2_ac.ome.csv
├── {XYZ}_s0_a3_ac.ome.csv
├── {XYZ}_s0_p1_pano.png
├── {XYZ}_s0_slide.png
├── {XYZ}_schema.xml

Next to the individual .ome.tiff files (one per acquisition), .csv files are generated that contain the channel name (the metal isotope) and the channel label (the name of the antibody) in the correct channel order. The _pano.png files contain brighfield panorama acquisitions of the sample where the slide overview is stored as _p1_pano.png. The _schema.xml file contains the internal metadata of the .mcd file in .xml format.

The .mcd to .ome.tiff conversion step additionally generates the analysis/cpinp/acquisition_metadata.csv file that stores per acquisition metadata for later use in CellProfiler.

Conversion from .ome.tiff to single-channel tiffs

In the next pre-processing step, .ome.tiff files are converted to a format that is supported by the histoCAT software3. To load images into histoCAT, they need to be stored as unsigned 16-bit or unsigned 32-bit single-channel .tiff files. For each acquisition (each .ome.tiff file), the export_to_histocat converter function exports one folder containing all measured channels as single-channel 32-bit .tiff files. The naming convention of these .tiff files is Name_Fluor, where Name is the name of the antibody (or the metal if no name is available) and Fluor is the name of the metal isotope. For full documentation on the histoCAT format, please follow the manual. Part of a single histoCAT folder will look as follows:

├── 131Xe_Xe131.tiff
├── Beta-2M-1855((2962))Nd148_Nd148.tiff
├── ...

Conversion from .ome.tiff to multi-channel tiffs

For downstream analysis and Ilastik pixel classification, the .ome.tiff files are converted into two multi-channel image stacks in TIFF format:

1. Full stack: The full stack contains all channels specified by the "1" entries in the full column of the panel file. This stack will be later used to measure cell-specific expression features of the selected channels.

2. Ilastik stack: The Ilastik stack contains all channels specified by the "1" entries in the ilastik column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background pixel probabilities (see Ilastik training).

Additional image stacks can be generated by adapting the panel file and specifying the suffix of the file name.

Hot pixel filtering: Each pixel intensity is compared against the maximum intensity of the 3x3 neighboring pixels. If the difference is larger than a specified threshold, the pixel intensity is clipped to the maximum intensity in the 3x3 neighborhood. Setting hpf=None disables hot pixel filtering in this conversion step.

By default the hot pixel filtered full stack is written out to the analysis/cpout/images folder and the hot pixel filtered Ilastik stack is written out to the analysis/ilastik folder.

The analysis/ilastik folder contains files such as:

├── {XYZ}_s0_a1_ac_ilastik.tiff
├── {XYZ}_s0_a1_ac_ilastik.csv
├── ...

The analysis/cpout/images folder contains following files:


├── {XYZ}_s0_a1_ac_full.tiff
├── {XYZ}_s0_a1_ac_full.csv
├── ...

The matching .csv files contain the channel names (metals) in the correct channel order.

Export of acquisition-specific metadata

In the final step of the pre-processing pipeline, a .csv file containing the full stack channel names (metal isotopes) and a .csv file containing the channel names of the images storing pixel probabilities (see Ilastik training) are written out to the analysis/cpinp/ folder.

Output

After image pre-processing the following files have been generated:

  • analysis/ometiff: contains individual folders (one per sample) of which each contains multiple .ome.tiff files (one per acquisition).
  • analysis/histocat: contains individual folders (one per acquisition) of which each contains multiple single-channel .tiff files for upload to histoCAT.
  • analysis/ilastik: contains the ilastik stacks for pixel classification as well as .csv files indicating the channel order.
  • analysis/cpout/images: contains the full stacks for analysis as well as .csv files indicating the channel order.
  • analysis/cpout/panel.csv: the panel file was copied into the final output folder.
  • analysis/cpinp: containing the acquisition_metadata.csv, full_channelmeta.csv, and probab_channelmeta_manual.csv files containing acquisition and channel metadata

  1. Catena R. et al. (2018) Ruthenium counterstaining for imaging mass cytometry. The Journal of Pathology 244(4), pages 479-484. 

  2. Goldberg I.G. et al. (2005) The open microscopy environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biology 6(5), R47. 

  3. Shapiro D. et al. (2017) histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nature Methods 14, pages 873–876.