Creating an Image Processing Pipeline¶
Welcome to the user guide for building an image processing pipeline using starfish! This tutorial will cover all the steps necessary for going from raw images to a single cell gene expression matrix. If you are wondering what is starfish, check out The Introduction. If you only have a few minutes to try out starfish, check out a pre-built pipeline by following the Guide to Getting Started. If you are ready to learn how to build your own image processing pipeline using starfish then read on!
The data model
This part of the tutorial goes into more detail about why each of the stages in the example are needed, and provides some alternative approaches that can be used to build similar pipelines.
The core functionalities of starfish pipelines are the detection (and decoding) of spots, and the segmentation of cells. Each of the other approaches are designed to address various characteristics of the imaging system, or the optical characteristics of the tissue sample being measured, which might bias the resulting spot calling, decoding, or cell segmentation decisions. Not all parts of image processing are always needed; some are dependent on the specific characteristics of the tissues. In addition, not all components are always found in the same order. Starfish is flexible enough to omit some pipeline stages or disorder them, but the typical order might match the following. The links show how and when to use each component of starfish, and the final section demonstrates putting together a “pipeline recipe” and running it on an experiment.
Sometimes it can be useful subset the images by, for example, excluding out-of-focus images or cropping out edge effects. For sparse data, it can be useful to project the z-volume into a single image, as this produces a much faster processing routine.
These stages are typically specific to the microscope, camera, filters, chemistry, and any tissue handling or microfluidices that are involved in capturing the images. These steps are typically independent of the assay. Starfish enables the user to design a pipeline that matches their imaging system
Enhancing Signal & Removing Background Noise¶
These stages are usually specific to the sample being analyzed. For example, tissues often have some level of autofluorescence which causes cellular compartments to have more background noise than intracellular regions. This can confound spot finders, which look for local intensity differences. These approaches ameliorate these problems.
Most assays are designed such that intensities need to be compared between rounds and/or channels in order to decode spots. As a basic example, smFISH spots are labeled by the channel with the highest intensity value. But because different channels use different fluorophores, excitation sources, etc. the images have different ranges of intensity values. The background intensity values in one channel might be as high as the signal intensity values of another channel. Normalizing image intensities corrects for these differences and allows comparisons to be made.
Whether to normalize¶
The decision of whether to normalize depends on your data and decoding method used in the next
step of the pipeline.
ImageStack has approximately the same
range of intensities across rounds and
channels then normalizing may have a trivial effect on pixel values. Starfish provides utility
functions imshow_plane and
intensity_histogram to visualize images and their intensity
Accurately normalized images is important if you plan to decode features with
PixelSpotDecoder. These two algorithms use the
feature trace to construct a vector whose distance from
other vectors is used decode the feature. Poorly normalized images with some systematic or random
variation in intensity will bias the results of decoding.
However if you decode with
PerRoundMaxChannel, which only compares intensities
between channels of the same round, precise normalization is not necessary. As long the intensity
values of signal in all three channels are greater than background in all three channels the
features will be decoded correctly.
How to normalize¶
How to normalize depends on your data and a key assumption. There are two approaches for normalizing images in starfish:
If you know a priori that image volumes acquired for every channel and/or every round should have
the same distribution of intensities then the intensity distributions of image volumes can be
MatchHistograms. Typically this means the number of spots and amount of
background autofluorescence in every image volume is approximately uniform across channels and/or
In most data sets the differences in gene expression leads to too much variation in number of
spots between channels and rounds. Normalizing intensity distributions would incorrectly skew the
intensities. Instead you can use
ClipValueToZero to normalize intensity values by clipping extreme values and
Finding and Decoding Spots¶
Assigning Spots to Cells¶
Assessing Performance Metrics¶
Feature Identification and Assignment¶
Once images have been corrected for tissue and optical aberrations, spot finding can be run to turn those spots into features that can be counted up. Separately, The dots and nuclei images can be segmented to identify the locations where the cells can be found in the images. Finally, the two sets of features can be combined to assign each spot to its cell of origin. At this point, it’s trivial to create a cell x gene matrix.