Spots¶

Starfish provides a number of methods for which spots (or other regions of interest) are the main substrate. These include starfish.spots.DetectPixels, which exposes methods that identify which target code best corresponds to each pixel, and merges adjacent pixels into ROIs, starfish.spots.FindSpots, which exposes methods that find bright spots against dark backgrounds, starfish.spots.DecodeSpots, which exposes methods that match patterns of spots detected across rounds and channels in the same spatial positions with target codes, and starfish.spots.AssignTargets, which exposes methods to assign spots to cells.

Detecting Pixels¶

Pixel Detectors can be imported using starfish.spots.DetectPixels, which registers all classes that subclass DetectPixelsAlgorithm:

from starfish.spots import DetectPixels

class starfish.spots.DetectPixels.PixelSpotDecoder(codebook: Codebook, metric: str, distance_threshold: float, magnitude_threshold: int, min_area: int, max_area: int, norm_order: int = 2)¶

Decode an image by first coding each pixel, then combining the results into spots

Parameters:

codebookCodebook: Codebook object mapping codewords to the targets they are designed to detect
metricstr: the sklearn metric string to pass to NearestNeighbors
distance_thresholdfloat: spots whose codewords are more than this metric distance from an expected code are filtered
magnitude_thresholdint: spots with intensity less than this value are filtered
min_areaint: spots with total area less than this value are filtered
max_areaint: spots with total area greater than this value are filtered
norm_orderint: order of L_p norm to apply to intensities and codes when using metric_decode to pair each intensities to its closest target (default = 2)

Methods

run(primary_image[, n_processes])

decode pixels and combine them into spots using connected component labeling

run(primary_image: ImageStack, n_processes: Optional[int] = None, *args) → Tuple[DecodedIntensityTable, ConnectedComponentDecodingResult]¶

decode pixels and combine them into spots using connected component labeling

Parameters:

primary_imageImageStack

ImageStack containing spots

n_processesOptional[int]

The number of processes to use for CombineAdjacentFeatures.: If None, uses the output of os.cpu_count() (default = None).

Returns:

DecodedIntensityTable: IntensityTable containing decoded spots
ConnectedComponentDecodingResult: Results of connected component labeling

Finding Spots¶

Spot Finders can be imported using starfish.spots.FindSpots, which registers all classes that subclass FindSpotsAlgorithm:

from starfish.spots import FindSpots

class starfish.spots.FindSpots.BlobDetector(min_sigma: Union[int, float, Tuple[Union[int, float], ...]], max_sigma: Union[int, float, Tuple[Union[int, float], ...]], num_sigma: int, threshold: Union[int, float], overlap: float = 0.5, measurement_type='max', is_volume: bool = True, detector_method: str = 'blob_log', exclude_border: Union[Tuple[int], int, bool] = False)¶

Multi-dimensional gaussian spot detector

This method is a wrapper for skimage.feature.blob_log()

Parameters:

min_sigmaNumber: The minimum standard deviation for Gaussian Kernel. Keep this low to detect smaller blobs.
max_sigmaNumber: The maximum standard deviation for Gaussian Kernel. Keep this high to detect larger blobs.
num_sigmaint: The number of intermediate values of standard deviations to consider between min_sigma and max_sigma.
thresholdfloat: The absolute lower bound for scale space maxima. Local maxima smaller than threshold are ignored. Reduce this to detect blobs with less intensities.
is_volume: bool: If True, pass 3d volumes (x, y, z) to func, else pass 2d tiles (x, y) to func. (default: True)
overlapfloat [0, 1]: If two spots have more than this fraction of overlap, the spots are combined (default: 0.5)
measurement_typestr [‘max’, ‘mean’]: name of the function used to calculate the intensity for each identified spot area (default: max)
detector_method: str [‘blob_dog’, ‘blob_doh’, ‘blob_log’]: name of the type of detection method used from feature (default: blob_log)

Notes

See also: Blob Detection

Methods

`image_to_spots`(data_image)	Find spots using a gaussian blob finding algorithm
`run`(image_stack[, reference_image, n_processes])	Find spots in the given ImageStack using a gaussian blob finding algorithm.

image_to_spots(data_image: ndarray) → PerImageSliceSpotResults¶

Find spots using a gaussian blob finding algorithm

Parameters:

data_imagenp.ndarray: image containing spots to be detected

Returns:

PerImageSpotResults: includes a SpotAttributes DataFrame of metadata containing the coordinates, intensity and radius of each spot, as well as any extra information collected during spot finding.

run(image_stack: ImageStack, reference_image: Optional[ImageStack] = None, n_processes: Optional[int] = None, *args) → SpotFindingResults¶

Find spots in the given ImageStack using a gaussian blob finding algorithm. If a reference image is provided the spots will be detected there then measured across all rounds and channels in the corresponding ImageStack. If a reference_image is not provided spots will be detected _independently_ in each channel. This assumes a non-multiplex imaging experiment, as only one (ch, round) will be measured for each spot.

Parameters:

image_stackImageStack: ImageStack where we find the spots in.
reference_imageOptional[ImageStack]: (Optional) a reference image. If provided, spots will be found in this image, and then the locations that correspond to these spots will be measured across each channel.
n_processesOptional[int] = None,: Number of processes to devote to spot finding.

class starfish.spots.FindSpots.FindSpotsAlgorithm¶

Starfish spot finders use a variety of means to detect bright spots against dark backgrounds. Starfish’s spot detectors each have different strengths and weaknesses.

Fixed-position spot finders

The following spot finders have two modes of operation.

The first mode is suitable for coded experiments where genes are identified by patterns of spots over all rounds and channels of the experiment. In this mode, the spot finders identify spots in a single reference image, which can be either a dots auxiliary image, or a maximum intensity projection of the primary images. The positions of the maxima are then measured in all other images, and the intensities across the complete experiment are stored in an IntensityTable

The second mode is suitable for assays that detect spots in a single round, such as single molecule FISH and RNAscope. This mode simply finds all the spots and concatenates them into a long-form IntensityTable. In this mode, the spots are not measured in images that correspond to other (round, channel) pairs; those positions of the IntensityTable are filled with np.nan.

1. The BlobDetector allows the user to pre-filter an image using either a Laplacian-of-Gaussians or Difference-of-Gaussians (fast approximation to Laplacian-of-Gaussians). These filters are applied at with a user-specified variety of Gaussian kernel sizes, and the best-fitting size is automatically selected. This allows this filter to detect Gaussian shaped blobs of various sizes.

Methods

run(image_stack[, reference_image, n_processes])

Find and measure spots across rounds and channels in the provided ImageStack.

abstract run(image_stack: ImageStack, reference_image: Optional[ImageStack] = None, n_processes: Optional[int] = None, *args) → SpotFindingResults¶: Find and measure spots across rounds and channels in the provided ImageStack.

class starfish.spots.FindSpots.LocalMaxPeakFinder(min_distance: int, stringency: int, min_obj_area: int, max_obj_area: int, threshold: Optional[Union[int, float]] = None, measurement_type: str = 'max', min_num_spots_detected: int = 3, is_volume: bool = True, verbose: bool = True, **kwargs)¶

local-max peak finder that wraps skimage.feature.peak_local_max()

Parameters:

min_distanceint: Minimum number of pixels separating peaks in a region of 2 * min_distance + 1 (i.e. peaks are separated by at least min_distance). To find the maximum number of peaks, use min_distance=1.
stringencyint
min_obj_areaint
max_obj_areaint
thresholdOptional[Number]
measurement_typestr, {‘max’, ‘mean’}: default ‘max’ calculates the maximum intensity inside the object
min_num_spots_detectedint: When fewer than this number of spots are detected, spot searching for higher threshold values. (default = 3)
is_volumebool: If True, run the algorithm on 3d volumes of the provided stack. (default = True)
verbosebool: If True, report the percentage completed during processing (default = False)
kwargs: Additional keyword arguments to pass to skimage.feature.peak_local_max()

Notes

skimage.feature.peak_local_max()

Methods

`image_to_spots`(data_image, **kwargs)	measure attributes of spots detected by binarizing the image using the selected threshold
`run`(image_stack[, reference_image, n_processes])	Find spots in the given ImageStack using a local maxima finding algorithm.

image_to_spots(data_image: DataArray, **kwargs) → PerImageSliceSpotResults¶

measure attributes of spots detected by binarizing the image using the selected threshold

Parameters:

data_imagexr.DataArray: image containing spots to be detected
kwargs: Additional keyword arguments to pass to peak_local_max()

Returns:

PerImageSliceSpotResults: includes a SpotAttributes DataFrame of metadata containing the coordinates, intensity and radius of each spot, as well as any extra information collected during spot finding.

run(image_stack: ImageStack, reference_image: Optional[ImageStack] = None, n_processes: Optional[int] = None, *args, **kwargs) → SpotFindingResults¶

Find spots in the given ImageStack using a local maxima finding algorithm. If a reference image is provided the spots will be detected there then measured across all rounds and channels in the corresponding ImageStack. If a reference_image is not provided spots will be detected _independently_ in each channel. This assumes a non-multiplex imaging experiment, as only one (ch, round) will be measured for each spot.

Parameters:

image_stackImageStack: ImageStack where we find the spots in.
reference_imageOptional[ImageStack]: (Optional) a reference image. If provided, spots will be found in this image, and then the locations that correspond to these spots will be measured across each channel.
n_processesOptional[int] = None,: Number of processes to devote to spot finding.

class starfish.spots.FindSpots.TrackpyLocalMaxPeakFinder(spot_diameter, min_mass, max_size, separation, percentile=0, noise_size: Tuple[int, int, int] = (1, 1, 1), smoothing_size=None, threshold=None, preprocess: bool = False, max_iterations: int = 10, measurement_type: str = 'max', is_volume: Optional[bool] = None, verbose=False, radius_is_gyration: bool = False)¶

Find spots using a local max peak finding algorithm

This is a wrapper for trackpy.locate, which implements a version of the Crocker-Grier algorithm.

Parameters:

spot_diameter: odd integer or tuple of odd integers. This may be a single number or a tuple giving the feature’s extent in each dimension, useful when the dimensions do not have equal resolution (e.g. confocal microscopy). The tuple order is the same as the image shape, conventionally (z, y, x) or (y, x). The number(s) must be odd integers. When in doubt, round up.
min_massOptional[float]: The minimum integrated brightness. This is a crucial parameter for eliminating spurious features. Recommended minimum values are 100 for integer images and 1 for float images. Defaults to 0 (no filtering).
max_sizefloat: maximum radius-of-gyration of brightness, default None
separationUnion[float, tuple]: Minimum separation between features. Default is diameter + 1. May be a tuple, see diameter for details.
percentilefloat: Features must have a peak brighter than pixels in this percentile. This helps eliminate spurious peaks. (default = 0)
noise_sizeUnion[float, tuple]: Width of Gaussian blurring kernel, in pixels Default is 1. May be a tuple, see diameter for details.
smoothing_sizeUnion[float, tuple]: The size of the sides of the square kernel used in boxcar (rolling average) smoothing, in pixels Default is diameter. May be a tuple, making the kernel rectangular.
thresholdfloat: Clip bandpass result below this value. Thresholding is done on the already background-subtracted image. By default, 1 for integer images and 1/255 for float images.
measurement_typestr [‘max’, ‘mean’]: name of the function used to calculate the intensity for each identified spot area (default = max)
preprocessboolean: Set to False to turn off bandpass prepossessing.
max_iterationsinteger: Max number of loops to refine the center of mass, (default = 10)
verbosebool: If True, report the percentage completed (default = False) during processing

Notes

Decoding Spots¶

Spot Decoders can be imported using starfish.spots.DecodeSpots, which registers all classes that subclass DecodeSpotsAlgorithm:

from starfish.spots import DecodeSpots

class starfish.spots.DecodeSpots.CheckAll(codebook: Codebook, search_radius: float = 3, error_rounds: int = 0, mode='med', physical_coords=False)¶

Decode spots by generating all possible combinations of neighboring spots to form barcodes given a radius distance that spots may be from each other in order to form a barcode. Then chooses the best set of nonoverlapping spot combinations by choosing the ones with the least spatial variance of their spot coordinates, highest normalized intensity and are also found to be best for multiple spots in the barcode (see algorithm below). Allows for one error correction round (option for more may be added in the future).

Two slightly different algorithms are used to balance the precision (proportion of targets that represent true mRNA molecules) and recall (proportion of true mRNA molecules that are recovered). They share mostly the same steps but two are switched between the different versions. The following is for the “filter-first” version:

1. For each spot in each round, find all neighbors in other rounds that are within the search radius 2. For each spot in each round, build all possible full length barcodes based on the channel labels of the spot’s neighbors and itself 3. Choose the “best” barcode of each spot’s possible barcodes by calculating a score that is based on minimizing the spatial variance and maximizing the intensities of the spots in the barcode. Each spot is assigned a “best” barcode in this way. 4. Drop “best” barcodes that don’t have a matching target in the codebook 5. Only keep barcodes/targets that were found as “best” using at least x of the spots that make each up (x is determined by parameters) 6. Find maximum independent set (approximation) of the spot combinations so no two barcodes use the same spot

The other method (which I’ll call “decode-first”) is the same except steps 3 and 4 are switched so that the minimum scoring barcode is chosen from the set of possible codes that have a match to the codebook. The filter-first method will return fewer decoded targets (lower recall) but has a lower false positive rate (higher precision) while the other method will find more targets (higher recall) but at the cost of an increased false positive rate (lower precision).

Decoding is run in multiple stages with the parameters becoming less strict as it gets into later stages. The high accuracy algorithm (filter-first) is always run first followed by the low accuracy method (decode-first), each with slightly different parameters based on the choice of “mode” parameter. After each decoding, the spots found to be in decoded barcodes are removed from the original set of spots before they are decoded again with a new set of parameters. In order to simplify the number of parameters to choose from, I have sorted them into three sets of presets (“high”, “medium”, or “low” accuracy) determined by the “mode” parameter.

Decoding is also done multiple times at multiple search radius values that start at 0 and increase incrementally until they reach the user-specified search radius. This allows high confidence barcodes to be called first and make things easier when later codes are called.

If error_rounds is set to 1 (currently cannot handle more than 1), after running all decodings for barcodes that exactly match the codebook, another set of decodings will be run to find barcodes that are missing a spot in exactly one round. If the codes in the codebook all have a hamming distance of at least 2 from all other codes, each can still be uniquely identified using a partial code with a single round dropped. Barcodes decoded with a partial code like this are inherently less accurate and so an extra dimension called “rounds_used” was added to the DecodedIntensityTable output that labels each decoded target with the number of rounds that was used to decode it, allowing you to easily separate these less accurate codes from your high accuracy set if you wish

Parameters:

codebookCodebook: Contains codes to decode IntensityTable
search_radiusfloat: Maximum allowed distance (in pixels) that spots in different rounds can be from each other and still be allowed to be combined into a barcode together
error_roundsint: Maximum hamming distance a barcode can be from it’s target in the codebook and still be uniquely identified (i.e. number of error correction rounds in each the experiment)
modestring: One of three preset parmaters sets. Choices are: “low”, “med”, or ‘high’. Low accuracy mode will return more decoded targets but at the cost to accuracy (high recall, low precision) while the high accuracy version will find fewer false postives but also fewer targets overall (high precision, low recall), medium is a balance between the two.
physical_coordsbool: True or False, should decoding using physical distances from the original imagestack that you performed spot finding on? Should be used when distances between z pixels is much greater than distance between x and y pixels.

Methods

run(spots[, n_processes])

Decode spots by finding the set of nonoverlapping barcodes that have the minimum spatial variance within each barcode.

run(spots: SpotFindingResults, n_processes: int = 1, *args) → DecodedIntensityTable¶

Decode spots by finding the set of nonoverlapping barcodes that have the minimum spatial variance within each barcode. :param spots: A Dict of tile indices and their corresponding measured spots :type spots: SpotFindingResults :param n_processes: Number of threads to run decoder in parallel with :type n_processes: int

Returns:: IntensityTable decoded and appended with Features.TARGET values.
Return type:: DecodedIntensityTable

class starfish.spots.DecodeSpots.DecodeSpotsAlgorithm¶

Performs decoding on the spots found, using the codebook specified.

Methods

run

class starfish.spots.DecodeSpots.MetricDistance(codebook: Codebook, max_distance: Union[int, float], min_intensity: Union[int, float], norm_order: int = 2, metric: str = 'euclidean', trace_building_strategy: TraceBuildingStrategies = TraceBuildingStrategies.EXACT_MATCH, anchor_round: int = 1, search_radius: int = 3, return_original_intensities: bool = False)¶

Normalizes both the magnitudes of the codes and the spot intensities, then decodes spots by assigning each spot to the closest code, measured by the provided metric.

Codes greater than max_distance from the nearest code, or dimmer than min_intensity, are discarded.

Parameters:

codebookCodebook: codebook containing targets the experiment was designed to quantify
max_distanceNumber: spots greater than this distance from their nearest target are not decoded
min_intensityNumber: spots dimmer than this intensity are not decoded
metricstr: the metric to use to measure distance. Can be any metric that satisfies the triangle inequality that is implemented by scipy.spatial.distance (default “euclidean”)
norm_orderint: the norm to use to normalize the magnitudes of spots and codes (default 2, L2 norm)
trace_building_strategy: TraceBuildingStrategies: Defines the strategy for building spot traces to decode across rounds and chs of spot finding results.
anchor_roundOptional[int]: Only applicable if trace_building_strategy is TraceBuildingStrategies.NEAREST_NEIGHBORS. The imaging round against which other rounds will be checked for spots in the same approximate pixel location.
search_radiusOptional[int]: Only applicable if trace_building_strategy is TraceBuildingStrategies.NEAREST_NEIGHBORS. Number of pixels over which to search for spots in other rounds and channels.
return_original_intensities: bool: If True returns original intensity values in the DecodedIntensityTable instead of normalized ones (default=False)

Methods

run(spots, *args)

Decode spots by selecting the max-valued channel in each sequencing round

run(spots: SpotFindingResults, *args) → DecodedIntensityTable¶

Decode spots by selecting the max-valued channel in each sequencing round

Parameters:

spotsSpotFindingResults: A Dict of tile indices and their corresponding measured spots

Returns:

DecodedIntensityTable: IntensityTable decoded and appended with Features.TARGET and Features.QUALITY values.

class starfish.spots.DecodeSpots.PerRoundMaxChannel(codebook: Codebook, trace_building_strategy: TraceBuildingStrategies = TraceBuildingStrategies.EXACT_MATCH, anchor_round: Optional[int] = 1, search_radius: Optional[int] = 3)¶

Decode spots by selecting the max-valued channel in each sequencing round.

Note that this assumes that the codebook contains only one “on” channel per sequencing round, a common pattern in experiments that assign one fluorophore to each DNA nucleotide and read DNA sequentially. It is also a characteristic of single-molecule FISH and RNAscope codebooks.

Parameters:

codebookCodebook: Contains codes to decode IntensityTable
trace_building_strategy: TraceBuildingStrategies: Defines the strategy for building spot traces to decode across rounds and chs of spot finding results.
search_radiusOptional[int]: Only applicable if trace_building_strategy is TraceBuildingStrategies.NEAREST_NEIGHBORS. Number of pixels over which to search for spots in other rounds and channels.
anchor_roundOptional[int]: Only applicable if trace_building_strategy is TraceBuildingStrategies.NEAREST_NEIGHBORS. The imaging round against which other rounds will be checked for spots in the same approximate pixel location.

Methods

run(spots, *args)

Decode spots by selecting the max-valued channel in each sequencing round

run(spots: SpotFindingResults, *args) → DecodedIntensityTable¶

Decode spots by selecting the max-valued channel in each sequencing round

Parameters:

spots: SpotFindingResults: A Dict of tile indices and their corresponding measured spots

Returns:

DecodedIntensityTable: IntensityTable decoded and appended with Features.TARGET and Features.QUALITY values.

class starfish.spots.DecodeSpots.SimpleLookupDecoder(codebook: Codebook)¶

Decode spots by assigning the target value of a spot to the corresponding target value of the round/ch it was found in. This method only makes sense to use in non mulitplexed sequential assays where each r/ch pair only has one target assigned to it.

Parameters:

codebookCodebook: Contains codes to decode IntensityTable

Methods

run(spots, *args)

Decode spots by looking up the associated target value for the round and ch each spot is in.

run(spots: SpotFindingResults, *args) → DecodedIntensityTable¶

Decode spots by looking up the associated target value for the round and ch each spot is in.

Parameters:

spots: SpotFindingResults: A Dict of tile indices and their corresponding measured spots

Returns:

DecodedIntensityTable: IntensityTable decoded and appended with Features.TARGET and values.

Target Assignment¶

Target Assignment can be imported using starfish.spots.AssignTargets, which registers all classes that subclass AssignTargetsAlgorithm:

from starfish.spots import AssignTargets

class starfish.spots.AssignTargets.AssignTargetsAlgorithm¶

AssignTargets assigns cell IDs to detected spots using an IntensityTable and SegmentationMaskCollection.

Methods

run(masks, decoded_intensity_table[, ...])

Performs target (e.g.

abstract run(masks: BinaryMaskCollection, decoded_intensity_table: DecodedIntensityTable, verbose: bool = False, in_place: bool = False) → DecodedIntensityTable¶: Performs target (e.g. gene) assignment given the spots and the regions.

class starfish.spots.AssignTargets.Label(**kwargs)¶

Extract cell ids for features in IntensityTable from a set of segmentation masks.

Methods

run(masks, decoded_intensity_table[, ...])

Extract cell ids for features in IntensityTable from a segmentation label image

run(masks: BinaryMaskCollection, decoded_intensity_table: DecodedIntensityTable, verbose: bool = False, in_place: bool = False) → DecodedIntensityTable¶

Extract cell ids for features in IntensityTable from a segmentation label image

Parameters:

masksBinaryMaskCollection: binary masks segmenting each cell
decoded_intensity_tableIntensityTable: spot information
in_placebool: if True, process ImageStack in-place, otherwise return a new stack
verbosebool: if True, report on the percentage completed during processing (default = False)

Returns:

IntensityTable: IntensityTable with added features variable containing cell ids. Points outside of cells will be assigned nan.