
class starfish.core.codebook.codebook.Codebook(data=<NA>, coords=None, dims=None, name=None, attrs=None, indexes=None, fastpath=False)[source]

Codebook for an image-based transcriptomics experiment

The codebook is a three dimensional tensor with shape (feature, round, channel) whose values are the expected intensity of features (spots or pixels) that correspond to each target (gene or protein) in each of the image tiles of an experiment.

This class supports the construction of synthetic codebooks for testing, and exposes decode methods to assign target identifiers to spots. This codebook provides an in-memory representation of the codebook defined in the SpaceTx format.

The codebook is a subclass of xarray, and exposes the complete public API of that package in addition to the methods and constructors listed below.


Build a codebook using Codebook.synthetic_one_hot_codebook():

>>> from starfish import Codebook
>>> sd = Codebook.synthetic_one_hot_codebook(n_round=4, n_channel=3, n_codes=2)
>>> sd.codebook()
<xarray.Codebook (target: 2, r: 4, c: 3)>
array([[[0, 0, 0, 0],
        [0, 0, 1, 1],
        [1, 1, 0, 0]],

       [[1, 0, 0, 0],
        [0, 0, 1, 0],
        [0, 1, 0, 1]]], dtype=uint8)
  * target     (target) object 08b1a822-a1b4-4e06-81ea-8a4bd2b004a9 ...
  * c          (c) int64 0 1 2
  * r          (r) int64 0 1 2 3

return the length of codes in this codebook


decode_metric(self, intensities, …)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

decode_per_round_max(self, intensities)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

from_code_array(code_array, Any], Any]], …)

Construct a codebook from a python list of SpaceTx-Format codewords.

from_numpy(code_names, n_round, n_channel, data)

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array

get_partial(self, indexers, Union[int, …)

Slice the codebook data according to the provided indexing parameters.

open_json(json_codebook, n_round, …)

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.

synthetic_one_hot_codebook(n_round, …)

Generate codes where one channel is “on” in each imaging round

to_json(self, filename, pathlib.Path])

Save a codebook to json using SpaceTx Format.

zeros(code_names, n_round, n_channel)

Create an empty codebook of shape (code_names, n_round, n_channel)

property code_length

return the length of codes in this codebook

Return type


decode_metric(self, intensities:starfish.core.intensity_table.intensity_table.IntensityTable, max_distance:Union[int, float], min_intensity:Union[int, float], norm_order:int, metric:str='euclidean', return_original_intensities:bool=False) → starfish.core.intensity_table.decoded_intensity_table.DecodedIntensityTable[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by first normalizing both the codes and the recovered intensities to be unit magnitude using an L2 norm, and then finds the closest code for each feature according to a distance metric (default=euclidean).

Features greater than max_distance from their nearest code, or that have an average intensity below min_intensity are not assigned to any feature.


features to be decoded


maximum distance between a feature and its closest code for which the coded target will be assigned.


minimum intensity for a feature to receive a target annotation


the scipy.linalg norm to apply to normalize codes and intensities


the sklearn metric string to pass to NearestNeighbors

return_original_intensities: bool

If True returns original intensity values in the DecodedIntensityTable instead of normalized ones (default=False)

IntensityTable :

Intensity table containing normalized intensities, target assignments, distances to the nearest code, and the filtering status of each feature.


The available norms for this function can be found at the following link: The available metrics for this function can be found at the following link:

Return type


decode_per_round_max(self, intensities:starfish.core.intensity_table.intensity_table.IntensityTable) → starfish.core.intensity_table.decoded_intensity_table.DecodedIntensityTable[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by identifying the maximum-intensity channel for each round, and assigning each spot to a code if the maximum-intensity pattern exists in the codebook.

This method is only compatible with one-hot codebooks, where exactly one channel is expected to contain fluorescence in each imaging round. This is a common coding strategy for experiments that read out one DNA base with a distinct fluorophore in each imaging round.


features to be decoded

IntensityTable :

intensity table containing additional data variables for target assignments


  • If no code matches the per-round maximum for a feature, it will be assigned ‘nan’ instead of a target value

  • Numpy’s argmax breaks ties by picking the first of the tied values – this can lead to unexpected results in low-precision images where some features with “tied” channels will decode, but others will be assigned ‘nan’.

Return type


classmethod from_code_array(code_array:List[Dict[Union[str, Any], Any]], n_round:Union[int, NoneType]=None, n_channel:Union[int, NoneType]=None) → 'Codebook'[source]

Construct a codebook from a python list of SpaceTx-Format codewords.

Note: Loading the SpaceTx-Format codebook with json.load() will produce a code array that can be passed to this constructor.

code_arrayList[Dict[str, Any]]

Array of dictionaries, each containing a codeword and target


The number of imaging rounds used in the codes. Will be inferred if not provided


The number of channels used in the codes. Will be inferred if not provided

Codebook :

Codebook with shape (targets, channels, imaging_rounds)


Construct a codebook from some array data in python memory

>>> from starfish.types import Axes
>>> from starfish import Codebook
>>> codebook = [
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_human"
>>>     },
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_mouse"
>>>     },
>>> ]
>>> Codebook.from_code_array(codebook)
<xarray.Codebook (target: 2, c: 4, r: 2)>
array([[[0, 0],
        [0, 0],
        [0, 0],
        [1, 1]],

       [[0, 0],
        [0, 1],
        [0, 0],
        [1, 0]]], dtype=uint8)
  * target     (target) object 'ACTB_human' 'ACTB_mouse'
  * c          (c) int64 0 1 2 3
  * r          (r) int64 0 1

Return type


classmethod from_numpy(code_names:Sequence[str], n_round:int, n_channel:int, data:numpy.ndarray) → 'Codebook'[source]

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array


the targets to be coded


number of imaging rounds used to build the codes


number of channels used to build the codes


array of unit8 values with len(code_names) x n_channel x n_round elements

Codebook :

codebook with filled values


build a 2-round 3-channel codebook where ACTA is specified by intensity in round 0, channel 1, and ACTB is coded by fluorescence in rounds 0 and 1, channel 2

>>> import numpy as np
>>> from starfish import Codebook
>>> data = np.zeros((2, 3, 2), dtype=np.uint8)
>>> data[0, 0, 1] = 1                 # ACTA
>>> data[[1, 1], [2, 2], [0, 1]] = 1  # ACTB
>>> Codebook.from_numpy(['ACTA', 'ACTB'], n_channel=3, n_round=2, data=data)
<xarray.Codebook (target: 2, c: 3, r: 2)>
array([[[0, 1],
        [0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0],
        [1, 1]]], dtype=uint8)
  * target     (target) object 'ACTA' 'ACTB'
  * c          (c) int64 0 1 2
  * r          (r) int64 0 1
Return type


get_partial(self, indexers:Mapping[starfish.core.types._constants.Axes, Union[int, slice, Sequence]])[source]

Slice the codebook data according to the provided indexing parameters. Used in a composite codebook scenario.

indexersMapping[Axes, Union[int, Sequence]]

A dictionary of dim:index where index is the value, values or range to index the dimension

classmethod open_json(json_codebook:str, n_round:Union[int, NoneType]=None, n_channel:Union[int, NoneType]=None) → 'Codebook'[source]

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.


Path or url to json file containing a spaceTx codebook.


The number of imaging rounds used in the codes. Will be inferred if not provided.


The number of channels used in the codes. Will be inferred if not provided.

Codebook :

Codebook with shape (targets, channels, imaging_rounds)


Create a codebook from in-memory data

 >>> from starfish.types import Axes
 >>> from starfish import Codebook
 >>> import tempfile
 >>> import json
 >>> import os
 >>> dir_ = tempfile.mkdtemp()
 >>> codebook = [
 >>>     {
 >>>         Features.CODEWORD: [
 >>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
 >>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
 >>>         ],
 >>>         Features.TARGET: "ACTB_human"
 >>>     },
 >>>     {
 >>>         Features.CODEWORD: [
 >>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
 >>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
 >>>         ],
 >>>         Features.TARGET: "ACTB_mouse"
 >>>     },
 >>> ]
 >>> # make a fake file
 >>> json_codebook = os.path.join(dir_, 'codebook.json')
 >>> with open(json_codebook, 'w') as f:
 >>>     json.dump(codebook, f)
 >>> # read codebook from file
 >>> Codebook.open_json(json_codebook)
<xarray.Codebook (target: 2, c: 4, r: 2)>
 array([[[0, 0],
         [0, 0],
         [0, 0],
         [1, 1]],

        [[0, 0],
         [0, 1],
         [0, 0],
         [1, 0]]], dtype=uint8)
   * target     (target) object 'ACTB_human' 'ACTB_mouse'
   * c          (c) int64 0 1 2 3
   * r          (r) int64 0 1
Return type


classmethod synthetic_one_hot_codebook(n_round:int, n_channel:int, n_codes:int, target_names:Union[Sequence, NoneType]=None) → 'Codebook'[source]

Generate codes where one channel is “on” in each imaging round


number of imaging rounds per code


number of channels per code


number of codes to generate


if provided, names for targets in codebook

List[Dict] :

list of codewords


Create a Codebook with 2 rounds, 3 channels, and 2 codes

>>> from starfish import Codebook
>>> Codebook.synthetic_one_hot_codebook(n_round=2, n_channel=3, n_codes=2)
<xarray.Codebook (target: 2, c: 3, r: 2)>
array([[[0, 1],
        [0, 0],
        [1, 0]],

       [[1, 1],
        [0, 0],
        [0, 0]]], dtype=uint8)
  * target     (target) object b25180dc-8af5-48f1-bff4-b5649683516d ...
  * c          (c) int64 0 1 2
  * h          (h) int64 0 1
Return type


to_json(self, filename:Union[str, pathlib.Path]) → None[source]

Save a codebook to json using SpaceTx Format.

filenameUnion[str, Path]

The name of the file in which to save the codebook.

Return type


classmethod zeros(code_names:Sequence[str], n_round:int, n_channel:int)[source]

Create an empty codebook of shape (code_names, n_round, n_channel)


The targets to be coded.


Number of imaging rounds used to build the codes.


Number of channels used to build the codes.

Codebook :

codebook whose values are all zero


Build an empty 2-round 3-channel codebook:

>>> from starfish import Codebook
>>> Codebook.zeros(['ACTA', 'ACTB'], n_round=2, n_channel=3)
<xarray.Codebook (target: 2, c: 3, r: 2)>
array([[[0, 0],
        [0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0],
        [0, 0]]], dtype=uint8)
  * target     (target) object 'ACTA' 'ACTB'
  * c          (c) int64 0 1 2
  * r          (r) int64 0 1