Codebook

class starfish.core.codebook.codebook.Codebook(data=<NA>, coords=None, dims=None, name=None, attrs=None, indexes=None, fastpath=False)[source]

Codebook for an image-based transcriptomics experiment

The codebook is a three dimensional tensor with shape (feature, round, channel) whose values are the expected intensity of features (spots or pixels) that correspond to each target (gene or protein) in each of the image tiles of an experiment.

This class supports the construction of synthetic codebooks for testing, and exposes decode methods to assign target identifiers to spots. This codebook provides an in-memory representation of the codebook defined in the SpaceTx format.

The codebook is a subclass of xarray, and exposes the complete public API of that package in addition to the methods and constructors listed below.

Examples

Build a codebook using Codebook.synthetic_one_hot_codebook():

>>> from starfish import Codebook
>>> sd = Codebook.synthetic_one_hot_codebook(n_round=4, n_channel=3, n_codes=2)
>>> sd.codebook()
<xarray.Codebook (target: 2, r: 4, c: 3)>
array([[[1, 0, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]],

       [[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 6d7fff11-8905-4421-ab49-4f6d8ecdb4b7     1f5f7087-0618-49fc-a6a5-82fee14360b3
  * r        (r) int64 0 1 2 3
  * c        (c) int64 0 1 2
Attributes
code_length

return the length of codes in this codebook

Methods

decode_metric(intensities, max_distance, …)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

decode_per_round_max(intensities)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

from_code_array(code_array[, n_round, n_channel])

Construct a codebook from a python list of SpaceTx-Format codewords.

from_numpy(code_names, n_round, n_channel, data)

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array

get_partial(indexers)

Slice the codebook data according to the provided indexing parameters.

open_json(json_codebook[, n_round, n_channel])

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.

synthetic_one_hot_codebook(n_round, …[, …])

Generate codes where one channel is “on” in each imaging round

to_json(filename)

Save a codebook to json using SpaceTx Format.

zeros(code_names, n_round, n_channel)

Create an empty codebook of shape (code_names, n_round, n_channel)

property code_length

return the length of codes in this codebook

Return type

int

decode_metric(intensities, max_distance, min_intensity, norm_order, metric='euclidean', return_original_intensities=False)[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by first normalizing both the codes and the recovered intensities to be unit magnitude using an L2 norm, and then finds the closest code for each feature according to a distance metric (default=euclidean).

Features greater than max_distance from their nearest code, or that have an average intensity below min_intensity are not assigned to any feature.

Parameters
intensitiesIntensityTable

features to be decoded

max_distanceNumber

maximum distance between a feature and its closest code for which the coded target will be assigned.

min_intensityNumber

minimum intensity for a feature to receive a target annotation

norm_orderint

the scipy.linalg norm to apply to normalize codes and intensities

metricstr

the sklearn metric string to pass to NearestNeighbors

return_original_intensities: bool

If True returns original intensity values in the DecodedIntensityTable instead of normalized ones (default=False)

Returns
IntensityTable :

Intensity table containing normalized intensities, target assignments, distances to the nearest code, and the filtering status of each feature.

Notes

The available norms for this function can be found at the following link: numpy.linalg.norm

The available metrics for this function can be found at the following link: Distance computations (scipy.spatial.distance)

Return type

DecodedIntensityTable

decode_per_round_max(intensities)[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by identifying the maximum-intensity channel for each round, and assigning each spot to a code if the maximum-intensity pattern exists in the codebook.

This method is only compatible with one-hot codebooks, where exactly one channel is expected to contain fluorescence in each imaging round. This is a common coding strategy for experiments that read out one DNA base with a distinct fluorophore in each imaging round.

Parameters
intensitiesIntensityTable

features to be decoded

Returns
IntensityTable :

intensity table containing additional data variables for target assignments

Notes

  • If no code matches the per-round maximum for a feature, it will be assigned ‘nan’ instead of a target value

  • Numpy’s argmax breaks ties by picking the first of the tied values – this can lead to unexpected results in low-precision images where some features with “tied” channels will decode, but others will be assigned ‘nan’.

Return type

DecodedIntensityTable

classmethod from_code_array(code_array, n_round=None, n_channel=None)[source]

Construct a codebook from a python list of SpaceTx-Format codewords.

Note: Loading the SpaceTx-Format codebook with json.load() will produce a code array that can be passed to this constructor.

Parameters
code_arrayList[Dict[str, Any]]

Array of dictionaries, each containing a codeword and target

n_roundOptional[int]

The number of imaging rounds used in the codes. Will be inferred if not provided

n_channelOptional[int]

The number of channels used in the codes. Will be inferred if not provided

Returns
Codebook :

Codebook with shape (targets, channels, imaging_rounds)

Examples

Construct a codebook from some array data in python memory

>>> from starfish.types import Axes, Features
>>> from starfish import Codebook
>>> codebook = [
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_human"
>>>     },
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_mouse"
>>>     },
>>> ]
>>> Codebook.from_code_array(codebook)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[0, 0, 0, 1],
        [0, 1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTB_human' 'ACTB_mouse'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3
Return type

Codebook

classmethod from_numpy(code_names, n_round, n_channel, data)[source]

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array

Parameters
code_namesSequence[str]

the targets to be coded

n_roundint

number of imaging rounds used to build the codes

n_channelint

number of channels used to build the codes

datanp.ndarray

array of unit8 values with len(code_names) x n_channel x n_round elements

Returns
Codebook :

codebook with filled values

Examples

Build a 3-round 4-channel codebook where ACTA is specified by intensity in round 0, channel 1, and ACTB is coded by fluorescence in channels 0, 1, and 2 of rounds 0, 1, and 2.

>>> import numpy as np
>>> from starfish import Codebook
>>> data = np.zeros((2,3,4), dtype=np.uint8)
>>> data[0, 0, 1] = 1                 # ACTA
>>> data[[1, 1, 1], [0, 1, 2], [0, 1, 2]] = 1  # ACTB
>>> Codebook.from_numpy(['ACTA', 'ACTB'], n_channel=4, n_round=3, data=data)
<xarray.Codebook (target: 2, r: 3, c: 4)>
array([[[0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTA' 'ACTB'
  * r        (r) int64 0 1 2
  * c        (c) int64 0 1 2 3
Return type

Codebook

get_partial(indexers)[source]

Slice the codebook data according to the provided indexing parameters. Used in a composite codebook scenario.

Parameters
indexersMapping[Axes, Union[int, Sequence]]

A dictionary of dim:index where index is the value, values or range to index the dimension

classmethod open_json(json_codebook, n_round=None, n_channel=None)[source]

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.

Parameters
json_codebookstr

Path or url to json file containing a spaceTx codebook.

n_roundOptional[int]

The number of imaging rounds used in the codes. Will be inferred if not provided.

n_channelOptional[int]

The number of channels used in the codes. Will be inferred if not provided.

Returns
Codebook :

Codebook with shape (targets, channels, imaging_rounds)

Examples

Create a codebook from in-memory data

>>> from starfish.types import Axes, Features
>>> from starfish import Codebook
>>> import tempfile
>>> import json
>>> import os
>>> dir_ = tempfile.mkdtemp()
>>> codebook = [
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_human"
>>>     },
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_mouse"
>>>     },
>>> ]
>>> # make a fake file
>>> json_codebook = os.path.join(dir_, 'codebook.json')
>>> with open(json_codebook, 'w') as f:
>>>     json.dump(codebook, f)
>>> # read codebook from file
>>> Codebook.open_json(json_codebook)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[0, 0, 0, 1],
        [0, 1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTB_human' 'ACTB_mouse'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3
Return type

Codebook

classmethod synthetic_one_hot_codebook(n_round, n_channel, n_codes, target_names=None)[source]

Generate codes where one channel is “on” in each imaging round

Parameters
n_roundint

number of imaging rounds per code

n_channelint

number of channels per code

n_codesint

number of codes to generate

target_namesOptional[List[str]]

if provided, names for targets in codebook

Returns
List[Dict] :

list of codewords

Examples

Create a Codebook with 2 rounds, 3 channels, and 2 codes

>>> from starfish import Codebook
>>> sd = Codebook.synthetic_one_hot_codebook(n_round=4, n_channel=3, n_codes=2)
>>> sd.codebook()
<xarray.Codebook (target: 2, r: 4, c: 3)>
array([[[1, 0, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]],

       [[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 6d7fff11-8905-4421-ab49-4f6d8ecdb4b7         1f5f7087-0618-49fc-a6a5-82fee14360b3
  * r        (r) int64 0 1 2 3
  * c        (c) int64 0 1 2
Return type

Codebook

to_json(filename)[source]

Save a codebook to json using SpaceTx Format.

Parameters
filenameUnion[str, Path]

The name of the file in which to save the codebook.

Return type

None

classmethod zeros(code_names, n_round, n_channel)[source]

Create an empty codebook of shape (code_names, n_round, n_channel)

Parameters
code_namesSequence[str]

The targets to be coded.

n_roundint

Number of imaging rounds used to build the codes.

n_channelint

Number of channels used to build the codes.

Returns
Codebook :

codebook whose values are all zero

Examples

Build an empty 2-round 3-channel codebook:

>>> from starfish import Codebook
>>> Codebook.zeros(['ACTA', 'ACTB'], n_round=2, n_channel=4)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTA' 'ACTB'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3