Codebook

class starfish.core.codebook.codebook.Codebook(data: ~typing.Any = <NA>, coords: ~typing.Optional[~typing.Union[~collections.abc.Sequence[collections.abc.Sequence[Any] | pandas.core.indexes.base.Index | xarray.core.dataarray.DataArray], ~collections.abc.Mapping[~typing.Any, ~typing.Any]]] = None, dims: ~typing.Optional[~typing.Union[~collections.abc.Hashable, ~collections.abc.Sequence[~collections.abc.Hashable]]] = None, name: ~typing.Optional[~collections.abc.Hashable] = None, attrs: ~typing.Optional[~collections.abc.Mapping] = None, indexes: ~typing.Optional[~collections.abc.Mapping[~typing.Any, ~xarray.core.indexes.Index]] = None, fastpath: bool = False)[source]

Codebook for an image-based transcriptomics experiment

The codebook is a three dimensional tensor with shape (feature, round, channel) whose values are the expected intensity of features (spots or pixels) that correspond to each target (gene or protein) in each of the image tiles of an experiment.

This class supports the construction of synthetic codebooks for testing, and exposes decode methods to assign target identifiers to spots. This codebook provides an in-memory representation of the codebook defined in the SpaceTx format.

The codebook is a subclass of xarray, and exposes the complete public API of that package in addition to the methods and constructors listed below.

Examples

Build a codebook using Codebook.synthetic_one_hot_codebook():

>>> from starfish import Codebook
>>> sd = Codebook.synthetic_one_hot_codebook(n_round=4, n_channel=3, n_codes=2)
>>> sd.codebook()
<xarray.Codebook (target: 2, r: 4, c: 3)>
array([[[1, 0, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]],

       [[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 6d7fff11-8905-4421-ab49-4f6d8ecdb4b7     1f5f7087-0618-49fc-a6a5-82fee14360b3
  * r        (r) int64 0 1 2 3
  * c        (c) int64 0 1 2
Attributes:
code_length

return the length of codes in this codebook

Methods

decode_metric(intensities, max_distance, ...)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

decode_per_round_max(intensities)

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

from_code_array(code_array[, n_round, n_channel])

Construct a codebook from a python list of SpaceTx-Format codewords.

from_numpy(code_names, n_round, n_channel, data)

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array

get_partial(indexers)

Slice the codebook data according to the provided indexing parameters.

item(*args)

Copy an element of an array to a standard Python scalar and return it.

open_json(json_codebook[, n_round, n_channel])

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.

searchsorted(v[, side, sorter])

Find indices where elements of v should be inserted in a to maintain order.

synthetic_one_hot_codebook(n_round, ...[, ...])

Generate codes where one channel is "on" in each imaging round

to_json(filename)

Save a codebook to json using SpaceTx Format.

zeros(code_names, n_round, n_channel)

Create an empty codebook of shape (code_names, n_round, n_channel)

property code_length: int

return the length of codes in this codebook

decode_metric(intensities: IntensityTable, max_distance: Union[int, float], min_intensity: Union[int, float], norm_order: int, metric: str = 'euclidean', return_original_intensities: bool = False) DecodedIntensityTable[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by first normalizing both the codes and the recovered intensities to be unit magnitude using an L2 norm, and then finds the closest code for each feature according to a distance metric (default=euclidean).

Features greater than max_distance from their nearest code, or that have an average intensity below min_intensity are not assigned to any feature.

Parameters:
intensitiesIntensityTable

features to be decoded

max_distanceNumber

maximum distance between a feature and its closest code for which the coded target will be assigned.

min_intensityNumber

minimum intensity for a feature to receive a target annotation

norm_orderint

the scipy.linalg norm to apply to normalize codes and intensities

metricstr

the sklearn metric string to pass to NearestNeighbors

return_original_intensities: bool

If True returns original intensity values in the DecodedIntensityTable instead of normalized ones (default=False)

Returns:
IntensityTable

Intensity table containing normalized intensities, target assignments, distances to the nearest code, and the filtering status of each feature.

Notes

The available norms for this function can be found at the following link: numpy.linalg.norm

The available metrics for this function can be found at the following link: Distance computations (scipy.spatial.distance)

decode_per_round_max(intensities: IntensityTable) DecodedIntensityTable[source]

Assigns intensity patterns that have been extracted from an ImageStack and stored in an IntensityTable by a SpotFinder to the gene targets that they encode.

This method carries out the assignment by identifying the maximum-intensity channel for each round, and assigning each spot to a code if the maximum-intensity pattern exists in the codebook.

This method is only compatible with one-hot codebooks, where exactly one channel is expected to contain fluorescence in each imaging round. This is a common coding strategy for experiments that read out one DNA base with a distinct fluorophore in each imaging round.

Parameters:
intensitiesIntensityTable

features to be decoded

Returns:
IntensityTable

intensity table containing additional data variables for target assignments

Notes

  • If no code matches the per-round maximum for a feature, it will be assigned ‘nan’ instead of a target value

  • Numpy’s argmax breaks ties by picking the first of the tied values – this can lead to unexpected results in low-precision images where some features with “tied” channels will decode, but others will be assigned ‘nan’.

classmethod from_code_array(code_array: List[Dict[Union[str, Any], Any]], n_round: Optional[int] = None, n_channel: Optional[int] = None) Codebook[source]

Construct a codebook from a python list of SpaceTx-Format codewords.

Note: Loading the SpaceTx-Format codebook with json.load() will produce a code array that can be passed to this constructor.

Parameters:
code_arrayList[Dict[str, Any]]

Array of dictionaries, each containing a codeword and target

n_roundOptional[int]

The number of imaging rounds used in the codes. Will be inferred if not provided

n_channelOptional[int]

The number of channels used in the codes. Will be inferred if not provided

Returns:
Codebook

Codebook with shape (targets, channels, imaging_rounds)

Examples

Construct a codebook from some array data in python memory

>>> from starfish.types import Axes, Features
>>> from starfish import Codebook
>>> codebook = [
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_human"
>>>     },
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_mouse"
>>>     },
>>> ]
>>> Codebook.from_code_array(codebook)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[0, 0, 0, 1],
        [0, 1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTB_human' 'ACTB_mouse'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3
classmethod from_numpy(code_names: Sequence[str], n_round: int, n_channel: int, data: ndarray) Codebook[source]

create a codebook of shape (code_names, n_round, n_channel) from a 3-d numpy array

Parameters:
code_namesSequence[str]

the targets to be coded

n_roundint

number of imaging rounds used to build the codes

n_channelint

number of channels used to build the codes

datanp.ndarray

array of unit8 values with len(code_names) x n_channel x n_round elements

Returns:
Codebook

codebook with filled values

Examples

Build a 3-round 4-channel codebook where ACTA is specified by intensity in round 0, channel 1, and ACTB is coded by fluorescence in channels 0, 1, and 2 of rounds 0, 1, and 2.

>>> import numpy as np
>>> from starfish import Codebook
>>> data = np.zeros((2,3,4), dtype=np.uint8)
>>> data[0, 0, 1] = 1                 # ACTA
>>> data[[1, 1, 1], [0, 1, 2], [0, 1, 2]] = 1  # ACTB
>>> Codebook.from_numpy(['ACTA', 'ACTB'], n_channel=4, n_round=3, data=data)
<xarray.Codebook (target: 2, r: 3, c: 4)>
array([[[0, 1, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTA' 'ACTB'
  * r        (r) int64 0 1 2
  * c        (c) int64 0 1 2 3
get_partial(indexers: Mapping[Axes, Union[int, slice, Sequence]])[source]

Slice the codebook data according to the provided indexing parameters. Used in a composite codebook scenario.

Parameters:
indexersMapping[Axes, Union[int, Sequence]]

A dictionary of dim:index where index is the value, values or range to index the dimension

item(*args)

Copy an element of an array to a standard Python scalar and return it.

Parameters:
*argsArguments (variable number and type)
  • none: in this case, the method only works for arrays with one element (a.size == 1), which element is copied into a standard Python scalar object and returned.

  • int_type: this argument is interpreted as a flat index into the array, specifying which element to copy and return.

  • tuple of int_types: functions as does a single int_type argument, except that the argument is interpreted as an nd-index into the array.

Returns:
zStandard Python scalar object

A copy of the specified element of the array as a suitable Python scalar

Notes

When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar that would not lose information. Void arrays return a buffer object for item(), unless fields are defined, in which case a tuple is returned.

item is very similar to a[args], except, instead of an array scalar, a standard Python scalar is returned. This can be useful for speeding up access to elements of the array and doing arithmetic on elements of the array using Python’s optimized math.

Examples

>>> np.random.seed(123)
>>> x = np.random.randint(9, size=(3, 3))
>>> x
array([[2, 2, 6],
       [1, 3, 6],
       [1, 0, 1]])
>>> x.item(3)
1
>>> x.item(7)
0
>>> x.item((0, 1))
2
>>> x.item((2, 2))
1
classmethod open_json(json_codebook: str, n_round: Optional[int] = None, n_channel: Optional[int] = None) Codebook[source]

Load a codebook from a SpaceTx Format json file or a url pointing to such a file.

Parameters:
json_codebookstr

Path or url to json file containing a spaceTx codebook.

n_roundOptional[int]

The number of imaging rounds used in the codes. Will be inferred if not provided.

n_channelOptional[int]

The number of channels used in the codes. Will be inferred if not provided.

Returns:
Codebook

Codebook with shape (targets, channels, imaging_rounds)

Examples

Create a codebook from in-memory data

>>> from starfish.types import Axes, Features
>>> from starfish import Codebook
>>> import tempfile
>>> import json
>>> import os
>>> dir_ = tempfile.mkdtemp()
>>> codebook = [
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_human"
>>>     },
>>>     {
>>>         Features.CODEWORD: [
>>>             {Axes.ROUND.value: 0, Axes.CH.value: 3, Features.CODE_VALUE: 1},
>>>             {Axes.ROUND.value: 1, Axes.CH.value: 1, Features.CODE_VALUE: 1},
>>>         ],
>>>         Features.TARGET: "ACTB_mouse"
>>>     },
>>> ]
>>> # make a fake file
>>> json_codebook = os.path.join(dir_, 'codebook.json')
>>> with open(json_codebook, 'w') as f:
>>>     json.dump(codebook, f)
>>> # read codebook from file
>>> Codebook.open_json(json_codebook)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 1],
        [0, 0, 0, 1]],

       [[0, 0, 0, 1],
        [0, 1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTB_human' 'ACTB_mouse'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3
searchsorted(v, side='left', sorter=None)

Find indices where elements of v should be inserted in a to maintain order.

For full documentation, see numpy.searchsorted

See also

numpy.searchsorted

equivalent function

classmethod synthetic_one_hot_codebook(n_round: int, n_channel: int, n_codes: int, target_names: Optional[Sequence] = None) Codebook[source]

Generate codes where one channel is “on” in each imaging round

Parameters:
n_roundint

number of imaging rounds per code

n_channelint

number of channels per code

n_codesint

number of codes to generate

target_namesOptional[List[str]]

if provided, names for targets in codebook

Returns:
List[Dict]

list of codewords

Examples

Create a Codebook with 2 rounds, 3 channels, and 2 codes

>>> from starfish import Codebook
>>> sd = Codebook.synthetic_one_hot_codebook(n_round=4, n_channel=3, n_codes=2)
>>> sd.codebook()
<xarray.Codebook (target: 2, r: 4, c: 3)>
array([[[1, 0, 0],
        [0, 0, 1],
        [1, 0, 0],
        [0, 1, 0]],

       [[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 6d7fff11-8905-4421-ab49-4f6d8ecdb4b7         1f5f7087-0618-49fc-a6a5-82fee14360b3
  * r        (r) int64 0 1 2 3
  * c        (c) int64 0 1 2
to_json(filename: Union[str, Path]) None[source]

Save a codebook to json using SpaceTx Format.

Parameters:
filenameUnion[str, Path]

The name of the file in which to save the codebook.

classmethod zeros(code_names: Sequence[str], n_round: int, n_channel: int)[source]

Create an empty codebook of shape (code_names, n_round, n_channel)

Parameters:
code_namesSequence[str]

The targets to be coded.

n_roundint

Number of imaging rounds used to build the codes.

n_channelint

Number of channels used to build the codes.

Returns:
Codebook

codebook whose values are all zero

Examples

Build an empty 2-round 3-channel codebook:

>>> from starfish import Codebook
>>> Codebook.zeros(['ACTA', 'ACTB'], n_round=2, n_channel=4)
<xarray.Codebook (target: 2, r: 2, c: 4)>
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)
Coordinates:
  * target   (target) object 'ACTA' 'ACTB'
  * r        (r) int64 0 1
  * c        (c) int64 0 1 2 3