ExpressionMatrix

The ExpressionMatrix is a 2-dimensional cells (x) by genes (y) array whose values contain the expression of a gene in a particular cell. The ExpressionMatrix is additionally annotated with the x, y, z pixel coordinates of the centroid of the cell in a pixel space, and xc, yc, zc in physical coordinate space. Additional metadata may be added at the user’s convenience to either the cells or genes.

Data

Gene expression are stored as numeric values, typically as integers in Image-based transcriptomics experiments, since they represent counted fluorescent spots, each corresponding to a single detected RNA molecule.

Metadata

cells; cell_id (int): cell identifier

cells; x, y, z (int): coordinates of cell centroid in pixel space

cells; xc, yc, zc (int): coordinates of cell centroid in global coordinate space (um)

genes; gene_id (int): GENCODE gene ID

genes; gene_name (int): Human-readable gene symbol (e.g. HGNC gene symbol for human data)

Implementation

Starfish Implements the ExpressionMatrix as an xarray.DataArray object to take advantage of xarray’s high performance, flexible metadata storage capabilities, and serialization options

Serialization

The ExpressionMatrix can leverage any of the xarray serialization features, including csv, zarr, and netcdf. We choose netcdf as it currently has the strongest support and interoperability between R and python. Users can load and manipulate ExpressionMatrix using R by loading them with the ncdf4 package. In the future, NetCDF serialization may be deprecated if R gains Zarr support.