Introduction

starfish is a Python library which lets you build scalable pipelines for processing image-based transcriptomics data. This is a work in progress and is being developed in collaboration with users and developers of image-based transcriptomics assays.

What is starfish?

Starfish is a library for counting spots in image data. It transforms potentially-multiplex imaging experiments over panoramic images broken up by microscope field of view into a table of spots (genes, proteins) localized in 3D, and can aggregate those localized spots into a cell x gene table by comparing the physical positions of spots and cells.

To achieve this generality, starfish exposes a set of objects that allow it to work both for discrete assays where each spot represents a molecule, and for assays that build codes across many images. Starfish breaks up processing into fields of view that correspond to the data produced by a microscope at a single location on a microscope slide, and is able to process single fields of view for each of the below assays. To enable this generality across assays, starfish requires data be converted into SpaceTx-Format, a lightweight JSON wrapper around 2-dimensional TIFF images.

Starfish is agnostic to the workflow runner, but does not expose a solution for processing complete experiments – you will need to decide how to orchestrate the processing of multiple fields of view. We made this decision because our users leverage a large variety of computational infrastructures (high performance computing clusters, amazon web services, and google cloud) and workflow engines (snakemake, Nextflow, and Cromwell). As a result, starfish is focusing on ensuring it is feature complete for processing individual fields of view, exposing methods to merge data across fields of view, and has left orchestration across fields of view to the user. Starfish runs on Mac OS X and Linux, and Windows through the Windows Subsystem for Linux.

To validate starfish’s performance, we are working in collaboration with the SpaceTx consortium to reproduce author’s pipelines for each of the following assays.

Assay

Loads Data

Single-FoV Pipeline

Multi-FoV Pipeline

MERFISH

🔜

ISS

🔜

osmFISH

🔲

allen_smFISH

🤞

🔲

BaristaSeq

🤞

🔲

DARTFISH

🤞

🔲

ex-FISH

🔲

🔲

StarMAP

🔲

seq-FISH

🔜

🔲

RNAscope

🔲

Legend:

  • ✅ - Done

  • 🤞 - In Review

  • 🔜 - In Process

  • 🔲 - TODO

  • ❌ - Not supported

To dive into starfish in more detail, please proceed to the getting started section.