Specform

Keep ML and analysis stable as your data evolves.

Deterministic dataset snapshots for clinical and biotech workflows built on continuously changing upstream data.

pip install specform

quickstart.py
from specform.sdk.specform import Specformsf = Specform(home=".", author="krish")brca = sf.dataset("brca")brca.add("data/brca.csv", note="raw export")df = brca.df()df_qc = df.dropna()brca.checkpoint(df_qc, note="drop NA rows")hist = brca.history()hist
versioncurrentcreated_atauthorrow_countfingerprint_shortactionnotepresence
1false2026-01-15krish10435a91be2caddraw exportpresent
2✓ true2026-01-22krish1021c4f10d9acheckpointdrop NA rowspresent

Clinical workflows are state-centric.

  • “the dataset as of this moment”
  • “what changed since last week”
  • “the analysis I ran with that data”

Notebooks preserve code and outputs, but they do not preserve state identity. Specform models state explicitly so each dataset snapshot and analysis run is addressable, repeatable, and auditable.

Immutable Artifacts

DS, AS, and ER are immutable records with deterministic identifiers.

Aliases, Not Versions

Aliases are views. History is append-only. Identity stays local and explicit.

Notebook-Native SDK

Use .df(), .checkpoint(), .history(), and .use()in place.

The abstraction

Kernel identity stays factual, aliases stay local, and SDK workflows keep trust properties close to the notebook.

DS

Kernel = reality

Immutable dataset snapshots (DS).

Alias

Aliases = perspective

Local pointers with append-only history.

DAO

SDK = trust layer

Notebook-native DAOs that make state feel obvious.

Git-like guarantees without Git workflows.

Deterministic by design

Design property

Immutable artifacts

Data, intent, and receipts never mutate.

Design property

Explicit state

Your ‘current dataset’ is a first-class object.

Design property

Reproducible analysis

Snapshots + intent make runs rerunnable.

Built for state-centric clinical analysis workflows.

Bring deterministic state to your notebooks.