Keep ML and analysis stable as your data evolves.
Deterministic dataset snapshots for clinical and biotech workflows built on continuously changing upstream data.
pip install specform
from specform.sdk.specform import Specformsf = Specform(home=".", author="krish")brca = sf.dataset("brca")brca.add("data/brca.csv", note="raw export")df = brca.df()df_qc = df.dropna()brca.checkpoint(df_qc, note="drop NA rows")hist = brca.history()hist| version | current | created_at | author | row_count | fingerprint_short | action | note | presence |
|---|---|---|---|---|---|---|---|---|
| 1 | false | 2026-01-15 | krish | 1043 | 5a91be2c | add | raw export | present |
| 2 | ✓ true | 2026-01-22 | krish | 1021 | c4f10d9a | checkpoint | drop NA rows | present |
Clinical workflows are state-centric.
- “the dataset as of this moment”
- “what changed since last week”
- “the analysis I ran with that data”
Notebooks preserve code and outputs, but they do not preserve state identity. Specform models state explicitly so each dataset snapshot and analysis run is addressable, repeatable, and auditable.
Immutable Artifacts
DS, AS, and ER are immutable records with deterministic identifiers.
Aliases, Not Versions
Aliases are views. History is append-only. Identity stays local and explicit.
Notebook-Native SDK
Use .df(), .checkpoint(), .history(), and .use()in place.
The abstraction
Kernel identity stays factual, aliases stay local, and SDK workflows keep trust properties close to the notebook.
Kernel = reality
Immutable dataset snapshots (DS).
Aliases = perspective
Local pointers with append-only history.
SDK = trust layer
Notebook-native DAOs that make state feel obvious.
Deterministic by design
Immutable artifacts
Data, intent, and receipts never mutate.
Explicit state
Your ‘current dataset’ is a first-class object.
Reproducible analysis
Snapshots + intent make runs rerunnable.
Built for state-centric clinical analysis workflows.