Specform

About Specform

Specform is deterministic dataset snapshotting for clinical notebooks—so analysis and ML stay stable as upstream data evolves.

What goes wrong in real workflows

Clinical data doesn’t “version” cleanly. EHR extracts get refreshed, cohort logic shifts, feature pipelines change, and suddenly the question isn’t whether your code ran—it’s whether your result is comparable.

  • Why did this metric move—data drift, logic drift, or both?
  • Which exact dataset did this notebook (and model) use?
  • Can we reproduce last month’s figure exactly—today?
  • When upstream changes, what downstream artifacts should be re-run?

What Specform enforces

Specform turns your datasets into immutable, addressable snapshots—then gives you notebook-native objects to move through time with intent. The result is a workflow where every table, chart, and model can be tied back to a specific dataset state.

Immutable snapshots

Each dataset state is captured as a stable identity. No silent overwrites, no “latest.csv” ambiguity—just reproducible inputs.

Notebook-native handles (DAOs)

In notebooks you work with a single object that resolves “current,” loads data, shows history, and updates pointers safely—without forcing Git semantics onto analysts.

Deterministic provenance

Every result can answer: what data was this run on? and what changed since then?—critical for clinical ML and reporting.

Why this matters for clinical ML

Model performance is only as trustworthy as the dataset state it was trained and evaluated on. Specform makes upstream change explicit, so teams can do clean comparisons across training runs, cohorts, and extraction updates—without hand-rolled conventions.

  • Repeatable training/eval splits tied to immutable dataset states
  • Auditable cohort definitions across refreshes
  • Clear diffs between “same code, different data” outcomes

Design principles

  • Determinism: snapshot identity is derived from canonical data bytes.
  • Immutability: past dataset states never change—only new states are created.
  • Ergonomics: notebooks stay simple; the DAO carries the workflow.
  • Trust: provenance is first-class, not a spreadsheet convention.
  • Portability: snapshot identities travel with the work—across machines and teams.
Read docs