Get Started
Specform
Dataset snapshot ledger for reproducible clinical analysis.
Specform is a dataset snapshot ledger for reproducible clinical
analysis.
You work with dataset aliases via a notebook-native object:
DatasetRef.
Core idea: immutable DS (snapshots) + mutable alias pointers (your "current dataset") + a DAO that makes it feel natural.
Install
pip install specformRecommended First Step (Quickstart Showcase)
You do not need to do this --- but it's the fastest way to understand Specform.
Initialize a workspace:
specform initThis creates:
.specform/workspacedemo_brca_03-22-2006.csvspecform_template.ipynb
Open the notebook and run it.
It demonstrates:
- Adding a CSV snapshot to alias
brca - Editing it as a DataFrame
- Checkpointing a new immutable DS
- Viewing alias history
- Exporting a portable data bundle
- Rolling back to version 1
That's the entire mental model in 30 seconds.
Minimal Manual Workflow (No Template)
1) Create a session
from specform import Specform
sf = Specform(home=".", author="krish")2) Add a dataset snapshot (DS)
brca = sf.dataset("brca")
brca.add("data/brca.csv", note="raw export")You just created an immutable Dataset Snapshot (DS) and pointed
alias brca at it.
3) Work with it
df = brca.df()
df.head()
brca.checkpoint(df.dropna(), note="drop NA rows")
brca.history()Mental Model
- DS is immutable (identity = canonical bytes fingerprint)
- Aliases are mutable pointers (your "current dataset")
- DatasetRef is the DAO
- Notes never affect identity --- metadata only
- History is append-only
Nothing silently mutates.
Next
- Deep dive: Datasets (DS + DAO) → /docs/datasets/datasetref
- Portability (export/import/merge/bundles) → /docs/portability/export-import
- CLI (optional) → /docs/cli
- Analysis (AS/ER) → /docs/analysis