Product
Trace every dataset to its origin
Understand exactly how a dataset was created — from the original seed file through the Blueprint to every generated version.
Why it matters
Data provenance at every step
When a model behaves unexpectedly, teams need to trace the issue back to the data. LiteSeed records the full lineage of every dataset version — seed, Blueprint, run parameters and generation identity — so teams can always answer: where did this data come from?
Full traceability
Every dataset version links back to the seed file, Blueprint version and run that created it.
Visual lineage graph
Explore the full lineage as an interactive graph — Seed → Blueprint → Dataset Versions.
Audit-ready
Blueprint hash and generation seed are stored with every dataset version for verification.
Core capabilities
Lineage graph
Every dataset has a visual lineage graph that shows the full chain from seed file to dataset versions.
- Nodes: Seed → Blueprint → Dataset Versions
- Clickable nodes navigate to source records
- Version count and row count per node
Blueprint version history
Every Blueprint has a full version history with parent-child relationships. Changes to distributions, constraints or policies create a new Blueprint version.
- Parent Blueprint ID stored with every version
- Blueprint hash for integrity verification
- Full diff between Blueprint versions
Dataset version registry
Every generated dataset version is stored with its full generation identity — Blueprint ID, seed, row count, mode and blueprint hash.
- generationSeed and blueprintHash per version
- Retention policy management
- Version comparison and export
Run history
Every run is a permanent record of a generation event. Runs store the full parameter set and link to the resulting dataset version.
- Run status, progress and completion time
- Download links for all export formats
- Distribution summary in run report
