Technology
System architecture
A modular, streaming-first architecture designed for deterministic generation, quality enforcement and reproducibility at scale.
Start FreeDesign principles
Streaming-first. Deterministic by default. Quality-enforced.
LiteSeed is built around three principles: streaming generation for scale without memory constraints, deterministic seeding for reproducibility, and quality enforcement at the row level.
Streaming-first
No full dataset is ever held in memory. Generation, quality scoring and export all operate on chunks of 10,000 rows.
Deterministic by default
The Mulberry32 PRNG ensures the same seed always produces the same output.
Quality-enforced
Every row is scored and validated before export. Hard constraint violations trigger resampling.
Architecture layers
Blueprint layer
The schema definition layer. Blueprints are versioned JSON documents.
- \u2192Field definitions: type, distribution, constraints
- \u2192Blueprint versioning with parent-child lineage
- \u2192Blueprint hash computed at generation time
- \u2192Blueprint diff for version comparison
Generation layer
The core generation engine. Implements the Mulberry32 PRNG, distribution samplers and constraint enforcement.
- \u2192Mulberry32 PRNG seeded with user-defined integer
- \u21928 distribution types
- \u2192Two-tier constraint system: hard + soft
- \u2192Computed field dependency resolution
Quality layer
Row-level and dataset-level quality scoring, distribution analysis and gap detection.
- \u2192Per-row constraint violation flags
- \u2192Composite Quality Score (0–100)
- \u2192Distribution match against Blueprint specification
- \u2192Gap Analysis with coverage recommendations
Export layer
Streaming export to CSV, JSONL and Parquet formats.
- \u2192CSV: streaming write via Node.js stream
- \u2192JSONL: line-by-line write for LLM fine-tuning
- \u2192Parquet: row-by-row write via parquetjs-lite
- \u2192OpenAI chat format export for fine-tuning