Resources
Documentation
Technical reference for the LiteSeed platform — Blueprint schema, API reference, generation engine, export formats and integration guides.
Getting started
Quick start: generate your first dataset
GuideCreate a project, upload a seed file, review the extracted Blueprint and generate your first dataset in under 5 minutes.
Blueprint schema v1 reference
ReferenceComplete reference for the Blueprint JSON schema — field types, distribution parameters, constraint types and generation policies.
ViewGeneration modes: Sandbox vs Training
ConceptWhen to use Sandbox mode (test data for software development, up to 10k rows) and when to use Training mode (ML/AI training data at scale, up to 1M rows).
Core concepts
Blueprints and versioning
ConceptHow Blueprints are structured, versioned and linked to dataset versions. Includes Blueprint hash verification and parent-child lineage.
Statistical distributions
ReferenceReference for all 8 supported distributions: normal, lognormal, gamma, uniform, poisson, categorical, rare_event, mixture.
ViewConstraint system
ReferenceHard and soft constraints, the 6 constraint types (formula, range, regex, date_order, not_null, enum_only) and resampling behaviour.
ViewDeterministic reproducibility
ConceptHow the Mulberry32 PRNG, randomSeed and blueprintHash guarantee bit-for-bit reproducibility across runs.
ViewStreaming generation architecture
ArchitectureHow the chunk-based streaming engine (CHUNK_SIZE=10,000) generates 5M+ row datasets with ≤256 MB peak RAM.
ViewDataset OS: versioning and lineage
ConceptDataset versioning, the lineage graph (Seed → Blueprint → Versions), experiment tracking and the Dataset OS primitives.
ViewExport and integration
Export formats
ReferenceCSV, JSONL, Parquet, SQLite, ZIP, SQL dump — format specifications, file structure and download API.
ViewCloud-Assisted Masking
ReferenceHow PII detection and deterministic masking works, which field types are detected and how to enable or disable the feature.
ViewExperiment tracking API
ReferenceHow to link dataset versions to model training runs, upload model metrics and retrieve dataset recommendations.
