Resources

Documentation

Technical reference for the LiteSeed platform — Blueprint schema, API reference, generation engine, export formats and integration guides.

Getting started

Quick start: generate your first dataset

Guide

Create a project, upload a seed file, review the extracted Blueprint and generate your first dataset in under 5 minutes.

Blueprint schema v1 reference

Reference

Complete reference for the Blueprint JSON schema — field types, distribution parameters, constraint types and generation policies.

View

Generation modes: Sandbox vs Training

Concept

When to use Sandbox mode (test data for software development, up to 10k rows) and when to use Training mode (ML/AI training data at scale, up to 1M rows).

Core concepts

Blueprints and versioning

Concept

How Blueprints are structured, versioned and linked to dataset versions. Includes Blueprint hash verification and parent-child lineage.

Statistical distributions

Reference

Reference for all 8 supported distributions: normal, lognormal, gamma, uniform, poisson, categorical, rare_event, mixture.

View

Constraint system

Reference

Hard and soft constraints, the 6 constraint types (formula, range, regex, date_order, not_null, enum_only) and resampling behaviour.

View

Deterministic reproducibility

Concept

How the Mulberry32 PRNG, randomSeed and blueprintHash guarantee bit-for-bit reproducibility across runs.

View

Streaming generation architecture

Architecture

How the chunk-based streaming engine (CHUNK_SIZE=10,000) generates 5M+ row datasets with ≤256 MB peak RAM.

View

Dataset OS: versioning and lineage

Concept

Dataset versioning, the lineage graph (Seed → Blueprint → Versions), experiment tracking and the Dataset OS primitives.

View

Export and integration

Export formats

Reference

CSV, JSONL, Parquet, SQLite, ZIP, SQL dump — format specifications, file structure and download API.

View

Cloud-Assisted Masking

Reference

How PII detection and deterministic masking works, which field types are detected and how to enable or disable the feature.

View

Experiment tracking API

Reference

How to link dataset versions to model training runs, upload model metrics and retrieve dataset recommendations.