Resources

Research

The technical research and academic foundations behind LiteSeed's approach to synthetic data generation.

Research foundations

Grounded in statistical theory and ML research.

LiteSeed's generation engine, quality scoring and reproducibility system are built on established statistical methods and ML research. These references document the theoretical foundations.

Statistical distributions in synthetic data

The theoretical basis for LiteSeed's 8 distribution types — normal, lognormal, gamma, uniform, poisson, categorical, rare_event and mixture — and their application to ML training data generation.

Deterministic pseudo-random number generation

The Mulberry32 PRNG algorithm: design, statistical properties and performance characteristics. Why it was chosen over alternatives for reproducible dataset generation.

Quality scoring for synthetic datasets

The methodology behind LiteSeed's composite Quality Score — constraint compliance, distribution fidelity and coverage completeness — and its relationship to downstream model performance.

Rare event simulation for ML training

The challenge of underrepresented scenarios in real-world training data and the statistical approach to controlled rare event injection.

Privacy-preserving synthetic data generation

The distinction between data anonymisation and synthetic data generation, and why Blueprint-based generation provides stronger privacy guarantees than anonymisation.

Resources: Documentation Technology: Architecture Technology: Reproducibility