Resources
Research
The technical research and academic foundations behind LiteSeed's approach to synthetic data generation.
Start FreeResearch foundations
Grounded in statistical theory and ML research.
LiteSeed's generation engine, quality scoring and reproducibility system are built on established statistical methods and ML research. These references document the theoretical foundations.
Statistical distributions in synthetic data
The theoretical basis for LiteSeed's 8 distribution types — normal, lognormal, gamma, uniform, poisson, categorical, rare_event and mixture — and their application to ML training data generation.
Deterministic pseudo-random number generation
The Mulberry32 PRNG algorithm: design, statistical properties and performance characteristics. Why it was chosen over alternatives for reproducible dataset generation.
Quality scoring for synthetic datasets
The methodology behind LiteSeed's composite Quality Score — constraint compliance, distribution fidelity and coverage completeness — and its relationship to downstream model performance.
Rare event simulation for ML training
The challenge of underrepresented scenarios in real-world training data and the statistical approach to controlled rare event injection.
Privacy-preserving synthetic data generation
The distinction between data anonymisation and synthetic data generation, and why Blueprint-based generation provides stronger privacy guarantees than anonymisation.