Resources
Research and benchmarks
Technical research on synthetic data generation, reproducibility and the Dataset OS concept — including benchmark results and methodology.
Platform benchmarks
Generation throughput
Measured on a standard single-core Node.js process with a 10-field Blueprint including 2 constraints.
Memory usage
Peak RSS measured during generation with the chunk-based streaming architecture (CHUNK_SIZE=10,000).
Target: ≤ 256 MB for 5M+ row runs — verified
Reproducibility verification
Same seed → identical output
Generating the same Blueprint with the same seed on two separate runs produces bit-for-bit identical datasets.
Verified across 42 automated tests
Different seed → different output
Generating the same Blueprint with different seeds produces statistically similar but distinct datasets.
Verified: seed 42 ≠ seed 43
Cross-machine determinism
The Mulberry32 PRNG produces identical output on any machine — no platform-specific floating point behaviour.
Verified: Node.js 22 on Linux/macOS
Test coverage
42
Total tests
18
Distribution tests
5
Determinism tests
8
Constraint tests
4
Mutation tests
7
Engine integration
