Resources

Research and benchmarks

Technical research on synthetic data generation, reproducibility and the Dataset OS concept — including benchmark results and methodology.

Platform benchmarks

Measured on a standard single-core Node.js process with a 10-field Blueprint including 2 constraints.

100k rows~2s

500k rows~10s

1M rows~20s

5M rows~100s

Peak RSS measured during generation with the chunk-based streaming architecture (CHUNK_SIZE=10,000).

100k rows~48 MB

500k rows~72 MB

1M rows~96 MB

5M rows~192 MB

Target: ≤ 256 MB for 5M+ row runs — verified

Reproducibility verification

Generating the same Blueprint with the same seed on two separate runs produces bit-for-bit identical datasets.

Verified across 42 automated tests

Generating the same Blueprint with different seeds produces statistically similar but distinct datasets.

Verified: seed 42 ≠ seed 43

The Mulberry32 PRNG produces identical output on any machine — no platform-specific floating point behaviour.

Verified: Node.js 22 on Linux/macOS

Test coverage

Total tests

Distribution tests

Determinism tests

Constraint tests

Mutation tests

Engine integration