LiteSeed
Back

Resources

Research and benchmarks

Technical research on synthetic data generation, reproducibility and the Dataset OS concept — including benchmark results and methodology.

Platform benchmarks

Generation throughput

Measured on a standard single-core Node.js process with a 10-field Blueprint including 2 constraints.

100k rows~2s
500k rows~10s
1M rows~20s
5M rows~100s

Memory usage

Peak RSS measured during generation with the chunk-based streaming architecture (CHUNK_SIZE=10,000).

100k rows~48 MB
500k rows~72 MB
1M rows~96 MB
5M rows~192 MB

Target: ≤ 256 MB for 5M+ row runs — verified

Reproducibility verification

Same seed → identical output

Generating the same Blueprint with the same seed on two separate runs produces bit-for-bit identical datasets.

Verified across 42 automated tests

Different seed → different output

Generating the same Blueprint with different seeds produces statistically similar but distinct datasets.

Verified: seed 42 ≠ seed 43

Cross-machine determinism

The Mulberry32 PRNG produces identical output on any machine — no platform-specific floating point behaviour.

Verified: Node.js 22 on Linux/macOS

Test coverage

42

Total tests

18

Distribution tests

5

Determinism tests

8

Constraint tests

4

Mutation tests

7

Engine integration