LiteSeed
Back

Technology

Blueprint Engine

The schema definition and generation engine at the core of LiteSeed — a versioned, constraint-enforced specification for every dataset.

Start Free

What is a Blueprint?

A Blueprint is a versioned dataset specification.

A Blueprint defines the complete structure of a dataset: field names, types, statistical distributions, constraints and generation policies. It is the single source of truth for all generation runs.

Blueprint capabilities

Field types

10+ field types covering all common data structures in ML/AI training datasets.

  • \u2192numeric: integer and float with distribution sampling
  • \u2192categorical: enum values with configurable weights
  • \u2192string: template-based string generation
  • \u2192boolean: configurable true/false probability
  • \u2192date: date range with configurable format
  • \u2192uuid: RFC 4122 UUID generation
  • \u2192computed: derived fields with dependency resolution

Statistical distributions

8 distribution types for precise control over statistical properties.

  • \u2192normal, lognormal, gamma, uniform, poisson
  • \u2192categorical: values + weights
  • \u2192rare_event: base_value + rare_value + probability p
  • \u2192mixture: blend of distributions with weights

Constraint system

A two-tier constraint system that enforces business rules and data validity.

  • \u2192Hard constraints: reject and resample (up to 50 retries)
  • \u2192Soft constraints: flag violations without blocking
  • \u2192Constraint types: formula, range, regex, date_order, not_null, enum_only

Blueprint versioning

Blueprints are versioned documents with parent-child lineage and hash-based provenance.

  • \u2192Immutable Blueprint versions
  • \u2192Parent-child lineage for schema evolution tracking
  • \u2192Blueprint hash computed at generation time
  • \u2192Version diff for field-level change comparison

Related