LiteSeed
Back

Product

Understand dataset quality before training

Measure the properties of generated datasets and identify coverage gaps before they affect model performance.

Why it matters

Invisible issues, visible impact

A dataset that looks correct can still produce weak models. Coverage gaps, class imbalances and constraint violations are invisible without measurement. LiteSeed computes quality metrics for every generated dataset version automatically.

Catch issues early

Identify class imbalances, coverage gaps and constraint violations before training.

Quantified quality

Every dataset version has a quality score derived from violation rate, entropy and balance.

Actionable insights

Recommendations are generated automatically based on dataset metrics.

Core capabilities

Automatic dataset metrics

Every generated dataset version is automatically evaluated for quality metrics — no manual configuration required.

  • row_count and constraint_violation_rate
  • class_balance and Shannon entropy (when labeling is enabled)
  • Distribution summary: min, max, mean, stddev per numeric field
  • Value counts for categorical and boolean fields

Dataset version comparison

Compare two versions of the same dataset side by side — metrics, distributions and constraint violations.

  • Select any two versions for comparison
  • Field-level distribution diff
  • Constraint violation rate delta
  • Label distribution shift detection

Coverage gap detection

Identify which scenarios are underrepresented in a generated dataset and where additional coverage is needed.

  • Rare event detection via distribution summary
  • Class imbalance warnings when entropy is low
  • Recommendations to increase row count or adjust distributions

Dataset lineage view

Trace every dataset version back to its origin — the seed file, the Blueprint and the run that generated it.

  • Visual lineage graph: Seed → Blueprint → Version
  • Clickable nodes navigate to source records
  • Full version history with timestamps and metrics