Product
Understand dataset quality before training
Measure the properties of generated datasets and identify coverage gaps before they affect model performance.
Why it matters
Invisible issues, visible impact
A dataset that looks correct can still produce weak models. Coverage gaps, class imbalances and constraint violations are invisible without measurement. LiteSeed computes quality metrics for every generated dataset version automatically.
Catch issues early
Identify class imbalances, coverage gaps and constraint violations before training.
Quantified quality
Every dataset version has a quality score derived from violation rate, entropy and balance.
Actionable insights
Recommendations are generated automatically based on dataset metrics.
Core capabilities
Automatic dataset metrics
Every generated dataset version is automatically evaluated for quality metrics — no manual configuration required.
- row_count and constraint_violation_rate
- class_balance and Shannon entropy (when labeling is enabled)
- Distribution summary: min, max, mean, stddev per numeric field
- Value counts for categorical and boolean fields
Dataset version comparison
Compare two versions of the same dataset side by side — metrics, distributions and constraint violations.
- Select any two versions for comparison
- Field-level distribution diff
- Constraint violation rate delta
- Label distribution shift detection
Coverage gap detection
Identify which scenarios are underrepresented in a generated dataset and where additional coverage is needed.
- Rare event detection via distribution summary
- Class imbalance warnings when entropy is low
- Recommendations to increase row count or adjust distributions
Dataset lineage view
Trace every dataset version back to its origin — the seed file, the Blueprint and the run that generated it.
- Visual lineage graph: Seed → Blueprint → Version
- Clickable nodes navigate to source records
- Full version history with timestamps and metrics
