Changelog

Version 2.0 (2026-01-25)

Metrics Module Major Optimization

Performance Improvements

🚀 8x Speed Boost: Introduced confusion matrix caching via MetricsCache class
- Previously: 8 redundant confusion matrix calculations
- Now: Single cached calculation shared across all metrics
- Impact: ~800% faster for basic metrics computation

Feature Enhancements

💡 Extended Target Metrics Support: Beyond sensitivity/specificity
- Added: PPV (Precision), NPV, F1-score, Accuracy
- Enables more comprehensive threshold selection criteria
- Example: targets={'sensitivity': 0.91, 'specificity': 0.91, 'ppv': 0.85}
🎯 Fallback Mechanism: Closest threshold finder
- Problem solved: No thresholds meet all targets (e.g., targets too high)
- Solution: Automatically finds "closest" threshold using distance metrics
- Supports: Euclidean, Manhattan, and Max distance
- Prevents data leakage: Training set finds threshold, test set applies it
🧠 Intelligent Threshold Selection: Three strategies
- First: Fast, returns first qualifying threshold
- Youden: Classic, maximizes Sensitivity + Specificity - 1
- Pareto+Youden (Recommended): Finds Pareto-optimal thresholds, selects max Youden
- Handles multi-objective optimization gracefully
📋 Category-Based Filtering: Compute only needed metrics
- Categories: 'basic', 'statistical'
- Enables faster computation when full metric suite unnecessary
- Example: calculate_metrics(y_true, y_pred, y_prob, categories=['basic'])
🔮 Multi-class Preparation: Foundation for multi-class classification
- AUC: Already supports multi-class (One-vs-Rest)
- Basic metrics: Added macro averaging for multi-class
- Future: Per-class and weighted strategies

API Changes

calculate_metrics_at_target
- New parameters: threshold_selection, fallback_to_closest, distance_metric
- New return fields: best_threshold, closest_threshold, combined_results
- Enhanced logging for threshold selection strategy
calculate_metrics
- New parameters: use_cache (default: True), categories
- Backward compatible: All existing calls work unchanged

Technical Debt Resolved

✅ Eliminated redundant confusion matrix calculations (8x performance hit)
✅ Removed hardcoded sensitivity/specificity limitation
✅ Implemented previously unused category parameter
✅ Fixed inefficient F1-score calculation (3 CM calculations → 1)

Testing

Added comprehensive test suite: tests/test_metrics_optimization.py
Coverage: Caching, extended targets, fallback, Pareto selection, categories
All tests passing ✓

Documentation

Added detailed guide: Metrics Module Optimization
Includes: API reference, usage examples, performance comparison
Best practices for training/test threshold management

Backward Compatibility

✅ 100% backward compatible
All existing code works without modification
New features accessible via optional parameters

Known Limitations

PPV/NPV computation: O(n) complexity, slower than sensitivity/specificity
Pareto algorithm: O(n²) worst case (negligible for typical threshold counts)
Multi-class: Basic support, further validation needed

Migration Guide

No migration needed! Existing code continues to work. To use new features:

# Old (still works)
metrics = calculate_metrics(y_true, y_pred, y_prob)

# New (enhanced)
metrics = calculate_metrics(
    y_true, y_pred, y_prob,
    use_cache=True,           # Enable caching
    categories=['basic']       # Faster computation
)

result = calculate_metrics_at_target(
    y_true, y_prob,
    targets={'sensitivity': 0.91, 'specificity': 0.91, 'ppv': 0.85},
    threshold_selection='pareto+youden',  # Intelligent selection
    fallback_to_closest=True              # Fallback mechanism
)

Contributors

HABIT Development Team

---

Bug Fixes

Fixed indentation error in comparison_workflow.py (duplicate code block removal)
Fixed tuple unpacking in _calculate_target_metrics_by_split (L736, L756)
Corrected threshold application logic for test sets

Configuration Improvements

Enhanced model name resolution in ComparisonFileConfig
Added _ensure_unique_model_name safeguard in MultifileEvaluator
Improved Pydantic validation for model comparison configurations

Workflow Enhancements

Test sets now always receive target metrics (no longer empty)
Enhanced logging for threshold selection and fallback mechanisms
Proper train→test threshold application (data leakage prevention)

---

Version 1.x

(Previous versions documented elsewhere)

Future Roadmap

Planned for v2.1

GPU acceleration for confusion matrix computation
Parallel Pareto optimization (multi-threading)
Adaptive threshold selection (auto-strategy)
Pareto frontier visualization

Planned for v3.0

Full multi-class classification support
- Weighted averaging strategies
- Per-class metrics
- Multi-label support
Advanced optimization
- Bayesian threshold optimization
- Cost-sensitive learning integration
Enhanced visualization
- Interactive threshold explorer
- Real-time metrics dashboard

---