Changelog
Version 2.0 (2026-01-25)
Metrics Module Major Optimization
Performance Improvements
🚀 8x Speed Boost: Introduced confusion matrix caching via
MetricsCacheclassPreviously: 8 redundant confusion matrix calculations
Now: Single cached calculation shared across all metrics
Impact: ~800% faster for basic metrics computation
Feature Enhancements
💡 Extended Target Metrics Support: Beyond sensitivity/specificity
Added: PPV (Precision), NPV, F1-score, Accuracy
Enables more comprehensive threshold selection criteria
Example:
targets={'sensitivity': 0.91, 'specificity': 0.91, 'ppv': 0.85}
🎯 Fallback Mechanism: Closest threshold finder
Problem solved: No thresholds meet all targets (e.g., targets too high)
Solution: Automatically finds "closest" threshold using distance metrics
Supports: Euclidean, Manhattan, and Max distance
Prevents data leakage: Training set finds threshold, test set applies it
🧠 Intelligent Threshold Selection: Three strategies
First: Fast, returns first qualifying threshold
Youden: Classic, maximizes Sensitivity + Specificity - 1
Pareto+Youden (Recommended): Finds Pareto-optimal thresholds, selects max Youden
Handles multi-objective optimization gracefully
📋 Category-Based Filtering: Compute only needed metrics
Categories: 'basic', 'statistical'
Enables faster computation when full metric suite unnecessary
Example:
calculate_metrics(y_true, y_pred, y_prob, categories=['basic'])
🔮 Multi-class Preparation: Foundation for multi-class classification
AUC: Already supports multi-class (One-vs-Rest)
Basic metrics: Added macro averaging for multi-class
Future: Per-class and weighted strategies
API Changes
calculate_metrics_at_targetNew parameters:
threshold_selection,fallback_to_closest,distance_metricNew return fields:
best_threshold,closest_threshold,combined_resultsEnhanced logging for threshold selection strategy
calculate_metricsNew parameters:
use_cache(default: True),categoriesBackward compatible: All existing calls work unchanged
Technical Debt Resolved
✅ Eliminated redundant confusion matrix calculations (8x performance hit)
✅ Removed hardcoded sensitivity/specificity limitation
✅ Implemented previously unused
categoryparameter✅ Fixed inefficient F1-score calculation (3 CM calculations → 1)
Testing
Added comprehensive test suite:
tests/test_metrics_optimization.pyCoverage: Caching, extended targets, fallback, Pareto selection, categories
All tests passing ✓
Documentation
Added detailed guide: Metrics Module Optimization
Includes: API reference, usage examples, performance comparison
Best practices for training/test threshold management
Backward Compatibility
✅ 100% backward compatible
All existing code works without modification
New features accessible via optional parameters
Known Limitations
PPV/NPV computation: O(n) complexity, slower than sensitivity/specificity
Pareto algorithm: O(n²) worst case (negligible for typical threshold counts)
Multi-class: Basic support, further validation needed
Migration Guide
No migration needed! Existing code continues to work. To use new features:
# Old (still works)
metrics = calculate_metrics(y_true, y_pred, y_prob)
# New (enhanced)
metrics = calculate_metrics(
y_true, y_pred, y_prob,
use_cache=True, # Enable caching
categories=['basic'] # Faster computation
)
result = calculate_metrics_at_target(
y_true, y_prob,
targets={'sensitivity': 0.91, 'specificity': 0.91, 'ppv': 0.85},
threshold_selection='pareto+youden', # Intelligent selection
fallback_to_closest=True # Fallback mechanism
)
Contributors
HABIT Development Team
---
Bug Fixes
Fixed indentation error in
comparison_workflow.py(duplicate code block removal)Fixed tuple unpacking in
_calculate_target_metrics_by_split(L736, L756)Corrected threshold application logic for test sets
Configuration Improvements
Enhanced model name resolution in
ComparisonFileConfigAdded
_ensure_unique_model_namesafeguard inMultifileEvaluatorImproved Pydantic validation for model comparison configurations
Workflow Enhancements
Test sets now always receive target metrics (no longer empty)
Enhanced logging for threshold selection and fallback mechanisms
Proper train→test threshold application (data leakage prevention)
---
Version 1.x
(Previous versions documented elsewhere)
Future Roadmap
Planned for v2.1
GPU acceleration for confusion matrix computation
Parallel Pareto optimization (multi-threading)
Adaptive threshold selection (auto-strategy)
Pareto frontier visualization
Planned for v3.0
Full multi-class classification support
Weighted averaging strategies
Per-class metrics
Multi-label support
Advanced optimization
Bayesian threshold optimization
Cost-sensitive learning integration
Enhanced visualization
Interactive threshold explorer
Real-time metrics dashboard
---
See Also
Metrics Module Optimization - Detailed optimization guide
测试指南 - Testing guidelines
machine_learning 模块 - Machine learning API reference