common 模块

Common components shared across HABIT modules.

配置系统 (Configuration System)

HABIT 使用一套基于 YAML 的强类型配置系统。

Configuration utilities for loading, saving, and resolving configurations. This module combines configuration I/O and path resolution capabilities.

habit.core.common.config_loader.load_config(config_path: str, resolve_paths: bool = True) → Dict[str, Any][源代码]

Load configuration file and optionally resolve relative paths.

参数:

config_path (str) -- Path to configuration file, supports YAML and JSON
resolve_paths (bool) -- Whether to resolve relative paths to absolute paths. Defaults to True.

返回:

Configuration dictionary

返回类型:

Dict[str, Any]

抛出:

FileNotFoundError -- If configuration file is not found
ValueError -- If file format is not supported

habit.core.common.config_loader.save_config(config: Dict[str, Any], config_path: str) → None[源代码]

Save configuration to file

参数:

config (Dict[str, Any]) -- Configuration dictionary
config_path (str) -- Path to save configuration file, supports YAML and JSON

抛出:

ValueError -- If file format is not supported

habit.core.common.config_loader.validate_config(config: Dict[str, Any], required_keys: List[str] | None = None) → bool[源代码]

Validate if configuration contains required keys

参数:

config (Dict[str, Any]) -- Configuration dictionary
required_keys (Optional[List[str]]) -- List of required keys

返回:

Whether the configuration is valid

返回类型:

bool

抛出:

ValueError -- If required keys are missing

基类：object

A flexible path resolver for configuration files.

Resolves relative paths in configuration dictionaries to absolute paths, using the configuration file's directory as the base.

base_dir

Base directory for resolving relative paths

Type:: Path

patterns

Patterns for identifying path fields

Type:: Dict

resolved_count

Number of paths resolved in last operation

Type:: int

示例

>>> resolver = PathResolver('/path/to/config.yaml')
>>> resolved_config = resolver.resolve(config_dict)
>>> print(f"Resolved {resolver.resolved_count} paths")

Initialize the PathResolver.

参数:

config_path -- Path to the configuration file (used to determine base_dir)
base_dir -- Explicit base directory for resolving paths (overrides config_path)
extra_suffixes -- Additional suffix patterns to match (e.g., ['_location'])
extra_exact -- Additional exact match patterns (e.g., ['my_path_field'])
custom_patterns -- Complete custom patterns dict to replace defaults

备注

Either config_path or base_dir must be provided.

is_path_field(key: str) → bool[源代码]

Check if a field name represents a path (key-based detection).

参数:: key -- The field name to check
返回:: True if the field is likely a path field

is_path_value(value: str) → bool[源代码]

Check if a string value looks like a path (value-based detection).

Detection strategies: 1. Starts with relative path prefix: ./ .../ .. 2. Ends with common file extension: .yaml, .nii.gz, .csv, etc. 3. Matches path-like pattern: contains path separators in meaningful way

参数:: value -- The string value to check
返回:: True if the value looks like a path

should_resolve(key: str, value: Any) → bool[源代码]

Determine if a key-value pair should have its path resolved.

Combines key-based and value-based detection strategies.

参数:

key -- The field name
value -- The field value

返回:

True if this field should be resolved as a path

resolve_path(path_value: str) → str[源代码]

Resolve a single path value.

参数:: path_value -- The path string to resolve
返回:: Absolute path if the input was relative and exists, otherwise original path

resolve(config: Dict[str, Any], _path_prefix: str = '') → Dict[str, Any][源代码]

Resolve all path fields in a configuration dictionary.

参数:

config -- Configuration dictionary to process
_path_prefix -- Internal use for tracking nested paths

返回:

New dictionary with resolved paths (original dict is not modified)

get_resolved_fields() → List[str][源代码]

Get list of field paths that were resolved.

返回:: List of field path strings (e.g., ['data_dir', 'input.file_path'])

habit.core.common.config_loader.resolve_config_paths(config: Dict[str, Any], config_path: str | Path, extra_patterns: List[str] | None = None, verbose: bool = False) → Dict[str, Any][源代码]

Convenience function to resolve paths in a configuration dictionary.

This is the recommended way to use path resolution in most cases.

参数:

config -- Configuration dictionary to process
config_path -- Path to the configuration file (for determining base directory)
extra_patterns -- Additional patterns for path field detection
verbose -- If True, print information about resolved paths

返回:

New configuration dictionary with resolved paths

示例

>>> config = load_config('demo_data/config.yaml')
>>> config = resolve_config_paths(config, 'demo_data/config.yaml')

habit.core.common.config_loader.load_config_with_paths(config_path: str | Path, extra_patterns: List[str] | None = None, resolve_paths: bool = True) → Dict[str, Any][源代码]

Load a configuration file and optionally resolve relative paths.

This is a convenience function that combines load_config and path resolution.

参数:

config_path -- Path to the configuration file
extra_patterns -- Additional patterns for path field detection
resolve_paths -- Whether to resolve relative paths (default: True)

返回:

Configuration dictionary with resolved paths

示例

>>> config = load_config_with_paths('demo_data/config.yaml')

Configuration validation middleware and utilities.

Provides unified configuration validation and loading across all HABIT modules.

class habit.core.common.config_validator.ConfigValidator[源代码]

基类：object

Unified configuration validator and loader.

Provides a single entry point for loading and validating configurations across all HABIT modules.

static validate_and_load(config_path: str | Path, config_class: Type[ConfigType], resolve_paths: bool = True, strict: bool = True) → ConfigType[源代码]

Load and validate configuration from file.

This is the recommended way to load configurations in HABIT. It provides: - Automatic path resolution - Unified error handling - Type-safe configuration objects

参数:

config_path -- Path to configuration file
config_class -- Configuration class (must inherit from BaseConfig)
resolve_paths -- Whether to resolve relative paths (default: True)
strict -- Whether to raise exceptions on validation errors (default: True)

返回:

Validated configuration instance

抛出:

FileNotFoundError -- If configuration file not found
ConfigValidationError -- If validation fails and strict=True

示例

>>> from habit.core.habitat_analysis.config_schemas import HabitatAnalysisConfig
>>> config = ConfigValidator.validate_and_load(
...     'config.yaml',
...     HabitatAnalysisConfig
... )

static validate_dict(config_dict: Dict[str, Any], config_class: Type[ConfigType], config_path: str | None = None, strict: bool = True) → ConfigType[源代码]

Validate configuration dictionary.

参数:

config_dict -- Configuration dictionary
config_class -- Configuration class
config_path -- Optional path for error reporting
strict -- Whether to raise exceptions on validation errors

返回:

Validated configuration instance

抛出:

ConfigValidationError -- If validation fails and strict=True

static safe_validate(config_dict: Dict[str, Any], config_class: Type[ConfigType], default: ConfigType | None = None) → ConfigType | None[源代码]

Safely validate configuration (returns None on failure instead of raising).

Useful for optional configurations or when you want to handle validation errors gracefully.

参数:

config_dict -- Configuration dictionary
config_class -- Configuration class
default -- Default value to return on validation failure

返回:

Validated configuration instance or default

habit.core.common.config_validator.load_and_validate_config(config_path: str | Path, config_class: Type[ConfigType], resolve_paths: bool = True) → ConfigType[源代码]

Convenience function for loading and validating configurations.

This is a shorthand for ConfigValidator.validate_and_load().

参数:

config_path -- Path to configuration file
config_class -- Configuration class
resolve_paths -- Whether to resolve relative paths

返回:

Validated configuration instance

示例

>>> from habit.core.habitat_analysis.config_schemas import HabitatAnalysisConfig
>>> config = load_and_validate_config('config.yaml', HabitatAnalysisConfig)

Base configuration classes for unified configuration management.

This module provides: 1. BaseConfig: Abstract base class for all configuration schemas 2. ConfigValidationError: Custom exception for configuration validation errors 3. ConfigAccessor: Unified interface for accessing configuration values

exception habit.core.common.config_base.ConfigValidationError(message: str, errors: Dict[str, Any] | None = None, config_path: str | None = None)[源代码]

基类：Exception

Custom exception for configuration validation errors.

Provides detailed information about validation failures.

__init__(message: str, errors: Dict[str, Any] | None = None, config_path: str | None = None)[源代码]

Initialize configuration validation error.

参数:

message -- Error message
errors -- Detailed validation errors from Pydantic
config_path -- Path to the configuration file that failed validation

__str__() → str[源代码]: Format error message with details.

class habit.core.common.config_base.BaseConfig(*, config_file: str | None = None, config_version: str | None = None)[源代码]

基类：BaseModel, ABC

Abstract base class for all configuration schemas in HABIT.

Provides common functionality: - Version tracking - Configuration file path tracking - Validation hooks - Accessor methods

All configuration classes should inherit from this base class.

config_file: str | None

config_version: str | None

class Config[源代码]

基类：object

Pydantic configuration.

extra = 'forbid'

validate_assignment = True

use_enum_values = True

__init__(**data: Any)[源代码]

Initialize configuration with validation.

参数:: **data -- Configuration data
抛出:: ConfigValidationError -- If validation fails

classmethod from_dict(config_dict: Dict[str, Any], config_path: str | None = None) → ConfigType[源代码]

Create configuration instance from dictionary.

参数:

config_dict -- Configuration dictionary
config_path -- Optional path to configuration file (for error reporting)

返回:

Configuration instance

抛出:

ConfigValidationError -- If validation fails

classmethod from_file(config_path: str | Path) → ConfigType[源代码]

Load configuration from file.

参数:

config_path -- Path to configuration file (YAML or JSON)

返回:

Configuration instance

抛出:

FileNotFoundError -- If configuration file not found
ConfigValidationError -- If validation fails

to_dict(exclude_none: bool = False, exclude_unset: bool = False) → Dict[str, Any][源代码]

Convert configuration to dictionary.

参数:

exclude_none -- Whether to exclude None values
exclude_unset -- Whether to exclude unset values

返回:

Configuration dictionary

get(key: str, default: Any | None = None) → Any[源代码]

Get configuration value by key (dictionary-like access).

This method provides backward compatibility with dictionary access patterns. However, direct attribute access (config.field_name) is preferred.

参数:

key -- Configuration key (supports dot notation for nested keys)
default -- Default value if key not found

返回:

Configuration value or default

validate() → bool[源代码]

Validate configuration (re-validate after modifications).

返回:: True if valid
抛出:: ConfigValidationError -- If validation fails

__getitem__(key: str) → Any[源代码]

Dictionary-like access for backward compatibility.

Prefer direct attribute access: config.field_name

__contains__(key: str) → bool[源代码]: Check if configuration contains a key.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class habit.core.common.config_base.ConfigAccessor(config: BaseConfig | Dict[str, Any])[源代码]

基类：object

Unified interface for accessing configuration values.

Provides a consistent API for accessing configuration regardless of whether it's a Pydantic model or a dictionary.

This class helps transition from dictionary-based config access to strongly-typed Pydantic model access.

__init__(config: BaseConfig | Dict[str, Any])[源代码]

Initialize config accessor.

参数:: config -- Configuration object (BaseConfig instance or dict)

get(key: str, default: Any | None = None) → Any[源代码]

Get configuration value by key.

Supports: - Direct attribute access for Pydantic models: config.field_name - Dot notation for nested access: config.section.subsection.field - Dictionary access for backward compatibility

参数:

key -- Configuration key (supports dot notation)
default -- Default value if key not found

返回:

Configuration value or default

has(key: str) → bool[源代码]

Check if configuration contains a key.

参数:: key -- Configuration key (supports dot notation)
返回:: True if key exists

get_section(section_name: str) → BaseConfig | Dict[str, Any] | None[源代码]

Get a configuration section.

参数:: section_name -- Section name (supports dot notation)
返回:: Configuration section or None

property raw_config: BaseConfig | Dict[str, Any]: Get raw configuration object.

to_dict() → Dict[str, Any][源代码]: Convert configuration to dictionary.

数据工具 (Data Utilities)

DataFrame Utilities

Common utility functions for DataFrame operations. Eliminates code duplication across the codebase.

habit.core.common.dataframe_utils.remove_nan_arrays(*arrays: ndarray) → List[ndarray][源代码]

Remove NaN values from multiple arrays simultaneously.

参数:: *arrays -- Variable number of numpy arrays
返回:: List of arrays with NaN rows removed

示例

>>> y_true = np.array([0, 1, np.nan, 1])
>>> y_pred = np.array([0.2, 0.8, 0.5, np.nan])
>>> clean_true, clean_pred = remove_nan_arrays(y_true, y_pred)
>>> len(clean_true)
2

habit.core.common.dataframe_utils.create_prediction_dataframe(y_true: ndarray, y_pred_proba: ndarray, y_pred: ndarray | None = None) → DataFrame[源代码]

Create a DataFrame for prediction data.

参数:

y_true -- True labels array
y_pred_proba -- Predicted probabilities array
y_pred -- Optional predicted labels array

返回:

y_true, y_pred_proba, [y_pred]

返回类型:

DataFrame with columns

示例

>>> df = create_prediction_dataframe(
...     y_true=np.array([0, 1, 0]),
...     y_pred_proba=np.array([0.2, 0.8, 0.3]),
...     y_pred=np.array([0, 1, 0])
... )
>>> df.columns.tolist()
['y_true', 'y_pred_proba', 'y_pred']

Clean prediction data by removing NaN values.

参数:

y_true -- True labels
y_pred_proba -- Predicted probabilities
y_pred -- Optional predicted labels

返回:

Tuple of (y_true_clean, y_pred_proba_clean, y_pred_clean)

示例

>>> y_true = np.array([0, 1, np.nan, 1])
>>> y_pred_proba = np.array([0.2, 0.8, 0.5, np.nan])
>>> clean_true, clean_prob, clean_pred = clean_prediction_data(y_true, y_pred_proba)
>>> len(clean_true)
2

habit.core.common.dataframe_utils.ensure_dataframe(data: DataFrame | ndarray, columns: List[str] | None = None) → DataFrame[源代码]

Ensure input is a DataFrame, converting from numpy if necessary.

参数:

data -- Input data (DataFrame or numpy array)
columns -- Optional column names for numpy arrays

返回:

DataFrame representation of the data

示例

>>> arr = np.array([[1, 2], [3, 4]])
>>> df = ensure_dataframe(arr, columns=['a', 'b'])
>>> isinstance(df, pd.DataFrame)
True

habit.core.common.dataframe_utils.validate_binary_labels(y: ndarray) → None[源代码]

Validate that labels are binary (0 or 1).

参数:: y -- Label array to validate
抛出:: ValueError -- If labels are not binary

示例

>>> validate_binary_labels(np.array([0, 1, 0, 1]))
>>> validate_binary_labels(np.array([0, 1, 2]))
ValueError: Labels must be binary (0 or 1)

habit.core.common.dataframe_utils.validate_probabilities(y_pred_proba: ndarray) → None[源代码]

Validate that predicted probabilities are in valid range [0, 1].

参数:: y_pred_proba -- Predicted probabilities array
抛出:: ValueError -- If probabilities are outside [0, 1] range

示例

>>> validate_probabilities(np.array([0.2, 0.8, 0.5]))
>>> validate_probabilities(np.array([0.2, 1.5, 0.5]))
ValueError: Probabilities must be in range [0, 1]

habit.core.common.dataframe_utils.normalize_probabilities(y_pred_proba: ndarray) → ndarray[源代码]

Normalize probabilities to [0, 1] range using min-max scaling.

参数:: y_pred_proba -- Predicted probabilities array
返回:: Normalized probabilities in [0, 1] range

示例

>>> probs = np.array([0.1, 0.2, 0.3])
>>> norm_probs = normalize_probabilities(probs)
>>> np.all((norm_probs >= 0) & (norm_probs <= 1))
True