habitat_analysis 模块
Habitat Analysis module for HABIT package.
This module provides: - HabitatAnalysis: Main class for habitat clustering analysis - Configuration schemas: HabitatAnalysisConfig, ResultColumns - Analyzer classes: HabitatMapAnalyzer (formerly HabitatFeatureExtractor)
- habit.core.habitat_analysis.get_import_errors()[源代码]
Get dictionary of import errors that occurred during module loading.
- 返回:
Dictionary mapping class names to error messages
- 返回类型:
- habit.core.habitat_analysis.get_available_classes()[源代码]
Get dictionary of successfully imported classes.
- 返回:
Dictionary mapping class names to their classes
- 返回类型:
- habit.core.habitat_analysis.is_class_available(class_name: str) bool[源代码]
Check if a specific class is available.
核心分析类 (Core Analysis)
HabitatAnalysis 是执行生境分析的主要入口类。
配置管理 (Configuration)
这些类定义了生境分析的配置结构,了解它们对于自定义分析流程至关重要。
Configuration Schemas for Habitat Analysis Workflows Uses Pydantic for robust validation and type safety.
- class habit.core.habitat_analysis.config_schemas.HabitatAnalysisConfig(*, config_file: str | None = None, config_version: str | None = None, data_dir: str, out_dir: str, run_mode: Literal['train', 'predict'] = 'train', pipeline_path: str | None = None, FeatureConstruction: FeatureConstructionConfig | None = None, HabitatsSegmention: HabitatsSegmentionConfig | None = None, processes: Annotated[int, Gt(gt=0)] = 2, plot_curves: bool = True, save_images: bool = True, save_results_csv: bool = True, random_state: int = 42, verbose: bool = True, debug: bool = False)[源代码]
基类:
BaseConfigRoot model for the entire habitat analysis configuration.
- FeatureConstruction: FeatureConstructionConfig | None
- HabitatsSegmention: HabitatsSegmentionConfig | None
- validate_mode_dependent_fields()[源代码]
Validate that required fields are present based on run_mode.
In train mode: FeatureConstruction and HabitatsSegmention are required
In predict mode: FeatureConstruction is optional, but HabitatsSegmention.clustering_mode is needed
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.VoxelLevelConfig(*, method: str, params: ~typing.Dict[str, ~typing.Any] = <factory>)[源代码]
基类:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.SupervoxelLevelConfig(*, supervoxel_file_keyword: str = '*_supervoxel.nrrd', method: str = 'mean_voxel_features()', params: ~typing.Dict[str, ~typing.Any] = <factory>)[源代码]
基类:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.PreprocessingMethod(*, method: Literal['winsorize', 'minmax', 'zscore', 'robust', 'log', 'binning', 'variance_filter', 'correlation_filter'], global_normalize: bool = False, winsor_limits: List[float] | None = None, n_bins: int | None = None, bin_strategy: Literal['uniform', 'quantile', 'kmeans'] | None = None, variance_threshold: float | None = None, corr_threshold: float | None = None, corr_method: Literal['pearson', 'spearman', 'kendall'] | None = None)[源代码]
基类:
BaseModel- method: Literal['winsorize', 'minmax', 'zscore', 'robust', 'log', 'binning', 'variance_filter', 'correlation_filter']
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.PreprocessingConfig(*, methods: ~typing.List[~habit.core.habitat_analysis.config_schemas.PreprocessingMethod] = <factory>)[源代码]
基类:
BaseModel- methods: List[PreprocessingMethod]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.FeatureConstructionConfig(*, voxel_level: VoxelLevelConfig, supervoxel_level: SupervoxelLevelConfig | None = None, preprocessing_for_subject_level: PreprocessingConfig | None = None, preprocessing_for_group_level: PreprocessingConfig | None = None)[源代码]
基类:
BaseModel- voxel_level: VoxelLevelConfig
- supervoxel_level: SupervoxelLevelConfig | None
- preprocessing_for_subject_level: PreprocessingConfig | None
- preprocessing_for_group_level: PreprocessingConfig | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.OneStepSettings(*, min_clusters: int = 2, max_clusters: int = 10, fixed_n_clusters: int | None = None, selection_method: Literal['silhouette', 'calinski_harabasz', 'davies_bouldin', 'inertia', 'kneedle'] = 'silhouette', plot_validation_curves: bool = True)[源代码]
基类:
BaseModelSettings for one-step clustering mode (voxel -> habitat directly).
In one-step mode, each subject is clustered independently. You can either: 1. Specify a fixed number of clusters (fixed_n_clusters) 2. Let the algorithm automatically select optimal clusters (min/max_clusters + selection_method)
- selection_method: Literal['silhouette', 'calinski_harabasz', 'davies_bouldin', 'inertia', 'kneedle']
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.ConnectedComponentPostprocessConfig(*, enabled: bool = False, min_component_size: Annotated[int, Ge(ge=1)] = 30, connectivity: Literal[1, 2, 3] = 1, reassign_method: Literal['neighbor_vote'] = 'neighbor_vote', max_iterations: Annotated[int, Ge(ge=1)] = 3)[源代码]
基类:
BaseModelConnected-component post-processing settings for label-map cleanup.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.SupervoxelClusteringConfig(*, algorithm: ~typing.Literal['kmeans', 'gmm', 'slic'] = 'kmeans', n_clusters: int = 50, random_state: int = 42, max_iter: int = 300, n_init: int = 10, compactness: float = 0.1, sigma: float = 0.0, enforce_connectivity: bool = True, one_step_settings: ~habit.core.habitat_analysis.config_schemas.OneStepSettings = <factory>)[源代码]
基类:
BaseModel- one_step_settings: OneStepSettings
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.HabitatClusteringConfig(*, algorithm: Literal['kmeans', 'gmm'] = 'kmeans', max_clusters: int = 10, min_clusters: int | None = 2, habitat_cluster_selection_method: str | List[str] = 'inertia', fixed_n_clusters: int | None = None, random_state: int = 42, max_iter: int = 300, n_init: int = 10)[源代码]
基类:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.HabitatsSegmentionConfig(*, clustering_mode: ~typing.Literal['one_step', 'two_step', 'direct_pooling'] = 'two_step', supervoxel: ~habit.core.habitat_analysis.config_schemas.SupervoxelClusteringConfig = <factory>, habitat: ~habit.core.habitat_analysis.config_schemas.HabitatClusteringConfig = <factory>, postprocess_supervoxel: ~habit.core.habitat_analysis.config_schemas.ConnectedComponentPostprocessConfig = <factory>, postprocess_habitat: ~habit.core.habitat_analysis.config_schemas.ConnectedComponentPostprocessConfig = <factory>)[源代码]
基类:
BaseModel- supervoxel: SupervoxelClusteringConfig
- habitat: HabitatClusteringConfig
- postprocess_supervoxel: ConnectedComponentPostprocessConfig
- postprocess_habitat: ConnectedComponentPostprocessConfig
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.ResultColumns[源代码]
基类:
objectCentralized column name definitions for pipeline outputs.
This avoids magic strings across the codebase and keeps feature/metadata column handling consistent in all pipeline steps and managers.
- SUBJECT = 'Subject'
- SUPERVOXEL = 'Supervoxel'
- COUNT = 'Count'
- HABITATS = 'Habitats'
- ORIGINAL_SUFFIX = '-original'
- class habit.core.habitat_analysis.config_schemas.FeatureExtractionConfig(*, config_file: str | None = None, config_version: str | None = None, params_file_of_non_habitat: str, params_file_of_habitat: str, raw_img_folder: str, habitats_map_folder: str, out_dir: str, n_processes: int = 4, habitat_pattern: str = '*_habitats.nrrd', feature_types: List[str], n_habitats: int | None = None, debug: bool = False)[源代码]
基类:
BaseConfigConfiguration for habitat feature extraction workflow.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.PathsConfig(*, params_file: str, images_folder: str, out_dir: str)[源代码]
基类:
BaseModelPaths configuration for radiomics extraction.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.ProcessingConfig(*, n_processes: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 2, save_every_n_files: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 5, process_image_types: ~typing.List[str] | None = None, target_labels: ~typing.List[int] = <factory>)[源代码]
基类:
BaseModelProcessing configuration for radiomics extraction.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.ExportConfig(*, export_by_image_type: bool = True, export_combined: bool = True, export_format: Literal['csv', 'json', 'pickle'] = 'csv', add_timestamp: bool = True)[源代码]
基类:
BaseModelExport configuration for radiomics extraction.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.LoggingConfig(*, level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] = 'INFO', console_output: bool = True, file_output: bool = True)[源代码]
基类:
BaseModelLogging configuration for radiomics extraction.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class habit.core.habitat_analysis.config_schemas.RadiomicsConfig(*, config_file: str | None = None, config_version: str | None = None, paths: ~habit.core.habitat_analysis.config_schemas.PathsConfig, processing: ~habit.core.habitat_analysis.config_schemas.ProcessingConfig = <factory>, export: ~habit.core.habitat_analysis.config_schemas.ExportConfig = <factory>, logging: ~habit.core.habitat_analysis.config_schemas.LoggingConfig = <factory>, params_file: str | None = None, images_folder: str | None = None, out_dir: str | None = None, n_processes: int | None = None)[源代码]
基类:
BaseConfigConfiguration for traditional radiomics feature extraction.
- paths: PathsConfig
- processing: ProcessingConfig
- export: ExportConfig
- logging: LoggingConfig
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
分析策略 (Analysis Strategies)
不同的策略决定了如何从 ROI 中提取生境特征。
Two-step strategy: voxel -> supervoxel -> habitat clustering. Refactored to use HabitatPipeline with template method pattern.
- class habit.core.habitat_analysis.strategies.two_step_strategy.TwoStepStrategy(analysis: HabitatAnalysis)[源代码]
-
Two-step clustering strategy using HabitatPipeline.
Flow: 1) Voxel feature extraction (Pipeline Step 1) 2) Subject-level preprocessing (Pipeline Step 2) 3) Individual clustering (voxel -> supervoxel) (Pipeline Step 3) 4) Supervoxel feature extraction (conditional) (Pipeline Step 4) 5) Supervoxel feature aggregation (Pipeline Step 5) 6) Combine supervoxels (Pipeline Step 6) - merge all subjects' supervoxels 7) Group-level preprocessing (Pipeline Step 7) 8) Population clustering (supervoxel -> habitat) (Pipeline Step 8)
Note: This strategy supports parallel processing through HabitatPipeline. Use config.processes to control the number of parallel workers for individual-level steps (Steps 1-5). Group-level steps (6-8) process all subjects together.
One-step strategy: voxel -> habitat clustering per subject. Refactored to use HabitatPipeline with template method pattern.
- class habit.core.habitat_analysis.strategies.one_step_strategy.OneStepStrategy(analysis: HabitatAnalysis)[源代码]
-
One-step clustering strategy using HabitatPipeline.
Flow: 1) Voxel feature extraction (Pipeline Step 1) 2) Subject-level preprocessing (Pipeline Step 2) 3) Individual clustering (voxel -> habitat per subject) (Pipeline Step 3) 4) Supervoxel aggregation (Pipeline Step 4) - calculates means per habitat 5) Combine supervoxels (Pipeline Step 5) - merge all subjects' results
Note: This strategy supports parallel processing through HabitatPipeline. Use config.processes to control the number of parallel workers.
Direct pooling strategy: concatenate all voxel features across subjects and cluster once. Refactored to use HabitatPipeline with template method pattern.
- class habit.core.habitat_analysis.strategies.direct_pooling_strategy.DirectPoolingStrategy(analysis: HabitatAnalysis)[源代码]
-
Direct pooling strategy using HabitatPipeline.
## Overview
This strategy pools (concatenates) voxel features from ALL subjects into a single feature matrix before clustering. This enables the discovery of population-level tissue patterns that are representative across the entire cohort.
## Workflow
Voxel feature extraction (Pipeline Step 1) - extract features for each subject
Subject-level preprocessing (Pipeline Step 2) - normalize within each subject
Concatenate all voxels (Pipeline Step 3) - merge all subjects' voxels into one matrix
Group-level preprocessing (Pipeline Step 4) - apply population-level transformations
Population clustering (Pipeline Step 5) - cluster all voxels -> discover habitats
## Why Pool All Voxels?
Rationale: By pooling voxels from all subjects, the clustering algorithm can discover tissue patterns that are consistent and reproducible across the entire population. This approach is particularly effective for: - Discovering common biological phenotypes (e.g., "highly perfused tissue" vs "necrotic tissue") - Identifying dominant habitat patterns shared by multiple subjects - Quickly prototyping and exploring population-level tissue heterogeneity
## About Data Leakage
Important: This strategy is NOT equivalent to label leakage in the traditional machine learning sense. Here's why:
Unsupervised Learning: Habitat discovery is an UNSUPERVISED process (no labels involved)
Feature Space Only: Pooling occurs in the FEATURE space (imaging intensities), not the label space (clinical outcomes)
Pre-modeling Step: Habitat segmentation is performed BEFORE building predictive models
Pipeline Isolation: When used in predictive workflows, the clustering model is fitted on training data only and applied to test data via the saved Pipeline
Analogy: It's similar to performing k-means clustering on pooled MRI intensities to discover tissue types—the clustering doesn't "know" which subjects are diseased vs healthy.
## Use Cases
Recommended for: - Exploratory analysis to discover dominant tissue patterns - Fast prototyping and hypothesis generation - Cohorts with moderate inter-subject variability - Studies focusing on population-level habitat characterization
Not recommended for: - Extremely heterogeneous cohorts where individual differences dominate - Small sample sizes (prefer Two-Step or One-Step strategies) - Studies requiring subject-specific habitat definitions
## Parallel Processing
This strategy supports parallel processing through HabitatPipeline: - config.processes: Controls parallel workers for individual-level steps (Steps 1-2) - Group-level steps (3-5): Process all subjects together (not parallelized)
Base strategy interface for habitat analysis.
- class habit.core.habitat_analysis.strategies.base_strategy.BaseClusteringStrategy(analysis: HabitatAnalysis)[源代码]
基类:
ABCAbstract base class for habitat analysis strategies.
Each strategy should implement run() and return a results DataFrame.
- __init__(analysis: HabitatAnalysis)[源代码]
Initialize the strategy with a HabitatAnalysis instance.
- 参数:
analysis -- HabitatAnalysis instance with shared utilities and configuration
- run(subjects: List[str] | None = None, save_results_csv: bool | None = None, load_from: str | None = None) DataFrame[源代码]
Template method for executing the strategy.
This method defines the algorithm skeleton. Subclasses can override specific steps if needed, but most will only need to implement strategy-specific logic in hooks.
- 参数:
subjects -- List of subjects to process (None means all subjects)
save_results_csv -- Whether to save results to CSV (defaults to config.save_results_csv)
load_from -- Optional path to a saved pipeline. If provided, the pipeline is loaded and only transform() is executed.
- 返回:
Results DataFrame
流程管理器 (Managers)
这些管理器负责协调具体的分析步骤,如特征提取、聚类和结果汇总。
Feature Manager for Habitat Analysis. Handles all feature extraction and preprocessing logic.
- class habit.core.habitat_analysis.managers.feature_manager.FeatureManager(config: HabitatAnalysisConfig, logger: Logger)[源代码]
基类:
objectManages feature extraction and preprocessing for habitat analysis.
- __init__(config: HabitatAnalysisConfig, logger: Logger)[源代码]
Initialize FeatureManager.
- 参数:
config -- Habitat analysis configuration
logger -- Logger instance
- extract_voxel_features(subject: str) Tuple[str, DataFrame, DataFrame, dict][源代码]
Extract voxel-level features for a single subject.
- 参数:
subject -- Subject ID to process
- 返回:
Tuple of (subject_id, feature_df, raw_df, mask_info)
- extract_supervoxel_features(subject: str) Tuple[str, DataFrame | Exception][源代码]
Extract supervoxel-level features from supervoxel maps.
- 参数:
subject -- Subject ID to process
- 返回:
Tuple of (subject_id, features_df or Exception)
- apply_preprocessing(feature_df: DataFrame, level: str) DataFrame[源代码]
Apply preprocessing based on level (user-facing interface).
This method provides a simplified interface for applying preprocessing at different levels.
- 参数:
feature_df -- DataFrame to preprocess
level -- 'subject' for individual level, 'group' for population level
- 返回:
Preprocessed DataFrame
备注
Group-level preprocessing is typically handled by Pipeline steps automatically. This method is primarily used for subject-level preprocessing.
- calculate_supervoxel_means(subject: str, feature_df: DataFrame, raw_df: DataFrame, supervoxel_labels: ndarray, n_clusters_supervoxel: int) DataFrame[源代码]
Calculate supervoxel-level features (aggregated from voxel features).
分析器与提取器 (Analyzers & Extractors)
Habitat Feature Extraction Tool (Refactored Version) This tool provides functionality for extracting features from habitat maps: 1. Radiomic features of raw images within different habitats 2. Radiomic features of habitats within the entire ROI 3. Number of disconnected regions and volume percentage for each habitat 4. MSI (Mutual Spatial Integrity) features from habitat maps 5. ITH (Intratumoral Heterogeneity) scores from habitat maps
- class habit.core.habitat_analysis.analyzers.habitat_analyzer.HabitatMapAnalyzer(params_file_of_non_habitat=None, params_file_of_habitat=None, raw_img_folder=None, habitats_map_folder=None, out_dir=None, n_processes=None, habitat_pattern=None, voxel_cutoff=10)[源代码]
基类:
objectHabitat Map Analyzer Class (Refactored)
This class provides functionality for extracting various features from habitat maps: 1. Radiomic features of raw images within different habitats 2. Radiomic features of habitats within the entire ROI 3. Number of disconnected regions and volume percentage for each habitat 4. MSI (Mutual Spatial Integrity) features from habitat maps 5. ITH (Intratumoral Heterogeneity) index from habitat maps
- __init__(params_file_of_non_habitat=None, params_file_of_habitat=None, raw_img_folder=None, habitats_map_folder=None, out_dir=None, n_processes=None, habitat_pattern=None, voxel_cutoff=10)[源代码]
Initialize the habitat feature extractor
- 参数:
params_file_of_non_habitat -- Parameter file for extracting radiomic features from raw images
params_file_of_habitat -- Parameter file for extracting radiomic features from habitat images
raw_img_folder -- Root directory of raw images
habitats_map_folder -- Root directory of habitat maps
out_dir -- Output directory
n_processes -- Number of processes to use
habitat_pattern -- Pattern for matching habitat files
voxel_cutoff -- Voxel threshold for filtering small regions in MSI feature calculation
- process_subject(subj, images_paths, habitat_paths, mask_paths=None, feature_types=None)[源代码]
Process a single subject for habitat feature extraction
Voxel-level radiomics feature extractor
- class habit.core.habitat_analysis.extractors.voxel_radiomics_extractor.VoxelRadiomicsExtractor(**kwargs)[源代码]
基类:
BaseClusteringExtractorExtract voxel-level radiomics features from image within mask region using PyRadiomics' voxel-based extraction
- __init__(**kwargs)[源代码]
Initialize voxel-level radiomics feature extractor
- 参数:
**kwargs -- Additional parameters
- extract_features(image_data: str | Image, mask_data: str | Image, **kwargs) DataFrame[源代码]
Extract voxel-level radiomics features from image within mask region
- 参数:
image_data -- Path to image file or SimpleITK image object
mask_data -- Path to mask file or SimpleITK mask object
**kwargs -- Additional parameters subj: subject name img_name: Name of the image to append to feature names
- 返回:
Extracted voxel-level radiomics features
- 返回类型:
pd.DataFrame
Supervoxel-level radiomics feature extractor
- class habit.core.habitat_analysis.extractors.supervoxel_radiomics_extractor.SupervoxelRadiomicsExtractor(params_file: str | None = None, **kwargs)[源代码]
基类:
BaseClusteringExtractorExtract radiomics features for each supervoxel in the supervoxel map
- __init__(params_file: str | None = None, **kwargs)[源代码]
Initialize supervoxel radiomics feature extractor
- 参数:
params_file -- Path to PyRadiomics parameter file or YAML string containing parameters
**kwargs -- Additional parameters
- extract_features(image_data: str | Image, supervoxel_map: str | Image, config_file: str | None = None, **kwargs) DataFrame[源代码]
Extract radiomics features for each supervoxel in the supervoxel map
- 参数:
image_data -- Path to image file or SimpleITK image object
supervoxel_map -- Path to supervoxel map file or SimpleITK image object
config_file -- Path to PyRadiomics parameter file (overrides the one in constructor)
**kwargs -- Additional parameters
- 返回:
DataFrame with radiomics features for each supervoxel
- 返回类型:
pd.DataFrame