mlvern.data package
Submodules
mlvern.data.fingerprint module
mlvern.data.inspect module
- class mlvern.data.inspect.DataInspector(df: DataFrame, target: str | None = None, mlvern_dir: str = '.')[source]
Bases:
objectComprehensive data profiling and validation framework.
- safe_numeric_profile(min_rows: int = 2) dict[str, Any][source]
Check if dataset is large enough for numeric profiling.
Returns analysis or explicit skip status.
mlvern.data.register module
mlvern.data.risk_check module
- mlvern.data.risk_check.data_drift(baseline: DataFrame, current: DataFrame, cols: List[str] | None = None) Dict[str, Any][source]
Check drift between baseline and current.
Uses KS for numeric, chi2 for categorical.
- mlvern.data.risk_check.run_risk_checks(df: DataFrame, target: str | None = None, sensitive: List[str] | None = None, baseline: DataFrame | None = None, train: DataFrame | None = None, test: DataFrame | None = None, mlvern_dir: str | None = None) Dict[str, Any][source]
- mlvern.data.risk_check.sampling_bias(baseline: DataFrame, current: DataFrame, cols: List[str] | None = None) Dict[str, Any][source]
Compare categorical distributions using chi-squared test.
- mlvern.data.risk_check.sensitive_attribute_imbalance(df: DataFrame, sensitive_cols: List[str]) Dict[str, Any][source]
mlvern.data.statistics module
- mlvern.data.statistics.compute_statistics(df: DataFrame, target: str | None = None, mlvern_dir: str | None = None) Dict[str, Any][source]
Collect statistics combining multiple functions.
- mlvern.data.statistics.correlations(df: DataFrame, method: str = 'pearson') DataFrame[source]
Compute correlation matrix (pearson or spearman).
- mlvern.data.statistics.dimensionality_signals(df: DataFrame, n_components: int = 5) Dict[str, Any][source]
- mlvern.data.statistics.distribution_shape(df: DataFrame, col: str) Dict[str, Any][source]
Assess approximate distribution shape using skewness and kurtosis.
- mlvern.data.statistics.feature_target_association(df: DataFrame, target: str) Dict[str, Any][source]
- mlvern.data.statistics.hypothesis_test_two_samples(x: Series, y: Series) Dict[str, Any][source]
Perform two-sample t-test (Welch’s t-test).
- mlvern.data.statistics.interaction_patterns(df: DataFrame, target: str | None = None, top_n: int = 10) Dict[str, Any][source]
Detect interaction patterns via pairwise product correlation.
- mlvern.data.statistics.numeric_summary(df: DataFrame, cols: List[str] | None = None) Dict[str, Dict[str, Any]][source]
Compute mean, median, std, skewness for numeric columns.