mlvern package
Subpackages
- mlvern.core package
- Submodules
- mlvern.core.forge module
ForgeForge.evaluate()Forge.get_dataset_path()Forge.get_dataset_report()Forge.get_project_stats()Forge.get_run()Forge.get_run_artifacts()Forge.get_run_metrics()Forge.get_run_tags()Forge.init()Forge.list_datasets()Forge.list_models()Forge.list_runs()Forge.load_dataset()Forge.load_dataset_by_hash()Forge.load_model()Forge.predict()Forge.prune_datasets()Forge.register_dataset()Forge.register_model()Forge.remove_run()Forge.run()Forge.save_dataset()Forge.tag_run()
- Module contents
ForgeForge.evaluate()Forge.get_dataset_path()Forge.get_dataset_report()Forge.get_project_stats()Forge.get_run()Forge.get_run_artifacts()Forge.get_run_metrics()Forge.get_run_tags()Forge.init()Forge.list_datasets()Forge.list_models()Forge.list_runs()Forge.load_dataset()Forge.load_dataset_by_hash()Forge.load_model()Forge.predict()Forge.prune_datasets()Forge.register_dataset()Forge.register_model()Forge.remove_run()Forge.run()Forge.save_dataset()Forge.tag_run()
- mlvern.data package
- mlvern.utils package
Module contents
- class mlvern.Forge(project: str, base_dir: str = '.')[source]
Bases:
object- evaluate(run_id_or_model: Any, X_test, y_test, output_dir: str | None = None) Dict[str, Any][source]
Evaluate a model and generate evaluation metrics and plots.
- Args:
run_id_or_model: Either a run_id (str) or a model object X_test: Test features y_test: Test labels output_dir: Directory to save evaluation plots
- Returns:
Dict with metrics and paths to generated plots
- get_dataset_path(dataset_hash: str) str[source]
Get the filesystem path for a dataset by hash.
- Args:
dataset_hash: The dataset hash identifier
- Returns:
Absolute path to the dataset directory
- Raises:
ValueError: If dataset hash not found
- get_dataset_report(dataset_hash: str) Dict[str, Any][source]
Get aggregated report for a dataset.
Includes inspection, statistics, risk, and EDA reports.
- Args:
dataset_hash: Hash of the dataset
- Returns:
Aggregated report dict
- get_project_stats() Dict[str, Any][source]
Get overall project statistics.
- Returns:
Dictionary with dataset count, run count, total size, etc.
- get_run(run_id: str) Dict[str, Any][source]
Get run metadata and information by run_id.
- Args:
run_id: The run identifier
- Returns:
Dictionary with run metadata, metrics, and paths
- Raises:
ValueError: If run not found
- get_run_artifacts(run_id: str) Dict[str, str][source]
Get paths to all artifacts for a run (model, config, metrics, etc).
- Args:
run_id: The run identifier
- Returns:
Dictionary mapping artifact names to their filesystem paths
- get_run_metrics(run_id: str) Dict[str, Any][source]
Get metrics for a specific run.
- Args:
run_id: The run identifier
- Returns:
Metrics dictionary
- get_run_tags(run_id: str) Dict[str, Any][source]
Get tags for a specific run.
- Args:
run_id: The run identifier
- Returns:
Dictionary of tags
- load_dataset(dataset_hash: str) Dict[str, Any] | None[source]
Load dataset metadata and paths by hash.
- Args:
dataset_hash: The dataset hash identifier
- Returns:
Dictionary containing dataset info and paths to reports/plots
- Raises:
ValueError: If dataset not found
- load_dataset_by_hash(dataset_hash: str) DataFrame[source]
Load a dataset from storage by hash.
- Args:
dataset_hash: Hash of the dataset
- Returns:
Loaded DataFrame
- Raises:
FileNotFoundError: If dataset not found
- load_model(run_id: str, safe: bool = True) Any[source]
Load a trained model from a run.
- Args:
run_id: The run identifier safe: If True, warn about pickle security risks
- Returns:
The loaded model object
- Raises:
ValueError: If run not found or model not found
- predict(run_id_or_model: Any, X_test) Any[source]
Make predictions using a model from a run or passed model object.
- Args:
run_id_or_model: Either a run_id (str) or a model object X_test: Test data for predictions
- Returns:
Predictions array
- prune_datasets(older_than_days: int = 30, confirm: bool = False) List[str][source]
Remove datasets older than specified number of days.
- Args:
older_than_days: Remove datasets older than this many days confirm: Must be True to perform deletion (safety check)
- Returns:
List of removed dataset hashes
- register_model(model: Any, metadata: Dict[str, Any], model_id: str | None = None) str[source]
Register a model in the model registry.
- Args:
model: The model object (will be saved) metadata: Metadata dict (should include: model_name, source_run_id, description, hyperparameters, etc.) model_id: Optional custom model ID; auto-generated if not provided
- Returns:
The model ID
- remove_run(run_id: str, confirm: bool = False) bool[source]
Remove a run and its artifacts.
- Args:
run_id: The run identifier confirm: Must be True to perform deletion (safety check)
- Returns:
True if removal succeeded, False otherwise
- run(model, X_train, y_train, X_val, y_val, config: dict, dataset_fp)[source]
Train a model and create a run record.
- save_dataset(df: DataFrame, dataset_hash: str, name: str | None = None, tags: Dict[str, Any] | None = None) Dict[str, Any][source]
Save a dataframe to an existing dataset directory.
- Args:
df: DataFrame to save dataset_hash: Hash of the dataset name: Optional friendly name for the dataset tags: Optional tags dict
- Returns:
Dict with save info