MLOps Python Package

This repository contains a Python code base with best practices designed to support your MLOps initiatives.

The package leverages several tools and tips to make your MLOps experience as flexible, robust, productive as possible.

You can use this package as part of your MLOps toolkit or platform (e.g., Model Registry, Experiment Tracking, Realtime Inference, ...).

Related Resources:

MLOps Coding Course (Learning): Learn how to create, develop, and maintain a state-of-the-art MLOps code base.
Cookiecutter MLOps Package (Template): Start building and deploying Python packages and Docker images for MLOps tasks.

MLOps Python Package
Table of Contents
Install
Usage
Tools
Tips
Resources
- Python
- AI/ML/MLOps

Install

This section details the requirements, actions, and next steps to kickstart your MLOps project.

Prerequisites

Python>=3.12: to benefit from the latest features and performance improvements
Poetry>=1.8.2: to initialize the project virtual environment and its dependencies

Installation

Clone this GitHub repository on your computer

# with ssh (recommended)
$ git clone git@github.com:fmind/mlops-python-package
# with https
$ git clone https://github.com/fmind/mlops-python-package

Run the project installation with poetry

$ cd mlops-python-package/
$ poetry install

Adapt the code base to your desire

Next Steps

Going from there, there are dozens of ways to integrate this package to your MLOps platform.

For instance, you can use Databricks or AWS as your compute platform and model registry.

It's up to you to adapt the package code to the solution you target. Good luck champ!

Usage

This section explains how configure the project code and execute it on your system.

Configuration

You can add or edit config files in the confs/ folder to change the program behavior.

# confs/training.yaml
job:
  KIND: TrainingJob
  inputs:
    KIND: ParquetReader
    path: data/inputs_train.parquet
  targets:
    KIND: ParquetReader
    path: data/targets_train.parquet

This config file instructs the program to start a TrainingJob with 2 parameters:

inputs: dataset that contains the model inputs
targets: dataset that contains the model target

You can find all the parameters of your program in the src/[package]/jobs/*.py files.

You can also print the full schema supported by this package using poetry run bikes --schema.

Execution

The project code can be executed with poetry during your development:

$ poetry run [package] confs/tuning.yaml
$ poetry run [package] confs/training.yaml
$ poetry run [package] confs/promotion.yaml
$ poetry run [package] confs/inference.yaml
$ poetry run [package] confs/evaluations.yaml
$ poetry run [package] confs/explanations.yaml

In production, you can build, ship, and run the project as a Python package:

poetry build
poetry publish # optional
python -m pip install [package]
[package] confs/inference.yaml

You can also install and use this package as a library for another AI/ML project:

from [package] import jobs

job = jobs.TrainingJob(...)
with job as runner:
    runner.run()

Additional tips:

You can pass extra configs from the command line using the --extras flag
- Use it to pass runtime values (e.g., a result from previous job executions)
You can pass several config files in the command-line to merge them from left to right
- You can define common configurations shared between jobs (e.g., model params)
The right job task will be selected automatically thanks to Pydantic Discriminated Unions
- This is a great way to run any job supported by the application (training, tuning, ....

Automation

This project includes several automation tasks to easily repeat common actions.

You can invoke the actions from the command-line or VS Code extension.

# execute the project DAG
$ inv projects
# create a code archive
$ inv packages
# list other actions
$ inv --list

Available tasks:

checks.all (checks) - Run all check tasks.
checks.code - Check the codes with ruff.
checks.coverage - Check the coverage with coverage.
checks.format - Check the formats with ruff.
checks.poetry - Check poetry config files.
checks.security - Check the security with bandit.
checks.test - Check the tests with pytest.
checks.type - Check the types with mypy.
cleans.all (cleans) - Run all tools and folders tasks.
cleans.cache - Clean the cache folder.
cleans.coverage - Clean the coverage tool.
cleans.dist - Clean the dist folder.
cleans.docs - Clean the docs folder.
cleans.environment - Clean the project environment file.
cleans.folders - Run all folders tasks.
cleans.mlruns - Clean the mlruns folder.
cleans.mypy - Clean the mypy tool.
cleans.outputs - Clean the outputs folder.
cleans.poetry - Clean poetry lock file.
cleans.pytest - Clean the pytest tool.
cleans.projects - Run all projects tasks.
cleans.python - Clean python caches and bytecodes.
cleans.requirements - Clean the project requirements file.
cleans.reset - Run all tools, folders, and sources tasks.
cleans.ruff - Clean the ruff tool.
cleans.sources - Run all sources tasks.
cleans.tools - Run all tools tasks.
cleans.venv - Clean the venv folder.
commits.all (commits) - Run all commit tasks.
commits.bump - Bump the version of the package.
commits.commit - Commit all changes with a message.
commits.info - Print a guide for messages.
containers.all (containers) - Run all container tasks.
containers.build - Build the container image with the given tag.
containers.compose - Start up docker compose.
containers.run - Run the container image with the given tag.
docs.all (docs) - Run all docs tasks.
docs.api - Document the API with pdoc using the given format and output directory.
docs.serve - Serve the API docs with pdoc using the given format and computer port.
formats.all - (formats) Run all format tasks.
formats.imports - Format python imports with ruff.
formats.sources - Format python sources with ruff.
installs.all (installs) - Run all install tasks.
installs.poetry - Install poetry packages.
installs.pre-commit - Install pre-commit hooks on git.
mlflow.all (mlflow) - Run all mlflow tasks.
mlflow.doctor - Run mlflow doctor to diagnose issues.
mlflow.serve - Start mlflow server with the given host, port, and backend uri.
packages.all (packages) - Run all package tasks.
packages.build - Build a python package with the given format.
projects.all (projects) - Run all project tasks.
projects.environment - Export the project environment file.
projects.requirements - Export the project requirements file.
projects.run - Run an mlflow project from MLproject file.

Workflows

This package supports two GitHub Workflows in .github/workflows:

check.yml: validate the quality of the package on each Pull Request
publish.yml: build and publish the docs and packages on code release.

You can use and extend these workflows to automate repetitive package management tasks.

Tools

This sections motivates the use of developer tools to improve your coding experience.

Automation

Pre-defined actions to automate your project development.

Commits: Commitizen

Motivations:
- Format your code commits
- Generate a standard changelog
- Integrate well with SemVer and PEP 440
Limitations:
- Learning curve for new users
Alternatives:
- Do It Yourself (DIY)

Git Hooks: Pre-Commit

Motivations:
- Check your code locally before a commit
- Avoid wasting resources on your CI/CD
- Can perform extra actions (e.g., file cleanup)
Limitations:
- Add overhead before your commit
Alternatives:
- Git Hooks: less convenient to use

Tasks: PyInvoke

Motivations:
- Automate project workflows
- Sane syntax compared to alternatives
- Good trade-off between power/simplicity
Limitations:
- Not familiar to most developers
Alternatives:
- Make: most popular, but awful syntax

CI/CD

Execution of automated workflows on code push and releases.

Runner: GitHub Actions

Motivations:
- Native on GitHub
- Simple workflow syntax
- Lots of configs if needed
Limitations:
- SaaS Service
Alternatives:
- GitLab: can be installed on-premise

CLI

Integrations with the Command-Line Interface (CLI) of your system.

Parser: Argparse

Motivations:
- Provide CLI arguments
- Included in Python runtime
- Sufficient for providing configs
Limitations:
- More verbose for advanced parsing
Alternatives:
- Typer: code typing for the win
- Fire: simple but no typing
- Click: more verbose

Logging: Loguru

Motivations:
- Show progress to the user
- Work fine out of the box
- Saner logging syntax
Limitations:
- Doesn't let you deviate from the base usage
Alternatives:
- Logging: available by default, but feel dated