MLOps Python Package
This repository contains a Python code base with best practices designed to support your MLOps initiatives.
The package leverages several tools and tips to make your MLOps experience as flexible, robust, productive as possible.
You can use this package as part of your MLOps toolkit or platform (e.g., Model Registry, Experiment Tracking, Realtime Inference, ...).
Related Resources:
- MLOps Coding Course (Learning): Learn how to create, develop, and maintain a state-of-the-art MLOps code base.
- Cookiecutter MLOps Package (Template): Start building and deploying Python packages and Docker images for MLOps tasks.
Table of Contents
- MLOps Python Package
- Table of Contents
- Install
- Usage
- Tools
- Tips
- Resources
Install
This section details the requirements, actions, and next steps to kickstart your MLOps project.
Prerequisites
- Python>=3.12: to benefit from the latest features and performance improvements
- Poetry>=1.8.2: to initialize the project virtual environment and its dependencies
Installation
- Clone this GitHub repository on your computer
# with ssh (recommended)
$ git clone git@github.com:fmind/mlops-python-package
# with https
$ git clone https://github.com/fmind/mlops-python-package
$ cd mlops-python-package/
$ poetry install
- Adapt the code base to your desire
Next Steps
Going from there, there are dozens of ways to integrate this package to your MLOps platform.
For instance, you can use Databricks or AWS as your compute platform and model registry.
It's up to you to adapt the package code to the solution you target. Good luck champ!
Usage
This section explains how configure the project code and execute it on your system.
Configuration
You can add or edit config files in the confs/
folder to change the program behavior.
# confs/training.yaml
job:
KIND: TrainingJob
inputs:
KIND: ParquetReader
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets_train.parquet
This config file instructs the program to start a TrainingJob
with 2 parameters:
inputs
: dataset that contains the model inputstargets
: dataset that contains the model target
You can find all the parameters of your program in the src/[package]/jobs/*.py
files.
You can also print the full schema supported by this package using poetry run bikes --schema
.
Execution
The project code can be executed with poetry during your development:
$ poetry run [package] confs/tuning.yaml
$ poetry run [package] confs/training.yaml
$ poetry run [package] confs/promotion.yaml
$ poetry run [package] confs/inference.yaml
$ poetry run [package] confs/evaluations.yaml
$ poetry run [package] confs/explanations.yaml
In production, you can build, ship, and run the project as a Python package:
poetry build
poetry publish # optional
python -m pip install [package]
[package] confs/inference.yaml
You can also install and use this package as a library for another AI/ML project:
from [package] import jobs
job = jobs.TrainingJob(...)
with job as runner:
runner.run()
Additional tips:
- You can pass extra configs from the command line using the
--extras
flag- Use it to pass runtime values (e.g., a result from previous job executions)
- You can pass several config files in the command-line to merge them from left to right
- You can define common configurations shared between jobs (e.g., model params)
- The right job task will be selected automatically thanks to Pydantic Discriminated Unions
- This is a great way to run any job supported by the application (training, tuning, ....
Automation
This project includes several automation tasks to easily repeat common actions.
You can invoke the actions from the command-line or VS Code extension.
# execute the project DAG
$ inv projects
# create a code archive
$ inv packages
# list other actions
$ inv --list
Available tasks:
- checks.all (checks) - Run all check tasks.
- checks.code - Check the codes with ruff.
- checks.coverage - Check the coverage with coverage.
- checks.format - Check the formats with ruff.
- checks.poetry - Check poetry config files.
- checks.security - Check the security with bandit.
- checks.test - Check the tests with pytest.
- checks.type - Check the types with mypy.
- cleans.all (cleans) - Run all tools and folders tasks.
- cleans.cache - Clean the cache folder.
- cleans.coverage - Clean the coverage tool.
- cleans.dist - Clean the dist folder.
- cleans.docs - Clean the docs folder.
- cleans.environment - Clean the project environment file.
- cleans.folders - Run all folders tasks.
- cleans.mlruns - Clean the mlruns folder.
- cleans.mypy - Clean the mypy tool.
- cleans.outputs - Clean the outputs folder.
- cleans.poetry - Clean poetry lock file.
- cleans.pytest - Clean the pytest tool.
- cleans.projects - Run all projects tasks.
- cleans.python - Clean python caches and bytecodes.
- cleans.requirements - Clean the project requirements file.
- cleans.reset - Run all tools, folders, and sources tasks.
- cleans.ruff - Clean the ruff tool.
- cleans.sources - Run all sources tasks.
- cleans.tools - Run all tools tasks.
- cleans.venv - Clean the venv folder.
- commits.all (commits) - Run all commit tasks.
- commits.bump - Bump the version of the package.
- commits.commit - Commit all changes with a message.
- commits.info - Print a guide for messages.
- containers.all (containers) - Run all container tasks.
- containers.build - Build the container image with the given tag.
- containers.compose - Start up docker compose.
- containers.run - Run the container image with the given tag.
- docs.all (docs) - Run all docs tasks.
- docs.api - Document the API with pdoc using the given format and output directory.
- docs.serve - Serve the API docs with pdoc using the given format and computer port.
- formats.all - (formats) Run all format tasks.
- formats.imports - Format python imports with ruff.
- formats.sources - Format python sources with ruff.
- installs.all (installs) - Run all install tasks.
- installs.poetry - Install poetry packages.
- installs.pre-commit - Install pre-commit hooks on git.
- mlflow.all (mlflow) - Run all mlflow tasks.
- mlflow.doctor - Run mlflow doctor to diagnose issues.
- mlflow.serve - Start mlflow server with the given host, port, and backend uri.
- packages.all (packages) - Run all package tasks.
- packages.build - Build a python package with the given format.
- projects.all (projects) - Run all project tasks.
- projects.environment - Export the project environment file.
- projects.requirements - Export the project requirements file.
- projects.run - Run an mlflow project from MLproject file.
Workflows
This package supports two GitHub Workflows in .github/workflows
:
check.yml
: validate the quality of the package on each Pull Requestpublish.yml
: build and publish the docs and packages on code release.
You can use and extend these workflows to automate repetitive package management tasks.
Tools
This sections motivates the use of developer tools to improve your coding experience.
Automation
Pre-defined actions to automate your project development.
Commits: Commitizen
- Motivations:
- Limitations:
- Learning curve for new users
- Alternatives:
- Do It Yourself (DIY)
Git Hooks: Pre-Commit
- Motivations:
- Check your code locally before a commit
- Avoid wasting resources on your CI/CD
- Can perform extra actions (e.g., file cleanup)
- Limitations:
- Add overhead before your commit
- Alternatives:
- Git Hooks: less convenient to use
Tasks: PyInvoke
- Motivations:
- Automate project workflows
- Sane syntax compared to alternatives
- Good trade-off between power/simplicity
- Limitations:
- Not familiar to most developers
- Alternatives:
- Make: most popular, but awful syntax
CI/CD
Execution of automated workflows on code push and releases.
Runner: GitHub Actions
- Motivations:
- Native on GitHub
- Simple workflow syntax
- Lots of configs if needed
- Limitations:
- SaaS Service
- Alternatives:
- GitLab: can be installed on-premise
CLI
Integrations with the Command-Line Interface (CLI) of your system.
Parser: Argparse
- Motivations:
- Provide CLI arguments
- Included in Python runtime
- Sufficient for providing configs
- Limitations:
- More verbose for advanced parsing
- Alternatives:
Logging: Loguru
- Motivations:
- Show progress to the user
- Work fine out of the box
- Saner logging syntax
- Limitations:
- Doesn't let you deviate from the base usage
- Alternatives:
- Logging: available by default, but feel dated
Code
Edition, validation, and versioning of your project source code.
Coverage: Coverage
- Motivations:
- Report code covered by tests
- Identify code path to test
- Show maturity