PI #007: Airflow - One Piece of Infra All ML Pipelines Must Have
The Benefits of Using Airflow. How to Integrate Airflow in Your Python Code.
This newsletter aims to give you weekly insights about designing and productionizing ML systems using MLOps good practices ๐ฅ.
This week I will go over the following:
3 key non-obvious benefits of using an orchestration tool such as Airflow for your ML pipeline.
Integrating Airflow with your Python code in a modular and flexible way.
๐ Also, I am thrilled to let you know that I recently started working at Metaphysic, a generative AI company.
๐ Thus, along with my MLE and MLOps content, I will start talking about generative AI from my real-world experience with the field.
3 key non-obvious benefits of using an orchestration tool such as Airflow for your ML pipeline.
These are the 3 key non-obvious benefits of using an orchestration tool such as Airflow for your ML pipeline that will:
- minimize errors
- maximize adding value
#๐. ๐๐ฅ๐ฎ๐ ๐๐ฅ๐ฅ ๐ฒ๐จ๐ฎ๐ซ ๐ฌ๐๐ซ๐ข๐ฉ๐ญ๐ฌ ๐ญ๐จ๐ ๐๐ญ๐ก๐๐ซ
Most of the ML pipelines contain multiple scripts to run. In the example from my diagram, you run 6 scripts (yellow blocks) in a specific order and configuration.
Also, you might have decision points that introduce even more complexityโfor example, running hyperparameter tuning or using the latest best configuration.
Using an orchestration tool (e.g., Airflow), you can easily connect all these components into a single DAG, which can be run with a button press.
Otherwise, you can easily perform the wrong step and lose hours wondering what you did wrong.
#๐. ๐๐๐ฏ๐๐ซ ๐ฎ๐ฌ๐ ๐ญ๐ก๐ ๐ฐ๐ซ๐จ๐ง๐ ๐ฏ๐๐ซ๐ฌ๐ข๐จ๐ง๐ฌ & ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ๐ฌ
As you can see in the diagram, almost every step from your ML pipeline will generate an artifact with its own version:
- dataset
- configuration
- model
- predictions
Most of the steps within the ML pipeline will require as input the versions of previously generated artifacts.
For example, to compute the predictions, you must specify the model version and features you want to use.
By manually setting all these versions, you can quickly introduce bugs.
We are humans, and at some point, you will forget to change or set the wrong version.
#๐. ๐๐๐ฏ๐ ๐ญ๐ข๐ฆ๐
Instead of doing tedious tasks, such as endlessly running various scripts, you can focus on tasks that add real value, such as:
- improving & scaling the solution,
- building new solutions,
- helping other people, etc.
By running your whole logic with a single call, your attention won't be distracted between multiple tasks, and you can easily focus on what matters.
Along with these 3 benefits, orchestrating your ML pipeline, you have the obvious benefits of:
- scheduling jobs
- monitoring the entire pipeline in a single place
- backfilling
To concludeโฆ
By using an orchestration tool, you will:
1. Run your ML pipeline quicker.
2. Manage your versions easier.
3. Save time delegating boring tasks.
Which of these 3 is the most helpful for you?
Integrating Airflow with your Python code in a modular and flexible way.
When I initially learned how to use Airflow, the most challenging part was figuring out how to properly install my Python code inside Airflow without having to:
- copy my entire code inside Airflow
- duplicate code
- create weird dependencies
Basically, without creating spaghetti code.
The best solutions are to use either virtual environments or docker containers.
Let me give you a concrete example of how to quickly do this using Python, Poetry, and venvs.
#๐. ๐๐ฎ๐ข๐ฅ๐ ๐ฒ๐จ๐ฎ๐ซ ๐๐จ๐๐
Using Poetry, you can quickly build your module into a wheel file using a single command:
```
poetry build
```
#๐. ๐๐ฎ๐๐ฅ๐ข๐ฌ๐ก ๐ฒ๐จ๐ฎ๐ซ ๐๐จ๐๐ ๐ญ๐จ ๐ ๐๐ฒ๐๐ข ๐ซ๐๐ฉ๐จ๐ฌ๐ข๐ญ๐จ๐ซ๐ฒ
Again, using Poetry, you can quickly deploy your package to a PyPi repository using the following:
```
poetry publish -r <my-pypi-repository>
```
Note that you can host your own PyPi repository or publish it to the official one.
#๐. ๐๐๐๐ข๐ง๐ ๐ฒ๐จ๐ฎ๐ซ ๐ญ๐๐ฌ๐ค ๐๐ฌ ๐ ๐๐ฒ๐ญ๐ก๐จ๐ง ๐ฏ๐๐ง๐ฏ
Using Airflow, you can define your task to use a venv instead of the systems Python interpreter.
You can do this using a Python decorator:
```
@ task.virtualenv(
requirements=[<your_python_module_1>, <your_python_module_2>, etc.]
...
)
You will specify your newly deployed Python package(s) in the requirements list.
That's it!
Poetry makes the process of building and publishing a Python package instantly simple.
You can also adapt this strategy using Docker. But instead of installing Python packages, you will install containers from a Docker registry.
So...
To deploy your Python code to Airflow, you have to:
- build it using Poetry
- publish it to a PyPi repository using Poetry
- install it inside a venv task
What strategy have you used so far? Leave your thoughts in the comments.
If you want to quickly:
understand how an orchestration tool such as Airflow is used within an ML system,
learn how to implement it with hands-on examples (code + learning materials).
Check out my Unlocking MLOps using Airflow: A Comprehensive Guide to ML System Orchestration article.

I know that you are busy people.
But, if you have more time to dive into the field, the Airflow article shared above is part of The Full Stack 7-Steps MLOps Framework course that will teach you how to design, build, deploy, and monitor an ML system using MLOps good practices.
The course is posted for free on Mediumโs TDI publication and contains the source code + 2.5 hours of reading & video materials.
๐ Check it out โ The Full Stack 7-Steps MLOps Framework
See you next week on Thursday at 9:00 am CET.
Have an awesome weekend!
๐ก My goal is to help machine learning engineers level up in designing and productionizing ML systems. Follow me on LinkedIn and Medium for more insights!
๐ฅ If you enjoy reading articles like this and wish to support my writing, consider becoming a Medium member. Using my referral link, you can support me without extra cost while enjoying limitless access to Medium's rich collection of stories.
Thank you โ๐ผ !