Decoding ML #013: Build a CI/CD Pipeline Using Github Actions & Docker in Just a Few Lines of Code.
Build a CI/CD pipeline using GitHub Actions and Docker. 3 Simple Tricks to Evaluate Your Models with Ease.
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
This week we will cover:
How you can build a CI/CD pipeline using GitHub Actions and Docker in just a few lines of code.
3 Simple tricks to evaluate your models with ease.
Terraform & Kubernetes crash courses.
But first, a little bit of shameless promotion that might be beneficial for both of us ๐
Looking for a hub where to ๐น๐ฒ๐ฎ๐ฟ๐ป ๐ฎ๐ฏ๐ผ๐๐ ๐ ๐ ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐ฎ๐ป๐ฑ ๐ ๐๐ข๐ฝ๐ ๐ณ๐ฟ๐ผ๐บ ๐ฟ๐ฒ๐ฎ๐น-๐๐ผ๐ฟ๐น๐ฑ ๐ฒ๐
๐ฝ๐ฒ๐ฟ๐ถ๐ฒ๐ป๐ฐ๐ฒ?
I just launched my personal site that serves as a hub for all my MLE & MLOps content and work.
There, I will constantly aggregate my:
- courses
- articles
- talks
...and more
โ Sweet part: Everything will revolve around MLE & MLOps
It is still a work in progress...
But please check it out and let me know what you think.
Your opinion is deeply appreciated ๐
โณ ๐ Personal site | MLE & MLOps Hub
#1. How you can build a CI/CD pipeline using GitHub Actions and Docker in just a few lines of code
This is how you can build a CI/CD pipeline using GitHub Actions and Docker in just a few lines of code.
As an ML/MLOps engineer, you should master serving models by building CI/CD pipelines.
The good news is that GitHub Actions + Docker simplifies building a CI/CD pipeline.
.
๐ช๐ต๐?
- you can easily trigger jobs when merging various branches
- the CI/CD jobs run on GitHub's VMs (free)
- easy to implement: copy & paste pre-made templates + adding credentials
.
๐๐ผ๐ฟ ๐ฒ๐
๐ฎ๐บ๐ฝ๐น๐ฒ, ๐๐ต๐ถ๐ ๐ถ๐ ๐ต๐ผ๐ ๐๐ผ๐ ๐ฐ๐ฎ๐ป ๐ฏ๐๐ถ๐น๐ฑ ๐ฎ ๐๐ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ ๐ถ๐ป ๐ฏ ๐๐ถ๐บ๐ฝ๐น๐ฒ ๐๐๐ฒ๐ฝ๐:
#1. The CI pipeline is triggered when you merge your new feature branch into the main branch.
#2. You login into the Docker Registry (or any other compatible registry such as ECR).
#3. You build the image. Run your tests (if you have any), and if the tests pass, you push the image into the registry.
.
๐ง๐ผ ๐ถ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐๐ต๐ฒ๐บ ๐๐๐ถ๐ป๐ด ๐๐ถ๐๐๐๐ฏ ๐๐ฐ๐๐ถ๐ผ๐ป๐, ๐๐ผ๐ ๐ต๐ฎ๐๐ฒ ๐๐ผ:
- Dockerize your code
- search "CI Template GitHub Actions" on Google
- copy-paste the template
- add your Docker Registry credentials
...and bam... you are done.
Easy right? The steps are similar when building your CD pipeline (deploying the new image to production).
If you want to see how I used GitHub Actions to build & deploy an ML system to GCP, check out this article: ๐ Seamless CI/CD Pipelines with GitHub Actions on GCP
#2. 3 Simple tricks to evaluate your models with ease
When comparing 100+ training experiments, I often got overwhelmed by which one to pick.
Until I started using these 3 simple tricks ๐
๐ญ. ๐๐ฎ๐๐ฒ ๐ ๐ผ๐ฑ๐ฒ๐น
You need a reference point to compare your model results with.
Otherwise, the computed metrics are hard to interpret.
-> For example, you are training a time series forecaster.
The base model always predicts the last value. If your "smart" model can't outperform that, you are better of without it.
๐ฎ. ๐ฆ๐น๐ถ๐ฐ๐ถ๐ป๐ด
Aggregated metrics are often misleading (the mean over all the testing samples).
Slicing your testing dataset by features of interest such as gender, age, demographics, etc., can bring to the surface issues such as:
- bias
- weakness points
- relationships between inputs & outputs (aka explainability), etc.
-> For example, your model can have extraordinary results in the [18, 30] age range but terrible ones in [30+, inf].
The aggregated metrics look great because most data samples are within the [18, 30] range. But in reality, your model fails the minority groups.
Thus, even though the aggregated metrics look great, you may deploy a broken model.
Tools: Snorkel
๐ฏ. ๐๐
๐ฝ๐ฒ๐ฟ๐ถ๐บ๐ฒ๐ป๐ ๐ง๐ฟ๐ฎ๐ฐ๐ธ๐ฒ๐ฟ
You just run 100+ experiments using different models and hyperparameters.
You already have your base model and slicing techniques set in place.
How can you easily compare these experiments?
You can quickly aggregate the results using an experiment tracker in a single graph(s).
Thus, you have the big picture to pick the best experiment and its metadata easily.
Tools: Comet ML, W&B, MLFlow, Neptune
To conclude...
To quickly compare many experiments, you need:
- a base model
- to slice your testing split
- an experiment tracker
If you are curious about implementing these strategies, check out my article: ๐ A Guide to Building Effective Training Pipelines for Maximum Results
Do you recommend other tricks to improve your evaluation process?
#3. Terraform & Kubernetes crash courses
Courses for 2 tools that any MLOps engineer should master.
I finished them recently & I had to share them with you.
๐ญ. ๐ง๐ฒ๐ฟ๐ฟ๐ฎ๐ณ๐ผ๐ฟ๐บ
The most popular tool for Infrastructure as Code.
It lets you spin up & down entire complex cloud infrastructures with a single command.
โณ ๐ Introduction to Terraform course
๐ฎ. ๐๐๐ฏ๐ฒ๐ฟ๐ป๐ฒ๐๐ฒ๐
Probably all of you know about Kubernetes.
But it is the go-to tool for scaling your application horizontally.
โณ ๐ Introduction to K8s course
.
Both courses provide a theoretical part & hands-on examples that you can replicate along the course.
Thatโs it for today ๐พ
See you next Thursday at 9:00 am CET.
Have a fantastic weekend!
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: here, I approach in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where I will constantly aggregate all my work (courses, articles, webinars, podcasts, etc.),