DML: How to add real-time monitoring & metrics to your ML System
How to easily add retry policies to your Python code. How to add real-time monitoring & metrics to your ML System.
Hello there, I am Paul Iusztin ๐๐ผ
Within this newsletter, I will help you decode complex topics about ML & MLOps one week at a time ๐ฅ
This weekโs ML & MLOps topics:
How to add real-time monitoring & metrics to your ML System
How to easily add retry policies to your Python code
Storytime: How am I writing code in 2023? ๐ ๐ฑ๐ผ๐ป'๐.
But first, I have some big news to share with you ๐
โ> Want to learn how to ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ฒ ๐ฎ๐ป ๐๐๐ , build a ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ, use a ๐๐ฒ๐ฐ๐๐ผ๐ฟ ๐๐, build a ๐ณ๐ถ๐ป๐ฎ๐ป๐ฐ๐ถ๐ฎ๐น ๐ฏ๐ผ๐ and ๐ฑ๐ฒ๐ฝ๐น๐ผ๐ ๐ฒ๐๐ฒ๐ฟ๐๐๐ต๐ถ๐ป๐ด using a serverless solution?
Then you will enjoy looking at this new free course that me and
(from the RWML newsletter) are cooking.
โณ The course will teach you how to build an end-to-end LLM solution.
It is structured into 4 modules โ
๐ ๐ผ๐ฑ๐๐น๐ฒ ๐ญ: Learn how to generate a financial Q&A dataset in a semi-automated way using the OpenAI API.
๐ ๐ผ๐ฑ๐๐น๐ฒ ๐ฎ: Fine-tune the LLM (e.g., Falcon, Llama2, etc.) using HuggingFace & Peft. Also, we will show you how to integrate an experiment tracker, model registry, and monitor the prompts using Comet.
๐ ๐ผ๐ฑ๐๐น๐ฒ ๐ฏ: Build a streaming pipeline using Bytewax that listens to financial news through a web socket, cleans it, embeds it, and loads it to a vector database using Qdrant.
๐ ๐ผ๐ฑ๐๐น๐ฒ ๐ฐ: Wrap the fine-tuned model and vector DB into a financial bot using LangChain and deploy it under a RESTful API.
โ๏ธ But all of this is useless if it isn't deployed.
โ We will use Beam to deploy everything quickly - Beam is a serverless solution that lets you focus on your problem and quickly serve all your ML components. Say bye-bye to access policies and network configuration.
๐ก๐ผ๐๐ฒ: This is still a work in progress, but the first 3 modules are almost done.
Curious?
Then, check out the repository and give it a โญ โ
โณ ๐ Course GitHub Repository
#1. How to add real-time monitoring & metrics to your ML System
Your model is exposed to performance degradation after it is deployed to production.
That is why you need to monitor it constantly.
The most common way to monitor an ML model is to compute its metrics.
But for that, you need the ground truth.
๐๐ป ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป, ๐๐ผ๐ ๐ฐ๐ฎ๐ป ๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ถ๐ฐ๐ฎ๐น๐น๐ ๐ฎ๐ฐ๐ฐ๐ฒ๐๐ ๐๐ต๐ฒ ๐ด๐ฟ๐ผ๐๐ป๐ฑ ๐๐ฟ๐๐๐ต ๐ถ๐ป ๐ฏ ๐บ๐ฎ๐ถ๐ป ๐๐ฐ๐ฒ๐ป๐ฎ๐ฟ๐ถ๐ผ๐:
1. near real-time: you can access it quite quickly
2. delayed: you can access it after a considerable amount of time (e.g., one month)
3. never: you have to label the data manually
.
๐๐ผ๐ฟ ๐๐๐ฒ ๐ฐ๐ฎ๐๐ฒ๐ ๐ฎ. ๐ฎ๐ป๐ฑ ๐ฏ. ๐๐ผ๐ ๐ฐ๐ฎ๐ป ๐พ๐๐ถ๐ฐ๐ธ๐น๐ ๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ ๐๐ผ๐๐ฟ ๐บ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ ๐ถ๐ป ๐๐ต๐ฒ ๐ณ๐ผ๐น๐น๐ผ๐๐ถ๐ป๐ด ๐๐ฎ๐:
- store the model predictions and GT as soon as they are available (these 2 will be out of sync -> you can't compute the metrics right away)
- build a DAG (e.g., using Airflow) that extracts the predictions & GT computes the metrics in batch mode and loads them into another storage (e.g., GCS)
- use an orchestration tool to run the DAG in the following scenarios:
1. scheduled: if the GT is available in near real-time (e.g., hourly), then it makes sense to run your monitoring pipeline based on the known frequency
2. triggered: if the GT is delayed and you don't know when it may come up, then you can implement a webhook to trigger your monitoring pipeline
- attach a consumer to your storage to use and display the metrics (e.g., trigger alarms and display them in a dashboard)
If you want to see how to implement a near real-time monitoring pipeline using Airflow and GCS, check out my article โ
โณ ๐ Ensuring Trustworthy ML Systems With Data Validation and Real-Time Monitoring
#2. How to easily add retry policies to your Python code
One strategy that makes the ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฏ๐ฒ๐๐๐ฒ๐ฒ๐ป ๐ด๐ผ๐ผ๐ฑ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฎ๐ป๐ฑ ๐ด๐ฟ๐ฒ๐ฎ๐ ๐ฐ๐ผ๐ฑ๐ฒ is adding ๐ฟ๐ฒ๐๐ฟ๐ ๐ฝ๐ผ๐น๐ถ๐ฐ๐ถ๐ฒ๐.
To manually implement them can get tedious and complicated.
Retry policies are a must when you:
- make calls to an external API
- read from a queue, etc.
.
๐จ๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐ง๐ฒ๐ป๐ฎ๐ฐ๐ถ๐๐ ๐ฃ๐๐๐ต๐ผ๐ป ๐ฝ๐ฎ๐ฐ๐ธ๐ฎ๐ด๐ฒ...
๐ ๐ฐ๐ถ ๐ค๐ข๐ฏ ๐ฒ๐ถ๐ช๐ค๐ฌ๐ญ๐บ ๐ฅ๐ฆ๐ค๐ฐ๐ณ๐ข๐ต๐ฆ ๐บ๐ฐ๐ถ๐ณ ๐ง๐ถ๐ฏ๐ค๐ต๐ช๐ฐ๐ฏ๐ด ๐ข๐ฏ๐ฅ ๐ข๐ฅ๐ฅ ๐ค๐ถ๐ด๐ต๐ฐ๐ฎ๐ช๐ป๐ข๐ฃ๐ญ๐ฆ ๐ณ๐ฆ๐ต๐ณ๐บ ๐ฑ๐ฐ๐ญ๐ช๐ค๐ช๐ฆ๐ด, ๐ด๐ถ๐ค๐ฉ ๐ข๐ด:
1. Add fixed and random wait times between multiple retries.
2. Add a maximum number of attempts or computation time.
3. Retry only when specific errors are thrown (or not thrown).
... as you can see, you easily compose these policies between them.
The cherry on top is that you can access the statistics of the retries of a specific function:
"
print(raise_my_exception.retry.statistics)
"
โณ ๐ tenacity repository
Storytime: How am I writing code in 2023? I donโt
As an engineer, you are paid to think and solve problems. How you do that, it doesn't matter. Let me explain โ
.
The truth is that I am lazy.
That is why I am a good engineer.
With the rise of LLMs, my laziness hit all times highs.
.
๐ง๐ต๐๐, ๐๐ต๐ถ๐ ๐ถ๐ ๐ต๐ผ๐ ๐ ๐๐ฟ๐ถ๐๐ฒ ๐บ๐ ๐ฐ๐ผ๐ฑ๐ฒ ๐๐ต๐ฒ๐๐ฒ ๐ฑ๐ฎ๐๐ โ
- 50% Copilot (tab is the new CTRL-C + CTRL-V)
- 30% ChatGPT/Bard
- 10% Stackoverflow (call me insane, but I still use StackOverflow from time to time)
- 10% Writing my own code
The thing is that I am more productive than ever.
... and that 10% of "writing my own code" is the final step that connects all the dots and brings real value to the table.
.
๐๐ป ๐ฟ๐ฒ๐ฎ๐น๐ถ๐๐, ๐ฎ๐ ๐ฎ๐ป ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ, ๐๐ผ๐ ๐บ๐ผ๐๐๐น๐ ๐ต๐ฎ๐๐ฒ ๐๐ผ:
- ask the right questions
- understand & improve the architecture of the system
- debug code
- understand business requirements
- communicate with other teams
...not to write code.
Writing code as we know it most probably will disappear with the rise of AI (it kind of already did).
.
What do you think? How do you write code these days?
Thatโs it for today ๐พ
See you next Thursday at 9:00 am CET.
Have a fantastic weekend!
Paul
Whenever youโre ready, here is how I can help you:
The Full Stack 7-Steps MLOps Framework: a 7-lesson FREE course that will walk you step-by-step through how to design, implement, train, deploy, and monitor an ML batch system using MLOps good practices. It contains the source code + 2.5 hours of reading & video materials on Medium.
Machine Learning & MLOps Blog: here, I approach in-depth topics about designing and productionizing ML systems using MLOps.
Machine Learning & MLOps Hub: a place where I will constantly aggregate all my work (courses, articles, webinars, podcasts, etc.).