The difference between development and continuous training ML environments
Looking to become a PRO in LangChain? How to write a streaming retrieval system for RAG on social media data.
Decoding ML Notes
This weekโs topics:
Looking to become a PRO in LangChain?
The difference between development and continuous training ML environments
How to write a streaming retrieval system for RAG on social media data
First, I want to thank everyone who supported our Hands-on LLMs course repo ๐๐ป
The ๐๐ฎ๐ป๐ฑ๐-๐ผ๐ป ๐๐๐ ๐ FREE ๐ฐ๐ผ๐๐ฟ๐๐ฒ passed 2.1k+ โญ๏ธ on GitHub - the place to ๐น๐ฒ๐ฎ๐ฟ๐ป the ๐ณ๐๐ป๐ฑ๐ฎ๐บ๐ฒ๐ป๐๐ฎ๐น๐ of ๐๐๐ ๐๐๐๐๐ฒ๐บ๐ & ๐๐๐ ๐ข๐ฝ๐
๐๐ฉ๐ฆ ๐ค๐ฐ๐ถ๐ณ๐ด๐ฆ ๐ช๐ด ๐ต๐ฉ๐ฆ ๐จ๐ฐ-๐ต๐ฐ ๐ฉ๐ถ๐ฃ ๐ง๐ฐ๐ณ ๐ญ๐ฆ๐ข๐ณ๐ฏ๐ช๐ฏ๐จ ๐ต๐ฉ๐ฆ ๐ง๐ถ๐ฏ๐ฅ๐ข๐ฎ๐ฆ๐ฏ๐ต๐ข๐ญ๐ด ๐ฐ๐ง ๐ฑ๐ณ๐ฐ๐ฅ๐ถ๐ค๐ต๐ช๐ฐ๐ฏ-๐ณ๐ฆ๐ข๐ฅ๐บ ๐๐๐๐ด & ๐๐๐๐๐ฑ๐ด
It will walk you through an ๐ฒ๐ป๐ฑ-๐๐ผ-๐ฒ๐ป๐ฑ ๐ฝ๐ฟ๐ผ๐ฐ๐ฒ๐๐...
...from data preparation to deployment & monitoring:
- the 3-pipeline design
- building your custom financial dataset using GPT-4
- a streaming pipeline to ingest financial news in real-time
- fine-tuning an LLM using QLoRA
- building a custom RAG pipeline
- deploying the streaming pipeline to AWS
- deploying the training & inference pipelines to Beam
- using MLOps components: model registries, experiment trackers, prompt monitoring
๐๐ต๐ฒ๐ฐ๐ธ ๐ถ๐ ๐ผ๐๐
โโโ
๐ ๐๐ข๐ฏ๐ฅ๐ด-๐ฐ๐ฏ ๐๐๐๐ด ๐๐ฐ๐ถ๐ณ๐ด๐ฆ - ๐๐ฆ๐ข๐ณ๐ฏ ๐ต๐ฐ ๐๐ณ๐ข๐ช๐ฏ ๐ข๐ฏ๐ฅ ๐๐ฆ๐ฑ๐ญ๐ฐ๐บ ๐ข ๐๐ฆ๐ข๐ญ-๐๐ช๐ฎ๐ฆ ๐๐ช๐ฏ๐ข๐ฏ๐ค๐ช๐ข๐ญ ๐๐ฅ๐ท๐ช๐ด๐ฐ๐ณ
Looking to become a PRO in LangChain?
Then ๐ฐ๐ต๐ฒ๐ฐ๐ธ ๐ผ๐๐ this ๐ฏ๐ผ๐ผ๐ธ on ๐ต๐ฎ๐ป๐ฑ๐-๐ผ๐ป ๐๐ฎ๐ป๐ด๐๐ต๐ฎ๐ถ๐ป: from ๐ฏ๐ฒ๐ด๐ถ๐ป๐ป๐ฒ๐ฟ to ๐ฎ๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ๐ฑ โ
โ It's called: ๐๐ฆ๐ฏ๐ฆ๐ณ๐ข๐ต๐ช๐ท๐ฆ ๐๐ ๐ธ๐ช๐ต๐ฉ ๐๐ข๐ฏ๐จ๐๐ฉ๐ข๐ช๐ฏ: ๐๐ถ๐ช๐ญ๐ฅ ๐๐๐ ๐ข๐ฑ๐ฑ๐ด ๐ธ๐ช๐ต๐ฉ ๐๐บ๐ต๐ฉ๐ฐ๐ฏ, ๐๐ฉ๐ข๐ต๐๐๐, ๐ข๐ฏ๐ฅ ๐ฐ๐ต๐ฉ๐ฆ๐ณ ๐๐๐๐ด by Ben Auffarth , published by Packt
๐๐ฆ๐ณ๐ฆ ๐ช๐ด ๐ข ๐ด๐ฉ๐ฐ๐ณ๐ต ๐ฃ๐ณ๐ฆ๐ข๐ฌ๐ฅ๐ฐ๐ธ๐ฏ:
- It begins with some theoretical chapters on LLMs & LangChain
- It explores the critical components of LangChain: chains, agents, memory, tools
๐ง๐ต๐ฒ๐ป, ๐บ๐ ๐ณ๐ฎ๐๐ผ๐ฟ๐ถ๐๐ฒ ๐ฝ๐ฎ๐ฟ๐...
๐๐ ๐ท๐๐บ๐ฝ๐ ๐ฑ๐ถ๐ฟ๐ฒ๐ฐ๐๐น๐ ๐ถ๐ป๐๐ผ ๐ต๐ฎ๐ป๐ฑ๐-๐ผ๐ป ๐ฒ๐
๐ฎ๐บ๐ฝ๐น๐ฒ๐ - ๐ช๐๐ง๐ ๐ฃ๐ฌ๐ง๐๐ข๐ก ๐๐ข๐๐ โ
- takes off with beginner-friendly examples of using LangChain with agents, HuggingFace, GCP/VertexAI, Azure, Anthropic, etc.
- shows an end-to-end example of building a customer services application with LangChain & VertexAI
- how to mitigate hallucinations using the ๐๐๐๐๐ฉ๐ฆ๐ค๐ฌ๐ฆ๐ณ๐๐ฉ๐ข๐ช๐ฏ class
- how to implement map-reduce pipelines
- how to monitor token usage & costs
- how to extract information from documents such as PDFs
- building a Streamlit interface
- how reasoning works in agent
- building a chatbot like ChatGPT from SCRATCH
.
I haven't finished it yet, but I love it so far โI plan to finish it soon.
.
๐ช๐ต๐ผ ๐ถ๐ ๐๐ต๐ถ๐ ๐ณ๐ผ๐ฟ?
If you are ๐๐๐ฎ๐ฟ๐๐ถ๐ป๐ด ๐ผ๐๐ in the LLM world, this is a great book to ๐ฟ๐ฒ๐ฎ๐ฑ ๐ฒ๐ป๐ฑ-๐๐ผ-๐ฒ๐ป๐ฑ.
Even if you are ๐ฒ๐
๐ฝ๐ฒ๐ฟ๐ถ๐ฒ๐ป๐ฐ๐ฒ๐ฑ, I think it is ๐ฒ๐
๐๐ฟ๐ฒ๐บ๐ฒ๐น๐ ๐๐๐ฒ๐ณ๐๐น to ๐๐ธ๐ถ๐บ ๐ถ๐ to refresh the fundamentals, learn new details, and see how everything is implemented in LangChain.
๐๐ ๐๐ต๐ถ๐ ๐ณ๐ผ๐ฟ ๐๐ผ๐? ๐ซต
๐ ๐๐ต๐ฒ๐ฐ๐ธ ๐ถ๐ ๐ผ๐๐: Generative AI with LangChain [By Ben Auffarth]
The difference between development and continuous training ML environments
They might do the same thing, but their design is entirely different โ
๐ ๐ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐ ๐๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐
At this point, your main goal is to ingest the raw and preprocessed data through versioned artifacts (or a feature store), analyze it & generate as many experiments as possible to find the best:
- model
- hyperparameters
- augmentations
Based on your business requirements, you must maximize some specific metrics, find the best latency-accuracy trade-offs, etc.
You will use an experiment tracker to compare all these experiments.
After you settle on the best one, the output of your ML development environment will be:
- a new version of the code
- a new version of the configuration artifact
Here is where the research happens. Thus, you need flexibility.
That is why we decouple it from the rest of the ML systems through artifacts (data, config, & code artifacts).
๐๐ผ๐ป๐๐ถ๐ป๐๐ผ๐๐ ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐
Here is where you want to take the data, code, and config artifacts and:
- train the model on all the required data
- output a staging versioned model artifact
- test the staging model artifact
- if the test passes, label it as the new production model artifact
- deploy it to the inference services
A common strategy is to build a CI/CD pipeline that (e.g., using GitHub Actions):
- builds a docker image from the code artifact (e.g., triggered manually or when a new artifact version is created)
- start the training pipeline inside the docker container that pulls the feature and config artifacts and outputs the staging model artifact
- manually look over the training report -> If everything went fine, manually trigger the testing pipeline
- manually look over the testing report -> if everything worked fine (e.g., the model is better than the previous one), manually trigger the CD pipeline that deploys the new model to your inference services
Note how the model registry quickly helps you to decouple all the components.
Also, because training and testing metrics are not always black and white, it is challenging to automate the CI/CD pipeline 100%.
Thus, you need a human in the loop when deploying ML models.
To conclude...
The ML development environment is where you do your research to find better models.
The continuous training environment is used to train & test the production model at scale.
How to write a streaming retrieval system for RAG on social media data
๐๐ฎ๐๐ฐ๐ต ๐๐๐๐๐ฒ๐บ๐ are the ๐ฝ๐ฎ๐๐. Here is how to ๐๐ฟ๐ถ๐๐ฒ a ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐๐๐๐๐ฒ๐บ for ๐ฅ๐๐ on ๐๐ผ๐ฐ๐ถ๐ฎ๐น ๐บ๐ฒ๐ฑ๐ถ๐ฎ ๐ฑ๐ฎ๐๐ฎ โ
๐ช๐ต๐ ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ผ๐๐ฒ๐ฟ ๐ฏ๐ฎ๐๐ฐ๐ต?
In environments where data evolves quickly (e.g., social media platforms), the system's response time is critical for your application's user experience.
That is why TikTok is so addicting. Its recommender system adapts in real-time based on your interaction with the app.
How would it be if the recommendations were updated daily or hourly?
Well, it would work, but you would probably get bored of the app much faster.
The same applies to RAG for highly intensive data sources...
โ where you must sync your source and vector DB in real time for up-to-date retrievals.
๐๐ฆ๐ต'๐ด ๐ด๐ฆ๐ฆ ๐ฉ๐ฐ๐ธ ๐ช๐ต ๐ธ๐ฐ๐ณ๐ฌ๐ด.
โโโ
I wrote an ๐ฎ๐ฟ๐๐ถ๐ฐ๐น๐ฒ on how to ๐ฏ๐๐ถ๐น๐ฑ a ๐ฟ๐ฒ๐ฎ๐น-๐๐ถ๐บ๐ฒ ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐๐๐๐๐ฒ๐บ for ๐ฅ๐๐ on ๐๐ถ๐ป๐ธ๐ฒ๐ฑ๐๐ป ๐ฑ๐ฎ๐๐ฎ in collaboration with Superlinked .
The ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐๐๐๐๐ฒ๐บ is based on ๐ฎ ๐ฑ๐ฒ๐๐ฎ๐ฐ๐ต๐ฒ๐ฑ ๐ฐ๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐๐:
- the streaming ingestion pipeline
- the retrieval client
The ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ถ๐ป๐ด๐ฒ๐๐๐ถ๐ผ๐ป ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ runs 24/7 to keep the vector DB synced with the current raw LinkedIn posts data source.
The ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐ฐ๐น๐ถ๐ฒ๐ป๐ is used in RAG applications to query the vector DB.
โ These 2 components are completely decoupled and communicate with each other through the vector DB.
#๐ญ. ๐ง๐ต๐ฒ ๐๐๐ฟ๐ฒ๐ฎ๐บ๐ถ๐ป๐ด ๐ถ๐ป๐ด๐ฒ๐๐๐ถ๐ผ๐ป ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ
โ Implemented in Bytewax - a streaming engine built in Rust (speed& reliability) that exposes a Python interface
๐๐ข๐ช๐ฏ ๐ง๐ญ๐ฐ๐ธ:
- uses CDC to add changes from the source DB to a queue
- listens to the queue for new events
- cleans, chunks, and embeds the LI posts
- loads them to a Qdrant vector DB
and... everything in real-time!
#๐ฎ. ๐ง๐ต๐ฒ ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐ฐ๐น๐ถ๐ฒ๐ป๐
โ A standard Python module.
The goal is to retrieve similar posts using various query types, such as posts, questions, and sentences.
๐๐ข๐ช๐ฏ ๐ง๐ญ๐ฐ๐ธ:
- preprocess user queries (the same way as they were ingested)
- search the Qdrant vector DB for the most similar results
- use rerank to improve the retrieval system's accuracy
- visualize the results on a 2D plot using UMAP
.
You don't believe me? ๐ซต
๐๐ต๐ฒ๐ฐ๐ธ ๐ผ๐๐ ๐๐ต๐ฒ ๐ณ๐๐น๐น ๐ฎ๐ฟ๐๐ถ๐ฐ๐น๐ฒ & ๐ฐ๐ผ๐ฑ๐ฒ ๐ผ๐ป ๐๐ฒ๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด ๐ ๐ โ
๐ ๐ ๐๐ฆ๐ข๐ญ-๐ต๐ช๐ฎ๐ฆ ๐๐ฆ๐ต๐ณ๐ช๐ฆ๐ท๐ข๐ญ ๐๐บ๐ด๐ต๐ฆ๐ฎ ๐ง๐ฐ๐ณ ๐๐๐ ๐ฐ๐ฏ ๐๐ฐ๐ค๐ช๐ข๐ญ ๐๐ฆ๐ฅ๐ช๐ข ๐๐ข๐ต๐ข
Images
If not otherwise stated, all images are created by the author.