Great write-up on building a production-quality RAG system.
I recently built something similar and thought it might be useful to share another implementation for comparison. The latest version of my project is rag-foundry-universal, which aims to be “universal” in the sense that it can ingest a variety of document types, including entire Python repositories, and then allow you to query them.
It expects that you already have Ollama installed and Docker available (Docker Desktop works well). Once running, it can index a repo and let you ask questions about the codebase. It’s intentionally not agentic—the focus was on keeping the architecture simple and reliable for retrieval and question answering over code.
It does not do other languages such as java as yet .
For comparison, this implementation by Priya includes additional observability and evaluation tooling (for example using Opik), which is an interesting direction for testing and tracing RAG pipelines:
Parsing repos with Ollama is a great use case for RAG! Would you like to write a guest post around this? Would love to see how it works in more detail and I am sure other's will to
Hi Paul, That would be a great honor. Yes I will go ahead and work on a guest post and run it by you this week. Sorry about the late reply. I had not logged in to substack for a week.
so i built this thing. it's basically a layer on top of everything — files, notes, tasks, urls, images, contacts, phone numbers, even remote files.
everything gets an id + uuid.
and anything can have embeddings.
but here's the kicker: no ingestion. no duplicating documents into some rag system. no copying stuff. the originals stay where they are. the layer just knows where they live.
so i can do stuff like: "give me everything related to x" — and it'll return a task, a contact, a screenshot, and a random url i saved two years ago. all from one query. all live, not stale.
Fabulous! Look forward to reading it. It addresses a real need for most of us who have tons of stuff stashed away but don't remember where most of it is and can't recall the relevant info when it's needed.
This is the kind of RAG writing I trust most. Not just how to make it work once, but how to build it in a way you can actually understand, debug, and use in the real world.
Thank you for the piece. As I am reading through it, just one feedback, I see code text in white against a light background, which is bad for readability.
Really strong piece. What I particularly liked is that it treats production RAG as a systems problem, not just a prompt-and-vector-store recipe.
That is exactly the gap many teams run into in practice: the hard part is not just getting retrieval to work once, but deciding which retrieval architecture fits the problem, how to compare tradeoffs, and how to operationalize it.
That is one of the reasons I built RAG Orchestration Studio — a browser-based environment to explore and design different RAG patterns such as vector, vectorless, graph, temporal, and hybrid workflows:
Priya, love the write-up. But why copy the whole file into Postgres? Files have checksums. Just checksum the file, embed the chunks, and store the embeddings with the checksum. Each embedding already carries its chunk of text anyway—that's basically the file's content right there. The embedding can live without the file in the database. So why duplicate? Periodic scans of your filesystem catch changes. The file stays where it is. Local disk, USB drive, remote server, doesn't matter. Index everything in place. No ingestion pipeline copying stuff you already own.
Hi Stacey, thank you for reading. Sorry if the code looked that way but the intent is that files stay where they are and chunks will have Metadata that includes path to file. You're absolutely right. Or...are you referring to storing the actual chunk text along with the embedding? By the way I'm still learning this stuff so am certainly no expert 🙂
I wrongly understood. Yes, files should stay where they are as embeddings in the database would relate to chunks of text anyway, it means chunks of text are already one copy of the file. And as you said, it can point back to the file.
Then what to do when files are moved? Maybe they should be checksum-ed, with b2sum or similar tool, and then if file is moved, it could be found by script executed regularly, and reference to file could be updated in the database.
On my side, I do not use file system for dealing with files, unless those are new files, mostly in Download directory. So I index every file into the dynamic knowledge repository. Consider that referencing system to files.
Now each file get it's own name, it can get automatically name from the file name, or curator can make it, or let the LLM generate it.
From that moment, I forget about file location. I have no idea where file is located.
Each file is related to some category, some project, some people, some other files, so I find them by relations or semantics or by similarities.
Let's say George Orwell, I would find his category within seconds, and all his files are there. Files have special types, subtypes, so if I am searching for is ID card, I would see maybe PDF as type and ID document as subtype. I could just say "Let me have ID documents of George Orwell".
People's files? I take e-mail address or phone number, find the person and list all files, e-mails related to that person.
- Total number of people: 245318
- People in last week: 20
- People in last month: 71
- Total Hyperdocuments: 95076
- Hyperdocuments in last week: 130
- Hyperdocuments in last month: 682
Why would I as human user go browsing file directory to find let us say CV of Peter Pan? I would need to have same order on file system, go browseing, but what if the CV has file name jbnjb1234.pdf ? That is what people do. I would not be able to find it without looking into multiple nonsensical file names, opening one by one PDF until I find the CV.
Instead, I search for Peter Pan, click and get all this categories, sets, files listed as meta-layer, I can search by name, type, subtype or by similarity.
Thanks for contributing to this amazing piece!
Amazing augmentation for the good 😊
Great write-up on building a production-quality RAG system.
I recently built something similar and thought it might be useful to share another implementation for comparison. The latest version of my project is rag-foundry-universal, which aims to be “universal” in the sense that it can ingest a variety of document types, including entire Python repositories, and then allow you to query them.
It expects that you already have Ollama installed and Docker available (Docker Desktop works well). Once running, it can index a repo and let you ask questions about the codebase. It’s intentionally not agentic—the focus was on keeping the architecture simple and reliable for retrieval and question answering over code.
It does not do other languages such as java as yet .
Repo: https://github.com/sankar-ramamoorthy/rag-foundry-universal
For comparison, this implementation by Priya includes additional observability and evaluation tooling (for example using Opik), which is an interesting direction for testing and tracing RAG pipelines:
https://github.com/CalvHobbes/rag-101
Curious how others are approaching evaluation and observability as RAG systems move toward more production-oriented setups
Parsing repos with Ollama is a great use case for RAG! Would you like to write a guest post around this? Would love to see how it works in more detail and I am sure other's will to
Hi Paul, That would be a great honor. Yes I will go ahead and work on a guest post and run it by you this week. Sorry about the late reply. I had not logged in to substack for a week.
yo
so i built this thing. it's basically a layer on top of everything — files, notes, tasks, urls, images, contacts, phone numbers, even remote files.
everything gets an id + uuid.
and anything can have embeddings.
but here's the kicker: no ingestion. no duplicating documents into some rag system. no copying stuff. the originals stay where they are. the layer just knows where they live.
so i can do stuff like: "give me everything related to x" — and it'll return a task, a contact, a screenshot, and a random url i saved two years ago. all from one query. all live, not stale.
intersections? yeah. type + embedding + whatever.
So that is where power lies.
Sounds very useful. Read a tweet recently that is vaguely related: https://x.com/karpathy/status/2039805659525644595?s=46&t=J-YwiTTVsRaDT8r-yZRTqg
I implemented something similar myself recently! Will write my next article on it
Fabulous! Look forward to reading it. It addresses a real need for most of us who have tons of stuff stashed away but don't remember where most of it is and can't recall the relevant info when it's needed.
This is the kind of RAG writing I trust most. Not just how to make it work once, but how to build it in a way you can actually understand, debug, and use in the real world.
https://nandigamharikrishna.substack.com/p/7-advanced-rag-techniques-that-deliver?r=8op1j&utm_campaign=post&utm_medium=web
Thank you for the piece. As I am reading through it, just one feedback, I see code text in white against a light background, which is bad for readability.
Really strong piece. What I particularly liked is that it treats production RAG as a systems problem, not just a prompt-and-vector-store recipe.
That is exactly the gap many teams run into in practice: the hard part is not just getting retrieval to work once, but deciding which retrieval architecture fits the problem, how to compare tradeoffs, and how to operationalize it.
That is one of the reasons I built RAG Orchestration Studio — a browser-based environment to explore and design different RAG patterns such as vector, vectorless, graph, temporal, and hybrid workflows:
https://ragorchestrationstudio.com
Would love your thoughts on whether a tool like this helps teams move from “RAG tutorial” to actual architecture design.
Priya, love the write-up. But why copy the whole file into Postgres? Files have checksums. Just checksum the file, embed the chunks, and store the embeddings with the checksum. Each embedding already carries its chunk of text anyway—that's basically the file's content right there. The embedding can live without the file in the database. So why duplicate? Periodic scans of your filesystem catch changes. The file stays where it is. Local disk, USB drive, remote server, doesn't matter. Index everything in place. No ingestion pipeline copying stuff you already own.
Hi Stacey, thank you for reading. Sorry if the code looked that way but the intent is that files stay where they are and chunks will have Metadata that includes path to file. You're absolutely right. Or...are you referring to storing the actual chunk text along with the embedding? By the way I'm still learning this stuff so am certainly no expert 🙂
I wrongly understood. Yes, files should stay where they are as embeddings in the database would relate to chunks of text anyway, it means chunks of text are already one copy of the file. And as you said, it can point back to the file.
Then what to do when files are moved? Maybe they should be checksum-ed, with b2sum or similar tool, and then if file is moved, it could be found by script executed regularly, and reference to file could be updated in the database.
On my side, I do not use file system for dealing with files, unless those are new files, mostly in Download directory. So I index every file into the dynamic knowledge repository. Consider that referencing system to files.
Now each file get it's own name, it can get automatically name from the file name, or curator can make it, or let the LLM generate it.
From that moment, I forget about file location. I have no idea where file is located.
Each file is related to some category, some project, some people, some other files, so I find them by relations or semantics or by similarities.
Let's say George Orwell, I would find his category within seconds, and all his files are there. Files have special types, subtypes, so if I am searching for is ID card, I would see maybe PDF as type and ID document as subtype. I could just say "Let me have ID documents of George Orwell".
People's files? I take e-mail address or phone number, find the person and list all files, e-mails related to that person.
- Total number of people: 245318
- People in last week: 20
- People in last month: 71
- Total Hyperdocuments: 95076
- Hyperdocuments in last week: 130
- Hyperdocuments in last month: 682
Why would I as human user go browsing file directory to find let us say CV of Peter Pan? I would need to have same order on file system, go browseing, but what if the CV has file name jbnjb1234.pdf ? That is what people do. I would not be able to find it without looking into multiple nonsensical file names, opening one by one PDF until I find the CV.
Instead, I search for Peter Pan, click and get all this categories, sets, files listed as meta-layer, I can search by name, type, subtype or by similarity.
Looks like you have a great way to organise your stuff, brilliant!