Welcome to the Advanced RAG Series: Building a Chat with Your Document System
In this first video of our new series on Advanced RAG (Retrieval Augmented Generation), we will explore the architecture of a basic chat with your document system or RAG pipeline. We will go beyond the basics and discuss more advanced techniques to enhance the pipeline.
The Basics of a RAG Pipeline
Let’s start with the basics of a RAG pipeline. The architecture consists of the following components:
- Document Loader: This component loads the documents, such as PDF files, that will be used in the chat system.
- Chunking: The documents are split into smaller chunks to fit the context window of the language model (LLM).
- Embeddings: Embeddings are computed for each chunk, which are then stored in a vector store to create a semantic index.
- Retrieval: When a user asks a question, the system computes embeddings for the query and performs a semantic search on the knowledge base (vector store) to retrieve relevant documents or chunks.
- Reranking: The retrieved documents or chunks are reranked based on relevance.
- Generation: The question and the context (reranked documents or chunks) are fed into the LLM to generate responses.
This is a basic overview of how a RAG pipeline works. If you are an absolute beginner, don’t worry! We have a series of videos on creating RAG pipelines that you can check out.
Introducing Hybrid Search
In this new series, we will focus on improving the basic RAG pipeline. The first topic we will explore is hybrid search.
Hybrid search involves adding two components to the RAG pipeline:
- Keyword-based Search: In addition to semantic search using embeddings, we also perform traditional keyword-based search on the available chunks.
- Ensemble Retriever: Instead of using only the embedding-based retriever, we combine the results from both the keyword-based search and the embedding-based retriever. This ensemble retriever allows us to assign different weights to each search method, making it more powerful.
Now that we have an overview of the architecture and the addition of hybrid search, let’s take a look at a code example.
Code Example
Before we dive into the code, make sure you have the necessary packages installed. We will be using rank_bm25
for hybrid search, unstructuredIO
for reading PDF files, and chromaDB
for creating the vector store. If you are working with scanned documents, additional packages for interacting with scanned documents and performing optical character recognition (OCR) are recommended.
Once the packages are installed, we can load them in our code. Here is an example:
import rank_bm25
import unstructuredIO
import chromaDB
from lchaintools import ChatPromptTemplate, OutputParser, RunnablePassThrough
from lchaintools import EmbeddingModelAPI, LLMModelAPI
In this example, we are using the latest version of LChain. We import the necessary packages and modules for our RAG pipeline, including the chat prompt template, output parser, runnable pass-through, embedding model API, and LLM model API.
Next, we need to load the PDF file and convert it into chunks. Here is an example:
pdf_loader = unstructuredIO.PDFLoader()
docs = pdf_loader.load_file('data/example.pdf')
chunk_size = 800
overlap = 100
splitter = unstructuredIO.RecursiveCharTextSplitter(chunk_size, overlap)
chunks = splitter.process(docs)
In this code snippet, we use the PDF loader to load the PDF file. We then define the chunk size and overlap for splitting the documents into chunks. The recursive character text splitter is used to create the chunks.
Now, let’s move on to creating the embeddings and the vector store:
embedding_model = EmbeddingModelAPI('hugging_face_token')
vector_store = chromaDB.VectorStore(embedding_model)
vector_store.create_vector_store(chunks)
In this example, we create an embedding model using the Hugging Face API. We then use the embedding model and the chunks to create the vector store.
Next, we set up the retrievers for both the embedding-based search and the keyword-based search:
embedding_retriever = vector_store.create_retriever(num_chunks=3)
keyword_retriever = rank_bm25.BM25Retriever(chunks, num_chunks=3)
ensemble_retriever = rank_bm25.EnsembleRetriever([embedding_retriever, keyword_retriever], [0.7, 0.3])
In this code snippet, we create the retrievers for both the embedding-based search and the keyword-based search. The ensemble retriever combines the results from both retrievers, with different weights assigned to each.
Finally, we set up the LLM model and the prompt template:
llm_model = LLMModelAPI('hugging_face_token')
prompt_template = ChatPromptTemplate(
system_prompt="You are a helpful AI assistant that follows instructions extremely well. Use the following context to answer the user's question:",
user_prompt="What is instruction tuning?",
assistant_prompt="",
)
output_parser = OutputParser()
chat_template = llm_model.create_chat_template(
prompt_template,
output_parser,
retriever=ensemble_retriever,
)
In this code snippet, we create the LLM model using the Hugging Face API. We then set up the prompt template for the chat, including the system prompt, user prompt, and assistant prompt. The output parser is used to parse the model’s response.
With everything set up, we can now run the chat:
user_input = "What is instruction tuning?"
response = chat_template.invoke(user_input)
In this example, we provide the user’s input and call the invoke
function on the chat template to get the model’s response.
Conclusion
In this video, we explored the architecture of a basic RAG pipeline and introduced the concept of hybrid search. We also provided a code example on how to implement hybrid search using the LChain library. By combining keyword-based search and embedding-based search, we can enhance the chat system’s performance and provide more accurate responses.
If you found this video useful, make sure to subscribe to our channel for more videos in this series. If you need assistance with your own projects, we offer consulting services. Check out the links in the description for more information.
Thank you for watching, and see you in the next video!