How to Enable Multiple Clients to Talk to a Common Knowledge Base Using Local GPT API
Today we’re going to look at how to enable multiple clients so that they can talk to a common knowledge base using the local GPT API. Local GPT is my own project that lets you chat with your documents locally in a secure environment. It has over 18,000 stars on GitHub and you can actually build business applications on top of this.
Setting Up Local GPT
First, let’s walk through the step-by-step process of how to set up local GPT. If this is the first time you’re seeing local GPT, you’ll need to clone the repo. Just click on the green button on the GitHub page to copy the link. Alternatively, you can use a preconfigured virtual machine provided by MK Compute. If you choose to use the virtual machine, use the code “promptengineering” to get 50% off.
If you’re running local GPT on a local machine, open your terminal and change the directory to where you want to store a copy of the cloned repo. Then, use the copied GitHub repo address to clone the repo into a new folder called “local GPT API”. Next, create a new virtual environment using the command “conda create -n localGPT python=3.10” (replace “localGPT” with the desired name of your virtual environment). Install all the requirements using “pip install -r requirements.txt”.
Once the setup is complete, you’re ready to move on to the injection part.
Injecting Documents and Creating a Knowledge Base
In order to enable multiple clients to communicate with a common knowledge base, we need to ingest the documents and create a vector store. This can be done using the Python command “python inject.py”. By default, it uses CUDA for device type, but if you’re running this on Apple silicon, you can provide MPS as the device type.
The injection process splits the documents into chunks and creates vector embeddings based on those chunks. The vector embeddings are stored in a vector store, which acts as the knowledge base.
Serving Multiple Clients
Now that we have the knowledge base set up, we can start serving multiple clients through an API. Local GPT uses the Flask API server for this purpose. The local GPT server implements a simple queuing mechanism, serving clients in the order they make requests. If you have a multi-GPU system, you can serve different instances of local GPT on different GPUs and route client requests through them.
To start the API server, use the command “python run_local_gpt_api.py”. This will load the model and start serving.
Using the Local GPT UI
Local GPT comes with an example UI that you can use to interact with the API server. Navigate to the “local GPT UI” folder and run the command “python local_gpt_ui.py”. This will start another Flask application running on port 5111.
You can create your own UIs by looking at the implementation of the local GPT API and how to call it. The example UI provided is just one option.
Simulating Multiple Clients
To simulate multiple clients accessing the API simultaneously, you can run multiple instances of the UI or make external calls to the API from different remote machines. In the example shown in the video, three different instances of the UI are running, each making a different prompt to the API server.
The API server responds to one prompt at a time, processing them in the order they are received. Once a response is generated, it moves on to the next prompt.
Conclusion
In this tutorial, we learned how to enable multiple clients to talk to a common knowledge base using the local GPT API server. While this tutorial provides a basic setup, there are more elegant ways to handle multiple clients, such as adding a load balancer and queuing requests.
If you’re interested in contributing to the local GPT project, check out the GitHub repo and join the Discord community. Consulting and advising services for products and startups are also available. Make sure to subscribe to the channel for more content on local GPT and its features.
Thanks for watching and see you in the next video!