For development and experimentation purposes, the Jupyter notebooks provide guidance to building knowledge augmented chatbots.
The following Jupyter notebooks are provided with the AI workflow for the default canonical RAG example:
This notebook demonstrates how to use a client to stream responses from an LLM deployed to NVIDIA Triton Inference Server with NVIDIA TensorRT-LLM (TRT-LLM). This deployment format optimizes the model for low latency and high throughput inference.
This notebook demonstrates how to use LangChain to build a chatbot that references a custom knowledge-base. LangChain provides a simple framework for connecting LLMs to your own data sources. It shows how to integrate a TensorRT-LLM to LangChain using a custom wrapper.
This notebook demonstrates how to use LlamaIndex to build a chatbot that references a custom knowledge-base. It contains the same functionality as the notebook before, but uses some LlamaIndex components instead of LangChain components. It also shows how the two frameworks can be used together.
This notebook demonstrates how to use LlamaIndex to build a more complex retrieval for a chatbot. The retrieval method shown in this notebook works well for code documentation; it retrieves more contiguous document blocks that preserve both code snippets and explanations of code.
This notebook demonstrates how to use the REST FastAPI server to upload the knowledge base and then ask a question without and with the knowledge base.
-
Nvidia AI Endpoint Integration with langchain This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) example using the NVIDIA AI endpoint integrated with Langchain, with FAISS as the vector store.
-
RAG with langchain and local LLM model from This notebook demonstrates how to plug in a local llm from HuggingFace Hub and build a simple RAG app using langchain.
-
Nvidia AI Endpoint with llamaIndex and Langchain This notebook demonstrates how to plug in a NVIDIA AI Endpoint mixtral_8x7b and embedding nvolveqa_40k, bind these into LlamaIndex with these customizations.
-
Locally deployed model from Hugginface integration with llamaIndex and Langchain This notebook demonstrates how to plug in a local llm from HuggingFace Hub Llama-2-13b-chat-hf and all-MiniLM-L6-v2 embedding from Huggingface, bind these to into LlamaIndex with these customizations.
-
Langchain agent with tools plug in multiple models from NVIDIA AI Endpoint This notebook demonstrates how to use multiple NVIDIA's AI endpoint's model like
mixtral_8*7b,DeplotandNeva.
If a JupyterLab server needs to be compiled and stood up manually for development purposes, follow the following commands:
- [Optional] Notebook
7 to 9require GPUs. If you have a GPU and are trying out notebooks7-9update the jupyter-server service in the docker-compose.yaml file to use./notebooks/Dockerfile.gpu_notebookas the Dockerfile
jupyter-server:
container_name: notebook-server
image: notebook-server:latest
build:
context: ../../
dockerfile: ./notebooks/Dockerfile.gpu_notebook
- [Optional] Notebook from
7-9may need multiple GPUs. Update docker-compose.yaml to use multiple gpu ids indevice_idsfield below or setcount: all
jupyter-server:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
- Build the container
source deploy/compose/compose.env
docker compose -f deploy/compose/docker-compose.yaml build jupyter-server
- Run the container which starts the notebook server
source deploy/compose/compose.env
docker compose -f deploy/compose/docker-compose.yaml up jupyter-server
-
Using a web browser, type in the following URL to access the notebooks.
http://host-ip:8888