Simple RAG Consumption with NVIDIA NIM > Introduction to Simple RAG Consumption with NVIDIA NIM recipe

Introduction to Simple RAG Consumption with NVIDIA NIM recipe

The Simple Retrieval Augmented Generation (RAG) Consumption with NVIDIA NIM recipe is based on REST and SOAP APIs. Use the recipe to receive a user query from Slack, search for relevant context in a vector database, send the query and context to the NVIDIA NIM Large Language Model (LLM), and return the generated response back to the Slack channel.

The process begins by receiving user input from Slack and skipping messages if they are bot responses. It then converts the query into a vector representation using a pre-trained embedding model, translating the text into a numerical format that captures the semantic meaning.

The process then uses a vectorized query to search for similar vectors within a vector database that contains representations of various contexts. The process retrieves the top K closest matches from the vector database based on similarity scores, representing contexts similar to the user's query.

The process then filters the retrieved contexts to include only those with a similarity score exceeding a specified cutoff parameter, ensuring that only relevant contexts are considered. The process combines these filtered contexts to formulate the final context, which provides additional information to the model.

Subsequently, the process invokes a Large Language Model (LLM) using the original user query and the processed context. The LLM leverages this information to generate a comprehensive response. This response is returned to the user in the same channel, appearing as a bot or app response.

This approach ensures that the LLM's response is relevant and enriched by the most contextually appropriate information from the vector database.