NVIDIA RAG Blueprint
Retrieval-Augmented Generation (RAG) combines the reasoning power of large language models (LLMs)
with real-time retrieval from trusted data sources.
It grounds AI responses in enterprise knowledge,
reducing hallucinations and ensuring accuracy, compliance, and freshness.
Overview
The NVIDIA RAG Blueprint is a reference solution and foundational starting point
for building Retrieval-Augmented Generation (RAG) pipelines with NVIDIA NIM microservices.
It enables enterprises to deliver natural language question answering grounded in their own data,
while meeting governance, latency, and scalability requirements.
Designed to be decomposable and configurable, the blueprint integrates GPU-accelerated components with NeMo Retriever models, Multimodal and Vision Language Models, and guardrailing services,
to provide an enterprise-ready framework.
With a pre-built reference UI, open-source code, and multiple deployment options β including local docker (with and without NVIDIA Hosted endpoints) and Kubernetes β
it serves as a flexible starting point that developers can adapt and extend to their specific needs.
Key Features
Data Ingestion
- Multimodal content extraction - Documents with text, tables, charts, infographics, and audio. For the full list of supported file types, see [NeMo Retriever Extraction Overview](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/).
- Custom metadata support
Search and Retrieval
- Multi-collection searchability
- Hybrid search with dense and sparse search
- Reranking to further improve accuracy
- GPU-accelerated Index creation and search
- Pluggable vector database
Query Processing
- Query decomposition
- Dynamic filter expression creati