How to improve your RAG: Move Beyond Flat Vector Stores

Most of the time, the knowledge base we want to chat and reason about with an LLM has strong inter-relations.

In the codebase, classes are constantly inherited, and functions are called in different places.
In healthcare, a symptom is not just a standalone fact; it’s a node connected to genetics, history, and pharmaceutical research.
Documents often have cross-links between different documents, or different chapters within a book, which create a logical hierarchy and process flow.
In research papers, we cite the literature that the current work builds on.
In law, no law or ruling exists in isolation.

Even the famous PageRank algorithm, which gave Google a competitive advantage and made them ahead of others, is based on the quantity and quality of links between websites. The relations within a knowledge base are crucial to fully understand it. The problem with classical RAG is that it chunks the text, discarding all internal relations. So can we do better?

Classical RAG

The classical RAG approach can be summarised as follows:

chunk the knowledge base
create embeddings
create a vector store
create an index inside of it
run a similarity search based on the query
use the retrieved chunks as a context for the LLM

The approach described above can be extended with multimodal support, sparse, dense, or hybrid retrieval, late interaction, reranking, augmenting the index store with hypothetical queries per chunk, cross-encoder retrieval, and so on.

But there are two main disadvantages of the classical RAG approach that cannot be overcome with the flat index store:

The system can’t answer general questions about the whole knowledge base. Chunks refer to parts of the document. The system will answer specific questions well, but will struggle with abstract questions about organizational knowledge and summarizations.
The system does not know the relationship between documents, facts, chunks, and knowledge in general. It will struggle to answer cross-document questions that require relation mapping.

Can we do better?... Yes definitely. The solution is Graph-based RAGs. Keeping the knowledge structure, instead of squeezing it into the flat index store, allows for reasoning about relations and structure, traversing the graph when needed, and reasoning about the whole knowledge by using high-level Graph nodes. Instead of just looking for matching keywords, the system understands the "who, what, and why" of your data.

Graph RAG

Overview

Graph RAG breaks data down into Entities (people, places, concepts) and Relationships (how those things interact). Each node contains a high-level or low-level fact extracted from the organizational knowledge. Connections between nodes tell us about how and why two entities are related. Retrieval can be implemented in a similar fashion to the classical RAG, where each node has its vectorised embedding and is chosen or not, based on the similarity to the query embedding. The advantage of keeping the knowledge structure is the possibility of context expansion by graph traversal, from the initially chosen nodes.

Example: If it finds a node for "Project Lemon," it can instantly see it’s connected to "Lead Engineer Olaf" and the "Budget Sugar." Even if your original question didn't mention Olaf or budgets, the graph provides that extra context automatically through graph traversal.

Graph-based RAGs can use both property graph and RDF graph databases; however, more often, property graphs are chosen, mainly because they are more human-intuitive and allow for easy property storing and relation weighting.

The graph-based rags gain more and more community attention. The most famous implementations right now are GraphRAG by Microsoft, LightRAG, RAG-Anything, and Graphiti. And the solution space constantly grows.

The process of Graph RAG creation starts with text chunking, fact extraction, and relationship creation. GraphRAG, LightRAG, and RAG-Anything create relations within a chunk based on LLM judgement and cross-chunk node relations are created in the deduplication process. If there exists few similar entities, those are merged into a single node, while preserving all existing relations.

In the second step, some solutions use clustering to create a simplified, abstracted layer on the graph that can be used for answering general questions. GrapRAG uses the Leiden algorithm to create hierarchical community summaries.

Challenges

Are there any drawbacks to using Graph-based RAG? The main challenges are building the right graph and maintaining it clean.

We can distinguish a few main challenges:

Finding the right granularity for nodes
Finding the right relationships
Maintaining the graph clean on updates
Fast and accurate duplicates detection and merging strategy
Abstraction creation

Benefits

All benchmarks present strong benefits of encoding the knowledge structure compared to the flat index stores. Authors of Graph-based RAG papers propose an LLM-based evaluation. The LLM directly compares two answers from two different RAG setups and chooses a winner based on predefined criteria. The proposed criteria are: comprehensiveness, diversity, empowerment, and overall score, which is the cumulative performance across the three preceding criteria.

For GraphRAG, global approaches achieved comprehensiveness with win rates between 72-

83% (p<.001) for the Podcast transcripts dataset and 72-80% (p<.001) for the News articles dataset, while diversity win rates ranged from 75-82% (p<.001) and 62-71% (p<.01), respectively.

LighRAG compares its approach to four different RAG approaches:

the flattened RAG (Naive RAG),
RQ-RAG (NaiveRAG, with a modified querying. LLM decomposes the input query into multiple sub-queries),
HyDe (again a modification of the NaiveRAG. LLM is requested to create a hypothetical document based on the query, which is then used for similarity matching with embeddings in a vector store)
GraphRAG

In all scenarios, LightRAG is better. It achieves between 60-85% success rate on 4 different datasets against NaiveRAG, between 60-85% against RQ-RAG, 57-73% against HyDe, and is more or less comparable to the GraphRAG 49-55%, with the main advantage on the diversity criteria.

Summary

Graph-based RAGs show a superior performance compared to flat vector-store-based RAGs. By moving from flat vector stores to structured knowledge graphs, AI can finally answer complex, cross-document questions and provide high-level summaries that traditional retrieval methods miss. While being more complex to implement, if well designed, it can bring a significant quality boost to the RAG system.

How to improve your RAG: Move Beyond Flat Vector Stores

Classical RAG

Graph RAG

Overview

Challenges

Benefits

Summary

Explore more topics

GitHub All-Stars #1: deepagents - Architecture of Deep Reasoning for Agentic AI

This Month We AIed #4

How We Got FP16 GPU Tests Running on GitHub Actions - Without a GPU