GraphRAG — Knowledge Graph Retrieval
GraphRAG is a next-generation retrieval strategy that builds a knowledge graph of entities and relationships from your documents, then uses that graph — instead of raw text chunks — to answer questions. It was pioneered by Microsoft Research and is especially powerful for multi-document knowledge bases where flat cosine similarity falls short.
Why GraphRAG?
Traditional RAG retrieves the K most similar text chunks by cosine distance. This works well for narrow, keyword-anchored questions but has well-known failure modes:
- Fragmented context — the answer spans many chunks that are individually below the similarity threshold
- Missing relationships — "how does X relate to Y?" requires traversing a relationship, not matching embeddings
- Thematic blindness — "what are the main architectural patterns used here?" has no single matching chunk
GraphRAG fixes all three by maintaining a structured knowledge graph in parallel with the vector index and offering two complementary search modes — local and global.
Architecture
graph TD
A["Raw Documents"] --> B["Text Chunker"]
B --> C["EntityExtractor<br/>(LLM calls per chunk)"]
C -->|"entities"| D["GraphStore<br/>(in-memory knowledge graph)"]
C -->|"relationships"| D
D --> E["CommunityDetector<br/>(Label Propagation)"]
E --> F["CommunitySummarizer<br/>(LLM per community)"]
F --> G["Community Reports"]
H["User Query"] --> I{"Search Mode"}
I -->|"local"| J["Fuzzy Entity Lookup"]
I -->|"global"| K["Report Ranking by relevance"]
I -->|"hybrid"| L["Both in parallel"]
J --> M["BFS Traversal<br/>(K hops)"]
M --> N["Entity + Relationship Context"]
K --> O["Top-K Report Summaries"]
L --> P["Merged Context"]
N --> Q["LLM Synthesis"]
O --> Q
P --> Q
Q --> R["Answer + Sources"]
style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
style G fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
style R fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fffPipeline stages
- EntityExtractor — the LLM reads each text chunk and returns structured JSON listing entities (with type, description) and directed relationships between them. Results are merged and deduplicated across chunks.
- GraphStore — an in-memory knowledge graph. Entities and relationships are stored with their source document IDs so you always know where a fact came from. It supports BFS traversal and fuzzy name lookup (handles plurals, acronyms).
- CommunityDetector — runs Label Propagation on the graph to cluster tightly connected entities into communities. Oversized communities are recursively split until every community is under
maxCommunitySize. - CommunitySummarizer — the LLM writes a structured report for each community: a title, a narrative summary, and a list of key findings. Reports are the raw material for global search.
Installation
GraphRAG is built into @hazeljs/rag — no additional packages needed beyond what the RAG package already requires:
npm install @hazeljs/rag openai
Quick start
import OpenAI from 'openai';
import { GraphRAGPipeline, TextFileLoader } from '@hazeljs/rag';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// 1. Create the pipeline
const graphRag = new GraphRAGPipeline({
// An async function that takes a prompt string and returns the LLM's text response.
// You can use any LLM — the pipeline is provider-agnostic.
llm: async (prompt) => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
temperature: 0,
messages: [{ role: 'user', content: prompt }],
});
return res.choices[0].message.content ?? '';
},
});
// 2. Load documents
const docs = await new TextFileLoader({ filePath: './docs/readme.txt' }).load();
// 3. Build the graph (extract → detect communities → summarise)
const stats = await graphRag.build(docs);
console.log(stats);
// {
// documentsProcessed: 1,
// entitiesExtracted: 23,
// relationshipsExtracted: 31,
// communitiesDetected: 4,
// communityReportsGenerated: 4,
// duration: 6240,
// }
// 4. Search
const result = await graphRag.search('What is the main purpose of this project?');
console.log(result.answer);
Configuration
All options are optional — the defaults work well for most use cases:
const graphRag = new GraphRAGPipeline({
llm: myLLMFunction,
// Extraction
extractionChunkSize: 2000, // max chars sent to LLM per extraction call
// smaller = more LLM calls, more precise
// Community detection
generateCommunityReports: true, // false skips summarisation (faster, no global search)
maxCommunitySize: 15, // communities larger than this are recursively split
// Search defaults
localSearchDepth: 2, // BFS hops from seed entities
localSearchTopK: 5, // number of seed entities per query
globalSearchTopK: 5, // community reports used in global context
});
Search modes
Local search — entity-centric
Best for specific questions about named concepts, technologies, people, or processes. Local search:
- Performs a fuzzy text match to find seed entities whose names appear in or near the query
- Runs BFS from each seed entity up to
localSearchDepthhops - Assembles entity descriptions and relationship descriptions as context
- Sends context + query to the LLM for synthesis
const result = await graphRag.search(
'How does the dependency injection system work?',
{ mode: 'local' },
);
console.log(result.answer);
// "The dependency injection system uses constructor injection.
// The IoC container reads TypeScript metadata from @Service() decorators
// to resolve dependencies at startup..."
// Entities found and traversed
result.entities.forEach(e => {
console.log(`${e.name} [${e.type}]: ${e.description}`);
});
// Relationships assembled as evidence
result.relationships.forEach(r => {
console.log(`${r.type}: ${r.description}`);
});
When to use local search: the question contains a specific noun, class name, concept, or technology that is likely to be an entity in the graph.
Global search — community reports
Best for broad, thematic questions that span many parts of the knowledge base. Global search:
- Scores each community report by query relevance (keyword overlap + entity proximity)
- Selects the top-K reports
- Assembles their titles, summaries, and key findings as context
- Sends context + query to the LLM for synthesis
const result = await graphRag.search(
'What are the main architectural layers of this system?',
{ mode: 'global' },
);
// Community reports used as context
result.communities.forEach(c => {
console.log(`\n=== ${c.title} (importance: ${c.rating}/10) ===`);
console.log(c.summary);
c.findings?.forEach(f => console.log(` • ${f}`));
});
When to use global search: the question asks about patterns, themes, architecture, the overall scope of a domain, or comparisons between many entities.
Hybrid search — recommended default
Runs local and global in parallel, merges both context windows, and makes a single LLM synthesis call. This is the best default — it covers specific entities and broad themes simultaneously at the cost of slightly higher token usage.
// mode: 'hybrid' is the default when mode is omitted
const result = await graphRag.search(
'What vector stores does this project support and how are they integrated?',
{
mode: 'hybrid', // default
includeGraph: true, // include entities + relationships in result
includeCommunities: true,
},
);
console.log(`Mode: ${result.mode}, Duration: ${result.duration}ms`);
console.log(`Entities: ${result.entities.length}, Communities: ${result.communities.length}`);
console.log(result.answer);
Entity and relationship types
The EntityExtractor prompts the LLM to classify every extracted concept into a canonical type. This makes the graph consistent and enables type-filtered queries in the future.
Entity types
| Type | Description |
|---|---|
CONCEPT | Abstract ideas, patterns, approaches |
TECHNOLOGY | Software, frameworks, libraries, languages |
PERSON | People, roles, personas |
ORGANIZATION | Companies, teams, projects |
PROCESS | Workflows, procedures, algorithms |
FEATURE | Specific capabilities or functions |
EVENT | Occurrences, releases, incidents |
LOCATION | Physical or logical locations |
OTHER | Anything that doesn't fit above |
Relationship types
| Type | Meaning |
|---|---|
USES | X uses Y |
IMPLEMENTS | X implements Y (e.g. interface, pattern) |
CREATED_BY | X was created/authored by Y |
PART_OF | X is a component of Y |
DEPENDS_ON | X requires Y |
RELATED_TO | General association |
EXTENDS | X extends or inherits from Y |
CONFIGURES | X configures or controls Y |
TRIGGERS | X causes or initiates Y |
PRODUCES | X outputs or generates Y |
REPLACES | X supersedes Y |
OTHER | Anything that doesn't fit above |
Community detection
After extraction, the CommunityDetector runs Label Propagation (LPA) on the undirected entity–relationship graph:
- Each entity starts as its own community label
- Each entity adopts the most frequent label among its neighbours
- Steps 2 repeats until labels stabilise (convergence)
- Any community larger than
maxCommunitySizeis split recursively using the same algorithm on the sub-graph
The result is a set of tightly-connected clusters of entities — effectively the "topics" or "modules" of your knowledge base.
// After build(), inspect the detected communities
const stats = graphRag.getStats();
console.log(stats.communityCount); // total communities
console.log(stats.communityBreakdown); // { 'community_0': 7, 'community_1': 12, ... }
console.log(stats.entityTypeBreakdown); // { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, ... }
Community reports
Once communities are detected, CommunitySummarizer makes one LLM call per community and generates a structured CommunityReport:
interface CommunityReport {
communityId: string;
title: string; // e.g. "HazelJS Core DI and IoC System"
summary: string; // 2–4 sentence narrative
findings: string[]; // bullet-point key facts
rating: number; // 1-10 importance score
entities: string[]; // entity IDs in this community
}
Reports are cached in the GraphStore and reused on every global search call — they are generated once during build().
Incremental updates
Add new documents without rebuilding the entire graph. The pipeline:
- Extracts entities from the new documents
- Merges them into the existing graph (deduplicating overlapping entities)
- Re-runs community detection on the updated graph
- Regenerates community reports
const newDocs = await new WebLoader({ urls: ['https://hazeljs.com/blog/new-post'] }).load();
const updateStats = await graphRag.addDocuments(newDocs);
console.log(`Added ${updateStats.entitiesExtracted} new entities`);
Inspecting the graph
Access the raw graph for visualisation (D3.js, Cytoscape, etc.) or debugging:
const graph = graphRag.getGraph();
// All entities
const entities = [...graph.entities.values()];
console.table(entities.map(e => ({
name: e.name,
type: e.type,
description: e.description.slice(0, 60),
})));
// All relationships
const rels = [...graph.relationships.values()];
console.table(rels.map(r => ({
from: graph.entities.get(r.sourceId)?.name,
type: r.type,
to: graph.entities.get(r.targetId)?.name,
})));
// Community reports
const reports = [...graph.communityReports.values()];
console.log(reports.map(r => `${r.title} (${r.entities.length} entities)`));
// Statistics
const stats = graphRag.getStats();
console.log(stats);
// {
// entityCount: 47,
// relationshipCount: 63,
// communityCount: 8,
// topEntities: [{ name: 'HazelJS', connections: 12 }, ...],
// entityTypeBreakdown: { TECHNOLOGY: 14, CONCEPT: 12, ... },
// communityBreakdown: { community_0: 7, community_1: 6, ... },
// }
HazelJS integration example
A complete HazelJS service using GraphRAGPipeline:
import { Service, OnModuleInit } from '@hazeljs/core';
import OpenAI from 'openai';
import {
GraphRAGPipeline,
DirectoryLoader,
GitHubLoader,
} from '@hazeljs/rag';
@Service()
export class GraphRAGService implements OnModuleInit {
private pipeline: GraphRAGPipeline;
private built = false;
async onModuleInit() {
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
this.pipeline = new GraphRAGPipeline({
llm: async (prompt) => {
const res = await openai.chat.completions.create({
model: process.env.OPENAI_MODEL ?? 'gpt-4o-mini',
temperature: 0,
messages: [{ role: 'user', content: prompt }],
});
return res.choices[0].message.content ?? '';
},
generateCommunityReports: true,
localSearchDepth: 2,
globalSearchTopK: 5,
});
}
async build(dirPath: string) {
const docs = await new DirectoryLoader({ dirPath, recursive: true }).load();
const stats = await this.pipeline.build(docs);
this.built = true;
return stats;
}
async search(query: string, mode: 'local' | 'global' | 'hybrid' = 'hybrid') {
if (!this.built) throw new Error('Call build() first');
return this.pipeline.search(query, { mode });
}
getStats() {
return this.pipeline.getStats();
}
getGraph() {
return this.pipeline.getGraph();
}
}
GraphRAG vs traditional RAG
| Dimension | Traditional RAG | GraphRAG |
|---|---|---|
| Storage | Flat vector index | Knowledge graph + vector index |
| Retrieval unit | Text chunk (K nearest) | Entity + relationships + community reports |
| Cross-document reasoning | Limited | Native via graph traversal |
| Broad thematic questions | Poor | Excellent via community reports |
| Specific entity questions | Good | Excellent via BFS traversal |
| Setup cost | Low | Medium (LLM extraction pass) |
| Token cost per query | Low | Medium (larger context) |
| Incremental updates | Append and re-embed | Append, merge, re-detect communities |
| Hallucination risk | Medium | Lower (structured facts as grounding) |
| Best use case | Single-domain Q&A | Multi-document knowledge bases, wikis, codebases |
Choosing a search mode
Is the question about a specific named concept, technology, or feature?
→ local search
Is the question asking about patterns, architecture, or the big picture?
→ global search
Not sure?
→ hybrid (the default)
Performance tips
- Right-size
extractionChunkSize— 1500–2500 chars is usually optimal. Too small means redundant LLM calls; too large risks the LLM missing entities at the edges. - Set
generateCommunityReports: falseduring development — extraction alone is fast; summarisation adds one LLM call per community. Turn reports on for production builds. - Use
maxCommunitySize: 10–20— smaller communities produce more focused reports; larger ones are faster to detect. - Cache the built graph — serialize
graphRag.getGraph()to JSON/Redis and reload it on restart so you don't pay the extraction cost on every deploy.
Next steps
- Document Loaders Guide — learn all the ways to feed documents into GraphRAG
- RAG Package Reference — complete API for vector stores, embeddings, and splitters
- Agentic RAG Guide — combine GraphRAG with autonomous retrieval strategies
- Agent Package — orchestrate multiple GraphRAG pipelines with
AgentGraph