GraphRAG — Knowledge Graph Retrieval

GraphRAG is a next-generation retrieval strategy that builds a knowledge graph of entities and relationships from your documents, then uses that graph — instead of raw text chunks — to answer questions. It was pioneered by Microsoft Research and is especially powerful for multi-document knowledge bases where flat cosine similarity falls short.

Why GraphRAG?

Traditional RAG retrieves the K most similar text chunks by cosine distance. This works well for narrow, keyword-anchored questions but has well-known failure modes:

  • Fragmented context — the answer spans many chunks that are individually below the similarity threshold
  • Missing relationships — "how does X relate to Y?" requires traversing a relationship, not matching embeddings
  • Thematic blindness — "what are the main architectural patterns used here?" has no single matching chunk

GraphRAG fixes all three by maintaining a structured knowledge graph in parallel with the vector index and offering two complementary search modes — local and global.

Architecture

graph TD
  A["Raw Documents"] --> B["Text Chunker"]
  B --> C["EntityExtractor<br/>(LLM calls per chunk)"]
  C -->|"entities"| D["GraphStore<br/>(in-memory knowledge graph)"]
  C -->|"relationships"| D
  D --> E["CommunityDetector<br/>(Label Propagation)"]
  E --> F["CommunitySummarizer<br/>(LLM per community)"]
  F --> G["Community Reports"]

  H["User Query"] --> I{"Search Mode"}
  I -->|"local"| J["Fuzzy Entity Lookup"]
  I -->|"global"| K["Report Ranking by relevance"]
  I -->|"hybrid"| L["Both in parallel"]

  J --> M["BFS Traversal<br/>(K hops)"]
  M --> N["Entity + Relationship Context"]
  K --> O["Top-K Report Summaries"]
  L --> P["Merged Context"]

  N --> Q["LLM Synthesis"]
  O --> Q
  P --> Q
  Q --> R["Answer + Sources"]

  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style G fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style R fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Pipeline stages

  1. EntityExtractor — the LLM reads each text chunk and returns structured JSON listing entities (with type, description) and directed relationships between them. Results are merged and deduplicated across chunks.
  2. GraphStore — an in-memory knowledge graph. Entities and relationships are stored with their source document IDs so you always know where a fact came from. It supports BFS traversal and fuzzy name lookup (handles plurals, acronyms).
  3. CommunityDetector — runs Label Propagation on the graph to cluster tightly connected entities into communities. Oversized communities are recursively split until every community is under maxCommunitySize.
  4. CommunitySummarizer — the LLM writes a structured report for each community: a title, a narrative summary, and a list of key findings. Reports are the raw material for global search.

Installation

GraphRAG is built into @hazeljs/rag — no additional packages needed beyond what the RAG package already requires:

npm install @hazeljs/rag openai

Quick start

import OpenAI from 'openai';
import { GraphRAGPipeline, TextFileLoader } from '@hazeljs/rag';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// 1. Create the pipeline
const graphRag = new GraphRAGPipeline({
  // An async function that takes a prompt string and returns the LLM's text response.
  // You can use any LLM — the pipeline is provider-agnostic.
  llm: async (prompt) => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      temperature: 0,
      messages: [{ role: 'user', content: prompt }],
    });
    return res.choices[0].message.content ?? '';
  },
});

// 2. Load documents
const docs = await new TextFileLoader({ filePath: './docs/readme.txt' }).load();

// 3. Build the graph (extract → detect communities → summarise)
const stats = await graphRag.build(docs);
console.log(stats);
// {
//   documentsProcessed: 1,
//   entitiesExtracted: 23,
//   relationshipsExtracted: 31,
//   communitiesDetected: 4,
//   communityReportsGenerated: 4,
//   duration: 6240,
// }

// 4. Search
const result = await graphRag.search('What is the main purpose of this project?');
console.log(result.answer);

Configuration

All options are optional — the defaults work well for most use cases:

const graphRag = new GraphRAGPipeline({
  llm: myLLMFunction,

  // Extraction
  extractionChunkSize: 2000,     // max chars sent to LLM per extraction call
                                  // smaller = more LLM calls, more precise

  // Community detection
  generateCommunityReports: true, // false skips summarisation (faster, no global search)
  maxCommunitySize: 15,           // communities larger than this are recursively split

  // Search defaults
  localSearchDepth: 2,            // BFS hops from seed entities
  localSearchTopK: 5,             // number of seed entities per query
  globalSearchTopK: 5,            // community reports used in global context
});

Search modes

Local search — entity-centric

Best for specific questions about named concepts, technologies, people, or processes. Local search:

  1. Performs a fuzzy text match to find seed entities whose names appear in or near the query
  2. Runs BFS from each seed entity up to localSearchDepth hops
  3. Assembles entity descriptions and relationship descriptions as context
  4. Sends context + query to the LLM for synthesis
const result = await graphRag.search(
  'How does the dependency injection system work?',
  { mode: 'local' },
);

console.log(result.answer);
// "The dependency injection system uses constructor injection.
//  The IoC container reads TypeScript metadata from @Service() decorators
//  to resolve dependencies at startup..."

// Entities found and traversed
result.entities.forEach(e => {
  console.log(`${e.name} [${e.type}]: ${e.description}`);
});

// Relationships assembled as evidence
result.relationships.forEach(r => {
  console.log(`${r.type}: ${r.description}`);
});

When to use local search: the question contains a specific noun, class name, concept, or technology that is likely to be an entity in the graph.

Global search — community reports

Best for broad, thematic questions that span many parts of the knowledge base. Global search:

  1. Scores each community report by query relevance (keyword overlap + entity proximity)
  2. Selects the top-K reports
  3. Assembles their titles, summaries, and key findings as context
  4. Sends context + query to the LLM for synthesis
const result = await graphRag.search(
  'What are the main architectural layers of this system?',
  { mode: 'global' },
);

// Community reports used as context
result.communities.forEach(c => {
  console.log(`\n=== ${c.title} (importance: ${c.rating}/10) ===`);
  console.log(c.summary);
  c.findings?.forEach(f => console.log(`  • ${f}`));
});

When to use global search: the question asks about patterns, themes, architecture, the overall scope of a domain, or comparisons between many entities.

Hybrid search — recommended default

Runs local and global in parallel, merges both context windows, and makes a single LLM synthesis call. This is the best default — it covers specific entities and broad themes simultaneously at the cost of slightly higher token usage.

// mode: 'hybrid' is the default when mode is omitted
const result = await graphRag.search(
  'What vector stores does this project support and how are they integrated?',
  {
    mode: 'hybrid',        // default
    includeGraph: true,    // include entities + relationships in result
    includeCommunities: true,
  },
);

console.log(`Mode: ${result.mode}, Duration: ${result.duration}ms`);
console.log(`Entities: ${result.entities.length}, Communities: ${result.communities.length}`);
console.log(result.answer);

Entity and relationship types

The EntityExtractor prompts the LLM to classify every extracted concept into a canonical type. This makes the graph consistent and enables type-filtered queries in the future.

Entity types

TypeDescription
CONCEPTAbstract ideas, patterns, approaches
TECHNOLOGYSoftware, frameworks, libraries, languages
PERSONPeople, roles, personas
ORGANIZATIONCompanies, teams, projects
PROCESSWorkflows, procedures, algorithms
FEATURESpecific capabilities or functions
EVENTOccurrences, releases, incidents
LOCATIONPhysical or logical locations
OTHERAnything that doesn't fit above

Relationship types

TypeMeaning
USESX uses Y
IMPLEMENTSX implements Y (e.g. interface, pattern)
CREATED_BYX was created/authored by Y
PART_OFX is a component of Y
DEPENDS_ONX requires Y
RELATED_TOGeneral association
EXTENDSX extends or inherits from Y
CONFIGURESX configures or controls Y
TRIGGERSX causes or initiates Y
PRODUCESX outputs or generates Y
REPLACESX supersedes Y
OTHERAnything that doesn't fit above

Community detection

After extraction, the CommunityDetector runs Label Propagation (LPA) on the undirected entity–relationship graph:

  1. Each entity starts as its own community label
  2. Each entity adopts the most frequent label among its neighbours
  3. Steps 2 repeats until labels stabilise (convergence)
  4. Any community larger than maxCommunitySize is split recursively using the same algorithm on the sub-graph

The result is a set of tightly-connected clusters of entities — effectively the "topics" or "modules" of your knowledge base.

// After build(), inspect the detected communities
const stats = graphRag.getStats();
console.log(stats.communityCount);           // total communities
console.log(stats.communityBreakdown);       // { 'community_0': 7, 'community_1': 12, ... }
console.log(stats.entityTypeBreakdown);      // { TECHNOLOGY: 14, CONCEPT: 12, FEATURE: 9, ... }

Community reports

Once communities are detected, CommunitySummarizer makes one LLM call per community and generates a structured CommunityReport:

interface CommunityReport {
  communityId: string;
  title: string;           // e.g. "HazelJS Core DI and IoC System"
  summary: string;         // 2–4 sentence narrative
  findings: string[];      // bullet-point key facts
  rating: number;          // 1-10 importance score
  entities: string[];      // entity IDs in this community
}

Reports are cached in the GraphStore and reused on every global search call — they are generated once during build().


Incremental updates

Add new documents without rebuilding the entire graph. The pipeline:

  1. Extracts entities from the new documents
  2. Merges them into the existing graph (deduplicating overlapping entities)
  3. Re-runs community detection on the updated graph
  4. Regenerates community reports
const newDocs = await new WebLoader({ urls: ['https://hazeljs.com/blog/new-post'] }).load();
const updateStats = await graphRag.addDocuments(newDocs);
console.log(`Added ${updateStats.entitiesExtracted} new entities`);

Inspecting the graph

Access the raw graph for visualisation (D3.js, Cytoscape, etc.) or debugging:

const graph = graphRag.getGraph();

// All entities
const entities = [...graph.entities.values()];
console.table(entities.map(e => ({
  name: e.name,
  type: e.type,
  description: e.description.slice(0, 60),
})));

// All relationships
const rels = [...graph.relationships.values()];
console.table(rels.map(r => ({
  from: graph.entities.get(r.sourceId)?.name,
  type: r.type,
  to: graph.entities.get(r.targetId)?.name,
})));

// Community reports
const reports = [...graph.communityReports.values()];
console.log(reports.map(r => `${r.title} (${r.entities.length} entities)`));

// Statistics
const stats = graphRag.getStats();
console.log(stats);
// {
//   entityCount: 47,
//   relationshipCount: 63,
//   communityCount: 8,
//   topEntities: [{ name: 'HazelJS', connections: 12 }, ...],
//   entityTypeBreakdown: { TECHNOLOGY: 14, CONCEPT: 12, ... },
//   communityBreakdown: { community_0: 7, community_1: 6, ... },
// }

HazelJS integration example

A complete HazelJS service using GraphRAGPipeline:

import { Service, OnModuleInit } from '@hazeljs/core';
import OpenAI from 'openai';
import {
  GraphRAGPipeline,
  DirectoryLoader,
  GitHubLoader,
} from '@hazeljs/rag';

@Service()
export class GraphRAGService implements OnModuleInit {
  private pipeline: GraphRAGPipeline;
  private built = false;

  async onModuleInit() {
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

    this.pipeline = new GraphRAGPipeline({
      llm: async (prompt) => {
        const res = await openai.chat.completions.create({
          model: process.env.OPENAI_MODEL ?? 'gpt-4o-mini',
          temperature: 0,
          messages: [{ role: 'user', content: prompt }],
        });
        return res.choices[0].message.content ?? '';
      },
      generateCommunityReports: true,
      localSearchDepth: 2,
      globalSearchTopK: 5,
    });
  }

  async build(dirPath: string) {
    const docs = await new DirectoryLoader({ dirPath, recursive: true }).load();
    const stats = await this.pipeline.build(docs);
    this.built = true;
    return stats;
  }

  async search(query: string, mode: 'local' | 'global' | 'hybrid' = 'hybrid') {
    if (!this.built) throw new Error('Call build() first');
    return this.pipeline.search(query, { mode });
  }

  getStats() {
    return this.pipeline.getStats();
  }

  getGraph() {
    return this.pipeline.getGraph();
  }
}

GraphRAG vs traditional RAG

DimensionTraditional RAGGraphRAG
StorageFlat vector indexKnowledge graph + vector index
Retrieval unitText chunk (K nearest)Entity + relationships + community reports
Cross-document reasoningLimitedNative via graph traversal
Broad thematic questionsPoorExcellent via community reports
Specific entity questionsGoodExcellent via BFS traversal
Setup costLowMedium (LLM extraction pass)
Token cost per queryLowMedium (larger context)
Incremental updatesAppend and re-embedAppend, merge, re-detect communities
Hallucination riskMediumLower (structured facts as grounding)
Best use caseSingle-domain Q&AMulti-document knowledge bases, wikis, codebases

Choosing a search mode

Is the question about a specific named concept, technology, or feature?
  → local search

Is the question asking about patterns, architecture, or the big picture?
  → global search

Not sure?
  → hybrid (the default)

Performance tips

  • Right-size extractionChunkSize — 1500–2500 chars is usually optimal. Too small means redundant LLM calls; too large risks the LLM missing entities at the edges.
  • Set generateCommunityReports: false during development — extraction alone is fast; summarisation adds one LLM call per community. Turn reports on for production builds.
  • Use maxCommunitySize: 10–20 — smaller communities produce more focused reports; larger ones are faster to detect.
  • Cache the built graph — serialize graphRag.getGraph() to JSON/Redis and reload it on restart so you don't pay the extraction cost on every deploy.

Next steps