Quellcode durchsuchen

docs: Replace search quality loop guide with comprehensive language verification framework

Replaces the focused search quality improvement loop with a complete language verification system. The new guide provides a systematic battery of tests (explore, search, call chains, impact analysis, edge extraction, node completeness, and real-world LLM prompts) to verify CodeGraph fully supports a programming language before marking it as production-ready.
Colby McHenry vor 2 Monaten
Ursprung
Commit
b04ee9f9bb
1 geänderte Dateien mit 429 neuen und 82 gelöschten Zeilen
  1. 429 82
      docs/SEARCH_QUALITY_LOOP.md

+ 429 - 82
docs/SEARCH_QUALITY_LOOP.md

@@ -1,45 +1,183 @@
-# CodeGraph Search Quality Loop
+# CodeGraph Language Verification Guide
 
-You are testing and improving CodeGraph's search quality for a specific language. The user will give you a real-world codebase path to test against.
+You are verifying that CodeGraph fully supports a specific programming language. The user will give you a path to a real-world, popular open-source codebase cloned locally. Your job is to run a battery of realistic prompts against it using CodeGraph's API and verify the results are good enough to say that language is **covered and supported**.
 
-## What You're Fixing
+A language is NOT verified until an LLM can reliably use CodeGraph's MCP tools to navigate that codebase — finding the right symbols, understanding call chains, exploring subsystems, and getting useful context for real tasks.
 
-When an LLM queries CodeGraph via MCP tools (`codegraph_search`, `codegraph_explore`, `codegraph_callees`), the results must be relevant. The main failure mode is: methods with common names (like `run`, `get`, `handle`) flood results and bury the actual target. The fix is usually adding `getReceiverType` to the language extractor so methods include their owner type in the FTS-indexed `qualified_name`.
+## Setup
 
-**Example:** Go's `func (sl *scrapeLoop) run()` was indexed as `scrape.go::scrape.go::run`. After adding `getReceiverType`, it became `scrape.go::scrapeLoop::run` — now FTS can rank it above unrelated `run` methods when the query mentions "scrapeLoop".
+### 1. Build and index
 
-## The Loop
+```bash
+npm run build
+rm -rf <codebase_path>/.codegraph
+node dist/bin/codegraph.js init -iv <codebase_path>
+```
+
+The `-iv` flag gives verbose output showing extraction progress, node/edge counts, and timing.
+
+### 2. Quick sanity check
+
+```bash
+# Verify nodes were extracted with proper qualified names
+sqlite3 <codebase_path>/.codegraph/codegraph.db \
+  "SELECT name, kind, qualified_name FROM nodes WHERE kind = 'method' LIMIT 10;"
+
+# GOOD: file.go::StructName::method_name  (owner type present)
+# BAD:  file.go::file.go::method_name     (owner type missing — needs getReceiverType)
+
+# Check edge counts
+sqlite3 <codebase_path>/.codegraph/codegraph.db \
+  "SELECT kind, COUNT(*) FROM edges GROUP BY kind ORDER BY COUNT(*) DESC;"
+
+# Check node kind distribution
+sqlite3 <codebase_path>/.codegraph/codegraph.db \
+  "SELECT kind, COUNT(*) FROM nodes GROUP BY kind ORDER BY COUNT(*) DESC;"
+```
+
+If methods are missing their owner type in `qualified_name`, fix that first (see [Adding getReceiverType](#adding-getreceivertype)) before proceeding with the full test battery.
+
+## The Test Battery
 
-### 1. Pick a test query
+Run **all** of the following test categories against the codebase. Use the Node.js API directly — the test scripts below are templates. Adapt the queries to match real types, methods, and subsystems in the codebase you're testing.
 
-Choose a query that exercises the language's method-on-type pattern. Good queries mention:
-- A specific type/class/struct name
-- A method on that type
-- A broader topic connecting multiple files
+**Pass criteria for each test:** Does the result give an LLM enough correct information to answer the question or complete the task? Would you trust these results if you were the LLM?
 
-Example for Go: `"scrapeLoop run scrape lifecycle TSDB storage"`
+---
 
-### 2. Index the codebase
+### Test 1: `codegraph_explore` — Deep Exploration (MOST IMPORTANT)
+
+This is the primary tool LLMs use. It must return relevant source code grouped by file, with correct relationships, for a natural language query. Test it with **at least 5 different query types**:
 
 ```bash
-rm -rf <codebase_path>/.codegraph
-node dist/bin/codegraph.js init -iv <codebase_path>
+node -e "
+const { CodeGraph } = require('./dist/index.js');
+async function test() {
+  const cg = await CodeGraph.open('<codebase_path>');
+
+  const queries = [
+    // A. Subsystem exploration — broad topic, should find the right files and key classes
+    'How does the caching system work?',
+
+    // B. Specific class/type deep dive — should return that class, its methods, and related types
+    'CacheBuilder configuration and build process',
+
+    // C. Cross-cutting concern — should find implementations across multiple files
+    'How are errors handled and propagated?',
+
+    // D. Data flow question — should trace through multiple layers
+    'How does data flow from input to storage?',
+
+    // E. Implementation detail — specific method behavior
+    'How does eviction decide which entries to remove?',
+  ];
+
+  for (const query of queries) {
+    console.log(\`\n========================================\`);
+    console.log(\`QUERY: \${query}\`);
+    console.log(\`========================================\`);
+
+    const subgraph = await cg.findRelevantContext(query, {
+      searchLimit: 8, traversalDepth: 3, maxNodes: 80, minScore: 0.2,
+    });
+
+    // Show entry points — these are what the LLM sees first
+    console.log(\`\nEntry points (\${subgraph.roots.length}):\`);
+    for (const rootId of subgraph.roots.slice(0, 8)) {
+      const node = subgraph.nodes.get(rootId);
+      if (node) console.log(\`  \${node.name} (\${node.kind}) — \${node.filePath}:\${node.startLine}\`);
+    }
+
+    // Show file distribution — are the right files surfacing?
+    const fileGroups = new Map();
+    for (const node of subgraph.nodes.values()) {
+      if (!fileGroups.has(node.filePath)) fileGroups.set(node.filePath, []);
+      fileGroups.get(node.filePath).push(node.name);
+    }
+    console.log(\`\nFiles (\${fileGroups.size}):\`);
+    for (const [file, nodes] of [...fileGroups.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 8)) {
+      console.log(\`  \${file} (\${nodes.length} symbols): \${nodes.slice(0, 6).join(', ')}\`);
+    }
+
+    // Show edge distribution — are relationships being captured?
+    const edgeKinds = new Map();
+    for (const edge of subgraph.edges) {
+      edgeKinds.set(edge.kind, (edgeKinds.get(edge.kind) || 0) + 1);
+    }
+    console.log(\`\nEdges (\${subgraph.edges.length}):\`);
+    for (const [kind, count] of [...edgeKinds.entries()].sort((a,b) => b - a)) {
+      console.log(\`  \${kind}: \${count}\`);
+    }
+
+    console.log(\`\nTotal: \${subgraph.nodes.size} nodes, \${subgraph.edges.length} edges, \${fileGroups.size} files\`);
+  }
+
+  await cg.close();
+}
+test().catch(console.error);
+"
 ```
 
-The `-iv` flag gives verbose output showing extraction progress, node/edge counts, and timing.
+**What to check for each query:**
+- Do the entry points make sense for the question?
+- Are the right files surfacing (not just test files or unrelated code)?
+- Is there a mix of edge types (calls, contains, extends, implements) — not just `contains`?
+- Does the node count feel right? Too few (<5) means search failed. Too many irrelevant ones means noise.
+
+---
 
-### 3. Check what the DB produced
+### Test 2: `codegraph_search` — Symbol Lookup
+
+Test that searching for specific symbols returns the right results ranked correctly.
 
 ```bash
-# Does the method have its owner type in qualified_name?
-sqlite3 <codebase_path>/.codegraph/codegraph.db \
-  "SELECT name, kind, qualified_name FROM nodes WHERE name = '<method>' AND file_path LIKE '%<file>%';"
+node -e "
+const { CodeGraph } = require('./dist/index.js');
+async function test() {
+  const cg = await CodeGraph.open('<codebase_path>');
+
+  const searches = [
+    // A. Class by name
+    { query: 'CacheBuilder', kinds: ['class'], desc: 'Find a specific class' },
+
+    // B. Method on a specific type (the classic disambiguation test)
+    { query: 'CacheBuilder build', kinds: ['method'], desc: 'Method on specific class' },
+
+    // C. Common method name — should still find relevant ones
+    { query: 'get', kinds: ['method'], desc: 'Common method name' },
+
+    // D. Interface/trait
+    { query: 'Cache', kinds: ['interface'], desc: 'Find an interface' },
+
+    // E. Enum
+    { query: 'Strength', kinds: ['enum'], desc: 'Find an enum' },
+  ];
 
-# GOOD: file.rs::StructName::method_name
-# BAD:  file.rs::file.rs::method_name  ← owner type missing, FTS can't find it
+  for (const s of searches) {
+    console.log(\`\n--- \${s.desc}: \"\${s.query}\" (kinds: \${s.kinds}) ---\`);
+    const results = cg.searchNodes(s.query, { limit: 10, kinds: s.kinds });
+    for (const r of results) {
+      console.log(\`  \${r.score.toFixed(1)} | \${r.node.name} (\${r.node.kind}) | \${r.node.qualifiedName}\`);
+    }
+    if (results.length === 0) console.log('  *** NO RESULTS ***');
+  }
+
+  await cg.close();
+}
+test().catch(console.error);
+"
 ```
 
-### 4. Test search ranking
+**What to check:**
+- Does the target symbol rank in the top 3?
+- For common names like `get`, do the results include qualified names that help disambiguate?
+- Are there zero-result queries? That's a bug.
+
+---
+
+### Test 3: `codegraph_callers` / `codegraph_callees` — Call Chain Tracing
+
+Test that call relationships were extracted correctly.
 
 ```bash
 node -e "
@@ -47,41 +185,237 @@ const { CodeGraph } = require('./dist/index.js');
 async function test() {
   const cg = await CodeGraph.open('<codebase_path>');
 
-  // Does the target method rank #1?
-  console.log('=== searchNodes ===');
-  const results = cg.searchNodes('<OwnerType> <method>', { limit: 10, kinds: ['method'] });
-  for (const r of results) {
-    console.log(\`\${r.score.toFixed(2)} | \${r.node.name} (\${r.node.kind}) | \${r.node.filePath}:\${r.node.startLine}\`);
+  // Pick 3-4 important methods and check their call graphs
+  const symbols = ['build', 'get', 'put', 'invalidate'];
+
+  for (const sym of symbols) {
+    // Find the symbol
+    const results = cg.searchNodes(sym, { limit: 5, kinds: ['method'] });
+    if (results.length === 0) { console.log(\`\${sym}: not found\`); continue; }
+
+    const node = results[0].node;
+    console.log(\`\n--- \${node.name} (\${node.qualifiedName}) ---\`);
+
+    // Check callees (what does it call?)
+    const callees = cg.getCallees(node.id);
+    console.log(\`  Callees (\${callees.length}): \${callees.slice(0, 10).map(c => c.node.name).join(', ')}\`);
+
+    // Check callers (what calls it?)
+    const callers = cg.getCallers(node.id);
+    console.log(\`  Callers (\${callers.length}): \${callers.slice(0, 10).map(c => c.node.name).join(', ')}\`);
   }
 
-  // Does explore find the right file?
-  console.log('\n=== findRelevantContext ===');
-  const subgraph = await cg.findRelevantContext('<your natural language query>', {
-    searchLimit: 8, traversalDepth: 3, maxNodes: 80, minScore: 0.2,
-  });
-  const fileGroups = new Map();
-  for (const node of subgraph.nodes.values()) {
-    if (!fileGroups.has(node.filePath)) fileGroups.set(node.filePath, []);
-    fileGroups.get(node.filePath).push(node.name);
+  await cg.close();
+}
+test().catch(console.error);
+"
+```
+
+**What to check:**
+- Do methods have callers AND callees? If a method has 0 of both, edge extraction may be broken.
+- Do the callers/callees make sense? A `build()` method should call constructor-like things, and be called by setup/initialization code.
+- Are the counts reasonable? A core method in a popular codebase should have multiple callers.
+
+---
+
+### Test 4: `codegraph_impact` — Change Impact Analysis
+
+Test that the impact radius correctly identifies affected code.
+
+```bash
+node -e "
+const { CodeGraph } = require('./dist/index.js');
+async function test() {
+  const cg = await CodeGraph.open('<codebase_path>');
+
+  // Pick a core class or interface that many things depend on
+  const results = cg.searchNodes('<CoreClass>', { limit: 1, kinds: ['class', 'interface'] });
+  if (results.length === 0) { console.log('Not found'); return; }
+
+  const node = results[0].node;
+  console.log(\`Impact analysis for: \${node.name} (\${node.kind}) — \${node.filePath}\`);
+
+  const impact = cg.getImpactRadius(node.id, { depth: 2 });
+  console.log(\`\nAffected nodes: \${impact.nodes.size}\`);
+  console.log(\`Affected edges: \${impact.edges.length}\`);
+
+  // Group by file
+  const files = new Map();
+  for (const n of impact.nodes.values()) {
+    if (!files.has(n.filePath)) files.set(n.filePath, []);
+    files.get(n.filePath).push(n.name);
   }
-  console.log('Entry points:');
-  for (const rootId of subgraph.roots.slice(0, 8)) {
-    const node = subgraph.nodes.get(rootId);
-    if (node) console.log(\`  \${node.name} (\${node.kind}) - \${node.filePath}:\${node.startLine}\`);
+  console.log(\`Affected files: \${files.size}\`);
+  for (const [file, nodes] of [...files.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 10)) {
+    console.log(\`  \${file}: \${nodes.slice(0, 5).join(', ')}\`);
   }
-  console.log('Top files:');
-  for (const [file, nodes] of [...fileGroups.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 5)) {
-    console.log(\`  \${file} (\${nodes.length}): \${nodes.slice(0, 5).join(', ')}\`);
+
+  await cg.close();
+}
+test().catch(console.error);
+"
+```
+
+**What to check:**
+- Does changing a core interface/class show a wide impact radius?
+- Are the affected files reasonable (things that import/extend/use it)?
+- Is the impact radius non-empty? Zero impact on a core type means edges are missing.
+
+---
+
+### Test 5: Edge Extraction Quality
+
+Directly verify that the major edge types are being extracted for this language.
+
+```bash
+node -e "
+const { CodeGraph } = require('./dist/index.js');
+async function test() {
+  const cg = await CodeGraph.open('<codebase_path>');
+
+  // Check overall edge distribution
+  console.log('=== Edge distribution ===');
+  // (Use sqlite3 query from sanity check above)
+
+  // Find a class that extends another
+  const classes = cg.searchNodes('', { limit: 100, kinds: ['class'] });
+  let foundExtends = false, foundImplements = false;
+  for (const r of classes) {
+    const callees = cg.getCallees(r.node.id);
+    // getCallees returns all outgoing edges, check for extends/implements
+    // Better: use graph traversal
   }
 
-  // Does qualified lookup resolve correctly?
-  console.log('\n=== qualified lookup ===');
-  const qr = cg.searchNodes('<OwnerType>.<method>', { limit: 50 });
-  const exact = qr.filter(r => r.node.qualifiedName.includes('<OwnerType>::<method>'));
-  console.log(\`\${exact.length} match(es) for <OwnerType>.<method>\`);
-  if (exact[0]) {
-    const callees = cg.getCallees(exact[0].node.id);
-    console.log('Callees:', callees.map(c => c.node.name).join(', '));
+  // Verify specific relationship types exist
+  const checks = [
+    { desc: 'contains edges (class → method)', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"contains\"' },
+    { desc: 'calls edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"calls\"' },
+    { desc: 'imports edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"imports\"' },
+    { desc: 'extends edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"extends\"' },
+    { desc: 'implements edges', query: 'SELECT COUNT(*) FROM edges WHERE kind = \"implements\"' },
+  ];
+  // Run these via sqlite3 (shown in sanity check section)
+
+  await cg.close();
+}
+test().catch(console.error);
+"
+```
+
+```bash
+sqlite3 <codebase_path>/.codegraph/codegraph.db "
+  SELECT kind, COUNT(*) as cnt FROM edges GROUP BY kind ORDER BY cnt DESC;
+"
+```
+
+**What to check:**
+- `contains` should be the most common (structural hierarchy).
+- `calls` should be plentiful — if near zero, call extraction is broken for this language.
+- `imports` should exist — if zero, import parsing is broken.
+- `extends` and `implements` should exist if the language has inheritance — if zero, `extractInheritance()` may not handle this language's AST.
+
+---
+
+### Test 6: Node Extraction Completeness
+
+Verify all expected node kinds are being extracted.
+
+```bash
+sqlite3 <codebase_path>/.codegraph/codegraph.db "
+  SELECT kind, COUNT(*) as cnt FROM nodes GROUP BY kind ORDER BY cnt DESC;
+"
+```
+
+**What to check for each language:**
+
+| Node Kind | Expected? | Notes |
+|-----------|-----------|-------|
+| `file` | Always | One per source file |
+| `class` | If language has classes | |
+| `method` | If language has methods | Should include owner type in `qualified_name` |
+| `function` | If language has top-level functions | |
+| `interface` | If language has interfaces/protocols | |
+| `enum` | If language has enums | |
+| `enum_member` | If language has enums | Values inside enums |
+| `import` | Always | One per import statement |
+| `variable` / `field` | Usually | Fields, constants, top-level vars |
+| `struct` | If language has structs | Go, Rust, C, Swift |
+| `trait` | If language has traits | Rust |
+
+If an expected node kind has 0 count, the language extractor is missing that AST type.
+
+---
+
+### Test 7: Real-World LLM Prompts
+
+This is the final and most important test. Simulate the kinds of questions a developer would actually ask an LLM that's using CodeGraph. For each prompt, run `findRelevantContext` (which powers `codegraph_explore`) and evaluate whether the returned context would let an LLM give a correct, complete answer.
+
+**Run at least 5 of these prompt styles, adapted to the actual codebase:**
+
+```bash
+node -e "
+const { CodeGraph } = require('./dist/index.js');
+async function test() {
+  const cg = await CodeGraph.open('<codebase_path>');
+
+  const prompts = [
+    // 1. \"How does X work?\" — subsystem understanding
+    'How does the cache eviction policy work?',
+
+    // 2. \"Where is X implemented?\" — symbol location
+    'Where is the LRU eviction logic implemented?',
+
+    // 3. \"What calls X?\" — usage discovery
+    'What code triggers cache invalidation?',
+
+    // 4. \"I want to change X, what breaks?\" — impact assessment
+    'If I change the Cache interface, what else is affected?',
+
+    // 5. \"How do X and Y interact?\" — cross-component relationships
+    'How does CacheBuilder connect to LocalCache?',
+
+    // 6. \"Show me the flow from A to B\" — data/control flow
+    'What happens when a cache entry expires?',
+
+    // 7. \"What are all the implementations of X?\" — polymorphism
+    'What classes implement the Cache interface?',
+
+    // 8. Bug investigation prompt
+    'Cache entries are not being evicted when they should be — where should I look?',
+  ];
+
+  for (const prompt of prompts) {
+    console.log(\`\n========================================\`);
+    console.log(\`PROMPT: \${prompt}\`);
+    console.log(\`========================================\`);
+
+    const subgraph = await cg.findRelevantContext(prompt, {
+      searchLimit: 8, traversalDepth: 3, maxNodes: 80, minScore: 0.2,
+    });
+
+    console.log(\`Result: \${subgraph.nodes.size} nodes, \${subgraph.edges.length} edges, \${subgraph.roots.length} entry points\`);
+
+    console.log('Entry points:');
+    for (const rootId of subgraph.roots.slice(0, 5)) {
+      const node = subgraph.nodes.get(rootId);
+      if (node) console.log(\`  \${node.name} (\${node.kind}) — \${node.filePath}:\${node.startLine}\`);
+    }
+
+    const fileGroups = new Map();
+    for (const node of subgraph.nodes.values()) {
+      if (!fileGroups.has(node.filePath)) fileGroups.set(node.filePath, []);
+      fileGroups.get(node.filePath).push(node.name);
+    }
+    console.log('Top files:');
+    for (const [file, nodes] of [...fileGroups.entries()].sort((a,b) => b[1].length - a[1].length).slice(0, 5)) {
+      console.log(\`  \${file} (\${nodes.length}): \${nodes.slice(0, 5).join(', ')}\`);
+    }
+
+    // PASS/FAIL judgment
+    const hasEntryPoints = subgraph.roots.length > 0;
+    const hasEdges = subgraph.edges.length > 0;
+    const hasMultipleFiles = fileGroups.size > 1;
+    console.log(\`\\nVERDICT: \${hasEntryPoints && hasEdges && hasMultipleFiles ? 'PASS' : 'FAIL — needs investigation'}\`);
   }
 
   await cg.close();
@@ -90,36 +424,46 @@ test().catch(console.error);
 "
 ```
 
-### 5. If results are bad, diagnose and fix
+**What to check for each prompt:**
+- Does it return entry points? Zero entry points = total failure.
+- Are the entry points **relevant** to the question? (Not just random symbols that happen to share a word.)
+- Does it span multiple files? Most real questions involve cross-file understanding.
+- Are relationships present? An LLM needs to understand how symbols connect, not just a list of names.
+- Would **you** be able to answer the question from this context?
+
+---
 
-| Symptom | Cause | Fix |
-|---------|-------|-----|
-| Target method not in top 10 of `searchNodes` | Owner type missing from `qualified_name` | Add `getReceiverType` to `src/extraction/languages/<lang>.ts` |
-| Explore returns irrelevant files | Common method name flooding exact matches | Check co-location boost in `src/db/queries.ts: findNodesByExactName` |
-| A key term is being dropped from search | It's in the STOP_WORDS list | Edit `src/search/query-utils.ts` |
-| `<OwnerType>.<method>` returns "not found" | `qualified_name` doesn't contain `OwnerType::method` | Fix `getReceiverType` output |
+## Diagnosing Failures
 
-### 6. Rebuild and re-test
+| Symptom | Likely Cause | Where to Fix |
+|---------|-------------|--------------|
+| Method missing owner type in `qualified_name` | Language needs `getReceiverType` | `src/extraction/languages/<lang>.ts` |
+| `codegraph_explore` returns irrelevant files | Common names flooding FTS; co-location boost not helping | `src/db/queries.ts: findNodesByExactName`, `src/context/index.ts` |
+| Zero `calls` edges | `callTypes` missing or wrong AST node type | `src/extraction/languages/<lang>.ts: callTypes` |
+| Zero `extends`/`implements` edges | `extractInheritance()` doesn't handle this language's AST | `src/extraction/tree-sitter.ts: extractInheritance()` |
+| Missing node kinds (no enums, no interfaces) | AST type not listed in extractor | `src/extraction/languages/<lang>.ts: enumTypes`, `interfaceTypes`, etc. |
+| Search term dropped from query | Term is in the stop words list | `src/search/query-utils.ts: STOP_WORDS` |
+| `qualified_name` missing class for nested methods | Extraction not walking parent stack correctly | `src/extraction/tree-sitter.ts: visitNode()` |
+| Import edges missing | `extractImport` returns null for this syntax | `src/extraction/languages/<lang>.ts: extractImport` |
+
+## After Fixing Issues
 
 ```bash
 npm run build
-# If you changed extraction (getReceiverType), must re-index:
 rm -rf <codebase_path>/.codegraph
 node dist/bin/codegraph.js init -iv <codebase_path>
-# Then re-run Step 4
+# Re-run the failing tests from above
 ```
 
-### 7. Run the test suite before finishing
+Always run the full test suite before marking a language as verified:
 
 ```bash
 npm test
 ```
 
-All 378+ tests must pass.
-
-## How to Add `getReceiverType` for a Language
+## Adding `getReceiverType`
 
-**Only needed for languages where methods are top-level or outside their owner type in the AST.** If the language nests methods inside class/struct bodies (Python, Java, TypeScript, C#), the qualified name already includes the parent — verify with Step 3 before adding anything.
+**Only needed for languages where methods are top-level or outside their owner type in the AST.** If the language nests methods inside class/struct bodies (Python, Java, TypeScript, C#), the qualified name already includes the parent — verify with the sanity check before adding anything.
 
 ### 1. Add the hook to the language extractor
 
@@ -129,7 +473,7 @@ In `src/extraction/languages/<lang>.ts`, add `getReceiverType` to the extractor
 getReceiverType: (node, source) => {
   // Extract the owner type name from the method's AST node.
   // Return the type name string, or undefined if not applicable.
-  // 
+  //
   // The core extractMethod() in tree-sitter.ts will use this to set:
   //   qualifiedName = `${filePath}::${receiverType}::${methodName}`
 },
@@ -163,29 +507,32 @@ if (receiverType) {
 
 | File | Role |
 |------|------|
-| `src/extraction/languages/<lang>.ts` | Language extractor — implement `getReceiverType` here |
-| `src/extraction/tree-sitter.ts` | Core extraction — `extractMethod()` uses the hook |
+| `src/extraction/languages/<lang>.ts` | Language extractor — node types, call types, `getReceiverType` |
+| `src/extraction/tree-sitter.ts` | Core extraction — `extractMethod()`, `extractCall()`, `extractInheritance()` |
 | `src/extraction/tree-sitter-types.ts` | `LanguageExtractor` interface definition |
 | `src/search/query-utils.ts` | `STOP_WORDS`, `extractSearchTerms`, `scorePathRelevance` |
-| `src/db/queries.ts` | `searchNodesFTS` (BM25), `findNodesByExactName` (co-location) |
-| `src/context/index.ts` | `findRelevantContext` — hybrid search + co-location boost |
-| `src/mcp/tools.ts` | MCP handlers — `matchesSymbol` uses `qualifiedName.includes("Type::method")` |
+| `src/db/queries.ts` | `searchNodesFTS` (BM25), `findNodesByExactName` (co-location boost) |
+| `src/context/index.ts` | `findRelevantContext` — hybrid search + graph traversal |
+| `src/mcp/tools.ts` | MCP tool handlers — `codegraph_explore` implementation |
 
-## Languages Completed
+## Language Status
+
+### Verified
 
 - [x] **Go** — `getReceiverType` extracts receiver from `func (sl *Type) method()`
-- [x] **Swift** — NOT needed. Tree-sitter parses `extension Type { }` as `class_declaration`, so methods already get owner type in `qualified_name` (e.g., `SimplifyApply.swift::SimplifyApply.swift::ApplyInst::simplify`)
+- [x] **Swift** — NOT needed. Tree-sitter nests methods inside class/extension bodies
+- [x] **Java** — NOT needed. Methods nested in class body. Verified against Guava
 
-## Languages To Do
+### Needs Verification
 
-Check these — only add `getReceiverType` if methods are top-level (not nested inside their owner type in the AST):
+Check these — may need `getReceiverType` if methods are top-level in the AST:
 
 - [ ] Rust — methods in `impl Type { }` blocks
 - [ ] C++ — out-of-class method definitions `Type::method()`
 - [ ] Kotlin — extension functions `fun Type.method()`
 
-Verify these DON'T need it (methods nested in class body → qualified name should already be correct):
-- [ ] Python — verify `qualified_name` includes class name
-- [ ] Java — verify `qualified_name` includes class name
-- [ ] TypeScript — verify `qualified_name` includes class name
-- [ ] C# — verify `qualified_name` includes class name
+Verify these DON'T need `getReceiverType` (methods nested in class body):
+
+- [ ] Python
+- [ ] TypeScript
+- [ ] C#