Преглед на файлове

fix(graph): complete edge sets & correct node limits in traversal (#1086, #1087, #1088, #1089, #1090)

Three root defects in src/graph/traversal.ts (reported by @inth3shadows as #1086–#1090):

- Depth guard returned before visited.add → duplicate callers/callees at maxDepth=1 and getImpact loop disagreement.
- Dedup gate also gated edge collection → traverseBFS dropped a parallel edge; getImpact dropped a direct incoming dependency edge.
- limit checked per-frame not per-add → high-degree node overshot opts.limit in traverseBFS and dfsRecursive.

traverseBFS now collects every distinct edge among kept nodes (deduped on edge identity), enqueues each node once, and caps per-add. getCallers/getCallees/getImpactRecursive mark visited before the depth check; getImpactRecursive records the incoming edge unconditionally and unifies its loops on visited. 7 regression tests in graph.test.ts, each failing on the pre-fix code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry преди 1 ден
родител
ревизия
43a6fa68f6
променени са 3 файла, в които са добавени 204 реда и са изтрити 18 реда
  1. 1 0
      CHANGELOG.md
  2. 127 0
      __tests__/graph.test.ts
  3. 76 18
      src/graph/traversal.ts

+ 1 - 0
CHANGELOG.md

@@ -19,6 +19,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - The same function-name recovery now covers inline macros from common third-party C++ libraries, not just Unreal Engine — including pugixml (`PUGI__FN`, `PUGIXML_FUNCTION`), Godot (`_FORCE_INLINE_`), Boost (`BOOST_FORCEINLINE`), and generic `ALWAYS_INLINE` / `FORCE_INLINE`. Functions decorated with these are now indexed under their real names. On a large Unreal project vendoring these libraries this cleaned up the large majority of remaining function-name garbling. (#1101)
 - C++ function names are now recovered even when decorated with a macro CodeGraph doesn't specifically know about. A function written `SOME_LIBRARY_MACRO ReturnType doWork(...)` previously had the macro or return type absorbed into its name whenever the macro wasn't one CodeGraph recognized; now the real name (`doWork`) is recovered regardless of the macro, so it's findable and its callers link — no per-library configuration needed. The recognized-macro list was also broadened (Qt, Folly, Abseil, LLVM, V8, Eigen, rapidjson) so those additionally capture the return type. This only ever cleans up an already-garbled name and is limited to C and C++, so ordinary names — and languages like Kotlin and Scala where identifiers can legitimately contain spaces — are unaffected. (#1102)
 - The set of C++ libraries whose macros are recognized for full return-type recovery was expanded well beyond Unreal Engine — now spanning Mozilla, Protobuf, {fmt}, nlohmann/json, GLM, Bullet, Skia, OpenCV, EASTL, Cocos2d-x, GLib, SQLite, and the common Windows calling conventions (so `HRESULT WINAPI CreateThing(...)` indexes as `CreateThing` returning `HRESULT`). Functions from libraries not on the list still get their name recovered automatically; being listed additionally recovers the return type. (#1103)
+- Graph traversal and blast-radius results no longer drop or miscount relationships in a handful of edge cases. When a symbol could be reached by more than one path, an impact/blast-radius query could leave out a direct dependency between two symbols that were already linked another way; separately, the lower-level graph traversal used by the library API could keep only one of several relationships between the same pair of symbols (for example a symbol that both calls and references another), count a caller reached through two different call sites twice, or return slightly more results than the requested size limit on a very highly-connected symbol. These were long-standing and mostly masked by later de-duplication, so day-to-day query results were largely unaffected, but the traversal now returns the complete, correctly-bounded set. Thanks @inth3shadows for the precise, individually-traced reports. (#1086, #1087, #1088, #1089, #1090)
 
 
 ## [1.1.6] - 2026-06-30

+ 127 - 0
__tests__/graph.test.ts

@@ -10,6 +10,7 @@ import * as path from 'path';
 import * as os from 'os';
 import CodeGraph from '../src/index';
 import { Node, Edge } from '../src/types';
+import { GraphTraverser } from '../src/graph/traversal';
 
 describe('Graph Queries', () => {
   let testDir: string;
@@ -486,3 +487,129 @@ export { main };
     });
   });
 });
+
+// =============================================================================
+// Traversal edge-completeness & node-limit regressions (#1086–#1090)
+//
+// These drive GraphTraverser directly against an in-memory graph (the same
+// approach the reporter used), so the exact parallel-edge / high-degree
+// topologies can be constructed deterministically without round-tripping
+// through extraction.
+// =============================================================================
+
+/** Minimal Node stub — the traversal code only reads id/kind/name. */
+function tNode(id: string, kind: Node['kind'] = 'function'): Node {
+  return {
+    id,
+    kind,
+    name: id,
+    qualifiedName: id,
+    filePath: `src/${id}.ts`,
+    language: 'typescript',
+    startLine: 1,
+    endLine: 10,
+    startColumn: 0,
+    endColumn: 0,
+  } as unknown as Node;
+}
+
+/** Build a GraphTraverser over a fixed node/edge set, honoring the `kinds` filter. */
+function tGraph(nodes: Node[], edges: Edge[]): GraphTraverser {
+  const byId = new Map(nodes.map((n) => [n.id, n]));
+  const q = {
+    getNodeById: (id: string) => byId.get(id) ?? null,
+    getNodesByIds: (ids: readonly string[]) => {
+      const m = new Map<string, Node>();
+      for (const id of ids) {
+        const n = byId.get(id);
+        if (n) m.set(id, n);
+      }
+      return m;
+    },
+    getOutgoingEdges: (source: string, kinds?: string[]) =>
+      edges.filter((e) => e.source === source && (!kinds || kinds.includes(e.kind))),
+    getIncomingEdges: (target: string, kinds?: string[]) =>
+      edges.filter((e) => e.target === target && (!kinds || kinds.includes(e.kind))),
+  };
+  return new GraphTraverser(q as never);
+}
+
+describe('Traversal edge-completeness & limits (#1086–#1090)', () => {
+  it('traverseBFS keeps every parallel edge to the same target (#1090)', () => {
+    // A reaches B via both `calls` and `references` — two distinct edges.
+    const edges: Edge[] = [
+      { source: 'A', target: 'B', kind: 'calls', line: 1 },
+      { source: 'A', target: 'B', kind: 'references', line: 2 },
+    ];
+    const sub = tGraph([tNode('A'), tNode('B')], edges).traverseBFS('A', { direction: 'outgoing' });
+
+    const ab = sub.edges.filter((e) => e.source === 'A' && e.target === 'B');
+    // Pre-fix: only the higher-priority `calls` edge survived; `references` was dropped.
+    expect(ab.map((e) => e.kind).sort()).toEqual(['calls', 'references']);
+    expect(sub.nodes.has('B')).toBe(true);
+  });
+
+  it('traverseBFS keeps two same-kind edges on different lines (#1090)', () => {
+    const edges: Edge[] = [
+      { source: 'A', target: 'B', kind: 'calls', line: 3 },
+      { source: 'A', target: 'B', kind: 'calls', line: 7 },
+    ];
+    const sub = tGraph([tNode('A'), tNode('B')], edges).traverseBFS('A', { direction: 'outgoing' });
+    expect(sub.edges.filter((e) => e.source === 'A' && e.target === 'B')).toHaveLength(2);
+  });
+
+  it('traverseBFS does not overshoot opts.limit on a high-degree node (#1087)', () => {
+    const neighbors = ['B', 'C', 'D', 'E', 'F'];
+    const nodes = [tNode('A'), ...neighbors.map((n) => tNode(n))];
+    const edges: Edge[] = neighbors.map((n) => ({ source: 'A', target: n, kind: 'calls' as const }));
+    const sub = tGraph(nodes, edges).traverseBFS('A', { limit: 3, direction: 'outgoing' });
+    // Pre-fix: all 5 neighbors were added in one pass → 6 nodes despite limit 3.
+    expect(sub.nodes.size).toBeLessThanOrEqual(3);
+  });
+
+  it('traverseDFS does not overshoot opts.limit on a high-degree node (#1088)', () => {
+    const neighbors = ['B', 'C', 'D', 'E', 'F'];
+    const nodes = [tNode('A'), ...neighbors.map((n) => tNode(n))];
+    const edges: Edge[] = neighbors.map((n) => ({ source: 'A', target: n, kind: 'calls' as const }));
+    const sub = tGraph(nodes, edges).traverseDFS('A', { limit: 2, direction: 'outgoing' });
+    expect(sub.nodes.size).toBeLessThanOrEqual(2);
+  });
+
+  it('getCallers returns each caller once when reached via multiple edges (#1086)', () => {
+    // Y calls X at two sites and also references it — three incoming edges.
+    const edges: Edge[] = [
+      { source: 'Y', target: 'X', kind: 'calls', line: 1 },
+      { source: 'Y', target: 'X', kind: 'calls', line: 2 },
+      { source: 'Y', target: 'X', kind: 'references', line: 3 },
+    ];
+    const callers = tGraph([tNode('X'), tNode('Y')], edges).getCallers('X'); // default maxDepth = 1
+    // Pre-fix: Y appeared three times (depth guard returned before visited.add).
+    expect(callers.map((c) => c.node.id)).toEqual(['Y']);
+  });
+
+  it('getCallees returns each callee once when reached via multiple edges (#1086)', () => {
+    const edges: Edge[] = [
+      { source: 'X', target: 'Y', kind: 'calls', line: 1 },
+      { source: 'X', target: 'Y', kind: 'calls', line: 2 },
+    ];
+    const callees = tGraph([tNode('X'), tNode('Y')], edges).getCallees('X');
+    expect(callees.map((c) => c.node.id)).toEqual(['Y']);
+  });
+
+  it('getImpactRadius keeps a direct edge into a node already collected via another path (#1089)', () => {
+    // Class P contains method M. Q calls both M and P. Reaching M first collects
+    // Q; the pre-fix `!nodes.has()` gate then dropped the direct Q→P edge.
+    const nodes = [tNode('P', 'class'), tNode('M', 'method'), tNode('Q')];
+    const edges: Edge[] = [
+      { source: 'P', target: 'M', kind: 'contains' },
+      { source: 'Q', target: 'M', kind: 'calls', line: 1 },
+      { source: 'Q', target: 'P', kind: 'calls', line: 2 },
+    ];
+    const sub = tGraph(nodes, edges).getImpactRadius('P', 2);
+
+    expect(sub.nodes.has('Q')).toBe(true);
+    expect(sub.edges.some((e) => e.source === 'Q' && e.target === 'M' && e.kind === 'calls')).toBe(true);
+    // The regression: this direct dependency edge used to vanish.
+    expect(sub.edges.some((e) => e.source === 'Q' && e.target === 'P' && e.kind === 'calls')).toBe(true);
+  });
+});

+ 76 - 18
src/graph/traversal.ts

@@ -56,6 +56,19 @@ export class GraphTraverser {
     const nodes = new Map<string, Node>();
     const edges: Edge[] = [];
     const visited = new Set<string>();
+    // Enqueue-once guard, tracked separately from `visited` (which is only set
+    // on dequeue). Guarding the enqueue on `visited` alone let a target
+    // reachable via two edges get queued twice; the second dequeue then hit
+    // `visited.has → continue` and its edge was never recorded, so parallel
+    // edges (A calls AND references B, or two `calls` on different lines — edges
+    // are unique on source+target+kind+line+col) went missing from the result
+    // (#1090). `enqueued` makes each node queued exactly once.
+    const enqueued = new Set<string>([startNode.id]);
+    // Edge-identity dedup so a `direction:'both'` scan — which encounters A→B
+    // from both endpoints — records each edge once.
+    const seenEdges = new Set<string>();
+    const edgeKey = (e: Edge) =>
+      `${e.source}|${e.target}|${e.kind}|${e.line ?? -1}|${e.column ?? -1}`;
     const queue: TraversalStep[] = [{ node: startNode, edge: null, depth: 0 }];
 
     if (opts.includeStart) {
@@ -64,18 +77,13 @@ export class GraphTraverser {
 
     while (queue.length > 0 && nodes.size < opts.limit) {
       const step = queue.shift()!;
-      const { node, edge, depth } = step;
+      const { node, depth } = step;
 
       if (visited.has(node.id)) {
         continue;
       }
       visited.add(node.id);
 
-      // Add edge to result
-      if (edge) {
-        edges.push(edge);
-      }
-
       // Check depth limit
       if (depth >= opts.maxDepth) {
         continue;
@@ -90,25 +98,42 @@ export class GraphTraverser {
         return priority(a) - priority(b);
       });
 
-      // Batch-fetch the unvisited neighbors in one query (was N+1 per BFS step).
+      // Batch-fetch neighbors we might newly enqueue in one query (was N+1 per
+      // BFS step). Already-queued/visited neighbors are already in `nodes`, so
+      // they don't need re-fetching to record an edge back to them.
       const wantIds = adjacentEdges
         .map((e) => (e.source === node.id ? e.target : e.source))
-        .filter((id) => !visited.has(id));
+        .filter((id) => !visited.has(id) && !enqueued.has(id));
       const neighborNodes = wantIds.length > 0 ? this.queries.getNodesByIds(wantIds) : new Map();
 
       for (const adjEdge of adjacentEdges) {
         const nextNodeId = adjEdge.source === node.id ? adjEdge.target : adjEdge.source;
-        if (visited.has(nextNodeId)) continue;
-
-        const nextNode = neighborNodes.get(nextNodeId);
+        const nextNode = neighborNodes.get(nextNodeId) ?? nodes.get(nextNodeId);
         if (!nextNode) continue;
 
         if (opts.nodeKinds && opts.nodeKinds.length > 0 && !opts.nodeKinds.includes(nextNode.kind)) {
           continue;
         }
 
-        nodes.set(nextNode.id, nextNode);
-        queue.push({ node: nextNode, edge: adjEdge, depth: depth + 1 });
+        // Enqueue each neighbor exactly once, and only while under the node
+        // budget — the cap is checked per-add here, not just on the outer
+        // `while`, so one high-degree node can't overshoot `opts.limit` (#1087).
+        if (!visited.has(nextNodeId) && !enqueued.has(nextNodeId)) {
+          if (nodes.size >= opts.limit) continue;
+          enqueued.add(nextNodeId);
+          nodes.set(nextNode.id, nextNode);
+          queue.push({ node: nextNode, edge: adjEdge, depth: depth + 1 });
+        }
+
+        // Record every distinct edge among kept nodes. Collecting on the
+        // adjacency scan (rather than once per dequeue) is what preserves
+        // parallel edges to the same target (#1090); `nextNode` is guaranteed
+        // to be in `nodes` at this point (just added, or already in-set).
+        const ek = edgeKey(adjEdge);
+        if (!seenEdges.has(ek)) {
+          seenEdges.add(ek);
+          edges.push(adjEdge);
+        }
       }
     }
 
@@ -178,6 +203,12 @@ export class GraphTraverser {
     const neighborNodes = wantIds.length > 0 ? this.queries.getNodesByIds(wantIds) : new Map();
 
     for (const edge of adjacentEdges) {
+      // Cap per-add, not just at the top of each frame: the top-of-function
+      // guard only stops the next recursion, so without this every sibling of
+      // the first over-budget child still got inserted, overshooting
+      // `opts.limit` by a node's full fan-out (#1088).
+      if (nodes.size >= opts.limit) break;
+
       const nextNodeId = edge.source === node.id ? edge.target : edge.source;
       if (visited.has(nextNodeId)) continue;
 
@@ -243,10 +274,18 @@ export class GraphTraverser {
     result: Array<{ node: Node; edge: Edge }>,
     visited: Set<string>
   ): void {
-    if (currentDepth >= maxDepth || visited.has(nodeId)) {
+    // Mark visited BEFORE the depth check, not after. Folding both into one
+    // guard meant that when `currentDepth >= maxDepth` fired we returned without
+    // marking the node — so a caller reachable from the same parent via two
+    // edges (two call sites, or calls + references) was pushed once per edge,
+    // duplicating it in `result` at the default `maxDepth=1` (#1086).
+    if (visited.has(nodeId)) {
       return;
     }
     visited.add(nodeId);
+    if (currentDepth >= maxDepth) {
+      return;
+    }
 
     // `instantiates` counts as a caller: constructing a class (`Foo(...)` /
     // `new Foo()`) is calling its constructor, so the instantiation site is a
@@ -293,10 +332,16 @@ export class GraphTraverser {
     result: Array<{ node: Node; edge: Edge }>,
     visited: Set<string>
   ): void {
-    if (currentDepth >= maxDepth || visited.has(nodeId)) {
+    // Mark visited before the depth check — see getCallersRecursive: the merged
+    // guard dropped the `visited.add` at the depth boundary, duplicating a
+    // callee reached from the same node via two edges at `maxDepth=1` (#1086).
+    if (visited.has(nodeId)) {
       return;
     }
     visited.add(nodeId);
+    if (currentDepth >= maxDepth) {
+      return;
+    }
 
     // Symmetric with getCallers: a function that constructs a class
     // (`Foo(...)` / `new Foo()`) has that class as a callee, so callers and
@@ -503,10 +548,17 @@ export class GraphTraverser {
     edges: Edge[],
     visited: Set<string>
   ): void {
-    if (currentDepth >= maxDepth || visited.has(nodeId)) {
+    // Mark visited before the depth check so a node collected at the depth
+    // boundary still lands in `visited`. Otherwise it could sit in `nodes` but
+    // not `visited`, and the two loops below — which used different sets to
+    // gate re-processing — would disagree about it (#1089).
+    if (visited.has(nodeId)) {
       return;
     }
     visited.add(nodeId);
+    if (currentDepth >= maxDepth) {
+      return;
+    }
 
     // For container nodes (classes, interfaces, structs, etc.), also traverse
     // into their children so that callers of contained methods appear in impact
@@ -540,9 +592,15 @@ export class GraphTraverser {
 
     for (const edge of incomingEdges) {
       const sourceNode = sources.get(edge.source);
-      if (sourceNode && !nodes.has(sourceNode.id)) {
+      if (!sourceNode) continue;
+      // Record the dependency edge unconditionally. The gate used to also gate
+      // edge collection (`!nodes.has(...)`), so a second incoming edge into a
+      // node already collected via another path was silently dropped from
+      // `edges` even though it's a real dependency (#1089). Each node's incoming
+      // edges are fetched once (nodes are expanded once), so no edge repeats.
+      edges.push(edge);
+      if (!visited.has(sourceNode.id)) {
         nodes.set(sourceNode.id, sourceNode);
-        edges.push(edge);
         this.getImpactRecursive(sourceNode.id, maxDepth, currentDepth + 1, nodes, edges, visited);
       }
     }