Prechádzať zdrojové kódy

fix(impact): link Python absolute whole-module imports (import a.b.c)

`import conduit.apps.signals` (a dotted absolute module import — common in
package setup and side-effect imports) created an import node but NO edge to the
module's file; only `from x import y` was linked. So a module imported by its
dotted path looked like nothing depended on it.

- Extraction (tree-sitter.ts): a Python `import_statement` now pushes an
  `imports` ref per dotted module (mirroring Go), at module scope.
- Resolution (import-resolver.ts): `resolvePythonAbsoluteModule` maps `a.b.c`
  to a file node ending in `a/b/c.py` or `a/b/c/__init__.py` (suffix-matched, so
  a package rooted under src/ works). Stdlib/external modules don't resolve (no
  `os.py` node), so `import os` creates no edge.

Validated (FAIR file-dependent coverage): flask 88.0% -> 92.0%, Django realworld
63.0% -> 70.4% (covered exceptions + app config); requests neutral; no regression.
1 regression test. Full suite 1170 passed.

NOTE (frontier, deferred): an `import a.b.c` INSIDE a function body (Django
`AppConfig.ready(): import myapp.signals`) is still not linked — Python in-body
import_statements aren't reached by import extraction (only top-level are),
though in-body calls are. Django `signals.py` files remain a residual zero.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry 2 týždňov pred
rodič
commit
61a993a0ce

+ 1 - 0
CHANGELOG.md

@@ -37,6 +37,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Go interfaces now connect to their implementations. Go has no `implements` keyword — a type satisfies an interface just by having the right methods — so CodeGraph now infers that link: a struct whose methods cover an interface's method set is treated as implementing it, and a call through the interface (`API.Marshal(...)`) reaches every concrete implementation. This means a type used only via an interface (the common plugin/strategy pattern — e.g. JSON-codec or renderer implementations selected at runtime) is no longer reported as having no callers or no dependents, and impact now flows from an interface method to the implementations behind it. (#584)
 - Go now records cross-package struct creation. A composite literal like `render.XML{...}` or `pkga.Widget{...}` — including ones registered in a package-level `var registry = map[string]R{...}` — now links to the package that defines the type. Cross-package function calls and type references already resolved; this closes struct instantiation, so a package whose types are only *constructed* elsewhere (a common pattern for interface implementations) is no longer reported as having no dependents. Go type conversions such as `(*Wrapped)(x)` now link to the converted-to type as well.
 - Python now follows whole-module imports — `from . import certs` then `certs.where()`, or `from pkg import sub` then `sub.run()`. Calls and attribute access through an imported submodule now resolve to that submodule, and importing a module is recorded as a dependency on it even when the member you use is itself re-exported from a third-party package. This also fixed Python relative-import path resolution generally (`from .sub.mod import x`), so `codegraph affected` and impact see the real module graph of a package.
+- Python now also links a whole-module **absolute** import (`import conduit.apps.signals`) to that module's file, not just `from x import y`. A module imported by its dotted path — common in package setup and side-effect imports — is no longer reported as having no dependents. Standard-library imports (`import os`) correctly create no edge. (Python)
 - The background file watcher no longer exhausts your machine's file-descriptor budget. On macOS it previously kept **one open file handle per watched file**, so on a large project the running MCP server could pile up tens of thousands of handles and blow past the system-wide limit — at which point *unrelated* apps (your shell, editor, Docker, browser) started failing with "too many open files" until the codegraph process was killed. The watcher now uses a single recursive watch on macOS and Windows, and bounded per-directory watches on Linux, so its cost stays flat no matter how large the project is. (#644, #496, #555, #628, #579)
 - Indexing a project with very symbol-dense files (tens of thousands of functions or methods in a single file) no longer runs out of memory. The step that links dynamic call relationships used to load every function and method into memory at once, which could exhaust the heap and abort indexing with "JavaScript heap out of memory" on large or generated codebases; it now streams them, so memory stays flat no matter how many symbols the project has. (#610)
 - Indexing a very large repository no longer aborts during its first sync with a "too many SQL variables" error. (#540)

+ 41 - 0
__tests__/extraction.test.ts

@@ -3856,6 +3856,47 @@ describe('Cross-language type/import gate (RN name collisions)', () => {
   });
 });
 
+describe('Python absolute module import resolution', () => {
+  let tempDir: string;
+  let cg: CodeGraph;
+
+  beforeEach(() => {
+    tempDir = createTempDir();
+  });
+
+  afterEach(() => {
+    if (cg) cg.close();
+    if (fs.existsSync(tempDir)) fs.rmSync(tempDir, { recursive: true, force: true });
+  });
+
+  it('links a bare `import pkg.module` of an internal module to its file', async () => {
+    // `import conduit.apps.signals` (a Django-style side-effect import, and any
+    // dotted absolute module import) had no edge to the module file — only
+    // `from x import y` was linked — so a module imported by its dotted path
+    // looked like nothing depended on it.
+    fs.mkdirSync(path.join(tempDir, 'conduit/apps'), { recursive: true });
+    fs.writeFileSync(path.join(tempDir, 'conduit/__init__.py'), '');
+    fs.writeFileSync(path.join(tempDir, 'conduit/apps/__init__.py'), '');
+    fs.writeFileSync(path.join(tempDir, 'conduit/apps/signals.py'), `def handler():\n    pass\n`);
+    fs.writeFileSync(
+      path.join(tempDir, 'conduit/apps/app.py'),
+      `import conduit.apps.signals\nimport os\n\nVALUE = 1\n`
+    );
+
+    cg = CodeGraph.initSync(tempDir);
+    await cg.indexAll();
+    cg.resolveReferences();
+
+    const signals = cg.getNodesByKind('file').find((n) => n.filePath.endsWith('conduit/apps/signals.py'));
+    expect(signals, 'signals.py indexed').toBeDefined();
+    const deps = [...cg.getImpactRadius(signals!.id, 2).nodes.values()].map((n) => n.filePath ?? '');
+    expect(deps.some((p) => p.endsWith('app.py')), 'importer depends on the module').toBe(true);
+    // `import os` (stdlib) must NOT fabricate an edge — no os.py file in the repo.
+    const osNode = cg.getNodesByKind('file').find((n) => n.filePath.endsWith('/os.py'));
+    expect(osNode, 'no stdlib os.py node').toBeUndefined();
+  });
+});
+
 describe('Same-directory include + KMP import resolution', () => {
   let tempDir: string;
   let cg: CodeGraph;

+ 19 - 0
src/extraction/tree-sitter.ts

@@ -1826,18 +1826,37 @@ export class TreeSitterExtractor {
 
     // Python import_statement: import os, sys (creates one import per module)
     if (this.language === 'python' && node.type === 'import_statement') {
+      const importParentId = this.nodeStack[this.nodeStack.length - 1];
+      // A bare `import a.b.c` of an internal module (the standard Django
+      // `AppConfig.ready(): import myapp.signals` registration pattern, and any
+      // `import pkg.mod` used for its side effects) had no edge to the module
+      // file — only `from x import y` was linked. Push an `imports` ref (like
+      // Go) so the resolver maps the dotted path to its file. Stdlib/external
+      // modules naturally don't resolve (no `os.py` file node in the repo).
+      const pushModuleRef = (dotted: SyntaxNode): void => {
+        if (!importParentId) return;
+        this.unresolvedReferences.push({
+          fromNodeId: importParentId,
+          referenceName: getNodeText(dotted, this.source),
+          referenceKind: 'imports',
+          line: dotted.startPosition.row + 1,
+          column: dotted.startPosition.column,
+        });
+      };
       for (let i = 0; i < node.namedChildCount; i++) {
         const child = node.namedChild(i);
         if (child?.type === 'dotted_name') {
           this.createNode('import', getNodeText(child, this.source), node, {
             signature: importText,
           });
+          pushModuleRef(child);
         } else if (child?.type === 'aliased_import') {
           const dottedName = child.namedChildren.find(c => c.type === 'dotted_name');
           if (dottedName) {
             this.createNode('import', getNodeText(dottedName, this.source), node, {
               signature: importText,
             });
+            pushModuleRef(dottedName);
           }
         }
       }

+ 39 - 0
src/resolution/import-resolver.ts

@@ -1155,6 +1155,11 @@ export function resolveViaImport(
   if (ref.language === 'python') {
     const pyResult = resolvePythonModuleMember(ref, imports, context);
     if (pyResult) return pyResult;
+    // Absolute dotted module import: `import conduit.apps.articles.signals`
+    // (the standard Django AppConfig.ready() signal-registration pattern, and
+    // any side-effect `import pkg.mod`). Map the dotted path to its file.
+    const pyModResult = resolvePythonAbsoluteModule(ref, context);
+    if (pyModResult) return pyModResult;
   }
 
   // Rust qualified path: resolve the module prefix of `crate::m::Item` /
@@ -1336,6 +1341,40 @@ function resolveModuleImportToFile(
   return null;
 }
 
+/**
+ * Resolve a Python ABSOLUTE dotted module import (`import a.b.c`) to its file.
+ * `a.b.c` → a file node ending in `a/b/c.py` (a module) or `a/b/c/__init__.py`
+ * (a package). The suffix match tolerates a package rooted under `src/` etc.
+ * Stdlib/external modules return null (no matching file node in the repo), so
+ * `import os` creates no edge. Covers the Django `AppConfig.ready(): import
+ * myapp.signals` pattern and any side-effect module import.
+ */
+function resolvePythonAbsoluteModule(
+  ref: UnresolvedRef,
+  context: ResolutionContext
+): ResolvedRef | null {
+  if (ref.referenceKind !== 'imports') return null;
+  const mod = ref.referenceName;
+  if (!mod || mod.startsWith('.')) return null; // relative imports handled elsewhere
+  const rel = mod.replace(/\./g, '/');
+  const lastSeg = mod.split('.').pop()!;
+  const wantModule = `${rel}.py`;
+  const wantPkg = `${rel}/__init__.py`;
+  const endsWith = (p: string, want: string): boolean => p === want || p.endsWith('/' + want);
+
+  const moduleFiles = context.getNodesByName(`${lastSeg}.py`).filter((n) => n.kind === 'file');
+  const hitModule = moduleFiles.find((n) => endsWith(n.filePath, wantModule));
+  if (hitModule && hitModule.filePath !== ref.filePath) {
+    return { original: ref, targetNodeId: hitModule.id, confidence: 0.9, resolvedBy: 'import' };
+  }
+  const pkgFiles = context.getNodesByName('__init__.py').filter((n) => n.kind === 'file');
+  const hitPkg = pkgFiles.find((n) => endsWith(n.filePath, wantPkg));
+  if (hitPkg && hitPkg.filePath !== ref.filePath) {
+    return { original: ref, targetNodeId: hitPkg.id, confidence: 0.9, resolvedBy: 'import' };
+  }
+  return null;
+}
+
 /**
  * Resolve a Rust qualified reference `A::B::C` by mapping the MODULE prefix
  * (`A::B`) to a file and finding the leaf symbol (`C`) in it. This is the Rust