Explorar o código

fix(impact): source-aware Python `from pkg import submodule` (FastAPI aggregator)

A router/aggregator's `from app.api.routes import authentication` (FastAPI/Django
pattern) left the imported route module with no dependent. The bare leaf
(`authentication`) is ambiguous — three `authentication.py` may exist across
sibling packages — so it must be resolved through the import's SOURCE package.

- `findPythonModuleFile(mod)` (shared): maps a dotted module path to its file
  (`a/b/c.py` or `a/b/c/__init__.py`, suffix-matched).
- `resolveModuleImportToFile`: for a Python absolute `from a.b import c` where
  resolveImportPath (relative-only) fails, resolve `a.b.c` directly via
  findPythonModuleFile — using the package source to pick the RIGHT submodule.
- `resolvePythonAbsoluteModule` (the `import a.b.c` path from 61a993a) is now
  restricted to DOTTED refs. This corrects a greedy bare-leaf match it was doing:
  it matched a bare `from x import name` leaf to ANY same-basename file, which
  was both wrong (FastAPI picked dependencies/authentication.py over
  routes/authentication.py) and inflated flask/Django coverage with spurious
  same-named matches. Bare leaves now go through the source-aware path only.

FAIR coverage: FastAPI realworld 78.6% -> 98.0% (4 route modules now resolve to
the right files). flask & requests are 100% (honest — the 61a993a 92%/Django
70.4% were partly the spurious greedy match; corrected here, with package
entry/barrel `__init__.py`/`__main__.py`/`docs` excluded per methodology).
No regression; full suite 1171 passed; 1 regression test (disambiguates
same-named submodules across packages).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry hai 2 semanas
pai
achega
283562305f
Modificáronse 3 ficheiros con 82 adicións e 29 borrados
  1. 1 0
      CHANGELOG.md
  2. 29 0
      __tests__/extraction.test.ts
  3. 52 29
      src/resolution/import-resolver.ts

+ 1 - 0
CHANGELOG.md

@@ -38,6 +38,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Go now records cross-package struct creation. A composite literal like `render.XML{...}` or `pkga.Widget{...}` — including ones registered in a package-level `var registry = map[string]R{...}` — now links to the package that defines the type. Cross-package function calls and type references already resolved; this closes struct instantiation, so a package whose types are only *constructed* elsewhere (a common pattern for interface implementations) is no longer reported as having no dependents. Go type conversions such as `(*Wrapped)(x)` now link to the converted-to type as well.
 - Python now follows whole-module imports — `from . import certs` then `certs.where()`, or `from pkg import sub` then `sub.run()`. Calls and attribute access through an imported submodule now resolve to that submodule, and importing a module is recorded as a dependency on it even when the member you use is itself re-exported from a third-party package. This also fixed Python relative-import path resolution generally (`from .sub.mod import x`), so `codegraph affected` and impact see the real module graph of a package.
 - Python now also links a whole-module **absolute** import (`import conduit.apps.signals`) to that module's file, not just `from x import y`. A module imported by its dotted path — common in package setup and side-effect imports — is no longer reported as having no dependents. Standard-library imports (`import os`) correctly create no edge. (Python)
+- Python `from package import submodule` now links to that submodule's file, resolved through the import's package so it lands on the right one when same-named modules exist in sibling packages (the FastAPI / Django router-aggregator pattern: `from app.api.routes import authentication` with an unrelated `authentication.py` elsewhere). So a route/handler module pulled in only by an aggregator is no longer reported as having no dependents. (Python)
 - The background file watcher no longer exhausts your machine's file-descriptor budget. On macOS it previously kept **one open file handle per watched file**, so on a large project the running MCP server could pile up tens of thousands of handles and blow past the system-wide limit — at which point *unrelated* apps (your shell, editor, Docker, browser) started failing with "too many open files" until the codegraph process was killed. The watcher now uses a single recursive watch on macOS and Windows, and bounded per-directory watches on Linux, so its cost stays flat no matter how large the project is. (#644, #496, #555, #628, #579)
 - Indexing a project with very symbol-dense files (tens of thousands of functions or methods in a single file) no longer runs out of memory. The step that links dynamic call relationships used to load every function and method into memory at once, which could exhaust the heap and abort indexing with "JavaScript heap out of memory" on large or generated codebases; it now streams them, so memory stays flat no matter how many symbols the project has. (#610)
 - Indexing a very large repository no longer aborts during its first sync with a "too many SQL variables" error. (#540)

+ 29 - 0
__tests__/extraction.test.ts

@@ -3895,6 +3895,35 @@ describe('Python absolute module import resolution', () => {
     const osNode = cg.getNodesByKind('file').find((n) => n.filePath.endsWith('/os.py'));
     expect(osNode, 'no stdlib os.py node').toBeUndefined();
   });
+
+  it('resolves `from pkg import submodule` to the submodule under that package, not a same-named one', async () => {
+    // FastAPI router-aggregator pattern: `from app.api.routes import authentication`
+    // with same-named modules in sibling packages must resolve via the import's
+    // SOURCE (the package), not a coincidental same-basename file elsewhere.
+    fs.mkdirSync(path.join(tempDir, 'app/api/routes'), { recursive: true });
+    fs.mkdirSync(path.join(tempDir, 'app/api/dependencies'), { recursive: true });
+    for (const p of ['app/__init__.py', 'app/api/__init__.py', 'app/api/routes/__init__.py', 'app/api/dependencies/__init__.py']) {
+      fs.writeFileSync(path.join(tempDir, p), '');
+    }
+    fs.writeFileSync(path.join(tempDir, 'app/api/routes/authentication.py'), `def login():\n    pass\n`);
+    fs.writeFileSync(path.join(tempDir, 'app/api/dependencies/authentication.py'), `def get_user():\n    pass\n`);
+    fs.writeFileSync(
+      path.join(tempDir, 'app/api/routes/api.py'),
+      `from app.api.routes import authentication\n\nROUTER = authentication\n`
+    );
+
+    cg = CodeGraph.initSync(tempDir);
+    await cg.indexAll();
+    cg.resolveReferences();
+
+    const routesAuth = cg.getNodesByKind('file').find((n) => n.filePath.endsWith('routes/authentication.py'));
+    const depsAuth = cg.getNodesByKind('file').find((n) => n.filePath.endsWith('dependencies/authentication.py'));
+    expect(routesAuth && depsAuth).toBeTruthy();
+    const routesDeps = [...cg.getImpactRadius(routesAuth!.id, 2).nodes.values()].map((n) => n.filePath ?? '');
+    const depsDeps = [...cg.getImpactRadius(depsAuth!.id, 2).nodes.values()].map((n) => n.filePath ?? '');
+    expect(routesDeps.some((p) => p.endsWith('routes/api.py')), 'submodule under the imported package is the dependent').toBe(true);
+    expect(depsDeps.some((p) => p.endsWith('routes/api.py')), 'same-named module in a sibling package is NOT').toBe(false);
+  });
 });
 
 describe('Same-directory include + KMP import resolution', () => {

+ 52 - 29
src/resolution/import-resolver.ts

@@ -1331,48 +1331,71 @@ function resolveModuleImportToFile(
     }
 
     const resolvedPath = resolveImportPath(modulePath, ref.filePath, ref.language, context);
-    if (!resolvedPath || resolvedPath === ref.filePath) continue;
+    if (resolvedPath && resolvedPath !== ref.filePath) {
+      const fileNode = context.getNodesInFile(resolvedPath).find((n) => n.kind === 'file');
+      if (fileNode) {
+        return { original: ref, targetNodeId: fileNode.id, confidence: 0.9, resolvedBy: 'import' };
+      }
+    }
 
-    const fileNode = context.getNodesInFile(resolvedPath).find((n) => n.kind === 'file');
-    if (fileNode) {
-      return { original: ref, targetNodeId: fileNode.id, confidence: 0.9, resolvedBy: 'import' };
+    // Python absolute `from a.b import submodule` (a FastAPI router aggregator's
+    // `from app.api.routes import authentication`): resolveImportPath only maps
+    // RELATIVE dotted paths to a file, so resolve the absolute dotted module
+    // directly to its file node.
+    if (ref.language === 'python') {
+      const modFile = findPythonModuleFile(modulePath, context, ref.filePath);
+      if (modFile) {
+        return { original: ref, targetNodeId: modFile.id, confidence: 0.9, resolvedBy: 'import' };
+      }
     }
   }
   return null;
 }
 
 /**
- * Resolve a Python ABSOLUTE dotted module import (`import a.b.c`) to its file.
- * `a.b.c` → a file node ending in `a/b/c.py` (a module) or `a/b/c/__init__.py`
- * (a package). The suffix match tolerates a package rooted under `src/` etc.
- * Stdlib/external modules return null (no matching file node in the repo), so
- * `import os` creates no edge. Covers the Django `AppConfig.ready(): import
- * myapp.signals` pattern and any side-effect module import.
+ * Find the file node for a Python dotted module path `a.b.c` — a module file
+ * ending in `a/b/c.py`, or a package `a/b/c/__init__.py` (suffix-matched, so a
+ * package rooted under `src/` etc. still resolves). Returns null for
+ * stdlib/external modules (no matching repo file node), so `import os` creates
+ * no edge. Shared by absolute `import a.b.c` and absolute `from a.b import c`
+ * (where `c` is a submodule) resolution.
  */
-function resolvePythonAbsoluteModule(
-  ref: UnresolvedRef,
-  context: ResolutionContext
-): ResolvedRef | null {
-  if (ref.referenceKind !== 'imports') return null;
-  const mod = ref.referenceName;
+function findPythonModuleFile(
+  mod: string,
+  context: ResolutionContext,
+  excludeFilePath: string
+): Node | null {
   if (!mod || mod.startsWith('.')) return null; // relative imports handled elsewhere
   const rel = mod.replace(/\./g, '/');
   const lastSeg = mod.split('.').pop()!;
-  const wantModule = `${rel}.py`;
-  const wantPkg = `${rel}/__init__.py`;
   const endsWith = (p: string, want: string): boolean => p === want || p.endsWith('/' + want);
+  const moduleFile = context
+    .getNodesByName(`${lastSeg}.py`)
+    .find((n) => n.kind === 'file' && n.filePath !== excludeFilePath && endsWith(n.filePath, `${rel}.py`));
+  if (moduleFile) return moduleFile;
+  const pkgFile = context
+    .getNodesByName('__init__.py')
+    .find((n) => n.kind === 'file' && n.filePath !== excludeFilePath && endsWith(n.filePath, `${rel}/__init__.py`));
+  return pkgFile ?? null;
+}
 
-  const moduleFiles = context.getNodesByName(`${lastSeg}.py`).filter((n) => n.kind === 'file');
-  const hitModule = moduleFiles.find((n) => endsWith(n.filePath, wantModule));
-  if (hitModule && hitModule.filePath !== ref.filePath) {
-    return { original: ref, targetNodeId: hitModule.id, confidence: 0.9, resolvedBy: 'import' };
-  }
-  const pkgFiles = context.getNodesByName('__init__.py').filter((n) => n.kind === 'file');
-  const hitPkg = pkgFiles.find((n) => endsWith(n.filePath, wantPkg));
-  if (hitPkg && hitPkg.filePath !== ref.filePath) {
-    return { original: ref, targetNodeId: hitPkg.id, confidence: 0.9, resolvedBy: 'import' };
-  }
-  return null;
+/**
+ * Resolve a Python ABSOLUTE dotted module import (`import a.b.c`) to its file —
+ * the Django `AppConfig.ready(): import myapp.signals` pattern and any
+ * side-effect module import.
+ */
+function resolvePythonAbsoluteModule(
+  ref: UnresolvedRef,
+  context: ResolutionContext
+): ResolvedRef | null {
+  if (ref.referenceKind !== 'imports') return null;
+  // Only a DOTTED `import a.b.c` ref carries its full module path. A bare leaf
+  // (`from app.api.routes import authentication`) is ambiguous on its own — three
+  // `authentication.py` files may exist — so leave it to resolveModuleImportToFile,
+  // which uses the import's source (`app.api.routes`) to build the full path.
+  if (!ref.referenceName.includes('.')) return null;
+  const hit = findPythonModuleFile(ref.referenceName, context, ref.filePath);
+  return hit ? { original: ref, targetNodeId: hit.id, confidence: 0.9, resolvedBy: 'import' } : null;
 }
 
 /**