Explorar el Código

feat(resolution): bridge Sidekiq Worker.perform_async to #perform

Sidekiq decouples a job's enqueue site from the worker's perform method,
linked by the worker class NAME: DestroyUserWorker.perform_async(id) has no
static edge to DestroyUserWorker#perform (usually in app/workers/, away from
the controller/model that enqueues it). sidekiqDispatchEdges bridges each
Worker.perform_async/_in/_at(...) site -> that worker's instance perform.

Name-keyed, like Celery: the receiver class must be a Sidekiq worker, gated by
reading `include Sidekiq::Job|Worker` from the class body (the mixin is an
external gem module that forms no resolvable edge). ActiveJob's perform_later/
_now is a different shape and deliberately not matched.

Namespace disambiguation was the n>1 validation payoff: loomio's flat workers
hid a collision bug that forem exposed (four SendEmailNotificationWorker classes
across modules; simple-name resolution mis-targeted 7/143 edges to the wrong
namespace). Fixed by resolving a namespaced receiver via exact qualified-name
lookup first, falling back to the simple name only for a unique worker — an
ambiguous unqualified collision bails (precision over recall).

Surfaces as `dynamic: sidekiq dispatch` via the generic synth-edge fallback.

Validated 100% precision on two grep-confirmed repos: loomio (medium,
Sidekiq::Worker, 47 edges) and forem (large, both include aliases — 131
Sidekiq::Job + 11 Sidekiq::Worker, 142 edges, 0 worker/source false positives,
0 namespace mismatch); 0 on the jekyll control. Node-stable (pure edge synth).
Suite 1621 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry hace 2 días
padre
commit
2c522c6254

+ 1 - 0
CHANGELOG.md

@@ -18,6 +18,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - `codegraph_explore` now follows **Celery** task dispatch in Python. A `send_email.delay(...)` or `send_email.apply_async(...)` call now links to the `@shared_task` / `@app.task` function it runs — typically defined in a different module (`tasks.py`) from where it's triggered (a view or service) — so "what actually happens when this is dispatched?" traces from the call site straight into the task body instead of stopping at the `.delay()` line. Both decorator dialects are recognized (bare `@shared_task` and the arg'd `@app.task(bind=True, …)` form), including the module-qualified `tasks.invalidate_cache.apply_async()` call style. It stays precise: a `.delay()` on something that isn't a Celery task is never mislinked, so a project that doesn't use Celery is unaffected.
 - `codegraph_explore` now follows **Spring application events** in Java. A `publishEvent(new OrderShippedEvent(...))` call now links to every `@EventListener` that handles that event — usually in a different class — so "what reacts when this is published?" traces from the publisher straight into each listener method instead of dead-ending at `publishEvent(...)`. The link is by event type, and all the common listener styles are recognized: a `@EventListener` typed on its parameter, the `@EventListener(SomeEvent.class)` form, `@TransactionalEventListener`, and the older `implements ApplicationListener<SomeEvent>`. One event fans out to all its listeners, and a plain Spring app with no event bus is unaffected.
 - `codegraph_explore` now follows **MediatR** request and notification dispatch in C#/.NET. A `_mediator.Send(command)` or `_mediator.Publish(notification)` call now links to the `Handle` method of the matching `IRequestHandler<>` / `INotificationHandler<>` — usually in a different file in a Clean Architecture layout — so "what handles this command?" traces from the controller straight into the handler instead of stopping at the mediator call. The sent type is recognized whether it's constructed inline (`Send(new GetFooQuery())`), built into a local first (`var cmd = new …; Send(cmd)`), or passed in as a parameter, and it's matched by type — so a `MessagingCenter.Send(...)` or a same-named DTO that isn't a request is never mislinked, and a project without MediatR is unaffected.
+- `codegraph_explore` now follows **Sidekiq** background-job dispatch in Ruby. A `DestroyUserWorker.perform_async(id)` (or `.perform_in` / `.perform_at`) call now links to that worker's `perform` method — usually in `app/workers/` away from the controller or model that enqueues it — so "what runs in the background here?" traces from the enqueue straight into the job body. Both the modern `include Sidekiq::Job` and the older `Sidekiq::Worker` are recognized, namespaced workers resolve to the right class even when several share a name (e.g. `Comments::NotifyWorker` vs `Articles::NotifyWorker`), and Rails ActiveJob's `perform_later` — a different mechanism — is intentionally left alone.
 
 - `codegraph_explore` now surfaces the right code in large multi-layer projects. When you ask a backend-flow question in a repo that pairs an API server with a big frontend that mirrors the same domain words — say an `app/` admin UI sitting over an `api/` server — the server-side file that genuinely matches several of your query's terms is no longer pushed out of the results by the larger, more interconnected frontend layer. A file corroborated by two or more distinct query terms is now kept in the answer even when a denser unrelated layer would otherwise crowd it out, so "how does X read items / handle the request" returns the service or handler that does the work instead of a wall of frontend views. Single-layer projects are unaffected; set `CODEGRAPH_RANK_NO_MULTITERM=1` to revert to the previous ranking.
 - Impact and blast-radius analysis for TypeScript, JavaScript, Go, Python, Rust, Ruby, C, Java, C#, PHP, Scala, Kotlin, Swift, Dart, and Pascal/Delphi now understands the readers of a constant. When you change a file-scope, package-level, module-level, or class-level constant — a config object, a lookup table, a shared constant — the other symbols in that file that read it now show up as affected, where before they were invisible (impact only followed calls, imports, and inheritance, so a constant's consumers looked like "nothing depends on this"). This makes `codegraph impact`, and the impact trail in `codegraph_explore`/`codegraph_node`, catch the "change this table, break its readers" class of change. It's on by default and adds no nodes to your graph; bundled/minified files and ambiguously-shadowed names are skipped to keep results precise. Set `CODEGRAPH_VALUE_REFS=0` to turn it off.

+ 128 - 0
__tests__/sidekiq-dispatch-synthesizer.test.ts

@@ -0,0 +1,128 @@
+/**
+ * Sidekiq job-dispatch bridge (Ruby).
+ *
+ * Sidekiq decouples a job enqueue from the worker's `perform`, linked by the WORKER CLASS
+ * NAME: `DestroyUserWorker.perform_async(id)` has no static edge to `DestroyUserWorker#perform`
+ * (usually a different file). This bridges each `Worker.perform_async`/`.perform_in`/`.perform_at`
+ * site to that worker's instance `perform`, gated on the class including `Sidekiq::Job`/`Worker`.
+ * Covers both include aliases, the scheduled forms, namespace disambiguation (two `NotifyWorker`s
+ * in different modules resolve to the right one by qualified name), and the precision boundary: a
+ * non-worker class with a `perform`, and an ActiveJob `perform_later`, both produce no edge.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+
+describe('sidekiq-dispatch synthesizer', () => {
+  let dir: string;
+  beforeEach(() => { dir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidekiq-dispatch-')); });
+  afterEach(() => { fs.rmSync(dir, { recursive: true, force: true }); });
+
+  const write = (rel: string, body: string) => {
+    const p = path.join(dir, rel);
+    fs.mkdirSync(path.dirname(p), { recursive: true });
+    fs.writeFileSync(p, body);
+  };
+
+  it('bridges perform_async/_in to #perform, disambiguates namespaces, ignores non-workers and ActiveJob', async () => {
+    write('app/workers/destroy_user_worker.rb', `class DestroyUserWorker
+  include Sidekiq::Worker
+  def perform(user_id)
+    User.find(user_id).destroy!
+  end
+end
+`);
+    // Modern Sidekiq::Job alias + the scheduled form.
+    write('app/workers/send_email_worker.rb', `class SendEmailWorker
+  include Sidekiq::Job
+  def perform(addr)
+  end
+end
+`);
+    // Namespace collision: two NotifyWorkers, same simple name, different modules.
+    write('app/workers/comments/notify_worker.rb', `module Comments
+  class NotifyWorker
+    include Sidekiq::Job
+    def perform(id)
+    end
+  end
+end
+`);
+    write('app/workers/articles/notify_worker.rb', `module Articles
+  class NotifyWorker
+    include Sidekiq::Job
+    def perform(id)
+    end
+  end
+end
+`);
+    // A non-worker class that happens to have a `perform` method — never a target.
+    write('app/services/report.rb', `class Report
+  def perform(x)
+  end
+end
+`);
+    // An ActiveJob — dispatched via perform_later, a different shape, not matched.
+    write('app/jobs/cleanup_job.rb', `class CleanupJob < ApplicationJob
+  def perform
+  end
+end
+`);
+    write('app/services/user_service.rb', `class UserService
+  def deactivate(user)
+    DestroyUserWorker.perform_async(user.id)
+    SendEmailWorker.perform_in(5, user.email)
+    Comments::NotifyWorker.perform_async(1)
+    Articles::NotifyWorker.perform_async(2)
+    Report.perform_async(3)
+    CleanupJob.perform_later
+  end
+end
+`);
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+
+    const edges = db
+      .prepare(
+        `SELECT s.name source, t.name target, t.file_path tf, json_extract(e.metadata,'$.via') via
+         FROM edges e JOIN nodes s ON s.id = e.source JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'sidekiq-dispatch'`
+      )
+      .all();
+
+    // Four enqueues bridge: both aliases, perform_async + perform_in, two namespaced.
+    expect(edges.map((r: any) => r.via).sort()).toEqual([
+      'Articles::NotifyWorker', 'Comments::NotifyWorker', 'DestroyUserWorker', 'SendEmailWorker',
+    ]);
+    expect(edges.every((r: any) => r.target === 'perform' && r.source === 'deactivate')).toBe(true);
+    // Namespace disambiguation: each NotifyWorker hits its OWN module's file, not the other.
+    expect(edges.find((r: any) => r.via === 'Comments::NotifyWorker').tf).toMatch(/comments[\\/]notify_worker\.rb$/);
+    expect(edges.find((r: any) => r.via === 'Articles::NotifyWorker').tf).toMatch(/articles[\\/]notify_worker\.rb$/);
+    // PRECISION: a non-worker `perform`, and ActiveJob `perform_later`, contribute nothing.
+    expect(edges.some((r: any) => r.via === 'Report')).toBe(false);
+    expect(edges.some((r: any) => /Cleanup/.test(r.via))).toBe(false);
+
+    cg.close?.();
+  });
+
+  it('produces no edges in a Ruby project with no Sidekiq (clean control)', async () => {
+    write('lib/calc.rb', `class Calc
+  def add(a, b)
+    a + b
+  end
+end
+`);
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+    const db = (cg as any).db.db;
+    const count = db
+      .prepare(`SELECT count(*) c FROM edges WHERE json_extract(metadata,'$.synthesizedBy') = 'sidekiq-dispatch'`)
+      .get();
+    expect(count.c).toBe(0);
+    cg.close?.();
+  });
+});

La diferencia del archivo ha sido suprimido porque es demasiado grande
+ 0 - 0
docs/design/dispatch-synthesizer-backlog.md


+ 93 - 1
src/resolution/callback-synthesizer.ts

@@ -2415,6 +2415,95 @@ function mediatrDispatchEdges(ctx: ResolutionContext): Edge[] {
   return edges;
 }
 
+// ── Sidekiq job dispatch (Ruby) ───────────────────────────────────────────────
+// Sidekiq decouples a job's enqueue site from the worker's `perform`, linked by the WORKER
+// CLASS NAME:
+//   # app/workers/destroy_user_worker.rb
+//   class DestroyUserWorker
+//     include Sidekiq::Worker          # or Sidekiq::Job (the modern alias)
+//     def perform(user_id) … end
+//   # app/services/… — a DIFFERENT file
+//   DestroyUserWorker.perform_async(user.id)   # also .perform_in(t, …) / .perform_at(t, …)
+// Bridge it: link the enclosing method at each `Worker.perform_async/_in/_at(…)` site → that
+// worker's instance `perform`. Name-keyed (like Celery): the receiver class must be a Sidekiq
+// worker — gated by reading `include Sidekiq::Job|Worker` from the class body, since that mixin
+// is an external gem module that forms no resolvable edge. ActiveJob's `perform_later`/`_now` is
+// a different shape and deliberately not matched, so an ActiveJob-only app yields 0.
+const SIDEKIQ_DISPATCH_RE = /([A-Z][A-Za-z0-9_]*(?:::[A-Z][A-Za-z0-9_]*)*)\s*\.\s*perform_(?:async|in|at)\b/g;
+const SIDEKIQ_WORKER_RE = /\binclude\s+Sidekiq::(?:Job|Worker)\b/;
+const SIDEKIQ_RB_EXT = /\.rb$/;
+const SIDEKIQ_FANOUT_CAP = 80;
+
+function sidekiqDispatchEdges(ctx: ResolutionContext): Edge[] {
+  // class node id → its instance `perform` method (null if the class isn't a Sidekiq worker),
+  // memoized. Reads the class body for the mixin; only consulted for actual dispatch receivers.
+  const performCache = new Map<string, Node | null>();
+  const performOf = (cls: Node): Node | null => {
+    let v = performCache.get(cls.id);
+    if (v !== undefined) return v;
+    v = null;
+    const content = ctx.readFile(cls.filePath);
+    if (content) {
+      const end = cls.endLine ?? cls.startLine;
+      const body = content.split('\n').slice(cls.startLine - 1, end).join('\n');
+      if (SIDEKIQ_WORKER_RE.test(body)) {
+        v = ctx.getNodesInFile(cls.filePath).find(
+          (n) => n.kind === 'method' && n.name === 'perform' && n.startLine >= cls.startLine && n.startLine <= end
+        ) ?? null;
+      }
+    }
+    performCache.set(cls.id, v);
+    return v;
+  };
+
+  // Resolve a (possibly namespaced) worker reference to its `perform`. A namespaced ref is
+  // matched by EXACT qualified name first, so same-named workers in different namespaces
+  // (forem has four `SendEmailNotificationWorker`s) resolve to the right one; an unqualified
+  // ref falls back to the simple name and links only when a single worker bears it — an
+  // ambiguous collision bails (precision over recall).
+  const resolve = (ref: string): Node | null => {
+    if (ref.includes('::')) {
+      const q = ctx.getNodesByQualifiedName(ref).find((n) => n.kind === 'class' && performOf(n));
+      if (q) return performOf(q);
+    }
+    const workers = ctx.getNodesByName(ref.split('::').pop()!).filter((n) => n.kind === 'class' && performOf(n));
+    return workers.length === 1 ? performOf(workers[0]!) : null;
+  };
+
+  const edges: Edge[] = [];
+  const seen = new Set<string>();
+  for (const file of ctx.getAllFiles()) {
+    if (!SIDEKIQ_RB_EXT.test(file)) continue;
+    const content = ctx.readFile(file);
+    if (!content || !/\.perform_(?:async|in|at)\b/.test(content)) continue;
+    const safe = stripCommentsForRegex(content, 'ruby');
+    const nodesInFile = ctx.getNodesInFile(file);
+    SIDEKIQ_DISPATCH_RE.lastIndex = 0;
+    let m: RegExpExecArray | null;
+    let added = 0;
+    while ((m = SIDEKIQ_DISPATCH_RE.exec(safe)) && added < SIDEKIQ_FANOUT_CAP) {
+      const line = safe.slice(0, m.index).split('\n').length;
+      const disp = enclosingFn(nodesInFile, line);
+      if (!disp) continue;
+      const target = resolve(m[1]!);
+      if (!target || target.id === disp.id) continue;
+      const key = `${disp.id}>${target.id}`;
+      if (seen.has(key)) continue;
+      seen.add(key);
+      edges.push({
+        source: disp.id,
+        target: target.id,
+        kind: 'calls',
+        line,
+        provenance: 'heuristic',
+        metadata: { synthesizedBy: 'sidekiq-dispatch', via: m[1]!, registeredAt: `${file}:${line}` },
+      });
+      added++;
+    }
+  }
+  return edges;
+}
+
 /**
  * Synthesize dispatcher→callback edges (field observers + EventEmitters +
  * React re-render + JSX children + Vue templates + SvelteKit load + RN event
@@ -2422,7 +2511,8 @@ function mediatrDispatchEdges(ctx: ResolutionContext): Edge[] {
  * Redux-thunk dispatch chain + object-literal registry dispatch + RTK Query
  * generated-hook → endpoint + Pinia useStore().action() + Vuex string dispatch +
  * Celery task .delay()/.apply_async() → task body + Spring publishEvent → @EventListener +
- * MediatR Send/Publish → IRequestHandler/INotificationHandler).
+ * MediatR Send/Publish → IRequestHandler/INotificationHandler +
+ * Sidekiq Worker.perform_async → #perform).
  * Returns the count added. Never throws into indexing — callers wrap in try/catch.
  */
 export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionContext): number {
@@ -2468,6 +2558,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
   const celeryEdges = celeryDispatchEdges(ctx);
   const springEdges = springEventEdges(ctx);
   const mediatrEdges = mediatrDispatchEdges(ctx);
+  const sidekiqEdges = sidekiqDispatchEdges(ctx);
 
   const merged: Edge[] = [];
   const seen = new Set<string>();
@@ -2499,6 +2590,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
     ...celeryEdges,
     ...springEdges,
     ...mediatrEdges,
+    ...sidekiqEdges,
   ]) {
     const key = `${e.source}>${e.target}`;
     if (seen.has(key)) continue;

Algunos archivos no se mostraron porque demasiados archivos cambiaron en este cambio