Explorar el Código

feat(extraction): index Metal shader files (.metal) via the C++ grammar (#1121) (#1151)

.metal was absent from EXTENSION_MAP, so Metal Shading Language files were
silently skipped. MSL ≈ C++14, and the C++ grammar extracts its functions,
structs, type aliases, and call edges at parity with plain C++ — except MSL's
post-declarator [[attribute]] annotations, which misparse struct fields into
spurious extends refs from the struct to the field's own type (a wrong
inheritance edge whenever the repo typedefs float3/float4x4 itself, common in
shared ShaderTypes.h). blankMetalAttributes blanks them pre-parse,
offset-preserving, following the blankCppExportMacros pattern (#1061), gated
to .metal files only — in regular C++ the attribute position is legal syntax
the grammar parses natively. The preParse hook gains an optional filePath
param to support the gate.

Validated on llama.cpp's ggml-metal.metal (10.7k lines: 130 kernels vs 113
`kernel void` ground-truth lines, rope_yarn resolves its 4 kernel callers)
and SDL's shaders (PQtoLinear ← GetOutputColor), 0 bogus extends edges.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Colby Mchenry hace 1 día
padre
commit
cc89146454

+ 1 - 0
CHANGELOG.md

@@ -13,6 +13,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 - The Claude Code context hook now recognizes prompts that describe code in plain words — in any language — by checking the prompt's words against the symbol names actually in your project's index. Asking about "the state machine des commandes" finds `OrderStateMachine` with no keyword involved. Confidence decides how much gets injected: structural questions and prompts naming a real symbol still get full context up front; a plain-words match gets a short pointer to the matching symbols so the agent queries them itself; everything else stays silent, exactly as before.
 - Anonymous usage telemetry now counts how often the context hook injected context, offered a hint, or stayed silent — fixed counter names only; the prompt's content is never stored or sent. This makes the hook's accuracy measurable instead of guessed. The counters record what actually happened, not what was attempted: a lookup that errors or comes back empty counts as a distinct silent outcome, never as delivered context (#1143, thanks @inth3shadows).
+- Metal shader files (`.metal`) are now indexed. Metal Shading Language is close enough to C++ that vertex/fragment/kernel functions, structs, type aliases, and the calls between them all land in the graph — so shader pipelines in Apple-platform projects show up in impact analysis and flow traces instead of being silently skipped. Metal's `[[buffer(0)]]`-style attribute annotations are handled so they can't corrupt what gets extracted. Thanks @FluxKo for the report. (#1121)
 
 ### Fixes
 

+ 2 - 1
README.md

@@ -244,7 +244,7 @@ The reliable, universal payoff is **surgical context and speed**: CodeGraph coll
 | **Full-Text Search** | Find code by name instantly across your entire codebase, powered by FTS5 |
 | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes |
 | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config |
-| **20+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Objective-C, Swift, Kotlin, Scala, Dart, Lua, Luau, R, Svelte, Vue, Astro, Liquid, Pascal/Delphi |
+| **20+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Objective-C, Metal, Swift, Kotlin, Scala, Dart, Lua, Luau, R, Svelte, Vue, Astro, Liquid, Pascal/Delphi |
 | **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 17 frameworks |
 | **Mixed iOS / React Native / Expo** | Closes cross-language flows that static parsing misses: Swift ↔ ObjC bridging, React Native legacy bridge + TurboModules + Fabric view components, native → JS event emitters, Expo Modules |
 | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only |
@@ -702,6 +702,7 @@ is written):
 | C | `.c`, `.h` | Full support |
 | C++ | `.cpp`, `.hpp`, `.cc` | Full support |
 | Objective-C | `.m`, `.mm`, `.h` | Partial support (classes, protocols, methods, `@property`, `#import`, message sends; `.mm` ObjC++ may parse incompletely) |
+| Metal | `.metal` | Full support (vertex/fragment/kernel functions, structs, type aliases, call edges — MSL parses as C++, with `[[attribute]]` annotations handled) |
 | Swift | `.swift` | Full support |
 | Kotlin | `.kt`, `.kts` | Full support |
 | Scala | `.scala`, `.sc` | Full support (classes, traits, methods, type aliases, Scala 3 enums) |

+ 118 - 1
__tests__/extraction.test.ts

@@ -11,7 +11,7 @@ import * as os from 'os';
 import { CodeGraph } from '../src';
 import { extractFromSource, scanDirectory, buildDefaultIgnore, discoverEmbeddedRepoRoots, buildScopeIgnore } from '../src/extraction';
 import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars, isSourceFile } from '../src/extraction/grammars';
-import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros, recoverMangledCppName } from '../src/extraction/languages/c-cpp';
+import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros, blankMetalAttributes, recoverMangledCppName } from '../src/extraction/languages/c-cpp';
 import { normalizePath } from '../src/utils';
 
 beforeAll(async () => {
@@ -102,6 +102,11 @@ describe('Language Detection', () => {
     expect(detectLanguage('stdio.h', '#ifndef STDIO_H\nvoid printf();\n#endif\n')).toBe('c');
   });
 
+  it('should detect Metal shader files as C++ (#1121)', () => {
+    expect(detectLanguage('Shaders.metal')).toBe('cpp');
+    expect(isSourceFile('Renderer/Shaders.metal')).toBe(true);
+  });
+
   it('should return unknown for unsupported extensions', () => {
     expect(detectLanguage('styles.css')).toBe('unknown');
     expect(detectLanguage('data.json')).toBe('unknown');
@@ -2811,6 +2816,118 @@ class MYGAME_API UMyComponent : public UActorComponent { };
     });
   });
 
+  describe('Metal shader extraction (#1121)', () => {
+    // Metal Shading Language (≈ C++14) parses with the C++ grammar. MSL puts
+    // `[[attribute]]` annotations AFTER the declarator — a position
+    // tree-sitter-cpp misparses: a struct field with a trailing attribute
+    // emitted a spurious `extends` ref from the struct to the field's own type.
+    // blankMetalAttributes (preParse, `.metal`-gated) blanks them so extraction
+    // matches plain C++.
+    const METAL = `#include <metal_stdlib>
+using namespace metal;
+
+struct VertexIn {
+    float3 position [[attribute(0)]];
+    float2 texCoord [[attribute(1)]];
+};
+
+struct VertexOut {
+    float4 position [[position]];
+    float2 texCoord;
+};
+
+struct Uniforms {
+    float4x4 modelViewProjection;
+};
+
+static float4 applyGamma(float4 color) {
+    return pow(color, float4(1.0 / 2.2));
+}
+
+vertex VertexOut vertexShader(VertexIn in [[stage_in]],
+                              constant Uniforms &uniforms [[buffer(0)]]) {
+    VertexOut out;
+    out.position = uniforms.modelViewProjection * float4(in.position, 1.0);
+    out.texCoord = in.texCoord;
+    return out;
+}
+
+fragment float4 fragmentShader(VertexOut in [[stage_in]],
+                               texture2d<float> colorTexture [[texture(0)]],
+                               sampler textureSampler [[sampler(0)]]) {
+    float4 color = colorTexture.sample(textureSampler, in.texCoord);
+    return applyGamma(color);
+}
+
+kernel void computeBlur(texture2d<float, access::read> inTexture [[texture(0)]],
+                        texture2d<float, access::write> outTexture [[texture(1)]],
+                        uint2 gid [[thread_position_in_grid]]) {
+    float4 color = inTexture.read(gid);
+    outTexture.write(color, gid);
+}
+`;
+
+    it('extracts vertex/fragment/kernel functions, structs, and calls from a .metal file', () => {
+      const result = extractFromSource('Shaders.metal', METAL);
+      expect(result.errors).toHaveLength(0);
+
+      const functions = result.nodes.filter((n) => n.kind === 'function').map((n) => n.name);
+      expect(functions).toEqual(
+        expect.arrayContaining(['applyGamma', 'vertexShader', 'fragmentShader', 'computeBlur'])
+      );
+      const structs = result.nodes.filter((n) => n.kind === 'struct').map((n) => n.name);
+      expect(structs).toEqual(expect.arrayContaining(['VertexIn', 'VertexOut', 'Uniforms']));
+      expect(result.nodes.find((n) => n.kind === 'import')?.name).toBe('metal_stdlib');
+
+      // Attribute blanking is offset-preserving, so positions stay exact.
+      const vertexFn = result.nodes.find((n) => n.name === 'vertexShader')!;
+      expect(vertexFn.startLine).toBe(22);
+
+      // The shader call graph connects: fragmentShader → applyGamma.
+      expect(
+        result.unresolvedReferences.find(
+          (r) => r.referenceKind === 'calls' && r.referenceName === 'applyGamma'
+        )
+      ).toBeTruthy();
+
+      // The regression the blanking fixes: field attributes (`float3 position
+      // [[attribute(0)]];`) misparsed into `extends` refs from the struct to the
+      // field's type — a wrong inheritance edge whenever the repo defines that
+      // type itself (simd typedefs in a shared ShaderTypes.h are common).
+      expect(result.unresolvedReferences.filter((r) => r.referenceKind === 'extends')).toHaveLength(0);
+    });
+
+    it('blankMetalAttributes blanks every attribute form, offset-preserving', () => {
+      const inp = [
+        'float4 position [[position]];',
+        'constant Uniforms &u [[buffer(0)]]',
+        'float2 uv [[user(locn0)]];',
+        'device float *out [[buffer(0), raster_order_group(0)]]',
+      ].join('\n');
+      const out = blankMetalAttributes(inp);
+      expect(out.length).toBe(inp.length); // every byte offset preserved
+      expect(out).not.toContain('[[');
+      // Nothing but the attributes changed: collapsing the blank runs gives the
+      // plain declarations back, newlines untouched.
+      expect(out.split('\n').map((l) => l.replace(/ +/g, ' ').trimEnd())).toEqual([
+        'float4 position ;',
+        'constant Uniforms &u',
+        'float2 uv ;',
+        'device float *out',
+      ]);
+    });
+
+    it('blankMetalAttributes never touches non-attribute [[ sequences', () => {
+      for (const c of [
+        'auto x = arr[[]{ return 0; }()];', // lambda in subscript — the only other [[ in C++-family code
+        'int y = a[b[i]];', // nested subscript
+        'int z = 1;', // no [[ at all — early-return path
+      ]) {
+        expect(blankMetalAttributes(c)).toBe(c);
+      }
+    });
+  });
+
   describe('C++ forward declarations do not mint phantom class nodes (#1093)', () => {
     // `class Foo;` parses as a bodiless class_specifier. Repeated across headers,
     // each forward decl minted a phantom bodiless `class` node that crowded out —

+ 4 - 0
src/extraction/grammars.ts

@@ -108,6 +108,10 @@ export const EXTENSION_MAP: Record<string, Language> = {
   '.luau': 'luau',
   '.m': 'objc',
   '.mm': 'objc',
+  // Metal Shading Language ≈ C++14: the C++ grammar extracts its functions,
+  // structs, and calls. MSL-specific `[[attribute]]` annotations are blanked
+  // pre-parse for `.metal` files (see blankMetalAttributes in c-cpp.ts). (#1121)
+  '.metal': 'cpp',
   // XML: file-level tracking; the MyBatis extractor matches `<mapper namespace="...">`
   // shape and emits SQL-statement nodes (other XML returns empty).
   '.xml': 'xml',

+ 35 - 3
src/extraction/languages/c-cpp.ts

@@ -356,10 +356,42 @@ export function recoverMangledCppName(name: string): string {
   return candidate;
 }
 
+/**
+ * Blank Metal Shading Language `[[attribute]]` annotations before parsing.
+ * MSL (≈ C++14) puts attributes AFTER the declarator — `float4 position
+ * [[position]];`, `constant Uniforms &u [[buffer(0)]]` — a position
+ * tree-sitter-cpp can't reconcile: a struct field with a trailing attribute
+ * misparses into a shape that emits a spurious `extends` reference from the
+ * struct to the field's *type* (`VertexIn extends float3`), which becomes a
+ * wrong inheritance edge whenever the repo defines that type itself (simd
+ * typedefs in a shared ShaderTypes.h are common). Replacing the attribute with
+ * equal-length spaces preserves every byte offset and lets fields and
+ * parameters parse as ordinary declarations, mirroring the macro blanks above.
+ *
+ * Matched tightly to the attribute shape — `[[ident]]`, `[[ident(args)]]`, and
+ * comma-separated lists (`[[buffer(0), raster_order_group(0)]]`) — so a
+ * subscripted lambda call (`arr[[]{ … }()]`, the only other way `[[` appears in
+ * C++-family source) can never match: after `[[` a lambda continues with `]`,
+ * never an identifier followed by `]]`. Applied ONLY to `.metal` files — in
+ * regular C++ the pre-declarator attribute position (`[[nodiscard]] int f()`)
+ * is legal syntax the grammar parses natively, and blanking it would be pure
+ * blast radius. (#1121)
+ */
+const METAL_ATTRIBUTE_RE =
+  /\[\[\s*[A-Za-z_]\w*(?:\s*\([^()\n]*\))?(?:\s*,\s*[A-Za-z_]\w*(?:\s*\([^()\n]*\))?)*\s*\]\]/g;
+export function blankMetalAttributes(source: string): string {
+  if (source.indexOf('[[') === -1) return source;
+  return source.replace(METAL_ATTRIBUTE_RE, (m) => ' '.repeat(m.length));
+}
+
 /** C/C++ source pre-processing before tree-sitter: recover both macro-annotated
- * class definitions and macro-prefixed function definitions. Offset-preserving. */
-function preParseCppSource(source: string): string {
-  return blankCppInlineMacros(blankCppExportMacros(source));
+ * class definitions and macro-prefixed function definitions — plus, for `.metal`
+ * shaders (parsed with the C++ grammar), MSL attribute annotations. Offset-preserving. */
+function preParseCppSource(source: string, filePath?: string): string {
+  const blanked = blankCppInlineMacros(blankCppExportMacros(source));
+  return filePath && filePath.toLowerCase().endsWith('.metal')
+    ? blankMetalAttributes(blanked)
+    : blanked;
 }
 
 export const cppExtractor: LanguageExtractor = {

+ 3 - 1
src/extraction/tree-sitter-types.ts

@@ -85,8 +85,10 @@ export interface LanguageExtractor {
    * grammar mis-parses inside enum bodies). MUST preserve byte offsets (replace
    * removed text with spaces, keep newlines) so node positions and getNodeText
    * stay correct; the returned string is used for both parsing and extraction.
+   * `filePath` lets a transform key off the concrete file extension when one
+   * language id serves several dialects (C++ also parses `.metal` shaders).
    */
-  preParse?: (source: string) => string;
+  preParse?: (source: string, filePath?: string) => string;
 
   // --- Node type mappings ---
 

+ 1 - 1
src/extraction/tree-sitter.ts

@@ -423,7 +423,7 @@ export class TreeSitterExtractor {
       // this.source so downstream getNodeText reads the same bytes the parser
       // saw (identical outside the blanked directive lines).
       if (this.extractor?.preParse) {
-        this.source = this.extractor.preParse(this.source);
+        this.source = this.extractor.preParse(this.source, this.filePath);
       }
       this.tree = parser.parse(this.source) ?? null;
       if (!this.tree) {