Browse Source

Add codegraph_explore tool and expand exclude patterns

- Add codegraph_explore MCP tool for deep exploration with condensed output
- Expand default exclude patterns for framework build outputs (.next, .nuxt, .expo, etc.)
- Increase node ID hash length from 16 to 32 chars to prevent collisions
- Add feature request detection with UX clarification reminders
- Update README with MCP tools reference and best practices
- Update CLAUDE.md with context usage guidelines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Colby McHenry 5 months ago
parent
commit
0c168c4d8b
7 changed files with 489 additions and 20 deletions
  1. 29 0
      CLAUDE.md
  2. 73 6
      README.md
  3. 2 2
      package-lock.json
  4. 1 1
      package.json
  5. 4 1
      src/extraction/tree-sitter.ts
  6. 264 3
      src/mcp/tools.ts
  7. 116 7
      src/types.ts

+ 29 - 0
CLAUDE.md

@@ -120,6 +120,35 @@ codegraph hooks install     # Install git auto-sync
 codegraph serve --mcp       # Start MCP server
 ```
 
+## MCP Tools Best Practices
+
+When using CodeGraph MCP tools, follow these guidelines to minimize context usage:
+
+### For Complex Tasks (features, refactoring, architecture)
+Use `codegraph_explore` - it does intensive exploration internally and returns a condensed brief:
+```
+codegraph_explore(task: "implement user authentication with OAuth")
+```
+This replaces multiple tool calls and keeps your main context clean.
+
+### For Simple Lookups
+Use targeted tools directly:
+- `codegraph_search` - Find symbols by name
+- `codegraph_node` - Get details about one symbol
+- `codegraph_callers/callees` - Understand usage patterns
+
+### Context Usage Comparison
+| Approach | Context Impact |
+|----------|---------------|
+| Multiple `codegraph_*` calls | High - each result stays in context |
+| Single `codegraph_explore` | Low - returns condensed summary |
+
+### Important
+CodeGraph provides **code context**, not product requirements. For new features, still ask the user about:
+- UX preferences and behavior
+- Edge cases and error handling
+- Acceptance criteria
+
 ## Test Structure
 
 Tests are in `__tests__/` directory with files mirroring the module structure:

+ 73 - 6
README.md

@@ -139,8 +139,12 @@ CodeGraph builds a semantic knowledge graph of codebases for better code explora
 
 ### If `.codegraph/` exists in the project
 
-Use the codegraph MCP tools instead of manually searching:
+**For complex tasks (features, refactoring, multi-file changes):**
+Use `codegraph_explore` first - it does deep exploration internally and returns a condensed brief, keeping your context clean:
 
+codegraph_explore(task: "implement bundle product swapping", keywords: "bundle,swap,subscription")
+
+**For simple lookups**, use targeted tools:
 - `codegraph_search` - Find symbols by name
 - `codegraph_context` - Get context for a task/issue
 - `codegraph_callers` - Find what calls a function
@@ -149,11 +153,7 @@ Use the codegraph MCP tools instead of manually searching:
 - `codegraph_node` - Get details about a specific symbol
 - `codegraph_status` - Check index status
 
-Use these tools when:
-- Exploring unfamiliar code
-- Finding where a function is used
-- Understanding dependencies before making changes
-- Building context for bug fixes or features
+**Important:** CodeGraph provides CODE context, not product requirements. For new features, still ask the user about UX preferences, edge cases, and acceptance criteria before implementing.
 
 The index auto-updates via git post-commit hook, so no manual sync needed.
 
@@ -287,6 +287,73 @@ codegraph serve --mcp                    # Start MCP server (stdio)
 codegraph serve --mcp --path /project    # Specify project path
 ```
 
+## 🔌 MCP Tools Reference
+
+When running as an MCP server, CodeGraph exposes these tools to AI assistants:
+
+### `codegraph_explore` ⭐ Recommended for complex tasks
+
+Deep exploration that returns a condensed brief. Use this for features, refactoring, or multi-file changes to keep your main context clean.
+
+```
+codegraph_explore(task: "implement user authentication", keywords: "auth,login,user")
+```
+
+**Returns:** Key files, entry points, types, functions, data flow, and suggested next steps.
+
+### `codegraph_context`
+
+Build context for a specific task. Good for focused queries.
+
+```
+codegraph_context(task: "fix checkout validation bug", maxNodes: 20)
+```
+
+### `codegraph_search`
+
+Quick symbol search by name. Returns locations only.
+
+```
+codegraph_search(query: "UserService", kind: "class", limit: 10)
+```
+
+### `codegraph_callers` / `codegraph_callees`
+
+Find what calls a function, or what a function calls.
+
+```
+codegraph_callers(symbol: "validatePayment", limit: 20)
+codegraph_callees(symbol: "processOrder", limit: 20)
+```
+
+### `codegraph_impact`
+
+Analyze what code would be affected by changing a symbol.
+
+```
+codegraph_impact(symbol: "UserService", depth: 2)
+```
+
+### `codegraph_node`
+
+Get details about a specific symbol. Use `includeCode: true` only when needed.
+
+```
+codegraph_node(symbol: "authenticate", includeCode: true)
+```
+
+### `codegraph_status`
+
+Check index health and statistics.
+
+### Context Usage Best Practices
+
+| Approach | Context Impact | When to Use |
+|----------|----------------|-------------|
+| `codegraph_explore` | **Low** - condensed summary | Complex features, refactoring |
+| Multiple tool calls | **High** - each result accumulates | Avoid for complex tasks |
+| `codegraph_context` | **Medium** | Focused, single-area queries |
+
 ## 📚 Library Usage
 
 CodeGraph can also be used as a library in your Node.js applications:

+ 2 - 2
package-lock.json

@@ -1,12 +1,12 @@
 {
   "name": "codegraph",
-  "version": "0.1.2",
+  "version": "0.1.5",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "codegraph",
-      "version": "0.1.2",
+      "version": "0.1.5",
       "license": "MIT",
       "dependencies": {
         "@xenova/transformers": "^2.17.0",

+ 1 - 1
package.json

@@ -1,6 +1,6 @@
 {
   "name": "@colbymchenry/codegraph",
-  "version": "0.1.4",
+  "version": "0.1.5",
   "description": "A local-first code intelligence system that builds a semantic knowledge graph from any codebase",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

+ 4 - 1
src/extraction/tree-sitter.ts

@@ -19,6 +19,9 @@ import { getParser, detectLanguage, isLanguageSupported } from './grammars';
 
 /**
  * Generate a unique node ID
+ *
+ * Uses a 32-character (128-bit) hash to avoid collisions when indexing
+ * large codebases with many files containing similar symbols.
  */
 export function generateNodeId(
   filePath: string,
@@ -30,7 +33,7 @@ export function generateNodeId(
     .createHash('sha256')
     .update(`${filePath}:${kind}:${name}:${line}`)
     .digest('hex')
-    .substring(0, 16);
+    .substring(0, 32);
   return `${kind}:${hash}`;
 }
 

+ 264 - 3
src/mcp/tools.ts

@@ -71,7 +71,7 @@ export const tools: ToolDefinition[] = [
   },
   {
     name: 'codegraph_context',
-    description: 'PRIMARY TOOL: Build comprehensive context for a task. Returns entry points, related symbols, and key code - often enough to understand the codebase without additional tool calls.',
+    description: 'PRIMARY TOOL: Build comprehensive context for a task. Returns entry points, related symbols, and key code - often enough to understand the codebase without additional tool calls. NOTE: This provides CODE context, not product requirements. For new features, still clarify UX/behavior questions with the user before implementing.',
     inputSchema: {
       type: 'object',
       properties: {
@@ -177,6 +177,29 @@ export const tools: ToolDefinition[] = [
       properties: {},
     },
   },
+  {
+    name: 'codegraph_explore',
+    description: 'RECOMMENDED FOR COMPLEX TASKS: Deep exploration that returns a condensed brief. Internally performs multiple searches, call graph analysis, and impact assessment - then synthesizes results into a compact summary. Use this instead of multiple codegraph_* calls to keep your context clean. Returns: key files, critical functions, data flow summary, and suggested approach.',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        task: {
+          type: 'string',
+          description: 'Detailed description of the feature, bug, or task to explore',
+        },
+        focus: {
+          type: 'string',
+          description: 'Optional focus area: "architecture" (structure & patterns), "implementation" (specific code), or "impact" (what would change). Default: auto-detect.',
+          enum: ['architecture', 'implementation', 'impact'],
+        },
+        keywords: {
+          type: 'string',
+          description: 'Optional comma-separated keywords to search for (e.g., "bundle,swap,subscription")',
+        },
+      },
+      required: ['task'],
+    },
+  },
 ];
 
 /**
@@ -205,6 +228,8 @@ export class ToolHandler {
           return await this.handleNode(args);
         case 'codegraph_status':
           return await this.handleStatus();
+        case 'codegraph_explore':
+          return await this.handleExplore(args);
         default:
           return this.errorResult(`Unknown tool: ${toolName}`);
       }
@@ -248,13 +273,47 @@ export class ToolHandler {
       format: 'markdown',
     });
 
+    // Detect if this looks like a feature request (vs bug fix or exploration)
+    const isFeatureQuery = this.looksLikeFeatureRequest(task);
+    const reminder = isFeatureQuery
+      ? '\n\n---\n**Note:** This is code context only. For new features, consider asking the user about UX preferences, edge cases, and acceptance criteria before implementing.'
+      : '';
+
     // buildContext returns string when format is 'markdown'
     if (typeof context === 'string') {
-      return this.textResult(context);
+      return this.textResult(context + reminder);
     }
 
     // If it returns TaskContext, format it
-    return this.textResult(this.formatTaskContext(context));
+    return this.textResult(this.formatTaskContext(context) + reminder);
+  }
+
+  /**
+   * Heuristic to detect if a query looks like a feature request
+   */
+  private looksLikeFeatureRequest(task: string): boolean {
+    const featureKeywords = [
+      'add', 'create', 'implement', 'build', 'enable', 'allow',
+      'new feature', 'support for', 'ability to', 'want to',
+      'should be able', 'need to add', 'swap', 'edit', 'modify'
+    ];
+    const bugKeywords = [
+      'fix', 'bug', 'error', 'broken', 'crash', 'issue', 'problem',
+      'not working', 'fails', 'undefined', 'null'
+    ];
+    const explorationKeywords = [
+      'how does', 'where is', 'what is', 'find', 'show me',
+      'explain', 'understand', 'explore'
+    ];
+
+    const lowerTask = task.toLowerCase();
+
+    // If it's clearly a bug or exploration, not a feature
+    if (bugKeywords.some(k => lowerTask.includes(k))) return false;
+    if (explorationKeywords.some(k => lowerTask.includes(k))) return false;
+
+    // If it matches feature keywords, it's likely a feature request
+    return featureKeywords.some(k => lowerTask.includes(k));
   }
 
   /**
@@ -387,6 +446,208 @@ export class ToolHandler {
     return this.textResult(lines.join('\n'));
   }
 
+  /**
+   * Handle codegraph_explore - the "sub-agent" that does intensive exploration
+   * and returns a condensed brief
+   */
+  private async handleExplore(args: Record<string, unknown>): Promise<ToolResult> {
+    const task = args.task as string;
+    const focus = args.focus as string | undefined;
+    const keywordsArg = args.keywords as string | undefined;
+
+    // Phase 1: Extract search terms
+    const keywords = this.extractKeywords(task, keywordsArg);
+
+    // Phase 2: Find relevant symbols (internal, not returned directly)
+    const symbolMap = new Map<string, Node>();
+    const fileSet = new Set<string>();
+
+    for (const keyword of keywords.slice(0, 5)) { // Limit to 5 keywords
+      const results = this.cg.searchNodes(keyword, { limit: 10 });
+      for (const r of results) {
+        if (!symbolMap.has(r.node.id)) {
+          symbolMap.set(r.node.id, r.node);
+          fileSet.add(r.node.filePath);
+        }
+      }
+    }
+
+    // Phase 3: Analyze call relationships for top symbols
+    const callGraphInsights: string[] = [];
+    const topSymbols = Array.from(symbolMap.values())
+      .filter(n => n.kind === 'function' || n.kind === 'method' || n.kind === 'component')
+      .slice(0, 5);
+
+    for (const symbol of topSymbols) {
+      const callers = this.cg.getCallers(symbol.id);
+      const callees = this.cg.getCallees(symbol.id);
+
+      if (callers.length > 0 || callees.length > 0) {
+        const callerNames = callers.slice(0, 3).map(c => c.node.name).join(', ');
+        const calleeNames = callees.slice(0, 3).map(c => c.node.name).join(', ');
+
+        let insight = `**${symbol.name}**`;
+        if (callers.length > 0) insight += ` ← called by: ${callerNames}${callers.length > 3 ? '...' : ''}`;
+        if (callees.length > 0) insight += ` → calls: ${calleeNames}${callees.length > 3 ? '...' : ''}`;
+        callGraphInsights.push(insight);
+      }
+    }
+
+    // Phase 4: Identify key entry points and patterns
+    const components = Array.from(symbolMap.values()).filter(n => n.kind === 'component');
+    const routes = Array.from(symbolMap.values()).filter(n => n.kind === 'route');
+    const interfaces = Array.from(symbolMap.values()).filter(n => n.kind === 'interface' || n.kind === 'type_alias');
+    const functions = Array.from(symbolMap.values()).filter(n => n.kind === 'function' || n.kind === 'method');
+
+    // Phase 5: Build condensed brief
+    const brief = this.buildExploreBrief({
+      task,
+      focus,
+      keywords,
+      files: Array.from(fileSet),
+      components,
+      routes,
+      interfaces,
+      functions,
+      callGraphInsights,
+      totalSymbols: symbolMap.size,
+    });
+
+    // Add feature request reminder if applicable
+    const isFeatureQuery = this.looksLikeFeatureRequest(task);
+    const reminder = isFeatureQuery
+      ? '\n\n---\n**Before implementing:** Clarify with the user: UX preferences, edge cases, error handling, and acceptance criteria.'
+      : '';
+
+    return this.textResult(brief + reminder);
+  }
+
+  /**
+   * Extract keywords from task description
+   */
+  private extractKeywords(task: string, explicitKeywords?: string): string[] {
+    const keywords: string[] = [];
+
+    // Add explicit keywords first
+    if (explicitKeywords) {
+      keywords.push(...explicitKeywords.split(',').map(k => k.trim()).filter(Boolean));
+    }
+
+    // Extract likely code identifiers from task (camelCase, PascalCase, snake_case)
+    const identifierPattern = /\b([A-Z][a-zA-Z0-9]*|[a-z][a-zA-Z0-9]*[A-Z][a-zA-Z0-9]*|[a-z]+_[a-z_]+)\b/g;
+    const matches = task.match(identifierPattern) || [];
+    keywords.push(...matches);
+
+    // Extract quoted terms
+    const quotedPattern = /"([^"]+)"|'([^']+)'/g;
+    let match;
+    while ((match = quotedPattern.exec(task)) !== null) {
+      const quoted = match[1] || match[2];
+      if (quoted) keywords.push(quoted);
+    }
+
+    // Extract domain-specific terms (nouns that might be code concepts)
+    const commonTerms = task.toLowerCase()
+      .split(/\s+/)
+      .filter(word =>
+        word.length > 3 &&
+        !['this', 'that', 'with', 'from', 'have', 'been', 'will', 'would', 'could', 'should', 'when', 'where', 'what', 'which', 'their', 'there', 'these', 'those', 'about', 'into', 'then', 'than', 'some', 'other', 'after', 'before'].includes(word)
+      );
+    keywords.push(...commonTerms);
+
+    // Deduplicate and return
+    return [...new Set(keywords)];
+  }
+
+  /**
+   * Build a condensed exploration brief
+   */
+  private buildExploreBrief(data: {
+    task: string;
+    focus?: string;
+    keywords: string[];
+    files: string[];
+    components: Node[];
+    routes: Node[];
+    interfaces: Node[];
+    functions: Node[];
+    callGraphInsights: string[];
+    totalSymbols: number;
+  }): string {
+    const lines: string[] = [
+      '## Exploration Brief',
+      '',
+      `**Task:** ${data.task}`,
+      `**Found:** ${data.totalSymbols} relevant symbols across ${data.files.length} files`,
+      '',
+    ];
+
+    // Key files (grouped by directory)
+    if (data.files.length > 0) {
+      lines.push('### Key Files');
+      const topFiles = data.files.slice(0, 10);
+      for (const file of topFiles) {
+        lines.push(`- ${file}`);
+      }
+      if (data.files.length > 10) {
+        lines.push(`- ... and ${data.files.length - 10} more`);
+      }
+      lines.push('');
+    }
+
+    // Entry points
+    const entryPoints: string[] = [];
+    if (data.components.length > 0) {
+      entryPoints.push(`**Components:** ${data.components.slice(0, 5).map(n => `${n.name} (${n.filePath}:${n.startLine})`).join(', ')}`);
+    }
+    if (data.routes.length > 0) {
+      entryPoints.push(`**Routes:** ${data.routes.slice(0, 5).map(n => `${n.name} (${n.filePath}:${n.startLine})`).join(', ')}`);
+    }
+    if (entryPoints.length > 0) {
+      lines.push('### Entry Points');
+      lines.push(...entryPoints);
+      lines.push('');
+    }
+
+    // Key types/interfaces
+    if (data.interfaces.length > 0) {
+      lines.push('### Key Types');
+      for (const iface of data.interfaces.slice(0, 5)) {
+        lines.push(`- **${iface.name}** - ${iface.filePath}:${iface.startLine}`);
+      }
+      lines.push('');
+    }
+
+    // Key functions
+    if (data.functions.length > 0) {
+      lines.push('### Key Functions');
+      for (const fn of data.functions.slice(0, 8)) {
+        const sig = fn.signature ? ` - \`${fn.signature.slice(0, 60)}${fn.signature.length > 60 ? '...' : ''}\`` : '';
+        lines.push(`- **${fn.name}** (${fn.filePath}:${fn.startLine})${sig}`);
+      }
+      lines.push('');
+    }
+
+    // Call graph insights
+    if (data.callGraphInsights.length > 0) {
+      lines.push('### Data Flow');
+      for (const insight of data.callGraphInsights.slice(0, 6)) {
+        lines.push(`- ${insight}`);
+      }
+      lines.push('');
+    }
+
+    // Suggested files to read (actionable)
+    lines.push('### Suggested Next Steps');
+    lines.push('Read these files for implementation details:');
+    const suggestedFiles = data.files.slice(0, 3);
+    for (const file of suggestedFiles) {
+      lines.push(`1. \`${file}\``);
+    }
+
+    return lines.join('\n');
+  }
+
   // =========================================================================
   // Formatting helpers (compact by default to reduce context usage)
   // =========================================================================

+ 116 - 7
src/types.ts

@@ -498,21 +498,130 @@ export const DEFAULT_CONFIG: CodeGraphConfig = {
     '**/*.rb',
   ],
   exclude: [
+    // Version control
+    '**/.git/**',
+
+    // Dependencies
     '**/node_modules/**',
+    '**/vendor/**',
+    '**/Pods/**',
+
+    // Generic build outputs
     '**/dist/**',
     '**/build/**',
-    '**/.git/**',
-    '**/vendor/**',
-    '**/__pycache__/**',
+    '**/out/**',
+    '**/bin/**',
+    '**/obj/**',
     '**/target/**',
+
+    // JavaScript/TypeScript
     '**/*.min.js',
     '**/*.bundle.js',
-    '**/Pods/**',
-    '**/.gradle/**',
-    '**/bin/**',
-    '**/obj/**',
+    '**/.next/**',
+    '**/.nuxt/**',
+    '**/.svelte-kit/**',
+    '**/.output/**',
+    '**/.turbo/**',
+    '**/.cache/**',
+    '**/.parcel-cache/**',
+    '**/.vite/**',
+    '**/.astro/**',
+    '**/.docusaurus/**',
+    '**/.gatsby/**',
+    '**/.webpack/**',
+    '**/.nx/**',
+    '**/.yarn/cache/**',
+    '**/.pnpm-store/**',
+    '**/storybook-static/**',
+
+    // React Native / Expo
+    '**/.expo/**',
+    '**/web-build/**',
+    '**/ios/Pods/**',
+    '**/ios/build/**',
+    '**/android/build/**',
+    '**/android/.gradle/**',
+
+    // Python
+    '**/__pycache__/**',
     '**/.venv/**',
     '**/venv/**',
+    '**/.pytest_cache/**',
+    '**/.mypy_cache/**',
+    '**/.ruff_cache/**',
+    '**/.tox/**',
+    '**/.nox/**',
+    '**/*.egg-info/**',
+    '**/.eggs/**',
+
+    // Go
+    '**/go/pkg/mod/**',
+
+    // Rust
+    '**/target/debug/**',
+    '**/target/release/**',
+
+    // Java/Kotlin/Gradle
+    '**/.gradle/**',
+    '**/.m2/**',
+    '**/generated-sources/**',
+    '**/.kotlin/**',
+
+    // C#/.NET
+    '**/.vs/**',
+    '**/.nuget/**',
+    '**/artifacts/**',
+    '**/publish/**',
+
+    // C/C++
+    '**/cmake-build-*/**',
+    '**/CMakeFiles/**',
+    '**/bazel-*/**',
+    '**/vcpkg_installed/**',
+    '**/.conan/**',
+    '**/Debug/**',
+    '**/Release/**',
+    '**/x64/**',
+
+    // Swift/iOS/Xcode
+    '**/DerivedData/**',
+    '**/.build/**',
+    '**/.swiftpm/**',
+    '**/xcuserdata/**',
+    '**/Carthage/Build/**',
+    '**/SourcePackages/**',
+
+    // PHP
+    '**/.composer/**',
+    '**/storage/framework/**',
+    '**/bootstrap/cache/**',
+
+    // Ruby
+    '**/.bundle/**',
+    '**/tmp/cache/**',
+    '**/public/assets/**',
+    '**/public/packs/**',
+    '**/.yardoc/**',
+
+    // Testing/Coverage
+    '**/coverage/**',
+    '**/htmlcov/**',
+    '**/.nyc_output/**',
+    '**/test-results/**',
+    '**/.coverage/**',
+
+    // IDE/Editor
+    '**/.idea/**',
+
+    // Logs and temp
+    '**/logs/**',
+    '**/tmp/**',
+    '**/temp/**',
+
+    // Documentation build output
+    '**/_build/**',
+    '**/docs/_build/**',
+    '**/site/**',
   ],
   languages: [],
   frameworks: [],