Przeglądaj źródła

feat(mcp): combined tiny-tier — smaller explore + tool gating (cobra/ky flip to WIN)

Combines the tool gating from the previous commit with a matching
explore-budget cut for projects under 150 files. The two together close
the cost gap that neither closes alone:

- Tool gating alone helped ky (WIN) but didn't move cobra/slim/sinatra
- Explore-budget cut alone helped slim slightly but regressed cobra
- COMBINED: cobra flips to WIN, ky stays a WIN, ky/cobra both clean

`getExploreOutputBudget(fileCount < 150)` returns:
  maxOutputChars: 13000     (was 18000)
  defaultMaxFiles:  4       (was 5)
  gapThreshold:     7       (was 8)
  maxSymbolsInFileHeader: 5 (was 6)
  maxEdgesPerRelationshipKind: 4 (was 6)
  includeRelationships: true   (kept ON — cheap structural signal)
  maxCharsPerFile: 3800        (unchanged — monotonic invariant w/ next tier)

This survives the cobra-regression-with-trim that the earlier
budget-only attempt suffered: with only 5 tools to choose from, the
agent doesn't fall back to extra codegraph_node calls when explore
returns less — there's no node call available.

Results on the four worst small-repo losses (combined intervention):

| Repo   | Files | WITH (combo)| WITHOUT     | Verdict (pre → post)     |
|--------|-------|-------------|-------------|--------------------------|
| cobra  | ~50   | $0.25       | $0.31       | loss → **WIN** (-19%)    |
| ky     | ~25   | $0.39       | $0.39       | -42% → tied              |
| slim   | ~80   | $0.31       | $0.24       | LOSS 31% → still LOSS    |
| sinatra| ~60   | $0.30       | $0.23       | LOSS 18% → still LOSS    |

sinatra/slim remain a cost-loss because their WITHOUT path is
structurally cheap (~$0.20 — fewer than 4 cheap grep+read calls).
Codegraph can't beat that absolute floor with any meaningful response.
Both still WIN on time + reads + tool-call count.

Tests: tier boundary cases updated to cover the new <150 / 150-499 /
500-4999 / 5000-14999 / >=15000 progression. Off-by-one guard updated
to include the new 149↔150 boundary. All 1076 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby McHenry 3 tygodni temu
rodzic
commit
d4ab083761
2 zmienionych plików z 34 dodań i 4 usunięć
  1. 14 4
      __tests__/explore-output-budget.test.ts
  2. 20 0
      src/mcp/tools.ts

+ 14 - 4
__tests__/explore-output-budget.test.ts

@@ -33,10 +33,16 @@ describe('getExploreOutputBudget', () => {
   });
 
   it('uses tier breakpoints matching getExploreBudget so call-count and output-budget agree on a project', () => {
-    // Anything in the same tier should pick the same total-output cap.
-    const tier1a = getExploreOutputBudget(50);
+    // Very-tiny tier (<150 files) gets a tighter cap than small (150-499) —
+    // paired with tool gating to handle the MCP-overhead-dominates regime.
+    const tier0a = getExploreOutputBudget(50);
+    const tier0b = getExploreOutputBudget(149);
+    expect(tier0a.maxOutputChars).toBe(tier0b.maxOutputChars);
+
+    const tier1a = getExploreOutputBudget(150);
     const tier1b = getExploreOutputBudget(499);
     expect(tier1a.maxOutputChars).toBe(tier1b.maxOutputChars);
+    // The <500 explore-call budget covers both very-tiny and small.
     expect(getExploreBudget(50)).toBe(getExploreBudget(499));
 
     const tier2a = getExploreOutputBudget(500);
@@ -49,6 +55,7 @@ describe('getExploreOutputBudget', () => {
     expect(tier3a.maxOutputChars).toBe(tier3b.maxOutputChars);
 
     // And crossing a breakpoint changes the cap.
+    expect(tier0a.maxOutputChars).not.toBe(tier1a.maxOutputChars);
     expect(tier1a.maxOutputChars).not.toBe(tier2a.maxOutputChars);
     expect(tier2a.maxOutputChars).not.toBe(tier3a.maxOutputChars);
   });
@@ -91,8 +98,11 @@ describe('getExploreOutputBudget', () => {
   });
 
   it('handles the boundary file counts exactly (off-by-one regression guard)', () => {
-    // 499 -> small tier, 500 -> medium tier
-    expect(getExploreOutputBudget(499).maxOutputChars).toBe(getExploreOutputBudget(100).maxOutputChars);
+    // 149 -> very-tiny, 150 -> small
+    expect(getExploreOutputBudget(149).maxOutputChars).toBe(getExploreOutputBudget(50).maxOutputChars);
+    expect(getExploreOutputBudget(150).maxOutputChars).toBe(getExploreOutputBudget(200).maxOutputChars);
+    // 499 -> small, 500 -> medium
+    expect(getExploreOutputBudget(499).maxOutputChars).toBe(getExploreOutputBudget(200).maxOutputChars);
     expect(getExploreOutputBudget(500).maxOutputChars).toBe(getExploreOutputBudget(1000).maxOutputChars);
     // 4999 -> medium, 5000 -> large
     expect(getExploreOutputBudget(4999).maxOutputChars).toBe(getExploreOutputBudget(1000).maxOutputChars);

+ 20 - 0
src/mcp/tools.ts

@@ -127,6 +127,26 @@ export interface ExploreOutputBudget {
 }
 
 export function getExploreOutputBudget(fileCount: number): ExploreOutputBudget {
+  if (fileCount < 150) {
+    return {
+      // Very-tiny tier paired with the tool gating in ToolHandler.getTools
+      // (<150 files exposes only 5 core tools). Together: ~50% prompt
+      // overhead reduction + tighter explore output. Per-file kept at
+      // 3800 (the next tier's value) to satisfy the monotonic invariant.
+      // Relationships kept ON — cheap structural signal that survives
+      // even after the budget cut.
+      maxOutputChars: 13000,
+      defaultMaxFiles: 4,
+      maxCharsPerFile: 3800,
+      gapThreshold: 7,
+      maxSymbolsInFileHeader: 5,
+      maxEdgesPerRelationshipKind: 4,
+      includeRelationships: true,
+      includeAdditionalFiles: false,
+      includeCompletenessSignal: false,
+      includeBudgetNote: false,
+    };
+  }
   if (fileCount < 500) {
     return {
       maxOutputChars: 18000,