Files
gstack/.github/workflows/evals-periodic.yml
Garry Tan 38fd67b67e fix(ci): include bun.lock in image build for deterministic install
CI evals all failed on PR #1363 with:
  error: Could not resolve: "smart-buffer". Maybe you need to "bun install"?
  error: Could not resolve: "ip-address". Maybe you need to "bun install"?
  at /opt/node_modules_cache/socks/build/client/socksclient.js:15

The cached node_modules layer in the pre-baked Docker image had
`socks` (the new dep) but was missing its transitive deps (smart-buffer,
ip-address). The image build copied only package.json into the build
context — without bun.lock, `bun install` resolved a different tree
than local `bun install` did, dropping required transitive deps.

Reproduces locally as 229 packages (correct) when bun.lock is present
or absent. Why CI diverged isn't fully understood — possibly Docker
layer cache reuse across image rebuilds — but the deterministic fix is
to include the lockfile in the image build context and use
`--frozen-lockfile`, matching what every CI doc recommends.

Changes:
- .github/docker/Dockerfile.ci: COPY bun.lock alongside package.json,
  switch `bun install` → `bun install --frozen-lockfile` so any future
  lockfile drift fails loudly during image build instead of producing
  a partially-installed cache that breaks downstream eval jobs.
- .github/workflows/evals.yml: include bun.lock in the image-tag hash
  so adding/removing a dep invalidates the image, AND copy bun.lock
  into the docker context alongside package.json.
- .github/workflows/evals-periodic.yml: same updates.
- .github/workflows/ci-image.yml: rebuild trigger now fires on bun.lock
  changes too; build context includes bun.lock.

Image hash changes → fresh image gets built on next CI run → install
matches the lockfile exactly → no missing transitive deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:45:04 -07:00

130 lines
3.8 KiB
YAML

name: Periodic Evals
on:
schedule:
- cron: '0 6 * * 1' # Monday 6 AM UTC
workflow_dispatch:
concurrency:
group: evals-periodic
cancel-in-progress: true
env:
IMAGE: ghcr.io/${{ github.repository }}/ci
EVALS_TIER: periodic
EVALS_ALL: 1 # Ignore diff — run all periodic tests
jobs:
build-image:
runs-on: ubicloud-standard-2
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tag }}
steps:
- uses: actions/checkout@v4
- id: meta
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json', 'bun.lock') }}" >> "$GITHUB_OUTPUT"
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Check if image exists
id: check
run: |
if docker manifest inspect ${{ steps.meta.outputs.tag }} > /dev/null 2>&1; then
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "exists=false" >> "$GITHUB_OUTPUT"
fi
- if: steps.check.outputs.exists == 'false'
run: cp package.json bun.lock .github/docker/
- if: steps.check.outputs.exists == 'false'
uses: docker/build-push-action@v6
with:
context: .github/docker
file: .github/docker/Dockerfile.ci
push: true
tags: |
${{ steps.meta.outputs.tag }}
${{ env.IMAGE }}:latest
evals:
runs-on: ubicloud-standard-2
needs: build-image
container:
image: ${{ needs.build-image.outputs.image-tag }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --user runner
timeout-minutes: 25
strategy:
fail-fast: false
matrix:
suite:
- name: e2e-plan
file: test/skill-e2e-plan.test.ts
- name: e2e-design
file: test/skill-e2e-design.test.ts
- name: e2e-qa-bugs
file: test/skill-e2e-qa-bugs.test.ts
- name: e2e-qa-workflow
file: test/skill-e2e-qa-workflow.test.ts
- name: e2e-review
file: test/skill-e2e-review.test.ts
- name: e2e-workflow
file: test/skill-e2e-workflow.test.ts
- name: e2e-routing
file: test/skill-routing-e2e.test.ts
- name: e2e-codex
file: test/codex-e2e.test.ts
- name: e2e-gemini
file: test/gemini-e2e.test.ts
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Fix bun temp
run: |
mkdir -p /home/runner/.cache/bun
{
echo "BUN_INSTALL_CACHE_DIR=/home/runner/.cache/bun"
echo "BUN_TMPDIR=/home/runner/.cache/bun"
echo "TMPDIR=/home/runner/.cache"
} >> "$GITHUB_ENV"
- name: Restore deps
run: |
if [ -d /opt/node_modules_cache ] && diff -q /opt/node_modules_cache/.package.json package.json >/dev/null 2>&1; then
ln -s /opt/node_modules_cache node_modules
else
bun install
fi
- run: bun run build
- name: Run ${{ matrix.suite.name }}
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
EVALS_CONCURRENCY: "40"
PLAYWRIGHT_BROWSERS_PATH: /opt/playwright-browsers
run: EVALS=1 bun test --retry 2 --concurrent --max-concurrency 40 ${{ matrix.suite.file }}
- name: Upload eval results
if: always()
uses: actions/upload-artifact@v4
with:
name: eval-periodic-${{ matrix.suite.name }}
path: ~/.gstack-dev/evals/*.json
retention-days: 90