mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-08 21:49:45 +08:00
CI evals all failed on PR #1363 with: error: Could not resolve: "smart-buffer". Maybe you need to "bun install"? error: Could not resolve: "ip-address". Maybe you need to "bun install"? at /opt/node_modules_cache/socks/build/client/socksclient.js:15 The cached node_modules layer in the pre-baked Docker image had `socks` (the new dep) but was missing its transitive deps (smart-buffer, ip-address). The image build copied only package.json into the build context — without bun.lock, `bun install` resolved a different tree than local `bun install` did, dropping required transitive deps. Reproduces locally as 229 packages (correct) when bun.lock is present or absent. Why CI diverged isn't fully understood — possibly Docker layer cache reuse across image rebuilds — but the deterministic fix is to include the lockfile in the image build context and use `--frozen-lockfile`, matching what every CI doc recommends. Changes: - .github/docker/Dockerfile.ci: COPY bun.lock alongside package.json, switch `bun install` → `bun install --frozen-lockfile` so any future lockfile drift fails loudly during image build instead of producing a partially-installed cache that breaks downstream eval jobs. - .github/workflows/evals.yml: include bun.lock in the image-tag hash so adding/removing a dep invalidates the image, AND copy bun.lock into the docker context alongside package.json. - .github/workflows/evals-periodic.yml: same updates. - .github/workflows/ci-image.yml: rebuild trigger now fires on bun.lock changes too; build context includes bun.lock. Image hash changes → fresh image gets built on next CI run → install matches the lockfile exactly → no missing transitive deps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
126 lines
6.5 KiB
Docker
126 lines
6.5 KiB
Docker
# gstack CI eval runner — pre-baked toolchain + deps
|
|
# Rebuild weekly via ci-image.yml, on Dockerfile changes, or on lockfile changes
|
|
FROM ubuntu:24.04
|
|
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
|
|
# Switch apt sources to Hetzner's public mirror.
|
|
# Ubicloud runners (Hetzner FSN1-DC21) hit reliable connection timeouts to
|
|
# archive.ubuntu.com:80 — observed 90+ second outages on multiple builds.
|
|
# Hetzner's mirror is publicly accessible from any cloud and route-local for
|
|
# Ubicloud, so this fixes both reliability and latency. Ubuntu 24.04 uses
|
|
# the deb822 sources format at /etc/apt/sources.list.d/ubuntu.sources.
|
|
#
|
|
# Using HTTP (not HTTPS) intentionally: the base ubuntu:24.04 image ships
|
|
# without ca-certificates, so HTTPS apt fails with "No system certificates
|
|
# available." Apt's security model verifies via GPG-signed Release files,
|
|
# not TLS, so HTTP here is no weaker than the upstream defaults.
|
|
RUN sed -i \
|
|
-e 's|http://archive.ubuntu.com/ubuntu|http://mirror.hetzner.com/ubuntu/packages|g' \
|
|
-e 's|http://security.ubuntu.com/ubuntu|http://mirror.hetzner.com/ubuntu/packages|g' \
|
|
/etc/apt/sources.list.d/ubuntu.sources
|
|
|
|
# Also make apt itself resilient — per-package retries + generous timeouts.
|
|
# Hetzner's mirror is reliable but individual packages can still blip; the
|
|
# retry config means a single failed fetch doesn't nuke the whole build.
|
|
RUN printf 'Acquire::Retries "5";\nAcquire::http::Timeout "30";\nAcquire::https::Timeout "30";\n' \
|
|
> /etc/apt/apt.conf.d/80-retries
|
|
|
|
# System deps (retry apt-get update + install as a unit — even Hetzner can blip).
|
|
# Includes xz-utils so the Node.js .tar.xz download below can decompress.
|
|
RUN for i in 1 2 3; do \
|
|
apt-get update && apt-get install -y --no-install-recommends \
|
|
git curl unzip xz-utils ca-certificates jq bc gpg && break || \
|
|
(echo "apt retry $i/3 after failure"; sleep 10); \
|
|
done \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# GitHub CLI
|
|
RUN curl --retry 5 --retry-delay 5 --retry-connrefused -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
|
|
| gpg --dearmor -o /usr/share/keyrings/githubcli-archive-keyring.gpg \
|
|
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
|
|
| tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
|
|
&& for i in 1 2 3; do \
|
|
apt-get update && apt-get install -y --no-install-recommends gh && break || \
|
|
(echo "gh install retry $i/3"; sleep 10); \
|
|
done \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Node.js 22 LTS (needed for claude CLI).
|
|
# Install from the official nodejs.org tarball instead of NodeSource's apt setup.
|
|
# NodeSource's setup_22.x script runs its own `apt-get update` + `apt-get install gnupg`,
|
|
# both of which depend on archive.ubuntu.com / security.ubuntu.com being reachable.
|
|
# Ubicloud CI runners frequently can't reach those mirrors (connection timeouts),
|
|
# and "gnupg" was renamed to "gpg" on Ubuntu 24.04 anyway, so NodeSource's script
|
|
# fails before it can add its own repo. Direct tarball download is network-simpler
|
|
# (one host: nodejs.org) and doesn't touch apt at all.
|
|
ENV NODE_VERSION=22.20.0
|
|
RUN curl --retry 5 --retry-delay 5 --retry-connrefused -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-x64.tar.xz" -o /tmp/node.tar.xz \
|
|
&& tar -xJ -C /usr/local --strip-components=1 --no-same-owner -f /tmp/node.tar.xz \
|
|
&& rm -f /tmp/node.tar.xz \
|
|
&& node --version \
|
|
&& npm --version
|
|
|
|
# Bun (install to /usr/local so non-root users can access it)
|
|
ENV BUN_INSTALL="/usr/local"
|
|
RUN curl --retry 5 --retry-delay 5 --retry-connrefused -fsSL https://bun.sh/install \
|
|
| BUN_VERSION=1.3.10 bash
|
|
|
|
# Claude CLI
|
|
RUN npm i -g @anthropic-ai/claude-code
|
|
|
|
# Playwright system deps (Chromium) — needed for browse E2E tests
|
|
RUN npx playwright install-deps chromium
|
|
|
|
# Linux has neither Helvetica nor Arial. make-pdf's print CSS stacks fall back
|
|
# to Liberation Sans (metric-compatible Arial clone, SIL OFL 1.1) so PDFs don't
|
|
# render in DejaVu Sans. playwright install-deps happens to pull this in today,
|
|
# but the dep is implicit and could change — install explicitly so upgrades
|
|
# can't silently regress rendering.
|
|
#
|
|
# Xvfb is also installed here so the browse --headed integration tests
|
|
# (headed-xvfb, headed-orphan-cleanup) can exercise the Linux container
|
|
# auto-spawn path on every CI run. Without Xvfb in the image, the most
|
|
# common production --headed path goes untested.
|
|
RUN for i in 1 2 3; do \
|
|
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig xvfb x11-utils && break || \
|
|
(echo "fonts-liberation install retry $i/3"; sleep 10); \
|
|
done \
|
|
&& fc-cache -f \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Pre-install dependencies (cached layer — only rebuilds when package.json or
|
|
# bun.lock changes). Copy BOTH so install is deterministic and matches local
|
|
# resolution. Without bun.lock here, bun install resolved transitive deps
|
|
# differently in CI vs local (observed on v1.28.0.0: socks landed but
|
|
# smart-buffer + ip-address didn't make it into the cached node_modules).
|
|
COPY package.json bun.lock /workspace/
|
|
WORKDIR /workspace
|
|
RUN bun install --frozen-lockfile && rm -rf /tmp/*
|
|
|
|
# Install Playwright Chromium to a shared location accessible by all users
|
|
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers
|
|
RUN npx playwright install chromium \
|
|
&& chmod -R a+rX /opt/playwright-browsers
|
|
|
|
# Verify everything works
|
|
RUN bun --version && node --version && claude --version && jq --version && gh --version \
|
|
&& npx playwright --version \
|
|
&& fc-match "Liberation Sans" | grep -qi "Liberation" \
|
|
|| (echo "ERROR: fonts-liberation not installed — make-pdf PDFs will render in DejaVu Sans" && exit 1)
|
|
|
|
# At runtime: checkout overwrites /workspace, but node_modules persists
|
|
# if we move it out of the way and symlink back
|
|
# Save node_modules + package.json snapshot for cache validation at runtime
|
|
RUN mv /workspace/node_modules /opt/node_modules_cache \
|
|
&& cp /workspace/package.json /opt/node_modules_cache/.package.json
|
|
|
|
# Claude CLI refuses --dangerously-skip-permissions as root.
|
|
# Create a non-root user for eval runs (GH Actions overrides USER, so
|
|
# the workflow must set options.user or use gosu/su-exec at runtime).
|
|
RUN useradd -m -s /bin/bash runner \
|
|
&& chmod -R a+rX /opt/node_modules_cache \
|
|
&& mkdir -p /home/runner/.gstack && chown -R runner:runner /home/runner/.gstack \
|
|
&& chmod 1777 /tmp \
|
|
&& mkdir -p /home/runner/.bun && chown -R runner:runner /home/runner/.bun
|