hai/everything-claude-code - everything-claude-code - Gitea: Hai

hai/everything-claude-code

mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-05-21 03:40:05 +08:00

Author	SHA1	Message	Date
Jamkris	b068069b9b	fix(ci): cover other widely-cited invisible code points in check-unicode-safety Extend `isDangerousInvisibleCodePoint` with five additional code points / ranges that are routinely cited in invisible-character smuggling references but were not in the previous denylist: - U+180E MONGOLIAN VOWEL SEPARATOR. Formerly classified as a space separator (Zs) until Unicode 6.3 reclassified it as Cf (Format control). Renders as zero-width; widely abused for homograph attacks and prompt smuggling. - U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER. Zero-width fillers used in Korean text shaping. Both are cited as common LLM-injection vectors in Korean / multilingual threat models. - U+2061–U+2064 invisible math operators (FUNCTION APPLICATION, INVISIBLE TIMES, INVISIBLE SEPARATOR, INVISIBLE PLUS). Zero-width and only meaningful inside math typesetting. No legitimate Markdown or source code uses them. - U+3164 HANGUL FILLER. Reported in real-world Discord and Twitter smuggling incidents; not used in legitimate Korean text. Reproduced before this commit: a file containing any one of these code points passed `check-unicode-safety.js` silently. After this commit each one is reported as `dangerous-invisible U+<HEX>` and `--write` mode strips it. Verified by writing 8 single-character probe files (`probe-0x180E.md`, `probe-0x115F.md`, …) and confirming exit=1 with each violation listed. ECC repo self-scan reports only the pre-existing `U+2605` BLACK STAR warnings (unchanged) and exits with the same status (no new in-repo violations introduced). Existing 5 unicode-safety tests still pass; `yarn lint` clean. Regression coverage for both the previous commit's Tag block fix and this commit's additions lands in the next commit.	2026-05-18 21:20:36 -04:00
Jamkris	e3483fda15	fix(ci): cover Unicode Tag block (U+E0000–U+E007F) in check-unicode-safety `isDangerousInvisibleCodePoint` enumerated seven ranges of invisible/ bidi/variation-selector code points but omitted the Unicode Tag block (U+E0000–U+E007F). Tag characters were proposed for language tagging in Unicode 3.1 and have been deprecated since Unicode 5.1, so no legitimate text uses them. They are the canonical vector for "ASCII Smuggling" / "Tag Smuggling" LLM prompt injection: an attacker hides instructions inside an ASCII-looking string, the model reads the tag bytes, the human reviewer sees nothing. Demonstrated against multiple LLM assistants during 2024–2025. `check-unicode-safety.js` is the repo's last line of defence before contributor content reaches agent context; the same script also runs in `--write` auto-sanitize mode on `.md` / `.mdx` / `.txt`. Today it silently passes tag-block characters through unchanged in both detection mode and `--write` mode. Reproduced before this commit: $ mkdir -p /tmp/uni-test && node -e " const fs = require('fs'); const hidden = [...Array(5)].map((_,i) => String.fromCodePoint(0xE0041 + i)).join(''); fs.writeFileSync('/tmp/uni-test/innocent.md', '# Title\\n\\nBenign text' + hidden + ' more.\\n');" $ ECC_UNICODE_SCAN_ROOT=/tmp/uni-test \ node scripts/ci/check-unicode-safety.js Unicode safety check passed. $ echo $? 0 Expected: tag-block characters reported as `dangerous-invisible` violations (exit 1) and stripped under `--write`. Actual: validator passes, `--write` leaves the bytes intact. Fix: extend the denylist with one new range `(codePoint >= 0xE0000 && codePoint <= 0xE007F)`. The change is purely additive; the existing seven ranges are untouched. After this commit the same reproduction returns: $ ECC_UNICODE_SCAN_ROOT=/tmp/uni-test \ node scripts/ci/check-unicode-safety.js Unicode safety violations detected: innocent.md:3:12 dangerous-invisible U+E0041 innocent.md:3:14 dangerous-invisible U+E0042 innocent.md:3:16 dangerous-invisible U+E0043 innocent.md:3:18 dangerous-invisible U+E0044 innocent.md:3:20 dangerous-invisible U+E0045 exit=1 `--write` mode also strips the bytes (verified: file length 47 → 42 after sanitize, regex `/[\u{E0000}-\u{E007F}]/u` no longer matches). Existing 5 unicode-safety tests still pass; `yarn lint` clean. The ECC repo's own self-scan (`node scripts/ci/check-unicode-safety.js` with no `ECC_UNICODE_SCAN_ROOT`) reports the same warnings as before this commit and exits with the same status (no regressions on in-repo content). A handful of other widely-cited invisible code points are missing from the denylist (`U+180E`, `U+115F`, `U+1160`, `U+2061–U+2064`, `U+3164`); those are addressed in the next commit so each fix remains independently reviewable. Regression coverage for both fixes lands two commits later.	2026-05-18 21:20:36 -04:00
Affaan Mustafa	8aa8c32d2a	feat: add observability readiness gate	2026-05-11 18:33:14 -04:00
Affaan Mustafa	8846210ca2	fix: unblock unicode safety CI lint (#1017 ) * fix: unblock unicode safety CI lint * fix: unblock shared CI regressions	2026-03-30 01:50:17 -04:00
Affaan Mustafa	7483d646e4	fix: narrow unicode cleanup scope	2026-03-29 21:21:18 -04:00
Affaan Mustafa	866d9ebb53	fix: harden unicode safety checks	2026-03-29 21:21:18 -04:00