Files
everything-claude-code/scripts/ci/check-unicode-safety.js
Jamkris b068069b9b fix(ci): cover other widely-cited invisible code points in check-unicode-safety
Extend `isDangerousInvisibleCodePoint` with five additional code
points / ranges that are routinely cited in invisible-character
smuggling references but were not in the previous denylist:

- **U+180E** MONGOLIAN VOWEL SEPARATOR. Formerly classified as a
  space separator (Zs) until Unicode 6.3 reclassified it as Cf
  (Format control). Renders as zero-width; widely abused for
  homograph attacks and prompt smuggling.

- **U+115F** HANGUL CHOSEONG FILLER and **U+1160** HANGUL JUNGSEONG
  FILLER. Zero-width fillers used in Korean text shaping. Both are
  cited as common LLM-injection vectors in Korean / multilingual
  threat models.

- **U+2061–U+2064** invisible math operators (FUNCTION APPLICATION,
  INVISIBLE TIMES, INVISIBLE SEPARATOR, INVISIBLE PLUS). Zero-width
  and only meaningful inside math typesetting. No legitimate
  Markdown or source code uses them.

- **U+3164** HANGUL FILLER. Reported in real-world Discord and
  Twitter smuggling incidents; not used in legitimate Korean text.

Reproduced before this commit: a file containing any one of these
code points passed `check-unicode-safety.js` silently.

After this commit each one is reported as
`dangerous-invisible U+<HEX>` and `--write` mode strips it.

Verified by writing 8 single-character probe files
(`probe-0x180E.md`, `probe-0x115F.md`, …) and confirming exit=1 with
each violation listed.

ECC repo self-scan reports only the pre-existing `U+2605` BLACK
STAR warnings (unchanged) and exits with the same status (no new
in-repo violations introduced). Existing 5 unicode-safety tests
still pass; `yarn lint` clean.

Regression coverage for both the previous commit's Tag block fix
and this commit's additions lands in the next commit.
2026-05-18 21:20:36 -04:00

7.3 KiB