Unicode Evasion: what it measures and what it cannot prove.
Find invisible characters, bidi controls, compatibility glyphs, mixed-script lookalikes, and other Unicode tricks that can bypass filters or mislead reviewers.
Plain-English method
Unicode evasion scanning is a deterministic input-sanitation method. It inspects codepoints rather than visible glyphs, because two strings can look identical while behaving differently in search, moderation, prompts, or source review.
Mechanism and scoring
The score is a weighted count: bidi/zero-width/control/confusable markers carry high weight; compatibility glyphs and combining marks carry medium weight; unusual whitespace carries low weight. No LLM call, no network call, no authorship claim.
What this catches
- Zero-width and unsafe control characters
- Bidirectional text controls
- Latin/Cyrillic/Greek lookalikes inside one token
- Compatibility glyphs and odd whitespace
What this misses
- Whether the text is true
- Whether a human or AI wrote it
- Legitimate multilingual context that requires human review
This is one signal in a layered stack.
Single-method detectors are too easy to overtrust. Unicode Evasion is useful when it changes routing: allow, revise, human_review, or reject. It should be layered with specificity, provenance, pattern pressure, Unicode sanitation, media provenance, and paid Deep Scan when the decision matters.
scan_unicode_evasion is available through local MCP with no LLM cost, and through the remote MCP endpoint with free unauthenticated rate limits. See all detection methodologies and the dedicated methodology page for the deepest treatment of this signal.