Methodology · Unicode Evasion

Unicode Evasion: what it measures and what it cannot prove.

Find invisible characters, bidi controls, compatibility glyphs, mixed-script lookalikes, and other Unicode tricks that can bypass filters or mislead reviewers.

Open free tool Trust model

How this works

Plain-English method

Unicode evasion scanning is a deterministic input-sanitation method. It inspects codepoints rather than visible glyphs, because two strings can look identical while behaving differently in search, moderation, prompts, or source review.

Mechanism and scoring

The score is a weighted count: bidi/zero-width/control/confusable markers carry high weight; compatibility glyphs and combining marks carry medium weight; unusual whitespace carries low weight. No LLM call, no network call, no authorship claim.

Read the full methodology →

What this catches

Zero-width and unsafe control characters
Bidirectional text controls
Latin/Cyrillic/Greek lookalikes inside one token
Compatibility glyphs and odd whitespace

What this misses

Whether the text is true
Whether a human or AI wrote it
Legitimate multilingual context that requires human review

How it fits the layered approach

This is one signal in a layered stack.

Single-method detectors are too easy to overtrust. Unicode Evasion is useful when it changes routing: allow, revise, human_review, or reject. It should be layered with specificity, provenance, pattern pressure, Unicode sanitation, media provenance, and paid Deep Scan when the decision matters.

scan_unicode_evasion is available through local MCP with no LLM cost, and through the remote MCP endpoint with free unauthenticated rate limits. See all detection methodologies and the dedicated methodology page for the deepest treatment of this signal.