2026-05-13

Benchmarking AI detectors on the routing decision production teams actually need

Why VeracityAPI will report binary flagging metrics alongside workflow-routing F1, with caveats and reproducibility gates.

Benchmark status Docs

Most detector comparisons ask whether a model can label content as AI or human. Production teams usually need a different answer: should this content ship, be revised, go to human review, or be rejected?

Our benchmark program will report conventional binary metrics and a routing-action metric. The routing metric is not a claim that competitors are bad detectors; it measures the workflow contract VeracityAPI is built around.

We will not publish named competitor numbers until vendor terms, corpus licensing, and frozen artifacts are complete. The longer-form rationale is in 'Why we don't publish competitor benchmark numbers (yet)' — the short version is that a benchmark done badly is worse than no benchmark.

Required caveat: VeracityAPI is a workflow-routing API, not forensic authorship proof. See /methodology for what we claim and don't claim.

About the author

Bernard Huang · Founder, VeracityAPI

Co-founded Clearscope and bootstrapped it to 7-figure ARR over 10 years of working with editorial and content teams at companies like Nvidia, HubSpot, Adobe, IBM, and Condé Nast. Now building VeracityAPI — content trust infrastructure for autonomous agent workflows.

More about Bernard