2026-05-13

Benchmarking AI detectors on the routing decision production teams actually need

Why VeracityAPI will report binary flagging metrics alongside workflow-routing F1, with caveats and reproducibility gates.

Benchmark status Docs

Most detector comparisons ask whether a model can label content as AI or human. Production teams usually need a different answer: should this content ship, be revised, go to human review, or be rejected?

Our benchmark program will report conventional binary metrics and a routing-action metric. The routing metric is not a claim that competitors are bad detectors; it measures the workflow contract VeracityAPI is built around.

We will not publish named competitor numbers until vendor terms, corpus licensing, and frozen artifacts are complete.

Required caveat: This is not forensic authorship proof. VeracityAPI reports workflow-risk signals and routing actions.