2026-05-15

Why the AI-detection category is splitting

The category that started as 'is this written by AI?' is becoming two categories. Where each one is heading, who they're built for, and why a single product can't do both jobs well.

Benchmark status Docs

Three years ago, 'AI detection' was a single category. A tool you uploaded text to, and it returned a probability that the text was generated by an AI. The use cases were narrow but coherent: academic integrity, editorial review, hiring portfolio review, journalism.

Sometime around the second half of 2024, the category started bifurcating. By early 2026 — where we are now — it's clearly become two categories with different buyers, different response shapes, and different success metrics. I want to lay out what's happening, because the bifurcation has implications for how teams should evaluate the products in this space.

Category 1: Authorship likelihood detection. This is where the category started. GPTZero, Originality.ai, Copyleaks' AI module, and several others occupy this space well. The job is to provide a probability score that a human will read and interpret. The product surface is built around that: a probability output, often a confidence interval, a UI that explains the score in plain language for a human reviewer.

Authorship-likelihood detection is settling into mature category dynamics. The buyers are institutions (universities, school districts, large publishers, hiring platforms). The procurement shape is recognizable: annual contracts, security reviews, dedicated support. The competitive axes are accuracy on adversarial samples, false-positive rates, and brand recognition with institutional buyers.

Category 2: Workflow-routing APIs. This is the category VeracityAPI is in, and it's still being defined. The job is to take content and return a deterministic action that automation can execute. The product surface is built around that: a structured response with an action label, an evidence array that's machine-readable, fixes that a rewrite agent can consume as prompts.

Workflow-routing buyers look very different. They're developers building autonomous agents, content platforms with programmatic publishing pipelines, AI infrastructure teams curating training data. The procurement shape is API-tier: pay-per-call, self-serve, no procurement cycle. The competitive axes are routing-action accuracy, integration cost, multimodal coverage under one response shape.

Why is this happening now? Two reasons, both downstream of the same broader shift.

First, the volume of agent-driven content has grown to the point where workflow-routing is the dominant use case by call volume. Academic integrity is real but bounded — there are only so many student essays in the world. Programmatic content factories, RAG pipelines, and agent workflows generate orders of magnitude more decisions per day. A product designed for the bounded use case can't economically serve the unbounded one without warping its response shape.

Second, the response shapes that serve a human reviewer and the response shapes that serve an autonomous agent are genuinely different. A probability score is the right shape for a human; it gives the reviewer judgment. A probability score is the wrong shape for an agent; it forces the developer to write thresholding code that becomes brittle as the underlying model evolves. The action label is right for the agent; it would be insulting to a human reviewer.

You can imagine a product that returns both — a probability AND an action label. Some products do try this. The problem is that the response shape determines the product's center of gravity. If you optimize for the probability accuracy on adversarial samples (the authorship-likelihood metric), you don't optimize for routing-action stability across versions. If you optimize for routing-action stability, the probability isn't your headline metric, and authorship-likelihood buyers correctly perceive the product as not built for them.

Most products will pick a side, even if they keep marketing copy that suggests they serve both. I think VeracityAPI is on the workflow-routing side definitively — the response shape, the documentation, the pricing model, the integration patterns are all designed for the second category. I think GPTZero, Originality.ai, and Copyleaks are on the authorship-likelihood side definitively, even though all of them have shipped API products that look superficially similar to VeracityAPI's.

What this means for buyers: pick by the question your workflow is actually asking. If a human reviewer is going to read the score and decide, pick a Category-1 product. If your code is going to read the response and execute, pick a Category-2 product. Many teams legitimately use one of each, in different layers of the same stack.

What this means for the category: the next 18 months will probably see explicit specialization. Category-1 products will get more sophisticated UIs for human reviewers (explanation interfaces, confidence intervals, sample-comparison tools). Category-2 products will get more sophisticated response shapes for autonomous agents (richer evidence arrays, structured rewrite prompts, multimodal routing). The middle won't hold.

The 2026 benchmark program VeracityAPI is publishing will report both binary-flagging F1 (the Category-1 metric) AND routing-action F1 (the Category-2 metric), with frozen artifacts. Not because we expect to dominate the Category-1 leaderboard — we don't — but because the comparison only makes sense if both metrics are visible at the same time. If you only see one number, you can't tell which category the product is optimized for.

Required caveat: VeracityAPI is a workflow-routing API, not forensic authorship proof. See /methodology for what we claim and don't claim.

About the author

Bernard Huang · Founder, VeracityAPI

Co-founded Clearscope and bootstrapped it to 7-figure ARR over 10 years of working with editorial and content teams at companies like Nvidia, HubSpot, Adobe, IBM, and Condé Nast. Now building VeracityAPI — content trust infrastructure for autonomous agent workflows.

More about Bernard