Not authorship evidence. A specificity and trust gate for agents.
Binary AI detection is brittle. VeracityAPI is built for the more useful workflow question: should this text or image be allowed, revised, reviewed, or rejected before an agent uses it?
What model is doing the scoring?
v0.1 uses a structured LLM scoring pass via Anthropic Haiku, temperature 0, with schema-constrained tool output and deterministic post-processing. It is not a fine-tuned classifier yet.
What does it measure?
Observable signals: specificity gaps, generic phrasing, unsupported or weakly sourced claims, low-value/slop patterns, and synthetic-looking texture. The useful target is workflow risk, not ground-truth authorship.
What changed after early audit?
Public positioning now says content trust/specificity scoring, not AI-content detection. Responses include additive derived fields: content_trust_score, specificity_risk, provenance_weakness, and synthetic_texture_risk.
What should agents do with it?
Use it as a quality gate. Good output: allow. Generic but fixable output: revise. High-risk source/citation/training input: human_review or reject depending on intended_use.
Traffic-light action matrix
Green = allowYellow = reviseOrange = human_reviewRed = reject
| risk source | threshold | workflow route |
|---|---|---|
| low | text < 0.40 / audio < 0.30 | allow for normal workflows |
| medium | text < 0.70 / audio < 0.55 | revise for publish, human_review for cite/train |
| high | text ≥ 0.70 / audio ≥ 0.55 | human_review or reject depending on intended_use |
risk_level = max(synthetic_risk, slop_risk) recommended_action = policy(risk_level, intended_use)
Image and audio scoring v0.1
POST /v1/analyze accepts {type:"image",content:"https://..."}, calls a vision LLM with a constrained visible-artifact rubric, and returns synthetic_image_risk, synthetic_risk alias, evidence, fixes, trust score, risk level, and recommended action. VeracityAPI stores no image bytes and logs only a URL hash plus hostname. C2PA/EXIF/provenance verification is roadmap, not claimed in v0.1.
POST /v1/analyze accepts {type:"audio",content:"https://..."} plus optional caller transcript; VeracityAPI returns a Gemini-generated transcript, sends capped audio bytes to Gemini for strict synthetic-audio workflow triage with transcript return, and returns synthetic_audio_risk, workflow_risk, evidence, fixes, trust score, risk level, and recommended action. VeracityAPI stores no audio bytes, base64, or full audio URLs. It is not proof of AI generation, voice cloning, speaker identity, or forensic determination.
Known limitations
- Does not prove whether text was written by AI or a human.
- Good AI-assisted writing with concrete details may pass.
- Weak human writing may be flagged — intentionally, if it is generic or unsupported.
- English-calibrated first; non-English scoring is experimental until evals exist.
- Latency is LLM-bound; use /v1/analyze-batch for pipeline batches and /v1/balance for preflight spend checks.
- Audio scoring is intentionally strict for triage and can produce false positives on compressed, edited, or unusually clean human recordings.
Near-term roadmap
- Publish a labeled calibration set with false-positive slices.
- Add configurable risk tolerance: lenient, standard, strict.
- Add a fast heuristic prefilter before the full evidence pass.
- Add async batch/webhooks after synchronous batch usage is proven.