The implementation, step by step.
This page covers what's running inside the API call: which model, which prompt structure, which post-processing, which thresholds, and what changed in each version. For the epistemology — what VeracityAPI claims to measure and what it doesn't — see /methodology.
The scoring pipeline, end to end
One POST /v1/analyze call goes through the following pipeline. Each step is deterministic given the previous step's output, except for the model call (step 3) which has temperature=0 but is still a model — empirically very stable across reruns, but not bit-identical.
- Validation — request schema checked via zod; rejects malformed inputs at the worker boundary with a 400 + structured error.
- Pre-processing — content normalized; for image, the URL is fetched in a sandboxed worker context with strict size and timeout limits.
- Model call — text uses Anthropic Haiku with temperature=0 and schema-constrained tool output. Image uses a vision LLM with a visible-artifact rubric.
- Post-processing — model outputs run through deterministic rules: risk levels mapped from raw scores via fixed thresholds, recommended_action derived from risk_level + intended_use via the policy table, evidence categories normalized.
- Response shaping — the final JSON is assembled with stable field names. Billing metadata is computed and the analysis_id minted (ULID format).
- Persistence — analysis_id, timestamps, billing, and routing decision logged to D1; raw submitted content is NOT stored unless
store_content:truewas explicitly set.
Text scoring (v0.1)
Model: Anthropic Haiku, temperature=0, schema-constrained tool output.
Prompt structure: The model receives the content plus context (format, intended_use, domain) and is asked to return a structured tool call with the evidence array, primary_reason, and individual risk scores. No free-form output.
Signals scored: specificity_risk, provenance_weakness, slop_risk, synthetic_texture_risk. The rollup is risk_level = bucket(max(synthetic_risk, slop_risk)).
Calibration: 0.871 macro F1 on the 500-item seed corpus across human firsthand, dry factual, generic slop, polished AI-with-specifics, and adversarial samples.
Image scoring (v0.1)
Model: A vision LLM with a structured visible-artifact rubric.
Input: Public HTTPS image URL. The image is fetched in a sandboxed context, scored, and discarded — no bytes stored.
Signals scored: synthetic_image_risk (alias: synthetic_risk), plus typed evidence categories (synthetic_texture, geometry_inconsistency, text_artifact, lighting_mismatch).
Known limit: v0.1 does NOT inspect EXIF or C2PA metadata. The signal is visual-only. Provenance verification is on the v0.2 roadmap.
Threshold table (the actual numbers)
This is the deterministic mapping from raw scores to risk_level bands. These thresholds may shift across version bumps; if your code is depending on the underlying score thresholds, branch on recommended_action instead.
| Modality | low | medium | high | Notes |
|---|---|---|---|---|
| Text | max(synthetic_risk, slop_risk) < 0.40 | < 0.70 | ≥ 0.70 | The 'slop or synthetic, whichever is worse' rule. |
| Image | synthetic_image_risk < 0.40 | < 0.70 | ≥ 0.70 | Vision-rubric output; visual-only at v0.1. |
risk_level = bucket(max(synthetic_risk, slop_risk)) recommended_action = policy(risk_level, intended_use)
The policy function is the table on /methodology — different intended_use values shift the action up or down a band.
Version changelog
Major version bumps are documented here; minor calibration changes appear in /changelog.
| Version | Date | What changed |
|---|---|---|
| v0.1 | 2026-Q1 | Initial release. Text scoring on Anthropic Haiku with schema-constrained tool output. Image scoring on vision LLM with visual-artifact rubric. Audio scoring on Gemini with transcript return. Video private beta on Claude Haiku contact-sheet pipeline. |
| v0.2 (planned) | 2026-H2 | Public-source EXIF/C2PA inspection for image scoring. Multilingual text calibration improvements. Async batch endpoints with webhook delivery. Configurable risk-tolerance modes (lenient/standard/strict). |
Known limitations of v0.1
- Does not prove text was AI-written or human-written. The score is workflow risk, not provenance.
- Good AI-assisted writing with concrete details may pass — and should, because the workflow risk is genuinely low.
- Weak human writing may be flagged. That's also working as intended; the signal is helpfulness, not authorship.
- English-first text calibration. Non-English coverage is weaker until the multilingual eval expansion lands.
- Latency is LLM-bound. Use
/v1/analyze-batch(1–25 items) and/v1/balancepreflight for high-volume workflows. - Image scoring is visual-only at v0.1; EXIF / C2PA provenance verification is on the v0.2 roadmap.
v0.2 roadmap (commitments + maybes)
Committed:
- Public-source EXIF and C2PA inspection for image scoring.
- Multilingual text calibration with published per-language coverage tables.
- Async batch endpoints with webhook delivery for jobs >1000 items.
- Configurable risk-tolerance modes (lenient / standard / strict).
Likely but not committed:
- Fine-tuned classifier replacing the LLM scoring pass for text — pending eval evidence that it improves on the structured-LLM approach.
- Fast heuristic prefilter before the full evidence pass, for cost-sensitive workflows.
- Public-source training-data certification (the dataset behind the multilingual calibration).
Image scoring v0.1
POST /v1/analyze accepts {type:"image",content:"https://..."}, calls a vision LLM with a constrained visible-artifact rubric, and returns synthetic_image_risk, synthetic_risk alias, evidence, fixes, trust score, risk level, and recommended action. VeracityAPI stores no image bytes and logs only a URL hash plus hostname. C2PA/EXIF/provenance verification is roadmap, not claimed in v0.1.
Known limitations
- Does not prove whether text was written by AI or a human.
- Good AI-assisted writing with concrete details may pass.
- Weak human writing may be flagged — intentionally, if it is generic or unsupported.
- English-calibrated first; non-English scoring is experimental until evals exist.
- Latency is LLM-bound; use /v1/analyze-batch for pipeline batches and /v1/balance for preflight spend checks.
- Audio scoring is intentionally strict for triage and can produce false positives on compressed, edited, or unusually clean human recordings.
Near-term roadmap
- Publish a labeled calibration set with false-positive slices.
- Add configurable risk tolerance: lenient, standard, strict.
- Add a fast heuristic prefilter before the full evidence pass.
- Add async batch/webhooks after synchronous batch usage is proven.