Technical pipeline · how the scoring runs

The implementation, step by step.

This page covers what's running inside the API call: which model, which prompt structure, which post-processing, which thresholds, and what changed in each version. For the epistemology — what VeracityAPI claims to measure and what it doesn't — see /methodology.

Get API key What we claim Evals

The scoring pipeline, end to end

One POST /v1/analyze call goes through the following pipeline. Each step is deterministic given the previous step's output, except for the model call (step 3) which has temperature=0 but is still a model — empirically very stable across reruns, but not bit-identical.

  1. Validation — request schema checked via zod; rejects malformed inputs at the worker boundary with a 400 + structured error.
  2. Pre-processing — content normalized; for image, the URL is fetched in a sandboxed worker context with strict size and timeout limits.
  3. Model call — text uses Anthropic Haiku with temperature=0 and schema-constrained tool output. Image uses a vision LLM with a visible-artifact rubric.
  4. Post-processing — model outputs run through deterministic rules: risk levels mapped from raw scores via fixed thresholds, recommended_action derived from risk_level + intended_use via the policy table, evidence categories normalized.
  5. Response shaping — the final JSON is assembled with stable field names. Billing metadata is computed and the analysis_id minted (ULID format).
  6. Persistence — analysis_id, timestamps, billing, and routing decision logged to D1; raw submitted content is NOT stored unless store_content:true was explicitly set.

Text scoring (v0.1)

Model: Anthropic Haiku, temperature=0, schema-constrained tool output.

Prompt structure: The model receives the content plus context (format, intended_use, domain) and is asked to return a structured tool call with the evidence array, primary_reason, and individual risk scores. No free-form output.

Signals scored: specificity_risk, provenance_weakness, slop_risk, synthetic_texture_risk. The rollup is risk_level = bucket(max(synthetic_risk, slop_risk)).

Calibration: 0.871 macro F1 on the 500-item seed corpus across human firsthand, dry factual, generic slop, polished AI-with-specifics, and adversarial samples.

Image scoring (v0.1)

Model: A vision LLM with a structured visible-artifact rubric.

Input: Public HTTPS image URL. The image is fetched in a sandboxed context, scored, and discarded — no bytes stored.

Signals scored: synthetic_image_risk (alias: synthetic_risk), plus typed evidence categories (synthetic_texture, geometry_inconsistency, text_artifact, lighting_mismatch).

Known limit: v0.1 does NOT inspect EXIF or C2PA metadata. The signal is visual-only. Provenance verification is on the v0.2 roadmap.

Threshold table (the actual numbers)

This is the deterministic mapping from raw scores to risk_level bands. These thresholds may shift across version bumps; if your code is depending on the underlying score thresholds, branch on recommended_action instead.

ModalitylowmediumhighNotes
Textmax(synthetic_risk, slop_risk) < 0.40< 0.70≥ 0.70The 'slop or synthetic, whichever is worse' rule.
Imagesynthetic_image_risk < 0.40< 0.70≥ 0.70Vision-rubric output; visual-only at v0.1.
risk_level = bucket(max(synthetic_risk, slop_risk))
recommended_action = policy(risk_level, intended_use)

The policy function is the table on /methodology — different intended_use values shift the action up or down a band.

Version changelog

Major version bumps are documented here; minor calibration changes appear in /changelog.

VersionDateWhat changed
v0.12026-Q1Initial release. Text scoring on Anthropic Haiku with schema-constrained tool output. Image scoring on vision LLM with visual-artifact rubric. Audio scoring on Gemini with transcript return. Video private beta on Claude Haiku contact-sheet pipeline.
v0.2 (planned)2026-H2Public-source EXIF/C2PA inspection for image scoring. Multilingual text calibration improvements. Async batch endpoints with webhook delivery. Configurable risk-tolerance modes (lenient/standard/strict).

Known limitations of v0.1

  • Does not prove text was AI-written or human-written. The score is workflow risk, not provenance.
  • Good AI-assisted writing with concrete details may pass — and should, because the workflow risk is genuinely low.
  • Weak human writing may be flagged. That's also working as intended; the signal is helpfulness, not authorship.
  • English-first text calibration. Non-English coverage is weaker until the multilingual eval expansion lands.
  • Latency is LLM-bound. Use /v1/analyze-batch (1–25 items) and /v1/balance preflight for high-volume workflows.
  • Image scoring is visual-only at v0.1; EXIF / C2PA provenance verification is on the v0.2 roadmap.

v0.2 roadmap (commitments + maybes)

Committed:

  • Public-source EXIF and C2PA inspection for image scoring.
  • Multilingual text calibration with published per-language coverage tables.
  • Async batch endpoints with webhook delivery for jobs >1000 items.
  • Configurable risk-tolerance modes (lenient / standard / strict).

Likely but not committed:

  • Fine-tuned classifier replacing the LLM scoring pass for text — pending eval evidence that it improves on the structured-LLM approach.
  • Fast heuristic prefilter before the full evidence pass, for cost-sensitive workflows.
  • Public-source training-data certification (the dataset behind the multilingual calibration).

Image scoring v0.1

POST /v1/analyze accepts {type:"image",content:"https://..."}, calls a vision LLM with a constrained visible-artifact rubric, and returns synthetic_image_risk, synthetic_risk alias, evidence, fixes, trust score, risk level, and recommended action. VeracityAPI stores no image bytes and logs only a URL hash plus hostname. C2PA/EXIF/provenance verification is roadmap, not claimed in v0.1.

Known limitations

  • Does not prove whether text was written by AI or a human.
  • Good AI-assisted writing with concrete details may pass.
  • Weak human writing may be flagged — intentionally, if it is generic or unsupported.
  • English-calibrated first; non-English scoring is experimental until evals exist.
  • Latency is LLM-bound; use /v1/analyze-batch for pipeline batches and /v1/balance for preflight spend checks.
  • Audio scoring is intentionally strict for triage and can produce false positives on compressed, edited, or unusually clean human recordings.

Near-term roadmap

  • Publish a labeled calibration set with false-positive slices.
  • Add configurable risk tolerance: lenient, standard, strict.
  • Add a fast heuristic prefilter before the full evidence pass.
  • Add async batch/webhooks after synchronous batch usage is proven.