Pre-indexation health check

SEO helpful-content proxy

Use VeracityAPI as a cheap proxy for the helpful-content question before Search Console can tell you the answer eight weeks later. Catch the pages that read generic now, while it's still cheap to fix them.

Get API key All use cases Docs

What we've seen in practice

At Clearscope we tracked thousands of pages across the September 2023 helpful-content update and the March 2024 core update. The pages that lost the most ranking were almost never the ones with technical SEO problems — they were the ones where the content was directionally correct but vague. 'Best travel backpacks for digital nomads' pages that didn't actually name backpacks. 'How to invest in index funds' pages that didn't name index funds. The specificity_risk signal is built around exactly that failure mode, because we watched it cost teams real revenue.

Business value

  • Closes the feedback loop on helpful-content updates. You don't have to wait for Google to demote a page to know it's weak — the signals are visible in the draft.
  • Prioritizes editor time toward the pages most likely to cost you in the next core update, not the pages with the highest keyword volume.
  • Gives the rewrite agent something to chase: specific evidence spans instead of a vague 'add E-E-A-T' note.

Agent job to be done

Be a helpful-content reviewer. For the target query, is this page genuinely useful — concrete examples, original information, evidence, named sources? Or does it summarize what every other page on page one already says?

format: articleintended_use: publishdomain: SEO helpful content / pre-indexation QA

Why FAQ schema is the highest-leverage place to look

Helpful-content drift hides in the FAQ section more than anywhere else. Generators love FAQ schema because it's an easy way to add word count, but the answers are usually the most paraphrased, least-sourced content on the page. If you only have budget to score one section per page, score the FAQ. You'll catch 80% of the drift for 15% of the cost.

When to call VeracityAPI

Before initial publish, and again before pushing a refresh of an already-indexed page. The refresh case matters: most helpful-content damage happens when a generator-driven 'refresh' replaces firsthand sections with paraphrase.

What text to submit

Primary content, title, meta description, H1/H2s, intro, key comparison sections, conclusion. Exclude global nav, sidebar, footer, author bio, and related-posts blocks. Include FAQ schema text — that's where slop hides.

Decision policy

  • allow: low risk AND content_trust_score ≥ 0.65 AND at least one named example per H2.
  • revise: medium risk OR evidence flags thin comparison tables, paraphrased summaries, or unsupported best/safest/cheapest claims.
  • human_review: high risk on YMYL pages (health, money, safety, legal), OR pages targeting commercial-intent queries above $5 CPC. The cost of getting these wrong is too high for autopublish.
  • Section-level rule: if any H2 section scores high risk, treat the page as high risk even if the average is medium.

Request template

The exact payload shape this use case sends. The sample below uses representative content for this workflow; substitute your own.

curl https://api.veracityapi.com/v1/analyze \
  -H "Authorization: Bearer $VERACITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type":"text","content":"The best travel backpacks are durable, affordable, and comfortable. We reviewed top options to help every traveler choose the perfect bag for any trip.","context":{"format":"article","intended_use":"publish","domain":"SEO helpful content / pre-indexation QA"},"store_content":false}'

Automation recipe

  • Content agent generates or refreshes the page.
  • Extractor splits by H2 and submits each section separately to /v1/analyze.
  • Worker maps evidence spans back to heading + character offsets so editors can jump to the bad paragraph.
  • Revision agent adds firsthand details (named examples, data tables, screenshots, named sources) before rescoring.
  • Page only enters the publish queue when every section is allow.

Evidence spans agents should inspect

  • 'thin_comparison' — comparison tables where rows just list features without differentiating value
  • 'paraphrase_summary' — paragraphs that read like a competitor's H2 reworded
  • 'unsupported_superlative' — best/safest/cheapest without a named comparison or measurement
  • 'missing_firsthand' — entire sections without a named example, screenshot, data point, or original observation

Policy pseudocode

if (result.recommended_action === "allow") continueWorkflow();
if (result.recommended_action === "revise") rewriteWith(result.evidence, result.recommended_fixes);
if (result.recommended_action === "human_review") queueForHumanReview(result);
if (result.recommended_action === "reject") discardOrRebuild();

KPIs to track

  • pre-publish block rate by section
  • % of pages improved (specificity_risk drop) on second pass
  • indexed-page ranking delta for gated vs. ungated cohorts (set up an A/B at the URL-path level)
  • organic traffic retained 90 days after a core update
  • editor queue size — needs to stay sustainable

What can go wrong

  • This is not a direct Google classifier. The signal is helpfulness — which correlates with helpful-content updates but doesn't predict them.
  • Helpful content includes UX, internal linking, reputation, and user-satisfaction signals VeracityAPI can't see.
  • Use evidence spans to fix pages. Chasing the score directly will push you into over-specific, brittle copy.

Cost and latency notes

Analyze only is $0.005 per 1,000 characters; Analyze + revise with auto_revise=true is $0.010 per 1,000 characters. Both round up to the nearest 1,000 characters. Short captions/emails usually cost $0.005; longer pages or chapters scale linearly by length. Current v0.1 latency is LLM-bound, so batch/concurrent orchestration is recommended for high-volume pipelines.

Agent evaluation checklist