Community moat protection

UGC moderation triage

When users submit reviews, tips, complaints, scam reports, or community posts, a slop-and-specificity gate sits in the moderation queue and triages: confident specific submissions publish; vague generic ones get held; obvious astroturf goes to the spam pile.

Get API key All use cases Docs

What we've seen in practice

The hardest case in UGC moderation isn't the obvious bot review — it's the LLM-assisted real user. Someone uses a chatbot to 'help me write a review for...' and gets back something plausible but specificity-free. These submissions look human, come from real accounts, and pass most spam classifiers. They fail slop_risk because the chatbot can't write the one thing that makes a review useful: the specific detail the reviewer actually experienced. The gate is built to surface those, because they're the volume problem most marketplaces are quietly facing.

Business value

  • Scales moderation as UGC volume grows without scaling the moderator headcount linearly. The gate handles the obvious bottom and top; humans get the middle.
  • Catches coordinated AI-planted campaigns earlier. A single fake review is hard to spot; ten thousand fake reviews with the same specificity profile are visible from orbit.
  • Preserves the value of the review/tip/report corpus over time. Communities die when bad submissions outnumber good ones; the gate keeps the ratio defensible.

Agent job to be done

Be a frontline moderator with infinite patience and zero ego. Allow low-risk specific submissions. Queue suspicious ones with evidence pinned. Reject obvious AI-generated marketing — but never auto-reject a genuine victim report just because it's poorly written.

format: product_reviewintended_use: moderatedomain: UGC moderation / reviews & tips

User-level vs. submission-level scoring

Score individual submissions for the routing decision, but track per-user trends for the campaign-detection job. A user whose last five reviews all scored high slop_risk — even if each was just-barely below the rejection threshold — is the signature of a fraud farm operating under the per-submission threshold. The aggregation table is where the campaign-detection value lives; the per-submission API call is just the data feeding it.

When to call VeracityAPI

On every new UGC submission, edited review, bulk import, or escalated report. Also re-run on user-account aggregation to detect campaigns.

What text to submit

Submission title and body, rating if present, category, target product/place, user-supplied metadata, and moderation history of the user. Keep identity metadata in your pipeline separately; submit only the text to the API.

Decision policy

  • allow: low risk AND specificity_risk ≤ 0.30. Routes to public publication.
  • Default for medium risk: allow under the moderate policy. Local product policy may override (a marketplace with high fraud exposure should hold medium risk).
  • human_review: high specificity/slop risk on accusatory reviews, safety claims, or promotional/astroturf signals.
  • reject: high risk combined with spam signals (duplicate text across users, link abuse, banned-account history, IP/device clustering).

Request template

The exact payload shape this use case sends. The sample below uses representative content for this workflow; substitute your own.

curl https://api.veracityapi.com/v1/analyze \
  -H "Authorization: Bearer $VERACITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type":"text","content":"This company is amazing and everyone should use it. Best service ever and totally safe. I had a perfect experience and recommend it to all travelers.","context":{"format":"product_review","intended_use":"moderate","domain":"UGC moderation / reviews & tips"},"store_content":false}'

Automation recipe

  • User submits review/tip/report. Moderation worker fetches the submission.
  • Score with intended_use=moderate, format matching the submission type.
  • Evidence categories become moderator notes — pre-written context for the human reviewer.
  • Trusted submissions publish. Suspicious submissions enter the review queue. Spam-flagged submissions go to the quarantine queue.
  • Aggregate signals by user, IP, and device fingerprint to detect coordinated campaigns. A user whose last five submissions all scored 'generic_endorsement' is suspect even if no single submission tripped the threshold.

Evidence spans agents should inspect

  • 'generic_endorsement' — 'great service, would recommend' without specifics
  • 'astroturf_phrasing' — language that reads like marketing copy disguised as a user review
  • 'unsupported_accusation' — negative reviews with claims that can't be verified ('they stole my money')
  • 'duplicate_pattern' — text that closely matches submissions on other products/places

Policy pseudocode

if (result.recommended_action === "allow") continueWorkflow();
if (result.recommended_action === "revise") rewriteWith(result.evidence, result.recommended_fixes);
if (result.recommended_action === "human_review") queueForHumanReview(result);
if (result.recommended_action === "reject") discardOrRebuild();

KPIs to track

  • moderator queue reduction (auto-allow + auto-reject as a share of total volume)
  • false-positive rate on a manual audit sample (target: under 3%)
  • false-negative rate on flagged campaigns
  • median time-to-publish for legitimate submissions
  • campaign-detection lead time (how fast you spot ten thousand fake reviews)

What can go wrong

  • The gate is not a complete moderation classifier. Pair with spam, abuse, link-detection, and reputation signals.
  • Do not auto-reject serious victim reports just because they're vaguely written. PTSD, language barriers, embarrassment, and rage all reduce specificity in genuine reports.
  • Aggregate signals matter more than single-submission signals. Tune the gate on user-level features (recent submission slop_risk trend, identical phrasing across submissions).

Cost and latency notes

Analyze only is $0.005 per 1,000 characters; Analyze + revise with auto_revise=true is $0.010 per 1,000 characters. Both round up to the nearest 1,000 characters. Short captions/emails usually cost $0.005; longer pages or chapters scale linearly by length. Current v0.1 latency is LLM-bound, so batch/concurrent orchestration is recommended for high-volume pipelines.

Agent evaluation checklist