UGC moderation triage
When users submit reviews, tips, complaints, scam reports, or community posts, a slop-and-specificity gate sits in the moderation queue and triages: confident specific submissions publish; vague generic ones get held; obvious astroturf goes to the spam pile.
The hardest case in UGC moderation isn't the obvious bot review — it's the LLM-assisted real user. Someone uses a chatbot to 'help me write a review for...' and gets back something plausible but specificity-free. These submissions look human, come from real accounts, and pass most spam classifiers. They fail slop_risk because the chatbot can't write the one thing that makes a review useful: the specific detail the reviewer actually experienced. The gate is built to surface those, because they're the volume problem most marketplaces are quietly facing.
Business value
- Scales moderation as UGC volume grows without scaling the moderator headcount linearly. The gate handles the obvious bottom and top; humans get the middle.
- Catches coordinated AI-planted campaigns earlier. A single fake review is hard to spot; ten thousand fake reviews with the same specificity profile are visible from orbit.
- Preserves the value of the review/tip/report corpus over time. Communities die when bad submissions outnumber good ones; the gate keeps the ratio defensible.
Agent job to be done
Be a frontline moderator with infinite patience and zero ego. Allow low-risk specific submissions. Queue suspicious ones with evidence pinned. Reject obvious AI-generated marketing — but never auto-reject a genuine victim report just because it's poorly written.
format: product_reviewintended_use: moderatedomain: UGC moderation / reviews & tips
User-level vs. submission-level scoring
Score individual submissions for the routing decision, but track per-user trends for the campaign-detection job. A user whose last five reviews all scored high slop_risk — even if each was just-barely below the rejection threshold — is the signature of a fraud farm operating under the per-submission threshold. The aggregation table is where the campaign-detection value lives; the per-submission API call is just the data feeding it.
When to call VeracityAPI
On every new UGC submission, edited review, bulk import, or escalated report. Also re-run on user-account aggregation to detect campaigns.
What text to submit
Submission title and body, rating if present, category, target product/place, user-supplied metadata, and moderation history of the user. Keep identity metadata in your pipeline separately; submit only the text to the API.
Decision policy
- allow: low risk AND specificity_risk ≤ 0.30. Routes to public publication.
- Default for medium risk: allow under the moderate policy. Local product policy may override (a marketplace with high fraud exposure should hold medium risk).
- human_review: high specificity/slop risk on accusatory reviews, safety claims, or promotional/astroturf signals.
- reject: high risk combined with spam signals (duplicate text across users, link abuse, banned-account history, IP/device clustering).
Request template
The exact payload shape this use case sends. The sample below uses representative content for this workflow; substitute your own.
curl https://api.veracityapi.com/v1/analyze \
-H "Authorization: Bearer $VERACITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"type":"text","content":"This company is amazing and everyone should use it. Best service ever and totally safe. I had a perfect experience and recommend it to all travelers.","context":{"format":"product_review","intended_use":"moderate","domain":"UGC moderation / reviews & tips"},"store_content":false}'Automation recipe
- User submits review/tip/report. Moderation worker fetches the submission.
- Score with intended_use=moderate, format matching the submission type.
- Evidence categories become moderator notes — pre-written context for the human reviewer.
- Trusted submissions publish. Suspicious submissions enter the review queue. Spam-flagged submissions go to the quarantine queue.
- Aggregate signals by user, IP, and device fingerprint to detect coordinated campaigns. A user whose last five submissions all scored 'generic_endorsement' is suspect even if no single submission tripped the threshold.
Evidence spans agents should inspect
- 'generic_endorsement' — 'great service, would recommend' without specifics
- 'astroturf_phrasing' — language that reads like marketing copy disguised as a user review
- 'unsupported_accusation' — negative reviews with claims that can't be verified ('they stole my money')
- 'duplicate_pattern' — text that closely matches submissions on other products/places
Policy pseudocode
if (result.recommended_action === "allow") continueWorkflow(); if (result.recommended_action === "revise") rewriteWith(result.evidence, result.recommended_fixes); if (result.recommended_action === "human_review") queueForHumanReview(result); if (result.recommended_action === "reject") discardOrRebuild();
KPIs to track
- moderator queue reduction (auto-allow + auto-reject as a share of total volume)
- false-positive rate on a manual audit sample (target: under 3%)
- false-negative rate on flagged campaigns
- median time-to-publish for legitimate submissions
- campaign-detection lead time (how fast you spot ten thousand fake reviews)
What can go wrong
- The gate is not a complete moderation classifier. Pair with spam, abuse, link-detection, and reputation signals.
- Do not auto-reject serious victim reports just because they're vaguely written. PTSD, language barriers, embarrassment, and rage all reduce specificity in genuine reports.
- Aggregate signals matter more than single-submission signals. Tune the gate on user-level features (recent submission slop_risk trend, identical phrasing across submissions).
Cost and latency notes
Analyze only is $0.005 per 1,000 characters; Analyze + revise with auto_revise=true is $0.010 per 1,000 characters. Both round up to the nearest 1,000 characters. Short captions/emails usually cost $0.005; longer pages or chapters scale linearly by length. Current v0.1 latency is LLM-bound, so batch/concurrent orchestration is recommended for high-volume pipelines.
Agent evaluation checklist
- Does this workflow have a costly failure mode from generic or weak-provenance text?
- Can the agent map evidence spans back to editable source locations?
- Should this workflow fail open, fail closed, or queue human review if VeracityAPI is unavailable?
- Which field drives policy: recommended_action, risk_level, content_trust_score, specificity_risk, or provenance_weakness?
- What local rule should complement the API score?