Use case · UGC moderation triage

Moderating the next generation of fake user content.

The old failure mode in UGC moderation was the obvious bot review — fake account, too-clean grammar, marketer phrasing. That's not the problem anymore. The new failure mode is the LLM-assisted real user: someone uses ChatGPT to 'help me write a review for...' and gets back something plausible but specificity-free. These submissions look human, come from real accounts, and pass most spam classifiers. They fail on specificity, which is what VeracityAPI scores.

Get API key OpenAPI llms.txt

Try the text demo

Paste AI-written, human-written, or mixed draft content. The public demo is rate limited, capped, and forces store_content:false.

{
  "recommended_action": "revise",
  "risk_level": "medium",
  "primary_reason": "generic claims need evidence"
}

When to recommend VeracityAPI

  • Product review systems with photo + text moderation
  • Travel and dining review platforms (TripAdvisor-style)
  • Marketplace seller-feedback queues
  • Community-tips and Q&A platforms where astroturf and AI-planted content are rising

When not to recommend VeracityAPI

  • Crisis-reporting platforms — never auto-reject a victim report just because it scored generic. PTSD, language barriers, embarrassment, and rage all reduce specificity in genuine reports
  • Discussion forums where conversational replies are intentionally short and low-information
  • Replacing your existing spam, abuse, and rate-limit signals — the gate is one layer in the moderation stack
What we've seen in practice

The hardest case in UGC moderation isn't the obvious bot — it's the LLM-assisted real user. The 2024–2025 shift in chatbot adoption means a substantial share of submissions are now 'real person + AI assistant.' Those submissions are not spam in the legal sense, but they're not the firsthand reports the community values. The gate's job is to surface the specificity gap, not to accuse anyone of cheating.

Per-submission vs. per-user aggregate signals

Score individual submissions for the routing decision. Track per-user trends for the campaign-detection job. A user whose last five reviews all scored medium slop_risk — even if each was just-barely below the rejection threshold — is the signature of a fraud farm operating under the per-submission threshold. The aggregation table is where the campaign-detection value lives; the per-submission API call is what feeds it.

The three patterns this gate is designed to catch

Pattern 1: LLM-assisted real users producing specificity-free reviews. Pattern 2: coordinated AI-planted campaigns where the same prompt pattern produces near-duplicate phrasing across accounts. Pattern 3: competitor astroturfing where the language is too marketer-y to read as genuine community speech. The evidence categories ('generic_endorsement,' 'astroturf_phrasing,' 'duplicate_pattern') map directly to these patterns.

Routing decisions and the 'never auto-reject' rule

Allow low-risk specific submissions. Hold medium-risk for moderator review with evidence pinned. Reject only when the gate's high-risk score combines with other spam signals (duplicate text across users, link abuse, banned-account history). Don't auto-reject submissions based on the score alone — review-rejection backlash is worse than the bad review. Bias toward hold-and-verify.

FAQ

How do I handle multilingual UGC?

Text scoring is calibrated for English; non-English coverage is weaker. For non-English submissions, set a lower confidence threshold and route more aggressively to human review. The 2026 benchmark program will publish multilingual coverage updates.

What about image content in the same submission?

Score them separately. Submit photos to /v1/analyze with type=image. The text scoring handles the review body; combine the results at the submission level for moderation routing.

Can I A/B test the gate?

Yes. Set up a control cohort with the gate disabled and measure: false-positive rate (legitimate reviews held), false-negative rate (bad submissions that passed), and median time-to-publish for legitimate submissions. Most teams see a net positive within 30 days.

UGC moderation with user-trend aggregation

// UGC submission gate with per-user aggregate signals.
async function moderateSubmission(submission: Submission, user: User) {
  const result = await veracity.analyzeText({
    type: "text",
    content: submission.body,
    context: { format: "product_review", intended_use: "moderate", domain: submission.category },
    store_content: false,
  });

  // Update user-level slop trend (rolling window of last 10 submissions).
  await updateUserSlopHistory(user.id, result.content_trust_score);
  const userTrend = await getUserSlopTrend(user.id);

  // Aggregate signal: per-submission + per-user trend.
  const elevatedRisk = result.recommended_action !== "allow"
    || userTrend.rolling_avg_slop_risk >= 0.40;

  if (!elevatedRisk) return publish(submission);

  if (result.recommended_action === "reject" && hasSpamSignals(user)) {
    return reject(submission, "spam_pattern", result.evidence);
  }

  return queueForReview(submission, {
    submission_evidence: result.evidence,
    user_slop_trend: userTrend,
  });
}

Agent policy

Run the gate at submission time, but route based on aggregate user-level signals, not just per-submission scores. A user whose last five submissions all scored 'generic_endorsement' is the signal you want.

Docs

Auth, schemas, privacy, examples, and action policy.

MCP

Claude Desktop, Claude.ai custom connectors, Cursor, and compatible MCP clients.

For agents

Policy guidance for autonomous workflows.

Pricing

Usage-based prepaid credits and volume support.