UGC Moderation Policy

Required governance ugc_moderation_policy

Agent Prompt Snippet

Ensure the project has a UGC moderation policy covering content review workflows, filtering rules, appeal processes, and enforcement.

Purpose

A UGC moderation policy is the operational rulebook that governs how user-generated content—custom maps, skins, chat messages, screenshots, workshop items, player-created levels—is reviewed, filtered, escalated, and acted upon before and after it reaches other players. It bridges the aspirational tone of community guidelines with the engineering reality of content pipelines, defining exactly what happens when a player uploads a texture, posts in chat, or shares a replay.

Games that accept UGC carry unique legal exposure. A single unmoderated CSAM image can trigger platform delisting, criminal investigation, and permanent reputational damage. Hate speech or graphic violence in user chat can result in age-rating reclassification, app store removal, or regulatory fines. COPPA violations from collecting data in UGC flows involving minors carry penalties up to $50,000 per incident. Platform certification requirements from Sony, Microsoft, Nintendo, and Valve all mandate documented moderation processes before approving UGC-enabled titles.

The moderation policy is not a list of banned words. It is a system design document: it specifies content categories and severity tiers, automated filter configurations, human review queue structures, report-to-action SLAs, appeal workflows, creator accountability measures, and the escalation path from a flagged asset to a final enforcement decision. Without it, moderation is ad hoc—some reports are acted on in minutes while others languish for weeks, enforcement is inconsistent across moderators, and the team has no defensible record when a platform holder or regulator asks how harmful content was handled.

Who needs this document

Persona	Why they need it	How they use it
Sam (Indie Dev)	Platform certification requires documented moderation before UGC features ship; cannot launch without it	Writes the policy before implementing UGC upload flows; references it when configuring automated filters and building the report UI
Claude Code (AI Agent)	Must implement content pipelines, filter integrations, and report endpoints that conform to documented moderation rules	Reads the policy to determine filter thresholds, queue routing logic, and enforcement action mappings before writing moderation service code
Priya (Eng Lead)	Responsible for staffing moderation queues, setting SLAs, and ensuring the pipeline handles volume at scale	Uses the policy to size moderation infrastructure, define on-call rotations, and establish escalation procedures for severity-critical content
Jordan (Trust & Safety Lead)	Owns the day-to-day moderation operation and must train moderators against consistent, documented standards	References the policy for moderator onboarding, calibration sessions, and when adjudicating appeals or edge cases

What separates a good version from a bad one

Criterion 1: Content categories and severity levels are explicit

✓ Strong: “Content is classified into four severity tiers: S1 (Critical — CSAM, credible threats, terrorist content; action within 1 hour, permanent ban, law enforcement referral). S2 (High — hate speech, graphic violence, doxxing; action within 4 hours, 30-day suspension). S3 (Medium — harassment, spam, impersonation; action within 24 hours, warning or 7-day mute). S4 (Low — off-topic, mild profanity, naming violations; action within 72 hours, content removal with notice).”

✗ Weak: “Inappropriate content will be removed and the user may be banned.” (No severity distinction means a slur in chat and a CSAM upload receive the same urgency. Moderators have no framework for prioritization.)

Criterion 2: The review pipeline distinguishes automated and human stages

✓ Strong: “All uploaded textures pass through PhotoDNA hash matching (CSAM detection, <200ms) and a machine-learning classifier for nudity and violence (confidence threshold 0.85). Content scoring above 0.85 is auto-rejected with a generic reason code. Content scoring 0.60–0.85 is routed to the human review queue with classifier output attached. Content below 0.60 is published immediately. Chat messages are filtered through a keyword blocklist (updated weekly) and a toxicity model. Human reviewers process the queue in severity order, not FIFO.”

✗ Weak: “We use AI to filter content before it goes live.” (Which AI? What thresholds? What happens to borderline content? What about content types the model was not trained on? An unspecified pipeline cannot be audited, tuned, or defended.)

Criterion 3: Appeal process has defined steps and timelines

✓ Strong: “Creators may appeal any enforcement action within 14 days via the in-game appeal form. Appeals are reviewed by a moderator who was not involved in the original decision. S1 appeals are reviewed by the Trust & Safety Lead. Appeal decisions are returned within 5 business days with a written explanation. If the appeal is upheld, the content is restored and the enforcement action is reversed. A second appeal may be submitted to the Review Board (3-person panel, quarterly meeting) for S2+ actions only.”

✗ Weak: “Users can contact support if they disagree with a moderation decision.” (No timeline, no independent review, no escalation path. This is not an appeal process—it is a suggestion box.)

Criterion 4: Creator accountability is proportional and progressive

✓ Strong: “Enforcement follows a progressive discipline model: first S3 violation = warning + content removal; second S3 within 90 days = 7-day upload suspension; third S3 within 180 days = 30-day suspension + review of all published content. S1 violations bypass progressive discipline: immediate permanent ban, content purge, and law enforcement notification where required by law. Creator trust scores are tracked internally; high-trust creators (100+ approved uploads, zero S2+ violations in 12 months) receive expedited review.”

✗ Weak: “Repeat offenders will be dealt with more severely.” (Without defined thresholds, two moderators will apply different standards to the same creator history. Inconsistent enforcement erodes creator trust and invites accusations of bias.)

Common mistakes

Treating text chat and asset uploads as the same moderation problem. Chat is high-volume, ephemeral, and requires sub-second filtering. Asset uploads (maps, skins, models) are lower-volume, persistent, and require visual inspection. A single moderation pipeline that handles both will either over-filter chat (destroying the social experience) or under-filter assets (letting harmful images persist for days). Design separate pipelines with appropriate tooling for each content type.

No CSAM-specific protocol. Generic moderation processes are insufficient for child sexual abuse material. Platforms require integration with NCMEC (National Center for Missing & Exploited Children) reporting, hash-matching databases like PhotoDNA, and immediate evidence preservation. Failing to have a documented CSAM protocol is not just a policy gap—it is a potential criminal liability. This must be a standalone section in the policy, not buried in general severity tiers.

SLAs that exist on paper but are not measured. Stating “S1 content is acted on within 1 hour” means nothing without monitoring. Track time-to-action per severity tier, measure it weekly, and alert when the SLA is breached. If you cannot measure the SLA, do not publish it.

Ignoring platform-specific certification requirements. Sony, Microsoft, Nintendo, and Valve each have distinct UGC moderation requirements for certification. A policy written for Steam may not satisfy PlayStation Technical Requirements. Document which platform requirements the policy satisfies and maintain a compliance matrix mapping policy sections to platform certification clauses.

No moderator well-being provisions. Human reviewers who process S1 and S2 content are exposed to graphic, disturbing material daily. The policy must include exposure limits (maximum hours per shift reviewing high-severity content), access to mental health support, and rotation schedules that prevent burnout. This is not optional—it is an occupational health obligation.

How to use this document

When to create it

Write the moderation policy before implementing any UGC upload, sharing, or chat system. Platform certification submissions require the policy to be documented and operational before the UGC feature ships. If you are adding UGC to an existing game, write the policy before the feature branch merges to main.

Who owns it

The Trust & Safety lead (or equivalent role) owns the policy. Engineering owns the automated pipeline implementation. Legal reviews COPPA, CSAM, and regional compliance sections. The policy is updated whenever a new content type is introduced, a new platform is targeted, or a moderation incident reveals a gap.

How AI agents should reference it

get_standard_docs(type="video_game", features=["ugc"])
→ ugc_moderation_policy in documents[]
→ agent reads the policy before implementing content upload endpoints or chat filters
→ agent maps severity tiers to enforcement actions in moderation service code
→ agent verifies report-to-action SLAs are reflected in queue processing logic
→ agent flags if a new UGC feature introduces a content type not covered by the policy

The prompt_snippet — “Ensure the project has a UGC moderation policy covering content review workflows, filtering rules, appeal processes, and enforcement” — tells the agent to verify all four pillars are present and operational.

How it connects to other documents

The UGC Moderation Policy operationalizes the Community Guidelines (the guidelines say “no hate speech”; the moderation policy defines how hate speech is detected, reviewed, and acted upon). It feeds enforcement data into the Privacy Policy (what moderation data is stored about users and for how long). The GDPR Compliance Checklist verifies that moderation data processing has a lawful basis and that users can request access to their moderation history. The Security & Privacy Plan covers the technical controls protecting the moderation pipeline itself—access controls on review queues, encryption of reported content, and audit logging of moderator actions.