Batch Schedule Specification
Agent Prompt Snippet
Define the cron expressions, time zones, dependency chains, SLA deadlines, and alerting rules for every scheduled batch pipeline execution.Purpose
The batch schedule specification defines the cron expressions, time zones, dependency chains, SLA deadlines, and alerting rules for every scheduled pipeline run.
This is a Required document — every project of this type should have one. Without it, the team risks misalignment, rework, or undetected issues that compound over time.
Key Sections to Include
- The cron expressions
- Time zones
- Dependency chains
- SLA deadlines
- Alerting rules for every scheduled batch pipeline execution
Agent hint: Define the cron expressions, time zones, dependency chains, SLA deadlines, and alerting rules for every scheduled batch pipeline execution.
What Makes It Good vs Bad
A strong version of this document:
- Includes runbooks with step-by-step procedures for common incidents
- Defines SLIs, SLOs, and error budgets with measurable thresholds
- Documents on-call responsibilities and escalation paths
- Covers both steady-state operations and failure recovery
- Tested regularly through drills or game days
Warning signs of a weak version:
- No runbooks — relies on tribal knowledge for incident response
- Monitoring defined but no clear thresholds or alerting rules
- Missing capacity planning or scaling procedures
- Disaster recovery plan that has never been tested
- No distinction between informational alerts and actionable pages
Common Mistakes
- Writing runbooks that assume expert knowledge of the system
- Defining SLOs without buy-in from product and engineering teams
- Not testing disaster recovery procedures until an actual disaster occurs
- Alerting on everything rather than focusing on user-impacting symptoms
How to Use This Document
Write runbooks as if the person reading them is stressed, sleep-deprived, and unfamiliar with the system — because during an incident, they might be. Use numbered steps, include expected output for each command, and clearly mark decision points. Test runbooks regularly through game days or tabletop exercises.
For AI agents: Reference operations documents when assisting with incident response, capacity planning, or deployment procedures. Verify that proposed infrastructure changes align with documented SLOs and operational constraints.
Starter Template
SpecBase includes a ready-to-use template for this document: kb/templates/operations/batch_schedule_spec.md.tmpl. Use the SpecBase CLI or MCP integration to generate it pre-filled for your project.
# Generate stubs via CLI
specbase init <archetype> --features <features> --dir ./docs
Recommended Reading
- Site Reliability Engineering by Betsy Beyer, Chris Jones, Jennifer Petoff & Niall Richard Murphy (Google) — The foundational text on SRE practices including SLOs, error budgets, and incident management.
- Release It! Design and Deploy Production-Ready Software by Michael T. Nygard — Practical patterns for building systems that survive real-world production conditions.
- The Phoenix Project by Gene Kim, Kevin Behr & George Spafford — Narrative introduction to DevOps principles and the flow of work through IT organizations.