Rewind Reels Docscontact@purposeforce.org

Incident Response Plan

Last updated March 2026

On this page

  1. Severity Levels
  2. Escalation Path
  3. Response Time SLAs
  4. Common Incidents
  5. Recovery Procedures
  6. Post-Incident Review
  7. Contact

Severity Levels

All incidents are classified into one of four severity levels based on user impact and service availability.

LevelNameDescriptionExamples
P1CriticalService is completely down or unusable for all usersBackend API unreachable, all video renders failing, Salesforce callback endpoint returning 500
P2DegradedService is operational but significantly impairedRender times 3x slower than normal, intermittent API timeouts, Stripe webhook processing delayed
P3MinorA non-critical feature is broken or a small subset of users affectedSingle preset type failing, analytics not updating, email notifications not sending
P4CosmeticVisual or UX issue with no functional impactStyling glitch on player page, typo in narration prompt, minor layout shift in dashboard

Escalation Path

Incidents follow this escalation chain. Escalate immediately if the current responder cannot resolve within the SLA window.

  1. First contact: Email hello@purposeforce.org — monitored during business hours (Mon-Fri, 9am-6pm CT)
  2. On-call engineer: If no response within 30 minutes during business hours, or for P1 incidents reported outside business hours, the on-call engineer is paged automatically
  3. Engineering lead: If the on-call engineer cannot resolve within the SLA window, or for any P1 lasting longer than 2 hours, escalate to the engineering lead
Important: P1 incidents should always be reported via email to hello@purposeforce.org with "P1" in the subject line. This triggers immediate alerting.

Response Time SLAs

Response time is measured from when the incident is reported to when the first meaningful acknowledgment or action is taken.

SeverityInitial ResponseStatus Update CadenceTarget Resolution
P1 — Critical1 hourEvery 30 minutes4 hours
P2 — Degraded4 hoursEvery 2 hours24 hours
P3 — Minor1 business dayDaily1 week
P4 — CosmeticNext sprintSprint reviewNext release

Common Incidents

Video rendering fails

Symptoms: Videos stuck in "Generating" status indefinitely, or moving to "Failed" status shortly after generation starts.

  • Check Vercel function logs — Look for errors in the /api/pipeline/generate route. Common issues: Claude API failures, timeout, or malformed data.
  • Verify API keys — Confirm the x-rewind-api-key header value matches the Api_Key__c on the org's Rewind_License__c record.
  • Check render count — If Renders_Used_This_Month__c has reached the tier limit, new generations will be rejected.
  • Verify callback connectivity — Ensure the backend can reach the Salesforce org's REST endpoint at /services/apexrest/rewind/callback. Check JWT auth configuration.

Stripe webhooks failing

Symptoms: License tier not updating after payment, subscription changes not reflected in Salesforce.

  • Check webhook secret — Verify the STRIPE_WEBHOOK_SECRET environment variable in Vercel matches the signing secret in the Stripe Dashboard under Developers > Webhooks.
  • Verify endpoint URL — The webhook endpoint should be https://rewind.purposeforce.org/api/billing/webhook. Check for typos or outdated URLs.
  • Review Stripe event logs — In Stripe Dashboard, check the webhook event delivery attempts for HTTP status codes and error messages.
  • Check Vercel function logs — Look for parsing errors or authentication failures in the webhook handler.

Salesforce callback failing

Symptoms: Generation completes on the backend but narration JSON never arrives in Salesforce. Rewind_Video__c record stays in "Generating" status.

  • Check Named Credential — Verify the Named Credential used for callbacks is correctly configured and the authentication provider has a valid refresh token.
  • Verify access token — The JWT Bearer or password auth flow in src/lib/salesforce-auth.ts may have an expired or revoked token. Check Vercel logs for 401 responses.
  • Check org connectivity — Ensure the Salesforce org is accessible (not in maintenance, sandbox not suspended).
  • Verify callback payload — The generate pipeline stores narration JSON and theme data on the Rewind_Video__c record via the Salesforce REST API.

Rate limit exceeded

Symptoms: API returns 429 status code. Users see "Too many requests" errors.

  • Check current usage — The rate limiter allows 30 requests per minute per API key. Verify if a burst of legitimate requests caused the limit.
  • Check Rewind_License__c render count — Monthly render limits are enforced separately from the per-minute rate limit. Verify Renders_Used_This_Month__c vs. the tier limit.
  • Review for abuse — Check Vercel logs for repeated requests from a single source that may indicate unauthorized usage or a misconfigured scheduled job.

Claude API errors

Symptoms: Video generation starts but fails during narration generation. Errors reference Anthropic or Claude in logs.

  • Check Anthropic API key — Verify the ANTHROPIC_API_KEY environment variable in Vercel is valid and has not been rotated.
  • Check Anthropic rate limits — The Claude API has its own rate limits. If multiple orgs are generating videos simultaneously, the shared key may be throttled. Check status.anthropic.com for service issues.
  • Review prompt size — Very large Salesforce datasets can produce prompts that exceed the context window. Check if the failing org has unusually large query results.

Recovery Procedures

Restarting the backend

  1. Open the Vercel dashboard for the Rewind project
  2. Navigate to Deployments and redeploy the latest production commit
  3. Monitor the function logs for the first few minutes to confirm healthy operation
  4. Verify a test render completes successfully from the dev org

Rotating API keys

  1. Generate a new API key value
  2. Update the Api_Key__c field on the affected Rewind_License__c record in Salesforce
  3. If the backend API key environment variable needs rotation, update it in Vercel and redeploy
  4. Verify connectivity with a test render

Recovering stuck renders

  1. Query for stale Rewind_Video__c records with status "Rendering" older than 60 minutes
  2. Update their status to "Failed" so users can retry
  3. Investigate root cause in Vercel logs before allowing retries

Post-Incident Review Template

After any P1 or P2 incident is resolved, complete a post-incident review within 48 hours. Use the following template.

Post-Incident Review

  • Incident title: Brief description
  • Severity: P1 / P2
  • Date & time detected: YYYY-MM-DD HH:MM CT
  • Date & time resolved: YYYY-MM-DD HH:MM CT
  • Duration: Total time from detection to resolution
  • Affected users: Number and description of impacted users/orgs
  • Summary: What happened, in plain language
  • Root cause: The underlying technical cause
  • Timeline: Chronological list of key events and actions taken
  • What went well: Things that helped resolve the incident quickly
  • What could be improved: Gaps in monitoring, communication, or process
  • Action items: Specific follow-up tasks with owners and deadlines
Tip: Store completed post-incident reviews in a shared location accessible to the entire team. Review trends quarterly to identify systemic improvements.

Contact

For all incident reports and operational questions:

Need immediate help? Email hello@purposeforce.org with the severity level in the subject line.