Operations Overview

Live
Service / ITSMIncidentsFOR-1282
·
P1 IncidentInvestigatingFOR-1282·Opened 4 minutes ago by Datadog Monitor

Payments API 503s — checkout flow degraded

Error rate on payments-svc climbed from 0.4% baseline to 4.2% at 09:14 UTC, correlating with deploy v2.184.0 (PR #1284).

Forge: Likely root cause identified

94% confidence

The new rate-limiter introduced in PR #1284 uses a token bucket sized for nominal traffic but doesn't account for the checkout-burst pattern. Stripe webhook retries are now hitting the limit and returning 503s.

Recommended

Increase bucket to 240 rps and add Stripe webhook bypass.

Blast radius

~1.4% of paid customers in last 4m. Confined to checkout.

Linked

Similar pattern in INC-1872 (resolved by rollback).

  • Forge Agent analyzed logs, telemetry, and recent deploys. Identified PR #1284 as likely cause.

    just now
  • Datadog alert fired — HighErrorRate on payments-svc

    4m ago
  • CI/CD deploy v2.184.0 rolled out to prod

    12m ago
  • Priya Shah I'm taking the on-call seat — pulling logs now.

    3m ago
  • System SLA timer started · breach in 34 minutes

    4m ago