Operations Overview

Live
IT Operations · Incidents

Incidents

Active incidents with blast-radius context and AI-identified root causes. Click any incident for the full timeline.

Active
3
P1 / P2
3
Auto-resolved today
6
Avg MTTR (7d)
22m
Active incidents3 open
P1InvestigatingINC-2841· 26m ago

Payments API 503s — checkout flow degraded

Services
payments-svc, checkout-api
Assignee
Priya Shah
Correlation
Deploy v2.184.0 (15m before alert)
Blast radius
3 svcs · 14 upstreams
SLA timer26m / 60m

Rate-limiter in PR #1284 doesn't handle checkout-burst. Rollback v2.184.0 recommended.

94%
P2InvestigatingINC-2843· 47m ago

Email delivery delays > 5 minutes — all regions

Services
email-svc, notification-svc
Assignee
Marcus Lin
Correlation
Config change email-worker-prod (2h ago)
Blast radius
2 svcs · 5 upstreams
SLA timer47m / 120m

SMTP relay queue buildup. Pattern matches INC-1992 — resolved by L3 restart.

88%
P2TriagingINC-2844· 8m ago

VPN outage — Bangalore office

Services
vpn-svc
Assignee
Unassigned
Correlation
Firewall change at 14:30 UTC
Blast radius
1 svcs · 3 upstreams
SLA timer8m / 60m

Firewall rule change at 14:30 UTC affecting Bangalore subnet. Firewall rollback should resolve.

87%
Recently resolved
INC-2840Prometheus scrape timeout — dev-k8s
Resolved by SRE Agent
MTTR 18m
42m ago
INC-2838Memory spike — notification-worker
Resolved by Resolution Agent
MTTR 6m
2h ago
INC-2835Slow DB queries — billing (staging)
Resolved by Priya Shah + Agent
MTTR 31m
4h ago