Switch basketball-api to RollingUpdate to prevent webhook delivery failures during deploys #346
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/basketball-api#346
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
Root cause discovered from forgejo_admin/basketball-api#343 investigation. The funnel works, but
Recreatestrategy causes downtime that breaks Stripe webhook delivery.Repo
ldraney/pal-e-deployments(kustomize overlay for basketball-api)What Broke
basketball-api uses
strategy: Recreatein its Deployment. Every CI build takes the API completely offline for 30-60 seconds. During that window, Stripe webhook retries get connection refused. With frequent deployments (10+ replicasets observed), multiple retry windows are missed, leavingcheckout.session.completedevents withpending_webhooks=1.This caused 6 jersey payments ($780) to go unrecorded in the database between March 25 and April 5. Payments were manually synced from Stripe on 2026-04-05.
Stripe retries webhooks with exponential backoff over 72 hours. Repeated deploy-time outages can exhaust all retry attempts.
Repro Steps
Recreatestrategy)pending_webhookscounter stays at 1Expected Behavior
Zero-downtime deploys via
RollingUpdatestrategy — new pod comes up and passes readiness checks before old pod is terminated. Stripe webhook delivery succeeds at all times.Environment
RecreateRollingUpdatewithmaxUnavailable: 0, maxSurge: 1Acceptance Criteria
RecreatetoRollingUpdatemaxUnavailable: 0ensures zero-downtime during deploysRelated
project-westside-basketball— project this affectsforgejo_admin/basketball-api#343— parent investigationforgejo_admin/basketball-api#340— original symptom (Daniel's "load failed")pal-e-deployments/basketball-api/deployment-patch.yamlor equivalent kustomize overlayScope Review: READY
Review note:
review-840-2026-04-04Ticket is well-scoped. Single 3-line YAML deletion from
overlays/basketball-api/prod/deployment-patch.yaml— the base template already provides the desiredRollingUpdatewithmaxUnavailable: 0, maxSurge: 1. All 4 AC are testable. No blockers, no decomposition needed.Minor recommendations (non-blocking):
[BODY]Fix file path reference:basketball-api/deployment-patch.yaml→overlays/basketball-api/prod/deployment-patch.yaml[SCOPE]Create architecture notearch-basketball-api(platform-wide gap, not specific to this ticket)Blast radius note: mcd-tracker and gcal-scheduler also use
Recreateoverrides — consider follow-up tickets if they need zero-downtime.