Clean up basketball-api stale ReplicaSets + ImagePullBackOff pod #370

Open
opened 2026-05-19 02:17:55 +00:00 by ldraney · 0 comments
Owner

Type

Bug

Lineage

Standalone — discovered during platform health audit 2026-05-18.

Repo

ldraney/pal-e-platform

What Broke

basketball-api has a stuck pod (basketball-api-864568b874-pqcrf) in ImagePullBackOff for 13+ hours. Trying to pull non-existent image: harbor.tail5b443a.ts.net/basketball-api/api:c4a95c678286642f86e69fdeb922a0621182bce2. Service is healthy on a newer ReplicaSet (5954dcfccb, 1/1 Running). 14 total ReplicaSets in namespace, 12 scaled to 0 and stale (30-35 days old).

Repro Steps

  1. kubectl get pods -n basketball-api — observe ImagePullBackOff pod
  2. kubectl get rs -n basketball-api — observe 12 stale ReplicaSets at 0 desired
  3. The stuck pod generates TargetDown alerts in Prometheus

Expected Behavior

Only healthy ReplicaSets should exist. Failed rollouts should be cleaned up.

Environment

  • Cluster/namespace: prod / basketball-api
  • Related alerts: TargetDown (basketball-api 10.42.0.59:8000)

Acceptance Criteria

  • Stuck pod basketball-api-864568b874-pqcrf deleted
  • Stale ReplicaSets (12, scaled to 0) cleaned up
  • Healthy pod (5954dcfccb) remains Running
  • TargetDown alert clears
  • project-pal-e-platform — platform health
### Type Bug ### Lineage Standalone — discovered during platform health audit 2026-05-18. ### Repo `ldraney/pal-e-platform` ### What Broke basketball-api has a stuck pod (`basketball-api-864568b874-pqcrf`) in ImagePullBackOff for 13+ hours. Trying to pull non-existent image: `harbor.tail5b443a.ts.net/basketball-api/api:c4a95c678286642f86e69fdeb922a0621182bce2`. Service is healthy on a newer ReplicaSet (`5954dcfccb`, 1/1 Running). 14 total ReplicaSets in namespace, 12 scaled to 0 and stale (30-35 days old). ### Repro Steps 1. `kubectl get pods -n basketball-api` — observe ImagePullBackOff pod 2. `kubectl get rs -n basketball-api` — observe 12 stale ReplicaSets at 0 desired 3. The stuck pod generates TargetDown alerts in Prometheus ### Expected Behavior Only healthy ReplicaSets should exist. Failed rollouts should be cleaned up. ### Environment - Cluster/namespace: prod / basketball-api - Related alerts: TargetDown (basketball-api 10.42.0.59:8000) ### Acceptance Criteria - [ ] Stuck pod `basketball-api-864568b874-pqcrf` deleted - [ ] Stale ReplicaSets (12, scaled to 0) cleaned up - [ ] Healthy pod (`5954dcfccb`) remains Running - [ ] TargetDown alert clears ### Related - `project-pal-e-platform` — platform health
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#370
No description provided.