[CRITICAL] Migration 044 streamlit_ro_role requires WESTSIDE_STREAMLIT_RO_PASSWORD env var that's not in pod env — basketball-api still CrashLoopBackOff #449

Closed
opened 2026-04-11 20:21:09 +00:00 by forgejo_admin · 1 comment
Contributor

Type

Bug

Lineage

Discovered 2026-04-11 immediately after PR #444 merged. The 041 collision fix unblocked alembic enough to run migration 044, which then crashed because the original PR #5/#435 introduced the migration without injecting the env var it requires. Related: #441 (040 fix), #443 (041 fix), PR #5/#435 (origin).

Repo

forgejo_admin/pal-e-deployments (the actual fix lives here, not basketball-api — the migration code is correct, only the deploy environment is wrong)

What Broke

Migration 044_add_westside_streamlit_ro_role.py line 44 raises RuntimeError because WESTSIDE_STREAMLIT_RO_PASSWORD is not in the basketball-api pod's environment.

File "/app/alembic/versions/044_add_westside_streamlit_ro_role.py", line 44, in upgrade
  raise RuntimeError(
RuntimeError: WESTSIDE_STREAMLIT_RO_PASSWORD environment variable is required to run this migration.
Source ~/secrets/pal-e-services/westside-streamlit.env before running alembic upgrade.

basketball-api-secrets currently holds 4 keys (postgres-password, stripe-api-key, stripe-webhook-secret, keycloak-admin-password) but NOT WESTSIDE_STREAMLIT_RO_PASSWORD. The basketball-api deployment-patch.yaml does NOT mount this env var. Result: every basketball-api pod startup since the original PR #5/#435 merge has failed at migration 044, but the failure was masked for 20+ hours by a separate dual-revision-041 collision (alembic crashed BEFORE reaching the body of either 041) and by RollingUpdate maxUnavailable=0 keeping the old pod alive.

Repro Steps

  1. PR #444 merged (resolves the 041 collision)
  2. ArgoCD rolls out new image: harbor.tail5b443a.ts.net/basketball-api/api:1bd8c301bae18d9008deeabc3a85e77b2e2e267f
  3. New pod attempts alembic upgrade head
  4. kubectl -n basketball-api logs basketball-api-585cbf95d6-rcmrb --tail=30 → shows the RuntimeError above
  5. kubectl -n basketball-api get deploy basketball-api -o jsonpath='{.spec.template.spec.containers[0].env}' | grep -i streamlit → empty (env var not defined)
  6. kubectl -n basketball-api get secret basketball-api-secrets -o jsonpath='{.data}' | grep -i streamlit → empty (key not in secret)
  7. ls ~/secrets/pal-e-services/westside-streamlit.env → exists on host (mode 0600, ldraney owner) — this is the source of truth for the password

Expected Behavior

basketball-api pod starts cleanly. alembic upgrade head reaches revision 044, creates the westside_streamlit_ro postgres role using the password from a Kubernetes Secret, advances to 044 (will subsequently advance to 043 jersey_public_orders if there's a higher revision — order depends on the chain).

Environment

  • Cluster: pal-e k3s
  • Namespace: basketball-api
  • Crashing pod: basketball-api-585cbf95d6-rcmrb running image 1bd8c301bae18d9008deeabc3a85e77b2e2e267f (post PR #444 merge)
  • Healthy pod (still serving via RollingUpdate): basketball-api-5c4b9bcc-vvfsx (alembic_version=042, no jersey_public_orders table, no westside_streamlit_ro role)
  • ArgoCD application: Synced / Progressing (cannot complete rollout)
  • Source of password (host file): ~/secrets/pal-e-services/westside-streamlit.env
  • Existing SOPS pattern in this overlay: pal-e-deployments/overlays/basketball-api/prod/harbor-creds.enc.yaml
  • kustomization.yaml currently lists harbor-creds.enc.yaml as a resource
  • basketball-api-secrets is externally managed (NOT in pal-e-deployments, NOT in pal-e-platform terraform — created via kubectl out-of-band). Do NOT modify it. Use a NEW separate Secret to avoid fighting the external management.

File Targets

Files to create:

  • pal-e-deployments/overlays/basketball-api/prod/westside-streamlit-secret.enc.yaml — new SOPS-encrypted Kubernetes Secret named westside-streamlit-secret containing key WESTSIDE_STREAMLIT_RO_PASSWORD sourced from ~/secrets/pal-e-services/westside-streamlit.env. Encrypt with the same SOPS recipient/age key used by harbor-creds.enc.yaml in this overlay (read that file's metadata to learn the recipient).

Files to modify:

  • pal-e-deployments/overlays/basketball-api/prod/kustomization.yaml — add westside-streamlit-secret.enc.yaml to the resources: list (next to harbor-creds.enc.yaml)
  • pal-e-deployments/overlays/basketball-api/prod/deployment-patch.yaml — append a new env var to the app container's env: list:
    - name: WESTSIDE_STREAMLIT_RO_PASSWORD
      valueFrom:
        secretKeyRef:
          name: westside-streamlit-secret
          key: WESTSIDE_STREAMLIT_RO_PASSWORD
    

Files the agent should NOT touch:

  • basketball-api repo entirely — the migration is correct, no Python changes needed
  • basketball-api-secrets k8s Secret — externally managed, do not modify
  • Any other overlay (only basketball-api/prod gets the new env var)
  • The harbor-creds.enc.yaml file itself (read for SOPS recipient pattern only)

Acceptance Criteria

  • New file westside-streamlit-secret.enc.yaml exists in pal-e-deployments/overlays/basketball-api/prod/
  • The SOPS-encrypted file decrypts cleanly with the same key used by harbor-creds.enc.yaml
  • Decrypted content is a valid Kubernetes Secret manifest with metadata.name = westside-streamlit-secret and a single key WESTSIDE_STREAMLIT_RO_PASSWORD whose value matches the password in ~/secrets/pal-e-services/westside-streamlit.env
  • kustomization.yaml lists the new file in resources:
  • deployment-patch.yaml includes the new env var pointing at westside-streamlit-secret
  • kustomize build pal-e-deployments/overlays/basketball-api/prod/ | kubectl apply --dry-run=client -f - succeeds
  • After merge + ArgoCD sync, kubectl -n basketball-api get secret westside-streamlit-secret returns the secret
  • After rollout, kubectl -n basketball-api get deploy basketball-api -o jsonpath='{.spec.template.spec.containers[0].env[*].name}' includes WESTSIDE_STREAMLIT_RO_PASSWORD
  • After rollout, basketball-api pod is in Running 1/1 status (no CrashLoopBackOff)
  • After rollout, kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -tc "SELECT 1 FROM pg_roles WHERE rolname='westside_streamlit_ro';" returns 1
  • After rollout, alembic_version table reaches at least 044 (and 043 jersey_public_orders if the chain order resolves that way)
  • After rollout, ArgoCD basketball-api application is Synced / Healthy
  • No password value committed to git in plain text

Test Expectations

  • sops -d westside-streamlit-secret.enc.yaml decrypts and shows valid YAML
  • kustomize build succeeds
  • Manual k8s apply dry-run succeeds
  • Post-rollout pod logs show no alembic errors and a successful "running migrations" line
  • Post-rollout DB query shows the role exists

Constraints

  • Use SOPS encryption matching the existing harbor-creds.enc.yaml recipient — find by sops --decrypt harbor-creds.enc.yaml > /dev/null or by reading the SOPS metadata block at the bottom of that file
  • Do NOT commit the plaintext password under any circumstance
  • Do NOT modify basketball-api-secrets (externally managed)
  • Do NOT modify the migration file or any basketball-api code
  • Use a NEW separate Secret (westside-streamlit-secret) — do not append to basketball-api-secrets
  • Hot-fix profile — minimal scope, no refactoring, no cleanup of unrelated drift
  • Do NOT change deployment-patch.yaml's strategy, replicas, resources, or any other field — only add the one env var entry
  • pal-e-platform — project tracking (this is platform-level deploy infra)
  • westside-basketball — affected product (jersey System B production rollout still blocked by this)
  • forgejo_admin/basketball-api#441 — sister bug 040 collision (fixed by PR #442)
  • forgejo_admin/basketball-api#443 — sister bug 041 collision (fixed by PR #444)
  • PR #5 / #435 — origin of the env-var-requiring migration
  • feedback_funnel_requires_auth.md — adjacent secrets-management theme
  • Process gap: PR template should require "if your migration reads an env var, the same PR must update the kustomize overlay to inject it" — separate follow-up
### Type Bug ### Lineage Discovered 2026-04-11 immediately after PR #444 merged. The 041 collision fix unblocked alembic enough to run migration 044, which then crashed because the original PR #5/#435 introduced the migration without injecting the env var it requires. Related: #441 (040 fix), #443 (041 fix), PR #5/#435 (origin). ### Repo `forgejo_admin/pal-e-deployments` (the actual fix lives here, not basketball-api — the migration code is correct, only the deploy environment is wrong) ### What Broke Migration `044_add_westside_streamlit_ro_role.py` line 44 raises `RuntimeError` because `WESTSIDE_STREAMLIT_RO_PASSWORD` is not in the basketball-api pod's environment. ``` File "/app/alembic/versions/044_add_westside_streamlit_ro_role.py", line 44, in upgrade raise RuntimeError( RuntimeError: WESTSIDE_STREAMLIT_RO_PASSWORD environment variable is required to run this migration. Source ~/secrets/pal-e-services/westside-streamlit.env before running alembic upgrade. ``` basketball-api-secrets currently holds 4 keys (postgres-password, stripe-api-key, stripe-webhook-secret, keycloak-admin-password) but NOT WESTSIDE_STREAMLIT_RO_PASSWORD. The basketball-api deployment-patch.yaml does NOT mount this env var. Result: every basketball-api pod startup since the original PR #5/#435 merge has failed at migration 044, but the failure was masked for 20+ hours by a separate dual-revision-041 collision (alembic crashed BEFORE reaching the body of either 041) and by RollingUpdate maxUnavailable=0 keeping the old pod alive. ### Repro Steps 1. PR #444 merged (resolves the 041 collision) 2. ArgoCD rolls out new image: `harbor.tail5b443a.ts.net/basketball-api/api:1bd8c301bae18d9008deeabc3a85e77b2e2e267f` 3. New pod attempts `alembic upgrade head` 4. `kubectl -n basketball-api logs basketball-api-585cbf95d6-rcmrb --tail=30` → shows the RuntimeError above 5. `kubectl -n basketball-api get deploy basketball-api -o jsonpath='{.spec.template.spec.containers[0].env}' | grep -i streamlit` → empty (env var not defined) 6. `kubectl -n basketball-api get secret basketball-api-secrets -o jsonpath='{.data}' | grep -i streamlit` → empty (key not in secret) 7. `ls ~/secrets/pal-e-services/westside-streamlit.env` → exists on host (mode 0600, ldraney owner) — this is the source of truth for the password ### Expected Behavior basketball-api pod starts cleanly. `alembic upgrade head` reaches revision 044, creates the `westside_streamlit_ro` postgres role using the password from a Kubernetes Secret, advances to 044 (will subsequently advance to 043 jersey_public_orders if there's a higher revision — order depends on the chain). ### Environment - Cluster: pal-e k3s - Namespace: `basketball-api` - Crashing pod: `basketball-api-585cbf95d6-rcmrb` running image `1bd8c301bae18d9008deeabc3a85e77b2e2e267f` (post PR #444 merge) - Healthy pod (still serving via RollingUpdate): `basketball-api-5c4b9bcc-vvfsx` (alembic_version=042, no jersey_public_orders table, no westside_streamlit_ro role) - ArgoCD application: `Synced / Progressing` (cannot complete rollout) - Source of password (host file): `~/secrets/pal-e-services/westside-streamlit.env` - Existing SOPS pattern in this overlay: `pal-e-deployments/overlays/basketball-api/prod/harbor-creds.enc.yaml` - kustomization.yaml currently lists `harbor-creds.enc.yaml` as a resource - basketball-api-secrets is externally managed (NOT in pal-e-deployments, NOT in pal-e-platform terraform — created via kubectl out-of-band). Do NOT modify it. Use a NEW separate Secret to avoid fighting the external management. ### File Targets Files to create: - `pal-e-deployments/overlays/basketball-api/prod/westside-streamlit-secret.enc.yaml` — new SOPS-encrypted Kubernetes Secret named `westside-streamlit-secret` containing key `WESTSIDE_STREAMLIT_RO_PASSWORD` sourced from `~/secrets/pal-e-services/westside-streamlit.env`. Encrypt with the same SOPS recipient/age key used by `harbor-creds.enc.yaml` in this overlay (read that file's metadata to learn the recipient). Files to modify: - `pal-e-deployments/overlays/basketball-api/prod/kustomization.yaml` — add `westside-streamlit-secret.enc.yaml` to the `resources:` list (next to `harbor-creds.enc.yaml`) - `pal-e-deployments/overlays/basketball-api/prod/deployment-patch.yaml` — append a new env var to the `app` container's `env:` list: ```yaml - name: WESTSIDE_STREAMLIT_RO_PASSWORD valueFrom: secretKeyRef: name: westside-streamlit-secret key: WESTSIDE_STREAMLIT_RO_PASSWORD ``` Files the agent should NOT touch: - `basketball-api` repo entirely — the migration is correct, no Python changes needed - `basketball-api-secrets` k8s Secret — externally managed, do not modify - Any other overlay (only basketball-api/prod gets the new env var) - The `harbor-creds.enc.yaml` file itself (read for SOPS recipient pattern only) ### Acceptance Criteria - [ ] New file `westside-streamlit-secret.enc.yaml` exists in `pal-e-deployments/overlays/basketball-api/prod/` - [ ] The SOPS-encrypted file decrypts cleanly with the same key used by `harbor-creds.enc.yaml` - [ ] Decrypted content is a valid Kubernetes Secret manifest with `metadata.name = westside-streamlit-secret` and a single key `WESTSIDE_STREAMLIT_RO_PASSWORD` whose value matches the password in `~/secrets/pal-e-services/westside-streamlit.env` - [ ] `kustomization.yaml` lists the new file in `resources:` - [ ] `deployment-patch.yaml` includes the new env var pointing at `westside-streamlit-secret` - [ ] `kustomize build pal-e-deployments/overlays/basketball-api/prod/ | kubectl apply --dry-run=client -f -` succeeds - [ ] After merge + ArgoCD sync, `kubectl -n basketball-api get secret westside-streamlit-secret` returns the secret - [ ] After rollout, `kubectl -n basketball-api get deploy basketball-api -o jsonpath='{.spec.template.spec.containers[0].env[*].name}'` includes `WESTSIDE_STREAMLIT_RO_PASSWORD` - [ ] After rollout, basketball-api pod is in `Running 1/1` status (no CrashLoopBackOff) - [ ] After rollout, `kubectl -n basketball-api exec postgres-9b5b87b5-5nccx -- psql -tc "SELECT 1 FROM pg_roles WHERE rolname='westside_streamlit_ro';"` returns 1 - [ ] After rollout, `alembic_version` table reaches at least 044 (and 043 jersey_public_orders if the chain order resolves that way) - [ ] After rollout, ArgoCD basketball-api application is `Synced / Healthy` - [ ] No password value committed to git in plain text ### Test Expectations - [ ] `sops -d westside-streamlit-secret.enc.yaml` decrypts and shows valid YAML - [ ] `kustomize build` succeeds - [ ] Manual k8s apply dry-run succeeds - [ ] Post-rollout pod logs show no alembic errors and a successful "running migrations" line - [ ] Post-rollout DB query shows the role exists ### Constraints - Use SOPS encryption matching the existing `harbor-creds.enc.yaml` recipient — find by `sops --decrypt harbor-creds.enc.yaml > /dev/null` or by reading the SOPS metadata block at the bottom of that file - Do NOT commit the plaintext password under any circumstance - Do NOT modify basketball-api-secrets (externally managed) - Do NOT modify the migration file or any basketball-api code - Use a NEW separate Secret (`westside-streamlit-secret`) — do not append to basketball-api-secrets - Hot-fix profile — minimal scope, no refactoring, no cleanup of unrelated drift - Do NOT change deployment-patch.yaml's strategy, replicas, resources, or any other field — only add the one env var entry ### Related - `pal-e-platform` — project tracking (this is platform-level deploy infra) - `westside-basketball` — affected product (jersey System B production rollout still blocked by this) - `forgejo_admin/basketball-api#441` — sister bug 040 collision (fixed by PR #442) - `forgejo_admin/basketball-api#443` — sister bug 041 collision (fixed by PR #444) - PR #5 / #435 — origin of the env-var-requiring migration - `feedback_funnel_requires_auth.md` — adjacent secrets-management theme - Process gap: PR template should require "if your migration reads an env var, the same PR must update the kustomize overlay to inject it" — separate follow-up
Author
Contributor

Scope Review: APPROVED

Review note: review-967-2026-04-10

Emergency hot-fix scope is tight and correct. All file targets verified on disk, SOPS age recipient derivable from harbor-creds.enc.yaml, story:WS-S31 verified on project-westside-basketball, WESTSIDE_STREAMLIT_RO_PASSWORD confirmed absent from deployment-patch.yaml, password source file exists at ~/secrets/pal-e-services/westside-streamlit.env. 1 new file + 2 modified files, single-directory, well under the 5-minute rule. No decomposition needed. Ready for dev dispatch — advance backlog → todo → next_up.

## Scope Review: APPROVED Review note: `review-967-2026-04-10` Emergency hot-fix scope is tight and correct. All file targets verified on disk, SOPS age recipient derivable from `harbor-creds.enc.yaml`, story:WS-S31 verified on project-westside-basketball, `WESTSIDE_STREAMLIT_RO_PASSWORD` confirmed absent from `deployment-patch.yaml`, password source file exists at `~/secrets/pal-e-services/westside-streamlit.env`. 1 new file + 2 modified files, single-directory, well under the 5-minute rule. No decomposition needed. Ready for dev dispatch — advance backlog → todo → next_up.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/basketball-api#449
No description provided.