revert: 30-day expires_at on Stripe Checkout Sessions (#490) #495

Merged
forgejo_admin merged 1 commit from 493-revert-490-ttl-code into main 2026-04-17 22:23:23 +00:00

Closes #493

Summary

Reverts PR #490 (squash commit 223c5da) which added expires_at = now + 30 days to all 8 Stripe Checkout Session call sites. Stripe caps expires_at at 24 hours, so every checkout flow returned InvalidRequestError in prod for ~60 minutes on 2026-04-17. The deployment overlay was already rolled back via pal-e-deployments#131; this PR removes the broken code from basketball-api main so the next merge does not rebuild and re-deploy the broken image. Scope is the revert only — re-implementation is tracked under spike #489 and follow-up #494.

Changes

  • Reverts squash merge 223c5da ("fix: set 30-day expires_at on all Stripe Checkout Sessions (#488) (#490)") via git revert --no-edit
  • Removes CHECKOUT_SESSION_TTL_SECONDS from src/basketball_api/config.py
  • Drops expires_at= kwarg from all 8 stripe.checkout.Session.create(...) call sites in routes/admin.py, routes/checkout.py, routes/jersey.py, routes/register.py, services/tournament_checkout.py
  • Deletes tests/test_checkout_session_ttl.py (536 lines of mock-based assertions that gave false confidence — mocks do not enforce Stripe's server-side 24h cap)
  • Reverts docs/tournament-billing-runbook.md TTL section
  • Net diff: 8 files, +24 / -668

Test Plan

  • ruff check . passes
  • ruff format --check . passes (123 files)
  • Full test suite: 1025 passed. The 6 failures observed (test_jersey_reminder.py x4, test_westside_streamlit_ro_role.py::test_migration_file_exists, test_first_payment_email.py::test_send_first_payment_email) also fail on main at 223c5da — confirmed pre-existing, not introduced by this revert
  • git diff main...HEAD confirms every +line in 223c5da is removed and every -line is restored
  • CI green on PR
  • Post-merge: update-kustomize-tag step in pal-e-deployments bumps to the new SHA; verify deployed pod no longer has CHECKOUT_SESSION_TTL_SECONDS set and checkout flows work against live Stripe

Review Checklist

  • Revert is code-only — no deployment-layer changes (pal-e-deployments#131 already restored prod at the overlay)
  • No re-implementation of the fix in this PR (spike #489 owns the architecture)
  • #488 closure is acknowledged as misleading; reopening or supersession is out-of-scope here and tracked in #493's acceptance criteria
  • Once merged, the next image build from main will no longer contain the broken expires_at code, so the update-kustomize-tag step can no longer overwrite the revert in pal-e-deployments
  • forgejo_admin/basketball-api#488 — original ticket (closure misleading; root problem unsolved)
  • forgejo_admin/basketball-api#489 — architecture spike (Payment Links vs lazy-mint); load-bearing
  • forgejo_admin/basketball-api#490 — the PR being reverted (squash commit 223c5da)
  • forgejo_admin/basketball-api#493 — this revert ticket
  • forgejo_admin/basketball-api#494 — follow-up re-implementation (scope belongs there, not here)
  • forgejo_admin/pal-e-deployments#130, #131 — deployment-layer revert that restored prod
  • Memory: feedback_retrieve_before_theorize.md — the missing sanity check that allowed a mock-only test to gate the merge
Closes #493 ## Summary Reverts PR #490 (squash commit 223c5da) which added `expires_at = now + 30 days` to all 8 Stripe Checkout Session call sites. Stripe caps `expires_at` at 24 hours, so every checkout flow returned `InvalidRequestError` in prod for ~60 minutes on 2026-04-17. The deployment overlay was already rolled back via pal-e-deployments#131; this PR removes the broken code from basketball-api main so the next merge does not rebuild and re-deploy the broken image. Scope is the revert only — re-implementation is tracked under spike #489 and follow-up #494. ## Changes - Reverts squash merge 223c5da ("fix: set 30-day expires_at on all Stripe Checkout Sessions (#488) (#490)") via `git revert --no-edit` - Removes `CHECKOUT_SESSION_TTL_SECONDS` from `src/basketball_api/config.py` - Drops `expires_at=` kwarg from all 8 `stripe.checkout.Session.create(...)` call sites in `routes/admin.py`, `routes/checkout.py`, `routes/jersey.py`, `routes/register.py`, `services/tournament_checkout.py` - Deletes `tests/test_checkout_session_ttl.py` (536 lines of mock-based assertions that gave false confidence — mocks do not enforce Stripe's server-side 24h cap) - Reverts `docs/tournament-billing-runbook.md` TTL section - Net diff: 8 files, +24 / -668 ## Test Plan - [x] `ruff check .` passes - [x] `ruff format --check .` passes (123 files) - [x] Full test suite: 1025 passed. The 6 failures observed (`test_jersey_reminder.py` x4, `test_westside_streamlit_ro_role.py::test_migration_file_exists`, `test_first_payment_email.py::test_send_first_payment_email`) also fail on `main` at 223c5da — confirmed pre-existing, not introduced by this revert - [x] `git diff main...HEAD` confirms every +line in 223c5da is removed and every -line is restored - [ ] CI green on PR - [ ] Post-merge: `update-kustomize-tag` step in `pal-e-deployments` bumps to the new SHA; verify deployed pod no longer has `CHECKOUT_SESSION_TTL_SECONDS` set and checkout flows work against live Stripe ## Review Checklist - [ ] Revert is code-only — no deployment-layer changes (pal-e-deployments#131 already restored prod at the overlay) - [ ] No re-implementation of the fix in this PR (spike #489 owns the architecture) - [ ] #488 closure is acknowledged as misleading; reopening or supersession is out-of-scope here and tracked in #493's acceptance criteria - [ ] Once merged, the next image build from main will no longer contain the broken `expires_at` code, so the update-kustomize-tag step can no longer overwrite the revert in pal-e-deployments ## Related Notes - `forgejo_admin/basketball-api#488` — original ticket (closure misleading; root problem unsolved) - `forgejo_admin/basketball-api#489` — architecture spike (Payment Links vs lazy-mint); load-bearing - `forgejo_admin/basketball-api#490` — the PR being reverted (squash commit 223c5da) - `forgejo_admin/basketball-api#493` — this revert ticket - `forgejo_admin/basketball-api#494` — follow-up re-implementation (scope belongs there, not here) - `forgejo_admin/pal-e-deployments#130, #131` — deployment-layer revert that restored prod - Memory: `feedback_retrieve_before_theorize.md` — the missing sanity check that allowed a mock-only test to gate the merge
Revert "fix: set 30-day expires_at on all Stripe Checkout Sessions (#488) (#490)"
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
7c9f5c2c23
This reverts commit 223c5da173.
Author
Owner

PR #495 Review

DOMAIN REVIEW

Python / FastAPI / Stripe. Pure revert of squash commit 223c5da (PR #490). Verified against the expected inverse:

  • All 8 call sites restored cleanlyexpires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS removed from:
    • services/tournament_checkout.py:95 (blessed helper)
    • routes/jersey.py:309
    • routes/checkout.py:264, 428, 566
    • routes/register.py:1397, 1443
    • routes/admin.py:2025
  • import time removed from all 5 modules that added it for #490 (admin.py, checkout.py, jersey.py, register.py, services/tournament_checkout.py). No dangling imports — confirmed by checking local basketball-api checkout at main pre-#490: none of these modules use time for anything else (the one time reference in checkout.py:364 is a comment, not a symbol).
  • CHECKOUT_SESSION_TTL_SECONDS constant deleted from src/basketball_api/config.py along with its 15-line docblock. All 5 import sites also cleaned up.
  • tests/test_checkout_session_ttl.py fully deleted (536 lines). This was the right call — Ava-tier lesson: mock-based assertions against Stripe-server-side constraints give false confidence.
  • Runbook reverted to pre-#488 narrative: restores the "5 other sanctioned call sites" table (pre-#488 had 5 non-tournament sites; #490 expanded to 7 to include the helper + admin recovery), removes the "Checkout session TTL" section, removes the "Historical hypothesis (corrected)" subsection, and restores the original metadata/order_id root-cause narrative for Utah Invitational. Header reverts issues #484 + #488 back to issue #484. Exact textual inverse of #490's runbook changes.
  • Net diff is clean: 8 files, +24 / -668 = exact inverse of #490's +668 / -24.

No accidental extras. No re-implementation. No drive-by edits.

BLOCKERS

None.

NITS

None. This is the right shape for an emergency revert: scope-restricted to the inverse of the broken PR, re-implementation explicitly deferred to spike #489 and follow-up #494. PR body cleanly distinguishes "restore prod" (already done via pal-e-deployments#131) from "prevent rebuild-and-redeploy" (this PR's job).

SOP COMPLIANCE

  • Branch named after issue (493-revert-490-ttl-code)
  • PR body follows template (Summary / Changes / Test Plan / Related Notes)
  • Related section references parent ticket #493, the PR being reverted #490, the spike #489, the follow-up #494, the deployment-layer reverts pal-e-deployments#130/#131, and the memory lesson
  • No secrets committed
  • No deployment-layer changes (correctly scoped to code-only; overlay already reverted upstream)
  • Pre-existing test failures disclosed and verified against main@223c5da (not introduced by this revert)
  • ruff check + ruff format --check pass per PR body

PROCESS OBSERVATIONS

  • MTTR: deployment-layer revert restored prod (~60min outage closed via pal-e-deployments#131), code-layer revert prevents the next image build from re-deploying the broken code. This is the correct two-step pattern for an incident where the overlay is the fast path and the code is the durability path.
  • Change failure risk: near-zero. Inverse of a known diff. No novel logic.
  • Documentation: the follow-up spike #489 and re-implementation ticket #494 carry the unsolved-problem forward; the runbook's "what broke" narrative temporarily regresses to the less-accurate metadata-drop hypothesis, but this is intentional — the accurate TTL diagnosis lives in #489's architecture work and will be re-added when the real fix lands. Not a blocker; it's the correct trade-off for a clean revert.
  • Memory reference: feedback_retrieve_before_theorize.md is the right lesson to link. Mock tests against a Stripe server-side constraint (24h expires_at cap) gave false-positive coverage. Future regression tests for Stripe integration constraints need a real Stripe test-mode call or a typed SDK assertion, not MagicMock.call_args.

VERDICT: APPROVED

## PR #495 Review ### DOMAIN REVIEW Python / FastAPI / Stripe. Pure revert of squash commit 223c5da (PR #490). Verified against the expected inverse: - **All 8 call sites restored cleanly** — `expires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS` removed from: - `services/tournament_checkout.py:95` (blessed helper) - `routes/jersey.py:309` - `routes/checkout.py:264, 428, 566` - `routes/register.py:1397, 1443` - `routes/admin.py:2025` - **`import time` removed** from all 5 modules that added it for #490 (`admin.py`, `checkout.py`, `jersey.py`, `register.py`, `services/tournament_checkout.py`). No dangling imports — confirmed by checking local basketball-api checkout at main pre-#490: none of these modules use `time` for anything else (the one `time` reference in `checkout.py:364` is a comment, not a symbol). - **`CHECKOUT_SESSION_TTL_SECONDS` constant deleted** from `src/basketball_api/config.py` along with its 15-line docblock. All 5 import sites also cleaned up. - **`tests/test_checkout_session_ttl.py` fully deleted** (536 lines). This was the right call — Ava-tier lesson: mock-based assertions against Stripe-server-side constraints give false confidence. - **Runbook reverted to pre-#488 narrative**: restores the "5 other sanctioned call sites" table (pre-#488 had 5 non-tournament sites; #490 expanded to 7 to include the helper + admin recovery), removes the "Checkout session TTL" section, removes the "Historical hypothesis (corrected)" subsection, and restores the original metadata/order_id root-cause narrative for Utah Invitational. Header reverts `issues #484 + #488` back to `issue #484`. Exact textual inverse of #490's runbook changes. - **Net diff is clean**: 8 files, +24 / -668 = exact inverse of #490's +668 / -24. No accidental extras. No re-implementation. No drive-by edits. ### BLOCKERS None. ### NITS None. This is the right shape for an emergency revert: scope-restricted to the inverse of the broken PR, re-implementation explicitly deferred to spike #489 and follow-up #494. PR body cleanly distinguishes "restore prod" (already done via pal-e-deployments#131) from "prevent rebuild-and-redeploy" (this PR's job). ### SOP COMPLIANCE - [x] Branch named after issue (`493-revert-490-ttl-code`) - [x] PR body follows template (Summary / Changes / Test Plan / Related Notes) - [x] Related section references parent ticket #493, the PR being reverted #490, the spike #489, the follow-up #494, the deployment-layer reverts pal-e-deployments#130/#131, and the memory lesson - [x] No secrets committed - [x] No deployment-layer changes (correctly scoped to code-only; overlay already reverted upstream) - [x] Pre-existing test failures disclosed and verified against main@223c5da (not introduced by this revert) - [x] `ruff check` + `ruff format --check` pass per PR body ### PROCESS OBSERVATIONS - **MTTR**: deployment-layer revert restored prod (~60min outage closed via pal-e-deployments#131), code-layer revert prevents the next image build from re-deploying the broken code. This is the correct two-step pattern for an incident where the overlay is the fast path and the code is the durability path. - **Change failure risk**: near-zero. Inverse of a known diff. No novel logic. - **Documentation**: the follow-up spike #489 and re-implementation ticket #494 carry the unsolved-problem forward; the runbook's "what broke" narrative temporarily regresses to the less-accurate metadata-drop hypothesis, but this is intentional — the accurate TTL diagnosis lives in #489's architecture work and will be re-added when the real fix lands. Not a blocker; it's the correct trade-off for a clean revert. - **Memory reference**: `feedback_retrieve_before_theorize.md` is the right lesson to link. Mock tests against a Stripe server-side constraint (24h `expires_at` cap) gave false-positive coverage. Future regression tests for Stripe integration constraints need a real Stripe test-mode call or a typed SDK assertion, not `MagicMock.call_args`. ### VERDICT: APPROVED
forgejo_admin deleted branch 493-revert-490-ttl-code 2026-04-17 22:23:23 +00:00
Sign in to join this conversation.
No description provided.