Stripe Checkout Session links expire in 24h — email blasts lose 78% of payers #488

Closed
opened 2026-04-17 16:32:59 +00:00 by forgejo_admin · 1 comment

Type

Bug

Lineage

Standalone — discovered 2026-04-17 investigating stranded Utah Invitational orders (#486 for recovery). Root cause of the 78% blast failure rate. Affects every Stripe checkout link in basketball-api.

Refinement applied after /review-ticket pass 1 (review note review-1023-2026-04-17):

  • Actual call-site count is 8, not 6 (added routes/checkout.py shared helper and routes/admin.py regenerate handler).
  • Line numbers corrected against origin/main @ a4047d2.
  • Runbook scope expanded to include the sanctioned-call-sites table (lines 23–29).
  • CHECKOUT_SESSION_TTL_SECONDS constant made a required element, not optional.

Repo

forgejo_admin/basketball-api

What Broke

Every stripe.checkout.Session.create call in the codebase omits the expires_at parameter. Stripe defaults expires_at = created + 86400s (24 hours). All email-delivered checkout links die 24h after mint, regardless of whether the parent has opened the email.

Evidence gathered via Stripe API retrieve on 2026-04-17:

  • 18 Utah Invitational pending orders → all status=expired (created 2026-04-12 or 2026-04-15, TTL=86400s exactly)
  • 5 Utah Invitational paid orders → all status=complete (clicked within the 24h window)
  • Monthly-fee cohort: 13 "canceled" orders are retry artifacts. Parents 111, 118, 127 each had 2–4 canceled sessions before paying; canceled = "session expired, superseded by a fresh order."

Stranded revenue right now: $985 tournament + $610 monthly = $1,595. Trust damage unquantified but substantial (parent Daniel Niyitanga reported the link as "broken," the first complaint to surface this).

Affected call sites — all 8 (verified against origin/main @ a4047d2):

  • src/basketball_api/services/tournament_checkout.py:78 — blessed tournament helper
  • src/basketball_api/routes/jersey.py:291 — legacy jersey purchase
  • src/basketball_api/routes/checkout.py:247 — generic /create-session endpoint
  • src/basketball_api/routes/checkout.py:410 — first monthly payment (prorated)
  • src/basketball_api/routes/checkout.py:553 — shared-metadata helper path
  • src/basketball_api/routes/register.py:1378 — tryout registration card path
  • src/basketball_api/routes/register.py:1423 — tryout registration fallback path
  • src/basketball_api/routes/admin.py:2005 — admin "regenerate session" handler

Repro Steps

  1. Call any stripe.checkout.Session.create(...) site without passing expires_at
  2. Wait >24 hours
  3. Load the resulting session URL in a browser
  4. Observe: Stripe shows "This Checkout Session has expired"

Or reproduce via Stripe API:

stripe.checkout.Session.retrieve('cs_live_a1o5wZCQD5p6kjUeS7u5TL1eyuBDM3ohMjpNyHvPOlXbSvyFuFi1q2poAH')
# status='expired', url=None, expires_at - created = 86400s

Expected Behavior

Checkout sessions minted for email blasts should remain valid long enough for real parent open/click latency — at minimum 30 days (Stripe's maximum expires_at window for payment mode). Emitting session URLs into emails should not silently fail after 24h.

Fix approach — required pattern:

Introduce a module-level constant in a shared location (recommend src/basketball_api/config.py or src/basketball_api/constants.py — pick whichever already hosts similar constants):

CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600  # Stripe max for payment mode (2,592,000s)

Import and use at each of the 8 call sites:

import time
from basketball_api.config import CHECKOUT_SESSION_TTL_SECONDS
...
stripe.checkout.Session.create(
    ...
    expires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS,
    ...
)

The constant is required (not optional). It makes #489's future cutover to Payment Links a single-line change and prevents the "new call site forgets to set expires_at" regression.

Runbook updates (docs/tournament-billing-runbook.md):

  1. "What broke" section (lines 37-52): correct the narrative — the real root cause was TTL expiry, not missing order_id metadata. Every verified session (paid and pending) in the Apr 2026 incident carried proper order_id. Move the metadata concern to a "Historical hypothesis (corrected)" subsection.
  2. Sanctioned-call-sites table (lines 23-29): update to reflect the 8 actual call sites with correct line numbers. The current table has 5 entries and 4 of those line numbers are stale.

Environment

  • Cluster/namespace: prod, basketball-api
  • Service version/commit: current main as of 2026-04-17 (HEAD includes fd081e0 fix: require amount match when reusing fresh pending Stripe session (#480) — review pinned against a4047d2)
  • Related alerts: none fired (the observability gap is tracked in siblings #487 + pal-e-platform #295)
  • Affected prod data: 17 tournament orders + 6 monthly orders currently have expired sessions (excludes order 93 Westside Admin test)

Acceptance Criteria

  • Module-level CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600 constant declared in a shared config/constants module
  • All 8 call sites pass expires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS
  • Regression test: tests/test_checkout_session_ttl.py parametrizes across all 8 routes/helpers and asserts each passes expires_at within 29–30 days of now
  • Test file does not yet exist on main — safe to create
  • pytest tests/ green in CI
  • ruff check clean
  • docs/tournament-billing-runbook.md "What broke" narrative updated: TTL is the documented primary root cause; metadata hypothesis demoted to historical subsection
  • docs/tournament-billing-runbook.md sanctioned-call-sites table updated: all 8 sites listed, correct line numbers, replaces the stale 5-entry table
  • No regression in non-tournament flows (jersey, tryout registration, generic checkout, first monthly, admin regen) — confirmed via existing tests
  • Bug no longer reproduces: newly minted session has expires_at - created ≈ 2592000s
  • project-pal-e-platform
  • forgejo_admin/basketball-api #486 — recovery ticket for the 17 stranded orders. Blocked by this patch.
  • forgejo_admin/basketball-api #487 — expired-session metric (observability follow-up)
  • forgejo_admin/pal-e-platform #295 — alert rule consuming that metric
  • forgejo_admin/basketball-api #489 — spike on Payment Links vs lazy-mint (independent; this patch is the stopgap regardless of spike outcome)
### Type Bug ### Lineage Standalone — discovered 2026-04-17 investigating stranded Utah Invitational orders (#486 for recovery). Root cause of the 78% blast failure rate. Affects every Stripe checkout link in basketball-api. Refinement applied after `/review-ticket` pass 1 (review note `review-1023-2026-04-17`): - Actual call-site count is **8**, not 6 (added `routes/checkout.py` shared helper and `routes/admin.py` regenerate handler). - Line numbers corrected against `origin/main @ a4047d2`. - Runbook scope expanded to include the sanctioned-call-sites table (lines 23–29). - `CHECKOUT_SESSION_TTL_SECONDS` constant made a required element, not optional. ### Repo `forgejo_admin/basketball-api` ### What Broke Every `stripe.checkout.Session.create` call in the codebase omits the `expires_at` parameter. Stripe defaults `expires_at = created + 86400s` (24 hours). All email-delivered checkout links die 24h after mint, regardless of whether the parent has opened the email. Evidence gathered via Stripe API retrieve on 2026-04-17: - 18 Utah Invitational `pending` orders → all `status=expired` (created 2026-04-12 or 2026-04-15, TTL=86400s exactly) - 5 Utah Invitational `paid` orders → all `status=complete` (clicked within the 24h window) - Monthly-fee cohort: 13 "canceled" orders are retry artifacts. Parents 111, 118, 127 each had 2–4 canceled sessions before paying; canceled = "session expired, superseded by a fresh order." Stranded revenue right now: $985 tournament + $610 monthly = $1,595. Trust damage unquantified but substantial (parent Daniel Niyitanga reported the link as "broken," the first complaint to surface this). **Affected call sites — all 8** (verified against `origin/main @ a4047d2`): - `src/basketball_api/services/tournament_checkout.py:78` — blessed tournament helper - `src/basketball_api/routes/jersey.py:291` — legacy jersey purchase - `src/basketball_api/routes/checkout.py:247` — generic `/create-session` endpoint - `src/basketball_api/routes/checkout.py:410` — first monthly payment (prorated) - `src/basketball_api/routes/checkout.py:553` — shared-metadata helper path - `src/basketball_api/routes/register.py:1378` — tryout registration card path - `src/basketball_api/routes/register.py:1423` — tryout registration fallback path - `src/basketball_api/routes/admin.py:2005` — admin "regenerate session" handler ### Repro Steps 1. Call any `stripe.checkout.Session.create(...)` site without passing `expires_at` 2. Wait >24 hours 3. Load the resulting session URL in a browser 4. Observe: Stripe shows "This Checkout Session has expired" Or reproduce via Stripe API: ``` stripe.checkout.Session.retrieve('cs_live_a1o5wZCQD5p6kjUeS7u5TL1eyuBDM3ohMjpNyHvPOlXbSvyFuFi1q2poAH') # status='expired', url=None, expires_at - created = 86400s ``` ### Expected Behavior Checkout sessions minted for email blasts should remain valid long enough for real parent open/click latency — at minimum 30 days (Stripe's maximum `expires_at` window for payment mode). Emitting session URLs into emails should not silently fail after 24h. **Fix approach — required pattern:** Introduce a module-level constant in a shared location (recommend `src/basketball_api/config.py` or `src/basketball_api/constants.py` — pick whichever already hosts similar constants): ```python CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600 # Stripe max for payment mode (2,592,000s) ``` Import and use at each of the 8 call sites: ```python import time from basketball_api.config import CHECKOUT_SESSION_TTL_SECONDS ... stripe.checkout.Session.create( ... expires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS, ... ) ``` The constant is required (not optional). It makes #489's future cutover to Payment Links a single-line change and prevents the "new call site forgets to set expires_at" regression. **Runbook updates** (`docs/tournament-billing-runbook.md`): 1. "What broke" section (lines 37-52): correct the narrative — the real root cause was TTL expiry, not missing `order_id` metadata. Every verified session (paid and pending) in the Apr 2026 incident carried proper `order_id`. Move the metadata concern to a "Historical hypothesis (corrected)" subsection. 2. Sanctioned-call-sites table (lines 23-29): update to reflect the 8 actual call sites with correct line numbers. The current table has 5 entries and 4 of those line numbers are stale. ### Environment - Cluster/namespace: prod, `basketball-api` - Service version/commit: current `main` as of 2026-04-17 (HEAD includes `fd081e0 fix: require amount match when reusing fresh pending Stripe session (#480)` — review pinned against `a4047d2`) - Related alerts: none fired (the observability gap is tracked in siblings #487 + pal-e-platform #295) - Affected prod data: 17 tournament orders + 6 monthly orders currently have expired sessions (excludes order 93 Westside Admin test) ### Acceptance Criteria - [ ] Module-level `CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600` constant declared in a shared config/constants module - [ ] All 8 call sites pass `expires_at=int(time.time()) + CHECKOUT_SESSION_TTL_SECONDS` - [ ] Regression test: `tests/test_checkout_session_ttl.py` parametrizes across all 8 routes/helpers and asserts each passes `expires_at` within 29–30 days of now - [ ] Test file does not yet exist on main — safe to create - [ ] `pytest tests/` green in CI - [ ] `ruff check` clean - [ ] `docs/tournament-billing-runbook.md` "What broke" narrative updated: TTL is the documented primary root cause; metadata hypothesis demoted to historical subsection - [ ] `docs/tournament-billing-runbook.md` sanctioned-call-sites table updated: all 8 sites listed, correct line numbers, replaces the stale 5-entry table - [ ] No regression in non-tournament flows (jersey, tryout registration, generic checkout, first monthly, admin regen) — confirmed via existing tests - [ ] Bug no longer reproduces: newly minted session has `expires_at - created ≈ 2592000s` ### Related - `project-pal-e-platform` - `forgejo_admin/basketball-api #486` — recovery ticket for the 17 stranded orders. Blocked by this patch. - `forgejo_admin/basketball-api #487` — expired-session metric (observability follow-up) - `forgejo_admin/pal-e-platform #295` — alert rule consuming that metric - `forgejo_admin/basketball-api #489` — spike on Payment Links vs lazy-mint (independent; this patch is the stopgap regardless of spike outcome)
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-1023-2026-04-17

Audited against origin/main @ a4047d2. Scope is solid in intent and the runbook critique is correct, but the call-site inventory is wrong.

Blocking [BODY] refinements:

  • Count is off: grep of src/ finds 8 stripe.checkout.Session.create call sites, not 6. Missing from the ticket: src/basketball_api/routes/checkout.py:553 and src/basketball_api/routes/admin.py:2005 (admin "regenerate session" handler). Title, "Affected call sites (all 6)" header, AC-1, and AC-2 all need "6" → "8".
  • 4 of the 6 listed line numbers are stale on current main:
    • routes/checkout.py:246 → actual call at 247
    • routes/checkout.py:398 → actual call at 410 (line 398 is order = Order(...))
    • routes/register.py:1373 → actual call at 1378
    • routes/register.py:1418 → actual call at 1423
  • AC-5 should explicitly call out that docs/tournament-billing-runbook.md's sanctioned-call-sites table (lines 23-29) also needs a rewrite — it shares the same stale line numbers and omits checkout.py:553 + admin.py:2005. Not just the "What broke" narrative.
  • (Optional hardening) Add AC: single CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600 constant in one module, imported at all 8 sites. Makes the #489 spike's cutover a one-line change.

Non-blocking [SCOPE] items (can ship as follow-up board items, do not hold this Bug):

  • No arch-stripe-checkout note exists in pal-e-docs. Worth creating — siblings #486 and #487 both need it too.
  • story:payment-reliability label does not match project-westside-basketball's WS-S{N} taxonomy (WS-S11 is the nearest existing fit). Same drift affects siblings #486 (story:payment-recovery) and #487 (story:observability). Project-wide reconciliation needed.

Correctly ordered vs. sibling #486: Ticket must merge + deploy before #486's regen script runs, or the recovery sessions inherit 24h TTL instead of 30d. Called out in dependencies section of the review note.

Stripe TTL value confirmed: 30 × 24 × 3600 = 2,592,000s, which is Stripe's published maximum for payment mode. Correct.

Decomposition: Not needed. 8 mechanical edits + 1 runbook + 1 new test file, single invariant, ~5 min Dev agent pass.

Full audit in review-1023-2026-04-17 (pal-e-docs).

## Scope Review: NEEDS_REFINEMENT Review note: `review-1023-2026-04-17` Audited against `origin/main @ a4047d2`. Scope is solid in intent and the runbook critique is correct, but the call-site inventory is wrong. **Blocking [BODY] refinements:** - Count is off: grep of `src/` finds **8** `stripe.checkout.Session.create` call sites, not 6. Missing from the ticket: `src/basketball_api/routes/checkout.py:553` and `src/basketball_api/routes/admin.py:2005` (admin "regenerate session" handler). Title, "Affected call sites (all 6)" header, AC-1, and AC-2 all need "6" → "8". - 4 of the 6 listed line numbers are stale on current main: - `routes/checkout.py:246` → actual call at **247** - `routes/checkout.py:398` → actual call at **410** (line 398 is `order = Order(...)`) - `routes/register.py:1373` → actual call at **1378** - `routes/register.py:1418` → actual call at **1423** - AC-5 should explicitly call out that `docs/tournament-billing-runbook.md`'s sanctioned-call-sites table (lines 23-29) also needs a rewrite — it shares the same stale line numbers and omits `checkout.py:553` + `admin.py:2005`. Not just the "What broke" narrative. - (Optional hardening) Add AC: single `CHECKOUT_SESSION_TTL_SECONDS = 30 * 24 * 3600` constant in one module, imported at all 8 sites. Makes the #489 spike's cutover a one-line change. **Non-blocking [SCOPE] items** (can ship as follow-up board items, do not hold this Bug): - No `arch-stripe-checkout` note exists in pal-e-docs. Worth creating — siblings #486 and #487 both need it too. - `story:payment-reliability` label does not match project-westside-basketball's `WS-S{N}` taxonomy (WS-S11 is the nearest existing fit). Same drift affects siblings #486 (`story:payment-recovery`) and #487 (`story:observability`). Project-wide reconciliation needed. **Correctly ordered vs. sibling #486:** Ticket must merge + deploy before #486's regen script runs, or the recovery sessions inherit 24h TTL instead of 30d. Called out in dependencies section of the review note. **Stripe TTL value confirmed:** 30 × 24 × 3600 = 2,592,000s, which is Stripe's published maximum for payment mode. Correct. **Decomposition:** Not needed. 8 mechanical edits + 1 runbook + 1 new test file, single invariant, ~5 min Dev agent pass. Full audit in `review-1023-2026-04-17` (pal-e-docs).
forgejo_admin 2026-04-17 19:50:12 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#488
No description provided.