Spike — right primitive for email-blast checkout: Payment Links vs lazy-mint vs 30d TTL #489

Open
opened 2026-04-17 16:33:27 +00:00 by forgejo_admin · 0 comments

Type

Spike

Lineage

Standalone — emerged from the 2026-04-17 Utah Invitational stranded-orders investigation. The 30-day-TTL patch (sibling Bug issue) stops the immediate bleed; this spike picks the permanent architecture so we don't hit a 30-day cliff on the next late-opening parent.

Repo

forgejo_admin/basketball-api (primary — this is where the decision will be implemented; touches westside-app only if lazy-mint wins)

Question

Which payment primitive should basketball-api use for email-blast checkout flows (tournament fees, monthly fees, future campaigns): Stripe Payment Links, lazy-mint /pay/{order_token}, or keep Checkout Sessions with 30d TTL?

What to Explore

  1. Stripe Payment Links (stripe.PaymentLink.create):

    • Do they actually never expire? Verify in Stripe docs + test with a live link aged past 30 days.
    • Can they carry per-order metadata (order_id, player_id, product_id) such that the current webhook handler at routes/webhooks.py::_handle_generic_order_completed still matches?
    • Do they support the same success/cancel URL redirect we use today?
    • Any gotchas with tax collection, subscription mode, or line-item pricing?
    • Stripe fee difference vs Checkout Session?
  2. Lazy-mint /pay/{order_token}:

    • Email carries an opaque per-order token (e.g. a signed JWT or a DB-backed token row).
    • On click, backend route mints a fresh stripe.checkout.Session with 24h TTL and redirects to session.url.
    • Session is always fresh; TTL becomes irrelevant.
    • Where does the route live? basketball-api (backend redirect) or westside-app (page that calls basketball-api)?
    • Failure modes: what does the parent see if our service is down when they click?
    • Token forgery: how do we bind token → order immutably?
  3. Keep Checkout Sessions with 30d TTL (the patch from the Bug issue):

    • Covers ~99% of real parent open latency.
    • Hard ceiling: parent opens email on day 31 → same broken experience Daniel had.
    • Is that acceptable given expected email re-engagement behavior?

For each option, evaluate on:

  • Parent UX: number of clicks, perceived reliability
  • Operational complexity: new routes/primitives to maintain
  • Fulfillment tracking: webhook match rate stays 100%?
  • Cost: Stripe fees + engineering maintenance
  • Failure modes: what breaks when backend/Stripe/network is down
  • Migration path: how do we cut over without regressing paid/pending orders

Reference existing code:

  • src/basketball_api/services/tournament_checkout.py — blessed helper
  • src/basketball_api/routes/webhooks.py::_handle_generic_order_completed — webhook matcher
  • docs/tournament-billing-runbook.md — current (incorrect) framing

Also consider the monthly-fee retry cohort (parents 111, 118, 127 each had 2–4 canceled sessions before succeeding). The right architecture eliminates those retries.

Success Criteria

  • ADR committed at docs/adr-payment-blast-pattern.md
  • All three options analyzed on each dimension above with evidence, not opinion
  • Decisive recommendation (no "it depends")
  • Migration checklist with specific file touches per affected call site
  • Follow-up Feature issue filed for the implementation (if recommendation is option 1 or 2)
  • Or: "no further action — keep 30d TTL" conclusion with reasoning documented
  • Lucas approves recommendation before any implementation tickets land

Time-box

Maximum time to spend: 1 session (~2-3 hours). If time-box expires without a decisive recommendation, close spike with documented findings and escalate to Lucas for direction. Rabbit-hole risk: getting lost comparing every Stripe product. Anchor on "what ships this quarter" when stuck.

  • project-pal-e-platform
  • forgejo_admin/basketball-api #486 — stranded-orders recovery (the incident that surfaced this)
  • forgejo_admin/basketball-api #487 — expired-session metric
  • forgejo_admin/pal-e-platform #295 — alert rule
  • Bug issue on the 30-day expires_at patch (filed in parallel — the patch proceeds regardless of this spike's outcome)
### Type Spike ### Lineage Standalone — emerged from the 2026-04-17 Utah Invitational stranded-orders investigation. The 30-day-TTL patch (sibling Bug issue) stops the immediate bleed; this spike picks the permanent architecture so we don't hit a 30-day cliff on the next late-opening parent. ### Repo `forgejo_admin/basketball-api` (primary — this is where the decision will be implemented; touches westside-app only if lazy-mint wins) ### Question Which payment primitive should basketball-api use for email-blast checkout flows (tournament fees, monthly fees, future campaigns): **Stripe Payment Links**, **lazy-mint `/pay/{order_token}`**, or **keep Checkout Sessions with 30d TTL**? ### What to Explore 1. **Stripe Payment Links** (`stripe.PaymentLink.create`): - Do they actually never expire? Verify in Stripe docs + test with a live link aged past 30 days. - Can they carry per-order metadata (`order_id`, `player_id`, `product_id`) such that the current webhook handler at `routes/webhooks.py::_handle_generic_order_completed` still matches? - Do they support the same success/cancel URL redirect we use today? - Any gotchas with tax collection, subscription mode, or line-item pricing? - Stripe fee difference vs Checkout Session? 2. **Lazy-mint `/pay/{order_token}`:** - Email carries an opaque per-order token (e.g. a signed JWT or a DB-backed token row). - On click, backend route mints a fresh `stripe.checkout.Session` with 24h TTL and redirects to `session.url`. - Session is always fresh; TTL becomes irrelevant. - Where does the route live? basketball-api (backend redirect) or westside-app (page that calls basketball-api)? - Failure modes: what does the parent see if our service is down when they click? - Token forgery: how do we bind token → order immutably? 3. **Keep Checkout Sessions with 30d TTL** (the patch from the Bug issue): - Covers ~99% of real parent open latency. - Hard ceiling: parent opens email on day 31 → same broken experience Daniel had. - Is that acceptable given expected email re-engagement behavior? For each option, evaluate on: - **Parent UX:** number of clicks, perceived reliability - **Operational complexity:** new routes/primitives to maintain - **Fulfillment tracking:** webhook match rate stays 100%? - **Cost:** Stripe fees + engineering maintenance - **Failure modes:** what breaks when backend/Stripe/network is down - **Migration path:** how do we cut over without regressing paid/pending orders Reference existing code: - `src/basketball_api/services/tournament_checkout.py` — blessed helper - `src/basketball_api/routes/webhooks.py::_handle_generic_order_completed` — webhook matcher - `docs/tournament-billing-runbook.md` — current (incorrect) framing Also consider the monthly-fee retry cohort (parents 111, 118, 127 each had 2–4 canceled sessions before succeeding). The right architecture eliminates those retries. ### Success Criteria - [ ] ADR committed at `docs/adr-payment-blast-pattern.md` - [ ] All three options analyzed on each dimension above with evidence, not opinion - [ ] Decisive recommendation (no "it depends") - [ ] Migration checklist with specific file touches per affected call site - [ ] Follow-up Feature issue filed for the implementation (if recommendation is option 1 or 2) - [ ] Or: "no further action — keep 30d TTL" conclusion with reasoning documented - [ ] Lucas approves recommendation before any implementation tickets land ### Time-box Maximum time to spend: **1 session (~2-3 hours)**. If time-box expires without a decisive recommendation, close spike with documented findings and escalate to Lucas for direction. Rabbit-hole risk: getting lost comparing every Stripe product. Anchor on "what ships this quarter" when stuck. ### Related - `project-pal-e-platform` - `forgejo_admin/basketball-api #486` — stranded-orders recovery (the incident that surfaced this) - `forgejo_admin/basketball-api #487` — expired-session metric - `forgejo_admin/pal-e-platform #295` — alert rule - Bug issue on the 30-day `expires_at` patch (filed in parallel — the patch proceeds regardless of this spike's outcome)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#489
No description provided.