Stripe webhooks unreachable — basketball-api Tailscale Funnel not routing public traffic #343

Closed
opened 2026-04-05 18:51:48 +00:00 by forgejo_admin · 2 comments

Type

Bug

Lineage

Discovered while investigating forgejo_admin/basketball-api#340 (jersey "load failed"). The original issue was transient, but investigation revealed this systemic webhook delivery failure.

Repo

ldraney/pal-e-services (funnel infra) and forgejo_admin/basketball-api (webhook endpoint)

What Broke

The Tailscale Funnel for basketball-api.tail5b443a.ts.net exists in k8s (basketball-api-funnel ingress with tailscale.com/funnel: "true"), but public internet traffic does not reach the endpoint. The hostname resolves to 100.117.254.123 (Tailscale internal IP) from within the Tailnet, but Stripe's servers on the public internet cannot connect.

Every checkout.session.completed event since ~March 25 has pending_webhooks=1. Parents pay via Stripe but the database never updates — jersey orders stuck at pending with no option/size/number recorded.

Confirmed affected: 6 jersey payments ($780 total) were manually synced on 2026-04-05 after discovery. 14 older tryout registration payments ($375+) from March 6-13 also unrecorded.

The webhook handler at POST /webhooks/stripe works correctly when reachable (returns 400 on missing signature, 200 with valid events). The problem is purely network/funnel routing.

Repro Steps

  1. From any machine NOT on the Tailnet, try to reach basketball-api.tail5b443a.ts.net
  2. DNS resolves to Tailscale internal IP — no public route
  3. Stripe webhook delivery fails (retries, then marks pending_webhooks=1)
  4. Verify: kubectl get ingress -n basketball-api basketball-api-funnel shows funnel exists with correct annotations

Expected Behavior

Public internet traffic should reach basketball-api.tail5b443a.ts.net/webhooks/stripe so Stripe can deliver webhook events. The funnel should route external requests through to the basketball-api k8s service, like westsidekingsandqueens.tail5b443a.ts.net does for the frontend.

Environment

  • Cluster/namespace: prod / basketball-api
  • Webhook endpoint: https://basketball-api.tail5b443a.ts.net/webhooks/stripe
  • Stripe webhook ID: we_1T9I5sR9SdzWqVXM1WBWMDBv
  • Funnel k8s ingress: basketball-api-funnel (exists, tailscale.com/funnel: "true")
  • Comparison: westsidekingsandqueens.tail5b443a.ts.net funnel works correctly

Investigation targets

  1. Tailscale operator logs — check if the operator is advertising the funnel route publicly
  2. Tailscale ACL nodeAttrs — verify funnel permission is granted for this hostname
  3. Operator pod statuskubectl logs -n tailscale deploy/operator for errors
  4. Compare working vs broken — diff the westsidekingsandqueens funnel config against basketball-api funnel config

Acceptance Criteria

  • Root cause identified (ACL, operator, or funnel config issue)
  • Public internet traffic reaches basketball-api.tail5b443a.ts.net
  • Stripe webhook test event delivered successfully to /webhooks/stripe
  • New jersey payment triggers checkout.session.completed with pending_webhooks=0
  • project-westside-basketball — project this affects
  • forgejo_admin/basketball-api#340 — original symptom (Daniel's "load failed")
  • Key files: pal-e-services/terraform/services.tf (~line 177, funnel config), pal-e-services/k3s.tfvars (~line 178), basketball-api/src/basketball_api/routes/webhooks.py (webhook handler)
### Type Bug ### Lineage Discovered while investigating forgejo_admin/basketball-api#340 (jersey "load failed"). The original issue was transient, but investigation revealed this systemic webhook delivery failure. ### Repo `ldraney/pal-e-services` (funnel infra) and `forgejo_admin/basketball-api` (webhook endpoint) ### What Broke The Tailscale Funnel for `basketball-api.tail5b443a.ts.net` exists in k8s (`basketball-api-funnel` ingress with `tailscale.com/funnel: "true"`), but **public internet traffic does not reach the endpoint**. The hostname resolves to `100.117.254.123` (Tailscale internal IP) from within the Tailnet, but Stripe's servers on the public internet cannot connect. Every `checkout.session.completed` event since ~March 25 has `pending_webhooks=1`. Parents pay via Stripe but the database never updates — jersey orders stuck at `pending` with no option/size/number recorded. **Confirmed affected:** 6 jersey payments ($780 total) were manually synced on 2026-04-05 after discovery. 14 older tryout registration payments ($375+) from March 6-13 also unrecorded. The webhook handler at `POST /webhooks/stripe` works correctly when reachable (returns 400 on missing signature, 200 with valid events). The problem is purely network/funnel routing. ### Repro Steps 1. From any machine NOT on the Tailnet, try to reach `basketball-api.tail5b443a.ts.net` 2. DNS resolves to Tailscale internal IP — no public route 3. Stripe webhook delivery fails (retries, then marks `pending_webhooks=1`) 4. Verify: `kubectl get ingress -n basketball-api basketball-api-funnel` shows funnel exists with correct annotations ### Expected Behavior Public internet traffic should reach `basketball-api.tail5b443a.ts.net/webhooks/stripe` so Stripe can deliver webhook events. The funnel should route external requests through to the basketball-api k8s service, like `westsidekingsandqueens.tail5b443a.ts.net` does for the frontend. ### Environment - Cluster/namespace: prod / basketball-api - Webhook endpoint: `https://basketball-api.tail5b443a.ts.net/webhooks/stripe` - Stripe webhook ID: `we_1T9I5sR9SdzWqVXM1WBWMDBv` - Funnel k8s ingress: `basketball-api-funnel` (exists, `tailscale.com/funnel: "true"`) - Comparison: `westsidekingsandqueens.tail5b443a.ts.net` funnel works correctly ### Investigation targets 1. **Tailscale operator logs** — check if the operator is advertising the funnel route publicly 2. **Tailscale ACL `nodeAttrs`** — verify funnel permission is granted for this hostname 3. **Operator pod status** — `kubectl logs -n tailscale deploy/operator` for errors 4. **Compare working vs broken** — diff the westsidekingsandqueens funnel config against basketball-api funnel config ### Acceptance Criteria - [ ] Root cause identified (ACL, operator, or funnel config issue) - [ ] Public internet traffic reaches `basketball-api.tail5b443a.ts.net` - [ ] Stripe webhook test event delivered successfully to `/webhooks/stripe` - [ ] New jersey payment triggers `checkout.session.completed` with `pending_webhooks=0` ### Related - `project-westside-basketball` — project this affects - `forgejo_admin/basketball-api#340` — original symptom (Daniel's "load failed") - Key files: `pal-e-services/terraform/services.tf` (~line 177, funnel config), `pal-e-services/k3s.tfvars` (~line 178), `basketball-api/src/basketball_api/routes/webhooks.py` (webhook handler)
Author
Owner

Scope Review: NEEDS_REFINEMENT

Review note: review-836-2026-04-04

The Tailscale Funnel for basketball-api already exists and has been deployed for 40 days. The k8s ingress basketball-api-funnel is present in the cluster with tailscale.com/funnel: "true" annotation and assigned hostname basketball-api.tail5b443a.ts.net. The issue's core assumption is incorrect.

Issues found:

  • Wrong file targets: terraform/modules/networking/main.tf (pal-e-platform) manages platform funnels. Service funnels live in pal-e-services/terraform/services.tf where basketball-api already has funnel = true.
  • Wrong repo: Issue filed on basketball-api but the fix will land in pal-e-platform or pal-e-services (infra repos).
  • AC1 already satisfied: "Tailscale Funnel configured" is true. Rewrite to: "Public internet traffic reaches basketball-api.tail5b443a.ts.net (verified via curl from off-tailnet host)."
  • AC4 is a data remediation task (Yussuf Duro id=116), not an infra fix. Split into separate ticket.
  • Missing arch note: No arch-basketball-api note exists in pal-e-docs.
  • Investigation pivot needed: The real question is why the existing funnel isn't serving public internet traffic (operator behavior, ACL permissions, Funnel vs Ingress semantics).
## Scope Review: NEEDS_REFINEMENT Review note: `review-836-2026-04-04` **The Tailscale Funnel for basketball-api already exists and has been deployed for 40 days.** The k8s ingress `basketball-api-funnel` is present in the cluster with `tailscale.com/funnel: "true"` annotation and assigned hostname `basketball-api.tail5b443a.ts.net`. The issue's core assumption is incorrect. Issues found: - **Wrong file targets**: `terraform/modules/networking/main.tf` (pal-e-platform) manages platform funnels. Service funnels live in `pal-e-services/terraform/services.tf` where basketball-api already has `funnel = true`. - **Wrong repo**: Issue filed on basketball-api but the fix will land in pal-e-platform or pal-e-services (infra repos). - **AC1 already satisfied**: "Tailscale Funnel configured" is true. Rewrite to: "Public internet traffic reaches basketball-api.tail5b443a.ts.net (verified via curl from off-tailnet host)." - **AC4 is a data remediation task** (Yussuf Duro id=116), not an infra fix. Split into separate ticket. - **Missing arch note**: No `arch-basketball-api` note exists in pal-e-docs. - **Investigation pivot needed**: The real question is why the existing funnel isn't serving public internet traffic (operator behavior, ACL permissions, Funnel vs Ingress semantics).
forgejo_admin changed title from Stripe webhooks unreachable — basketball-api needs Tailscale Funnel to Stripe webhooks unreachable — basketball-api Tailscale Funnel not routing public traffic 2026-04-05 18:56:56 +00:00
Author
Owner

Scope Review: APPROVED

Review note: review-836-2026-04-04-r2

Re-review after refinement — all 5 [BODY] recommendations from the previous review addressed. Problem statement correctly describes funnel-exists-but-unreachable. File targets verified against codebase. Repo placement corrected to identify both pal-e-services and basketball-api. AC1 rewritten to test actual reachability. Yussuf Duro data remediation split out.

Remaining discovered scope (non-blocking): arch-basketball-api architecture note does not exist in pal-e-docs.

## Scope Review: APPROVED Review note: `review-836-2026-04-04-r2` Re-review after refinement — all 5 [BODY] recommendations from the previous review addressed. Problem statement correctly describes funnel-exists-but-unreachable. File targets verified against codebase. Repo placement corrected to identify both pal-e-services and basketball-api. AC1 rewritten to test actual reachability. Yussuf Duro data remediation split out. Remaining discovered scope (non-blocking): `arch-basketball-api` architecture note does not exist in pal-e-docs.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#343
No description provided.