Gmail OAuth auto-reauth cron not running — westside email broken 2+ days #322

Open
opened 2026-05-02 14:50:31 +00:00 by forgejo_admin · 0 comments
Contributor

Type

Bug

Lineage

Standalone — discovered 2026-05-01 during alert-state audit. Related to forgejo_admin/pal-e-platform #222 (the PR that was supposed to ship this automation).

Repo

forgejo_admin/pal-e-platform

What Broke

Westside email delivery is broken because the Gmail OAuth token for westsidebasketball@gmail.com is 52 days old (Google policy: tokens for unverified apps expire after 7 days). Two alerts firing:

  • GmailOAuthTokenExpired (critical, namespace=basketball-api) since 2026-04-29 01:57 UTC
  • GmailOAuthTokenExpiringSoon (warning, same namespace) since 2026-04-29 01:57 UTC

The auto-reauth automation that PR #222 was supposed to introduce does not exist in the cluster. Verified via:

$ kubectl get cronjob -A | grep -iE 'gmail|oauth|reauth'
(no output)

Only scripts/gmail-reauth.sh exists — that's a manual operator script, not a CronJob.

Repro Steps

  1. kubectl get secret -n basketball-api gmail-oauth-token -o jsonpath='{.metadata.creationTimestamp}'2026-03-10T04:48:21Z
  2. kubectl get cronjob -A | grep gmail → no results
  3. Trigger an outbound westside email (e.g. via basketball-api /test-email endpoint) → fails with 401/invalid_grant from Google

Expected Behavior

  • A CronJob runs the reauth flow on a daily schedule and refreshes the token before it expires (target: refresh on day 5–6 of token age).
  • The cron itself has its own alert (GmailOAuthReauthCronFailed) so silent failure is detected the next time it recurs.
  • GmailOAuthTokenExpired clears within 24h of the cron's first successful run.

Environment

  • Cluster: pal-e (k3s), namespace basketball-api
  • Service: basketball-api 6fd588f9f8-jcknx
  • Secret: gmail-oauth-token, created 2026-03-10
  • Related alerts: GmailOAuthTokenExpired, GmailOAuthTokenExpiringSoon
  • Memory: per feedback_gmail_oauth_testing_mode, root-cause permanent fix is to publish the Google app from Testing → Production mode

Acceptance Criteria

  • CronJob exists in cluster running the reauth flow on schedule
  • CronJob has its own alert GmailOAuthReauthCronFailed
  • Token refreshes; GmailOAuthTokenExpired clears
  • Test send via basketball-api → westsidebasketball@gmail.com succeeds
  • sop-gmail-oauth updated with the cron's identity, schedule, and verification command
  • No regression to existing manual scripts/gmail-reauth.sh (still works as fallback)
  • pal-e-platform — project
  • forgejo_admin/pal-e-platform #222 — PR that should have shipped this
  • alert-report-2026-05-01 — alert snapshot identifying this as P1
  • sop-gmail-oauth — runbook
### Type Bug ### Lineage Standalone — discovered 2026-05-01 during alert-state audit. Related to `forgejo_admin/pal-e-platform #222` (the PR that was supposed to ship this automation). ### Repo `forgejo_admin/pal-e-platform` ### What Broke Westside email delivery is broken because the Gmail OAuth token for `westsidebasketball@gmail.com` is 52 days old (Google policy: tokens for unverified apps expire after 7 days). Two alerts firing: - `GmailOAuthTokenExpired` (critical, namespace=basketball-api) since 2026-04-29 01:57 UTC - `GmailOAuthTokenExpiringSoon` (warning, same namespace) since 2026-04-29 01:57 UTC The auto-reauth automation that PR #222 was supposed to introduce **does not exist in the cluster**. Verified via: ``` $ kubectl get cronjob -A | grep -iE 'gmail|oauth|reauth' (no output) ``` Only `scripts/gmail-reauth.sh` exists — that's a manual operator script, not a CronJob. ### Repro Steps 1. `kubectl get secret -n basketball-api gmail-oauth-token -o jsonpath='{.metadata.creationTimestamp}'` → `2026-03-10T04:48:21Z` 2. `kubectl get cronjob -A | grep gmail` → no results 3. Trigger an outbound westside email (e.g. via basketball-api `/test-email` endpoint) → fails with 401/invalid_grant from Google ### Expected Behavior - A CronJob runs the reauth flow on a daily schedule and refreshes the token before it expires (target: refresh on day 5–6 of token age). - The cron itself has its own alert (`GmailOAuthReauthCronFailed`) so silent failure is detected the next time it recurs. - `GmailOAuthTokenExpired` clears within 24h of the cron's first successful run. ### Environment - Cluster: pal-e (k3s), namespace `basketball-api` - Service: basketball-api `6fd588f9f8-jcknx` - Secret: `gmail-oauth-token`, created 2026-03-10 - Related alerts: `GmailOAuthTokenExpired`, `GmailOAuthTokenExpiringSoon` - Memory: per `feedback_gmail_oauth_testing_mode`, root-cause permanent fix is to publish the Google app from Testing → Production mode ### Acceptance Criteria - [ ] CronJob exists in cluster running the reauth flow on schedule - [ ] CronJob has its own alert `GmailOAuthReauthCronFailed` - [ ] Token refreshes; `GmailOAuthTokenExpired` clears - [ ] Test send via basketball-api → `westsidebasketball@gmail.com` succeeds - [ ] `sop-gmail-oauth` updated with the cron's identity, schedule, and verification command - [ ] No regression to existing manual `scripts/gmail-reauth.sh` (still works as fallback) ### Related - `pal-e-platform` — project - `forgejo_admin/pal-e-platform #222` — PR that should have shipped this - `alert-report-2026-05-01` — alert snapshot identifying this as P1 - `sop-gmail-oauth` — runbook
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#322
No description provided.