feat: Gmail OAuth token persistence in Postgres #163
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/basketball-api!163
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "130-spike-gmail-oauth-token-persistence"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Gmail OAuth tokens were stored on the filesystem via gmail-sdk, which breaks in k8s with read-only secret mounts. Access tokens expire ~60min and refreshed tokens were lost on pod restart, causing a ~29-hour undetected email outage (2026-03-19 to 2026-03-21). This adds a Postgres-backed token store that loads, refreshes, and persists tokens via an
oauth_tokenstable, eliminating the writable filesystem dependency.Changes
src/basketball_api/models.py-- AddedOAuthTokenmodel (provider, account, token_data JSONB, expires_at, timestamps) with unique constraint on (provider, account)src/basketball_api/services/token_store.py-- New Postgres-backed token store withload_token,save_token,get_access_token(refresh + persist flow), andseed_from_file(one-time migration)src/basketball_api/services/email.py-- Updatedget_gmail_client()to accept optionaldbparam; when provided, loads token from DB via token store with fallback to file-based. Updated all 7 email send functions to passdbthrough.src/basketball_api/routes/password_reset.py-- Passdbtosend_password_reset_emailcallalembic/env.py-- ImportOAuthTokenfor migration awarenessalembic/versions/019_add_oauth_tokens_table.py-- New migration creatingoauth_tokenstablescripts/seed_oauth_token.py-- New one-time seed script to import file-based tokens into DBtests/test_token_store.py-- New 16 tests covering save/load, upsert, refresh flow, expiry detection, refresh token preservation, seed from file, and get_gmail_client integrationTest Plan
Review Checklist
Spike findings
gmail-sdk token storage:
AuthMixin._load_and_refresh_token()reads from file, refreshes via Google if within 300s of expiry, writes back. No pluggable interface. Solution: bypass by passingaccess_token=directly toGmailClient().Schema: New
oauth_tokenstable rather than JSONB ontenants-- cleaner separation, supports multiple providers. No encryption at rest -- Postgres auth + k8s network policy matches current k8s secret mount security posture.Refresh flow: Proactive 5-min buffer, last-write-wins concurrency (single replica), graceful fallback on DB write failure.
Migration path: (1) Deploy this PR, (2) Run
python scripts/seed_oauth_token.py marcus, (3) Remove init container + volume mounts from pal-e-deployments.Cross-repo: Follow-up needed in
pal-e-deploymentsto removecopy-gmail-oauthinit container andgmail-oauth-tokensecret volume.Related
PR #163 Review
DOMAIN REVIEW
Tech stack: Python / FastAPI / SQLAlchemy / Alembic / httpx / gmail-sdk
This PR solves a real production incident (29-hour email outage from lost OAuth tokens on pod restart) with a clean Postgres-backed token store. The architecture is sound: bypass gmail-sdk's file-based storage by refreshing tokens in-process via Google's token endpoint and passing
access_token=directly toGmailClient().Positive observations:
OAuthTokenmodel is its own table rather than JSONB on tenants. Correct choice for multi-provider extensibility.db: Session | None = Noneonget_gmail_clientandsend_password_reset_emailmeans all existing call sites (tests, routes that don't passdb) continue working via file-based fallback.token_store.pycorrectly preserve the existing refresh_token when Google doesn't return a new one (standard Google behavior).get_gmail_clientintegration wiring.Domain-specific findings:
db.commit()+db.refresh()insave_tokenis the standard pattern. JSONB column typed properly withMapped[dict].token_store.pyis thorough.resp.raise_for_status()in_refresh_access_tokencorrectly bubbles HTTP errors. The caller inget_access_tokencatcheshttpx.HTTPStatusError, re-raises after logging.BLOCKERS
1. Migration chain conflict (BLOCKER)
Migration
019hasdown_revision = "018", but migration020_add_custom_notes_to_player.pyALSO hasdown_revision = "018". This creates two Alembic heads. Runningalembic upgrade headwill fail with:Fix: Migration 020 must be updated to
down_revision = "019"so the chain is018 -> 019 -> 020. This is a merge ordering issue -- PR #165 (custom_notes) apparently merged before this PR.2.
alembic/env.pymissing other model imports (BLOCKER)The current
alembic/env.pyon main imports onlyCoach, Outbox, Parent, Player, Registration, Tenant. The PR addsOAuthTokenbut does NOT add many other models that exist inmodels.py:Team, EmailLog, Product, Order, PasswordResetToken, Outbox. While this is a pre-existing issue (not introduced by this PR), the migration chain conflict in item 1 means this file will need to be touched anyway, and the PR should not make the problem worse. TheOAuthTokenimport is correct for this PR's scope.Downgrading item 2 to NIT -- the missing imports are a pre-existing issue. The PR correctly adds its own model import. However, item 1 (migration chain conflict) remains a BLOCKER.
NITS
Lazy import in
get_gmail_client(line 27 of diff foremail.py):from basketball_api.services.token_store import get_access_tokenis imported inside the function body. This avoids a circular import, which is fine, but a module-level comment explaining why would help future readers. Minor._load_credentialserror handling: Ifcredentials.jsonis malformed JSON,json.load()will raisejson.JSONDecodeErrorwith a generic message. A more descriptive error wrapping the path would help debugging in production. Minor.seed_oauth_token.pyhas no error handling for DB connection: IfBASKETBALL_DATABASE_URLis misconfigured, the error will be a raw SQLAlchemy exception. Since this is a one-time migration script, acceptable but could be friendlier.alembic/env.pymodel imports are incomplete (pre-existing):Team, EmailLog, Product, Order, PasswordResetTokenare missing from the import list. Not introduced by this PR, but worth a follow-up cleanup issue.Test coverage for
_refresh_access_tokenand_load_credentials: These are tested indirectly through mocks, which is fine for unit tests. However, the_load_credentialsfunction handles bothinstalledandwebcredential formats -- only the mock path is exercised. A unit test for_load_credentialswith both formats and the error case would strengthen coverage. Non-blocking.SOP COMPLIANCE
130-spike-gmail-oauth-token-persistencematches issue #130ya29.valid-access-token,1//valid-refresh-token)PROCESS OBSERVATIONS
pal-e-deploymentsto remove the init container and volume mount. This should be tracked as a Forgejo issue.019/020collision must be resolved before merge. This is likely a rebase issue from PR #165 merging first.VERDICT: NOT APPROVED
One blocker: the Alembic migration chain conflict (
019and020both claimdown_revision = "018"). Fix this by rebasing and updating020to depend on019, or renumber accordingly. Once the migration chain is linear, this PR is ready to merge.de2ef2b0e16876db7f59PR #163 Re-Review
Re-review after previous blocker: migration chain conflict (019 and existing 020 both had
down_revision="018"). Fix was to rebase onto main and renumber migration to 021 withdown_revision="020".DOMAIN REVIEW
Stack: Python / FastAPI / SQLAlchemy / Alembic / httpx
Migration chain verification (primary focus of re-review):
The PR migration
021_add_oauth_tokens_table.pynow hasrevision = "021"anddown_revision = "020". Verified against main's migration chain:down_revision = "018")down_revision = "018")Result: The original blocker (019/020 collision with the old 019 oauth migration) is RESOLVED. Migration 021 chains off 020 correctly.
Pre-existing issue on main (not introduced by this PR): Both 019 and 020 on main have
down_revision = "018", creating a forked Alembic history with two heads. This meansalembic upgrade headwill fail until a merge migration is created. This is a pre-existing problem from PRs #164 and #165 merging without chain coordination. This PR does NOT introduce or worsen this issue -- it simply extends one of the two existing heads. A separate issue should track the merge migration fix.Token store (
token_store.py): Clean implementation. Proactive 5-minute refresh buffer matches gmail-sdk behavior. Graceful degradation on DB write failure after successful Google refresh. Last-write-wins concurrency is appropriate for single-replica Recreate strategy.Email integration (
email.py): All 7send_*functions updated to passdbthrough toget_gmail_client. Backwards-compatible --db=Nonefalls through to file-based tokens.send_password_reset_emailaddsdb: Session | None = Noneas optional param; route caller updated.Model (
models.py):OAuthTokenuses proper mapped columns, JSONB for token_data, UniqueConstraint on (provider, account), server_default timestamps.Alembic env.py:
OAuthTokenimport added for autogenerate awareness. Correct.Seed script (
scripts/seed_oauth_token.py): One-time migration utility. Proper error handling (missing args, file not found). UsesSessionLocaldirectly (appropriate for scripts outside request lifecycle).BLOCKERS
None. The migration chain blocker from the previous review is resolved. Migration 021 correctly descends from 020.
NITS
DB write failure path untested: The
except Exceptionblock inget_access_token(lines ~155-161 of token_store.py) that logs but continues after a DB write failure is not covered by tests. Low risk since it is a defensive fallback, but a test would confirm the access_token is still returned whensave_tokenraises.Pre-existing Alembic fork: Main has two heads (019 and 020 both descend from 018). A merge migration is needed. This should be tracked as a separate issue -- it predates this PR and is not within scope.
Inline import:
from basketball_api.services.token_store import get_access_tokenis imported insideget_gmail_clientrather than at module top. This avoids circular imports, which is a valid reason, but a comment explaining the deferred import would help future readers.SOP COMPLIANCE
130-spike-gmail-oauth-token-persistencematches issue #130PROCESS OBSERVATIONS
pal-e-deploymentsneeds init container + volume mount removal after seed script runs (documented in PR).VERDICT: APPROVED