Spike: Gmail OAuth token persistence in Postgres #130

Closed
opened 2026-03-21 14:50:48 +00:00 by forgejo_admin · 2 comments

Type

Spike

Lineage

plan-wkq → Phase 11 → discovered scope (email pipeline broke during login investigation)

Repo

forgejo_admin/basketball-api

Question

How should basketball-api persist Gmail OAuth tokens in Postgres so that token refresh survives pod restarts without writable filesystem mounts?

Currently gmail-sdk reads/writes tokens from a JSON file mounted as a read-only k8s secret. Access token expires ~60 min, SDK refreshes via Google but can't write back. Email silently breaks. Caused ~29-hour undetected outage (2026-03-19 to 2026-03-21).

What to Explore

1. gmail-sdk token storage mechanism

  • Read gmail_sdk.GmailClient.__init__ — how does it load tokens?
  • Is there a pluggable token store interface (callback, class override)?
  • If not, what's the minimal wrapper to intercept file read/write?
  • Check: does the SDK refresh in-memory and write back, or does it re-read from file each time?

2. Database schema options

  • Option A: New oauth_tokens table (type, account, token_json, expires_at, updated_at)
  • Option B: JSONB column on existing tenants table (token is per-tenant anyway)
  • Evaluate: encryption at rest? (refresh_token grants persistent Gmail access to westsidebasketball@gmail.com)
  • Consider pgcrypto vs app-level Fernet vs "Postgres auth + network policy is sufficient"

3. Token refresh flow

  • How to wire: read from DB on client init, write back after refresh
  • Concurrency: multiple requests hit expired token simultaneously — mutex? last-write-wins? single-refresh pattern?
  • Error handling: what if DB write fails after successful Google refresh?

4. Migration path

  • Seed DB from current k8s secret (one-time migration)
  • Remove gmail-oauth-token secret volume mount from pal-e-deployments kustomize
  • Fallback strategy: hard fail if DB token missing, or graceful degradation?

5. Cross-repo impact

  • basketball-api — app code changes (email service, config, models, migration)
  • pal-e-deployments — remove secret volume mount, remove secret resources
  • Any other services consuming gmail-oauth secrets?

Success Criteria

  • gmail-sdk token storage mechanism documented (pluggable or not)
  • Schema recommendation with encryption decision and reasoning
  • Refresh flow design with concurrency handling
  • Complete file change list across all repos
  • Implementation effort estimate (story points)
  • Follow-up feature ticket created with full scope

Time-box

2 hours investigation, written findings

  • Westside Basketball — project
  • basketball-api #129 — Enterprise login (depends on working email)
  • pal-e-platform #122 — Remove SendGrid
  • Blocks: girls jersey follow-up email (can't send if tokens expire)
### Type Spike ### Lineage `plan-wkq` → Phase 11 → discovered scope (email pipeline broke during login investigation) ### Repo `forgejo_admin/basketball-api` ### Question How should basketball-api persist Gmail OAuth tokens in Postgres so that token refresh survives pod restarts without writable filesystem mounts? Currently gmail-sdk reads/writes tokens from a JSON file mounted as a read-only k8s secret. Access token expires ~60 min, SDK refreshes via Google but can't write back. Email silently breaks. Caused ~29-hour undetected outage (2026-03-19 to 2026-03-21). ### What to Explore **1. gmail-sdk token storage mechanism** - Read `gmail_sdk.GmailClient.__init__` — how does it load tokens? - Is there a pluggable token store interface (callback, class override)? - If not, what's the minimal wrapper to intercept file read/write? - Check: does the SDK refresh in-memory and write back, or does it re-read from file each time? **2. Database schema options** - Option A: New `oauth_tokens` table (type, account, token_json, expires_at, updated_at) - Option B: JSONB column on existing `tenants` table (token is per-tenant anyway) - Evaluate: encryption at rest? (refresh_token grants persistent Gmail access to westsidebasketball@gmail.com) - Consider pgcrypto vs app-level Fernet vs "Postgres auth + network policy is sufficient" **3. Token refresh flow** - How to wire: read from DB on client init, write back after refresh - Concurrency: multiple requests hit expired token simultaneously — mutex? last-write-wins? single-refresh pattern? - Error handling: what if DB write fails after successful Google refresh? **4. Migration path** - Seed DB from current k8s secret (one-time migration) - Remove `gmail-oauth-token` secret volume mount from pal-e-deployments kustomize - Fallback strategy: hard fail if DB token missing, or graceful degradation? **5. Cross-repo impact** - `basketball-api` — app code changes (email service, config, models, migration) - `pal-e-deployments` — remove secret volume mount, remove secret resources - Any other services consuming gmail-oauth secrets? ### Success Criteria - [ ] gmail-sdk token storage mechanism documented (pluggable or not) - [ ] Schema recommendation with encryption decision and reasoning - [ ] Refresh flow design with concurrency handling - [ ] Complete file change list across all repos - [ ] Implementation effort estimate (story points) - [ ] Follow-up feature ticket created with full scope ### Time-box 2 hours investigation, written findings ### Related - `Westside Basketball` — project - basketball-api #129 — Enterprise login (depends on working email) - pal-e-platform #122 — Remove SendGrid - Blocks: girls jersey follow-up email (can't send if tokens expire)
Author
Owner

Spike Findings: Gmail OAuth Token Persistence in Postgres

1. gmail-sdk Token Storage Mechanism

Package: gmail-sdk-ldraney==0.1.3 (Lucas's own SDK, published to Forgejo PyPI)

Token load flow (auth.py -> _load_and_refresh_token):

  1. GmailClient.__init__ calls _load_and_refresh_token(account, secrets_dir) when access_token is not provided
  2. _load_token(account, secrets_dir) reads {secrets_dir}/gmail-{account}.json
  3. If expires_at < time.time() + 300 (5-minute buffer), it refreshes:
    • Reads {secrets_dir}/credentials.json for client_id / client_secret
    • Calls Google's token endpoint with the refresh_token
    • Writes updated token back to disk via _save_token() using os.open(..., O_WRONLY | O_CREAT | O_TRUNC, 0o600)
  4. Returns the access_token string, which is set on the httpx.Client headers

Token file structure (confirmed from live container):

Keys: access_token, expires_in, refresh_token, scope, token_type, refresh_token_expires_in, expires_at

Critical finding -- the root cause is confirmed:

  • K8s secret is mounted read-only (readOnly: true in deployment-patch.yaml)
  • The SDK's _save_token() call throws OSError: [Errno 30] Read-only file system
  • This means: first request after pod start works (token is fresh from secret), but after ~55 minutes the access token expires, SDK tries to refresh + write back, and the write fails with an unhandled exception, causing email sends to error out entirely

Pluggable token store: NO.
The SDK uses static methods _load_token / _save_token with hardcoded file I/O. There is no callback, protocol, or dependency injection for custom storage. The three options are:

  1. Subclass AuthMixin and override _load_token / _save_token / _load_and_refresh_token (they're static methods, so this requires overriding _load_and_refresh_token as classmethod)
  2. Monkey-patch AuthMixin._load_token and AuthMixin._save_token
  3. Upgrade gmail-sdk-ldraney to add a pluggable TokenStore protocol (since Lucas owns the SDK)

Recommendation: Option 3 -- add a TokenStore protocol to gmail-sdk-ldraney. This is the clean enterprise path. The SDK is Lucas's own package, and a TokenStore protocol is a 30-minute change. Then basketball-api implements a PostgresTokenStore that reads/writes from the DB.

2. Current Usage in basketball-api

get_gmail_client(tenant) in services/email.py (line 19):

  • Creates a new GmailClient per call (no singleton, no caching)
  • Uses tenant.gmail_account as the account name (currently "westsidebasketball")
  • Uses settings.gmail_secrets_dir which resolves to /secrets/google-oauth in prod

Call sites (4 email functions, each creates a fresh client):

  • send_confirmation_email() -- called from registration service + Stripe webhook
  • send_profile_reminder_email() -- admin endpoint
  • send_roster_export_email() -- admin endpoint
  • send_tryout_announcement_email() -- admin endpoint

Concurrency model: FastAPI with sync routes -> threadpool. Multiple requests can hit expired token simultaneously. Each creates its own GmailClient, each would independently attempt refresh.

3. Schema Recommendation

Recommendation: JSONB column on tenants table.

Reasoning:

  • Token is 1:1 with tenant (one gmail account per tenant)
  • tenants table already has gmail_account field (the account alias)
  • Avoids a new table + foreign key for what is essentially a tenant attribute
  • JSONB gives flexibility for the full token payload without schema rigidity

Schema change:

# On Tenant model
gmail_token: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
gmail_token_updated_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)

Encryption decision: NOT NEEDED at the application layer.

Reasoning:

  • The refresh_token is sensitive, but:
    • Postgres is internal to the cluster (no external access)
    • Database access requires the POSTGRES_PASSWORD from k8s secrets
    • Network policy will restrict access to basketball-api pods only
    • The current k8s secret mount has the same threat model (anyone with kubectl access can read it)
  • pgcrypto or Fernet adds operational complexity (key management, rotation) with no real security improvement for this threat model
  • If we later need encryption: add it as a follow-up when we have a proper secrets management system

Design: Single-refresh with DB persistence + last-write-wins.

Request arrives -> get_gmail_client(tenant) ->
  1. Read tenant.gmail_token from DB
  2. If expired (expires_at < now + 300s):
     a. Call Google token refresh endpoint
     b. UPDATE tenants SET gmail_token = new_token, gmail_token_updated_at = now
        WHERE id = tenant_id
     c. If DB write fails: log error, proceed with the fresh access_token anyway
        (it's valid for ~60 min)
  3. Return GmailClient(access_token=token["access_token"])

Concurrency handling: Last-write-wins is sufficient.

  • Multiple concurrent requests refreshing simultaneously will all get valid tokens from Google
  • Last one to write to DB wins -- this is fine because all refreshed tokens are equally valid
  • The refresh_token doesn't rotate on Google's side (it stays the same unless revoked)
  • No mutex needed -- the worst case is 2-3 redundant refresh calls, which is harmless

Error handling:

  • Google refresh fails -> raise, email send fails, gets logged (existing behavior)
  • DB write fails after successful refresh -> log warning, return the access_token anyway (it's valid for this request; next request will retry the write)
  • DB read fails -> raise, hard fail (no email without tokens)

5. Implementation Plan -- Two Repos

Phase A: gmail-sdk-ldraney (new feature)

Add a TokenStore protocol so the client accepts custom storage backends:

class TokenStore(Protocol):
    def load(self, account: str) -> dict | None: ...
    def save(self, account: str, token_data: dict) -> None: ...
    def load_credentials(self) -> dict[str, str]: ...

Then GmailClient.__init__ accepts token_store: TokenStore | None = None. If provided, uses it instead of file I/O.

Files changed:

  • gmail_sdk/auth.py -- add TokenStore protocol, modify _load_and_refresh_token to use it
  • gmail_sdk/__init__.py -- export TokenStore
  • tests/test_auth.py -- test custom token store
  • Bump version to 0.2.0

Phase B: basketball-api

Files changed:

  1. src/basketball_api/models.py -- add gmail_token JSONB + gmail_token_updated_at to Tenant
  2. alembic/versions/014_add_gmail_token_to_tenants.py -- migration
  3. src/basketball_api/services/email.py -- rewrite get_gmail_client() to:
    • Read tenant.gmail_token from DB
    • Implement PostgresTokenStore class
    • Pass to GmailClient(token_store=pg_store)
  4. src/basketball_api/config.py -- keep gmail_secrets_dir as fallback during transition
  5. tests/ -- update email service tests for DB-backed tokens
  6. One-time migration script (or alembic data migration) to seed DB from current k8s secret

Phase C: pal-e-deployments

Files changed:

  1. overlays/basketball-api/prod/deployment-patch.yaml -- remove gmail-oauth volume mount + volume
  2. Keep BASKETBALL_GMAIL_SECRETS_DIR env var temporarily as fallback, remove in follow-up

Phase D: Cleanup

  • Delete gmail-oauth-token k8s secret
  • Delete gmail-oauth-westsidebasketball k8s secret (there are 2 in the namespace)
  • Remove gmail_secrets_dir from config.py

6. Story Point Estimate

Phase Estimate Notes
A: gmail-sdk TokenStore protocol 2 pts Small SDK change, Lucas owns it
B: basketball-api DB persistence 3 pts Migration + service rewrite + tests
C: pal-e-deployments cleanup 1 pt Remove volume mount
D: Secret cleanup 0.5 pts kubectl delete
Total 6.5 pts ~2-3 dev sessions

7. Surprises and Blockers

  1. Two gmail-oauth secrets exist in the basketball-api namespace: gmail-oauth-token (11d old) and gmail-oauth-westsidebasketball (2d13h old). Only gmail-oauth-token is mounted. The second one appears to be from a manual update attempt. Should be cleaned up.

  2. The SDK exception is unhandled. When _save_token() fails on the read-only filesystem, it raises OSError. This propagates up through GmailClient.__init__, which means the client never gets created, and the email send fails entirely. This is worse than "can't persist" -- it means zero emails send after the first token expiry (not just "tokens don't persist").

  3. Token is currently NOT expired (expires_at: 1774108514 = 2026-03-19 + offset from refresh). This is because someone manually refreshed the k8s secret recently (the 2d13h old second secret suggests this). The 29-hour outage was real but was manually patched.

  4. gmail-sdk-ldraney has only httpx as a dependency. Adding a Protocol for TokenStore requires zero new dependencies. Clean.

  5. No other services consume gmail-oauth secrets. Only basketball-api namespace has gmail secrets. No cross-service impact.

  6. The SDK creates a new httpx.Client per GmailClient instance. Since basketball-api creates a new client per email send, this means a new TCP connection per email. Not a blocker, but worth noting for future connection pooling optimization.

Next Step

Create a feature issue with the full scope from this spike, linking to this spike issue.

## Spike Findings: Gmail OAuth Token Persistence in Postgres ### 1. gmail-sdk Token Storage Mechanism **Package:** `gmail-sdk-ldraney==0.1.3` (Lucas's own SDK, published to Forgejo PyPI) **Token load flow** (`auth.py` -> `_load_and_refresh_token`): 1. `GmailClient.__init__` calls `_load_and_refresh_token(account, secrets_dir)` when `access_token` is not provided 2. `_load_token(account, secrets_dir)` reads `{secrets_dir}/gmail-{account}.json` 3. If `expires_at < time.time() + 300` (5-minute buffer), it refreshes: - Reads `{secrets_dir}/credentials.json` for `client_id` / `client_secret` - Calls Google's token endpoint with the `refresh_token` - **Writes updated token back to disk** via `_save_token()` using `os.open(..., O_WRONLY | O_CREAT | O_TRUNC, 0o600)` 4. Returns the `access_token` string, which is set on the `httpx.Client` headers **Token file structure** (confirmed from live container): ``` Keys: access_token, expires_in, refresh_token, scope, token_type, refresh_token_expires_in, expires_at ``` **Critical finding -- the root cause is confirmed:** - K8s secret is mounted read-only (`readOnly: true` in deployment-patch.yaml) - The SDK's `_save_token()` call throws `OSError: [Errno 30] Read-only file system` - This means: first request after pod start works (token is fresh from secret), but after ~55 minutes the access token expires, SDK tries to refresh + write back, and **the write fails with an unhandled exception**, causing email sends to error out entirely **Pluggable token store: NO.** The SDK uses static methods `_load_token` / `_save_token` with hardcoded file I/O. There is no callback, protocol, or dependency injection for custom storage. The three options are: 1. Subclass `AuthMixin` and override `_load_token` / `_save_token` / `_load_and_refresh_token` (they're static methods, so this requires overriding `_load_and_refresh_token` as `classmethod`) 2. Monkey-patch `AuthMixin._load_token` and `AuthMixin._save_token` 3. **Upgrade gmail-sdk-ldraney** to add a pluggable `TokenStore` protocol (since Lucas owns the SDK) **Recommendation:** Option 3 -- add a `TokenStore` protocol to gmail-sdk-ldraney. This is the clean enterprise path. The SDK is Lucas's own package, and a `TokenStore` protocol is a 30-minute change. Then basketball-api implements a `PostgresTokenStore` that reads/writes from the DB. ### 2. Current Usage in basketball-api **`get_gmail_client(tenant)`** in `services/email.py` (line 19): - Creates a **new `GmailClient` per call** (no singleton, no caching) - Uses `tenant.gmail_account` as the account name (currently `"westsidebasketball"`) - Uses `settings.gmail_secrets_dir` which resolves to `/secrets/google-oauth` in prod **Call sites** (4 email functions, each creates a fresh client): - `send_confirmation_email()` -- called from registration service + Stripe webhook - `send_profile_reminder_email()` -- admin endpoint - `send_roster_export_email()` -- admin endpoint - `send_tryout_announcement_email()` -- admin endpoint **Concurrency model:** FastAPI with sync routes -> threadpool. Multiple requests can hit expired token simultaneously. Each creates its own GmailClient, each would independently attempt refresh. ### 3. Schema Recommendation **Recommendation: JSONB column on `tenants` table.** Reasoning: - Token is 1:1 with tenant (one gmail account per tenant) - `tenants` table already has `gmail_account` field (the account alias) - Avoids a new table + foreign key for what is essentially a tenant attribute - JSONB gives flexibility for the full token payload without schema rigidity **Schema change:** ```python # On Tenant model gmail_token: Mapped[dict | None] = mapped_column(JSONB, nullable=True) gmail_token_updated_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True) ``` **Encryption decision: NOT NEEDED at the application layer.** Reasoning: - The refresh_token is sensitive, but: - Postgres is internal to the cluster (no external access) - Database access requires the `POSTGRES_PASSWORD` from k8s secrets - Network policy will restrict access to basketball-api pods only - The current k8s secret mount has the same threat model (anyone with kubectl access can read it) - pgcrypto or Fernet adds operational complexity (key management, rotation) with no real security improvement for this threat model - If we later need encryption: add it as a follow-up when we have a proper secrets management system ### 4. Recommended Refresh Flow **Design: Single-refresh with DB persistence + last-write-wins.** ``` Request arrives -> get_gmail_client(tenant) -> 1. Read tenant.gmail_token from DB 2. If expired (expires_at < now + 300s): a. Call Google token refresh endpoint b. UPDATE tenants SET gmail_token = new_token, gmail_token_updated_at = now WHERE id = tenant_id c. If DB write fails: log error, proceed with the fresh access_token anyway (it's valid for ~60 min) 3. Return GmailClient(access_token=token["access_token"]) ``` **Concurrency handling: Last-write-wins is sufficient.** - Multiple concurrent requests refreshing simultaneously will all get valid tokens from Google - Last one to write to DB wins -- this is fine because all refreshed tokens are equally valid - The refresh_token doesn't rotate on Google's side (it stays the same unless revoked) - No mutex needed -- the worst case is 2-3 redundant refresh calls, which is harmless **Error handling:** - Google refresh fails -> raise, email send fails, gets logged (existing behavior) - DB write fails after successful refresh -> log warning, return the access_token anyway (it's valid for this request; next request will retry the write) - DB read fails -> raise, hard fail (no email without tokens) ### 5. Implementation Plan -- Two Repos **Phase A: gmail-sdk-ldraney (new feature)** Add a `TokenStore` protocol so the client accepts custom storage backends: ```python class TokenStore(Protocol): def load(self, account: str) -> dict | None: ... def save(self, account: str, token_data: dict) -> None: ... def load_credentials(self) -> dict[str, str]: ... ``` Then `GmailClient.__init__` accepts `token_store: TokenStore | None = None`. If provided, uses it instead of file I/O. Files changed: - `gmail_sdk/auth.py` -- add `TokenStore` protocol, modify `_load_and_refresh_token` to use it - `gmail_sdk/__init__.py` -- export `TokenStore` - `tests/test_auth.py` -- test custom token store - Bump version to `0.2.0` **Phase B: basketball-api** Files changed: 1. `src/basketball_api/models.py` -- add `gmail_token` JSONB + `gmail_token_updated_at` to Tenant 2. `alembic/versions/014_add_gmail_token_to_tenants.py` -- migration 3. `src/basketball_api/services/email.py` -- rewrite `get_gmail_client()` to: - Read `tenant.gmail_token` from DB - Implement `PostgresTokenStore` class - Pass to `GmailClient(token_store=pg_store)` 4. `src/basketball_api/config.py` -- keep `gmail_secrets_dir` as fallback during transition 5. `tests/` -- update email service tests for DB-backed tokens 6. One-time migration script (or alembic data migration) to seed DB from current k8s secret **Phase C: pal-e-deployments** Files changed: 1. `overlays/basketball-api/prod/deployment-patch.yaml` -- remove `gmail-oauth` volume mount + volume 2. Keep `BASKETBALL_GMAIL_SECRETS_DIR` env var temporarily as fallback, remove in follow-up **Phase D: Cleanup** - Delete `gmail-oauth-token` k8s secret - Delete `gmail-oauth-westsidebasketball` k8s secret (there are 2 in the namespace) - Remove `gmail_secrets_dir` from config.py ### 6. Story Point Estimate | Phase | Estimate | Notes | |-------|----------|-------| | A: gmail-sdk TokenStore protocol | 2 pts | Small SDK change, Lucas owns it | | B: basketball-api DB persistence | 3 pts | Migration + service rewrite + tests | | C: pal-e-deployments cleanup | 1 pt | Remove volume mount | | D: Secret cleanup | 0.5 pts | kubectl delete | | **Total** | **6.5 pts** | ~2-3 dev sessions | ### 7. Surprises and Blockers 1. **Two gmail-oauth secrets exist** in the basketball-api namespace: `gmail-oauth-token` (11d old) and `gmail-oauth-westsidebasketball` (2d13h old). Only `gmail-oauth-token` is mounted. The second one appears to be from a manual update attempt. Should be cleaned up. 2. **The SDK exception is unhandled.** When `_save_token()` fails on the read-only filesystem, it raises `OSError`. This propagates up through `GmailClient.__init__`, which means the client never gets created, and the email send fails entirely. This is worse than "can't persist" -- it means **zero emails send after the first token expiry** (not just "tokens don't persist"). 3. **Token is currently NOT expired** (`expires_at: 1774108514` = 2026-03-19 + offset from refresh). This is because someone manually refreshed the k8s secret recently (the 2d13h old second secret suggests this). The 29-hour outage was real but was manually patched. 4. **`gmail-sdk-ldraney` has only `httpx` as a dependency.** Adding a `Protocol` for `TokenStore` requires zero new dependencies. Clean. 5. **No other services consume gmail-oauth secrets.** Only basketball-api namespace has gmail secrets. No cross-service impact. 6. **The SDK creates a new `httpx.Client` per `GmailClient` instance.** Since basketball-api creates a new client per email send, this means a new TCP connection per email. Not a blocker, but worth noting for future connection pooling optimization. ### Next Step Create a feature issue with the full scope from this spike, linking to this spike issue.
Author
Owner

Spike Findings: Gmail OAuth Token Persistence in Postgres

1. Current Token Loading Mechanism

gmail-sdk v0.1.3 uses a purely file-based token storage model. The flow:

  1. GmailClient.__init__ receives account and secrets_dir params
  2. Calls AuthMixin._load_and_refresh_token(account, secrets_dir) which:
    • Reads {secrets_dir}/gmail-{account}.json via _load_token() (JSON file on disk)
    • Checks expires_at — if token expires within 300 seconds, refreshes via Google
    • Loads credentials.json from secrets_dir to get client_id/client_secret for refresh
    • Writes refreshed token back to disk via _save_token() (uses os.open with O_WRONLY | O_CREAT | O_TRUNC, mode 0o600)
  3. Sets self.access_token on the httpx client — one-shot at init time, no mid-request re-auth

Critical detail: The SDK refreshes at init time only. Once GmailClient is constructed, the access token is baked into the httpx Authorization header. There is no retry-on-401 or lazy refresh. If the token expires mid-session (after ~60 min), subsequent API calls fail with 401.

basketball-api creates a new GmailClient per email send (get_gmail_client() in services/email.py), so each send triggers a fresh token load + potential refresh. This is actually safe for token expiry — the problem is that _save_token() fails silently or throws on read-only filesystems.

pal-e-mail caches GmailClient instances in a module-level _clients dict, so it only refreshes on first use per sender per process lifetime. This is more vulnerable to token expiry.

Current k8s workaround (basketball-api): An init container copies the read-only k8s secret to an emptyDir volume so _save_token() can write. But emptyDir is ephemeral — refreshed tokens are lost on pod restart, and the original k8s secret still has the old (possibly expired) access token. The refresh_token survives (Google doesn't revoke it), so the SDK can always re-refresh, but the cycle repeats every restart.

pal-e-mail uses a PVC (gmail-oauth PersistentVolumeClaim), which is more durable but still filesystem-dependent and creates operational complexity.

2. Does gmail-sdk Support Pluggable Token Stores?

No. The SDK has no abstraction layer for token storage. The relevant methods are all @staticmethod on AuthMixin:

  • _load_token(account, secrets_dir) — hardcoded Path(secrets_dir) / f"gmail-{account}.json" file read
  • _save_token(account, secrets_dir, token_data) — hardcoded file write
  • _load_and_refresh_token(account, secrets_dir) — orchestrates load/refresh/save, returns access_token string
  • _load_credentials(secrets_dir) — reads credentials.json for client_id/client_secret

All four are static methods with no hook points, no callback interface, no class-level override pattern. The GmailClient.__init__ calls _load_and_refresh_token directly.

Option A (Recommended): Add a TokenStore protocol to gmail-sdk

Add an abstract TokenStore interface to gmail-sdk, with a default FileTokenStore that preserves current behavior:

# gmail_sdk/token_store.py
from typing import Protocol, Any

class TokenStore(Protocol):
    def load(self, account: str) -> dict[str, Any] | None: ...
    def save(self, account: str, token_data: dict[str, Any]) -> None: ...

class FileTokenStore:
    """Default file-based store (current behavior)."""
    def __init__(self, secrets_dir: str): ...
    def load(self, account: str) -> dict[str, Any] | None: ...
    def save(self, account: str, token_data: dict[str, Any]) -> None: ...

Then GmailClient.__init__ accepts an optional token_store: TokenStore | None param. If provided, uses it instead of file-based methods. If not, creates FileTokenStore(secrets_dir) — fully backwards compatible.

basketball-api then implements PostgresTokenStore:

class PostgresTokenStore:
    def __init__(self, session_factory):
        self._session_factory = session_factory

    def load(self, account: str) -> dict[str, Any] | None:
        with self._session_factory() as db:
            row = db.query(OAuthToken).filter_by(account=account).first()
            return json.loads(row.token_json) if row else None

    def save(self, account: str, token_data: dict[str, Any]) -> None:
        with self._session_factory() as db:
            row = db.query(OAuthToken).filter_by(account=account).first()
            if row:
                row.token_json = json.dumps(token_data)
                row.updated_at = func.now()
            else:
                row = OAuthToken(account=account, token_json=json.dumps(token_data))
                db.add(row)
            db.commit()

Why Option A over wrapping at the app level: Intercepting _save_token from outside the SDK is fragile (monkeypatching static methods). The SDK owns the refresh-then-save flow, so the abstraction belongs in the SDK. This also benefits pal-e-mail, which has the same problem.

Option B (Alternative): Wrapper in basketball-api only, no SDK changes

Subclass GmailClient and override the static methods. This works because Python allows overriding static methods on subclasses, but it is brittle — _load_and_refresh_token calls cls._load_token and cls._save_token, so overriding on a subclass does work via cls dispatch. However, this couples basketball-api to SDK internals and breaks if the SDK refactors.

Recommendation: Option A. The SDK change is small (one new file, ~40 lines, plus ~10 lines changed in auth.py and client.py), backwards compatible, and fixes the problem for all consumers.

4. Database Schema

New oauth_tokens table (not a JSONB column on tenants):

CREATE TABLE oauth_tokens (
    id SERIAL PRIMARY KEY,
    account VARCHAR(100) UNIQUE NOT NULL,  -- e.g. "marcus"
    token_json TEXT NOT NULL,              -- full token dict as JSON
    updated_at TIMESTAMP DEFAULT now(),
    created_at TIMESTAMP DEFAULT now()
);

Why a separate table, not a column on tenants:

  • Separation of concerns — OAuth tokens are infrastructure, not business data
  • The account key (e.g. "marcus") is the gmail-sdk concept, not the tenant slug
  • Multiple tenants could share a gmail account, or a tenant could have zero
  • Easier to reason about encryption, rotation, and access control independently

Encryption decision: Not needed at this stage. Reasoning:

  • The refresh_token is the sensitive value. It grants persistent Gmail access.
  • Postgres is behind k8s network policy (only basketball-api namespace can reach it). No public exposure.
  • The current k8s secret (which stores the same token in plaintext JSON) has the same security posture.
  • Adding pgcrypto or Fernet adds key management complexity (where does the encryption key live? another k8s secret — turtles all the way down).
  • If encryption becomes a requirement later, it is a clean additive change (encrypt token_json column, add decrypt in PostgresTokenStore.load).

5. Token Refresh Flow and Concurrency

Concurrency handling: Last-write-wins is sufficient.

  • basketball-api creates a new GmailClient per request (no caching). Two concurrent requests hitting an expired token will both refresh with Google. Google handles this gracefully — both get valid access tokens and the refresh_token is not rotated.
  • Both writes to oauth_tokens will contain valid tokens. Whichever writes last wins. No mutex needed.
  • The SDK's _load_and_refresh_token is synchronous and scoped to the request. No shared state.

Error handling if DB write fails after successful Google refresh:

  • The access token is already set on the GmailClient instance — the email still sends.
  • The refreshed token is lost — next request will re-refresh from DB (which still has the old access_token but a valid refresh_token).
  • Log a warning, do not fail the email send. Token refresh is best-effort persistence.

6. Migration Path

One-time seed: Read current token file from k8s secret, INSERT into oauth_tokens table via alembic data migration or a management command.

Deployment sequence:

  1. Merge gmail-sdk TokenStore protocol (bump to v0.1.4)
  2. Add oauth_tokens model + alembic migration to basketball-api
  3. Implement PostgresTokenStore in basketball-api
  4. Seed DB from current k8s secret value
  5. Deploy basketball-api with new code — it reads/writes tokens from Postgres
  6. Remove gmail-oauth-token secret, copy-gmail-oauth init container, and gmail-oauth-writable emptyDir from pal-e-deployments/overlays/basketball-api/
  7. (Separate ticket) Migrate pal-e-mail to use the same pattern, remove its PVC

7. Complete File Change List

gmail-sdk (new minor version 0.1.4):

  • src/gmail_sdk/token_store.py — NEW: TokenStore protocol + FileTokenStore default
  • src/gmail_sdk/auth.py — MODIFY: refactor _load_token, _save_token, _load_and_refresh_token to delegate to token store
  • src/gmail_sdk/client.py — MODIFY: accept optional token_store param in __init__
  • src/gmail_sdk/__init__.py — MODIFY: export TokenStore, FileTokenStore
  • tests/test_token_store.py — NEW: unit tests for protocol + file store
  • pyproject.toml — MODIFY: bump version

basketball-api:

  • src/basketball_api/models.py — MODIFY: add OAuthToken model
  • src/basketball_api/token_store.py — NEW: PostgresTokenStore implementation
  • src/basketball_api/services/email.py — MODIFY: get_gmail_client() passes PostgresTokenStore to GmailClient
  • alembic/versions/019_add_oauth_tokens_table.py — NEW: migration
  • pyproject.toml — MODIFY: bump gmail-sdk dependency to >=0.1.4

pal-e-deployments:

  • overlays/basketball-api/prod/deployment-patch.yaml — MODIFY: remove init container, gmail-oauth volumes, gmail-oauth-writable emptyDir

8. Effort Estimate

Work Item Estimate
gmail-sdk TokenStore protocol + FileTokenStore + tests 2 points
basketball-api OAuthToken model + migration + PostgresTokenStore 2 points
Wire up in email service + integration test 1 point
pal-e-deployments cleanup (remove volumes/init container) 1 point
Seed DB from current secret + deploy validation 1 point
Total 7 points

Split across 2-3 PRs: gmail-sdk first, then basketball-api + pal-e-deployments together.

9. Follow-up Ticket Scope

A feature ticket should cover:

  • All file changes listed above
  • pal-e-mail migration (separate ticket, same pattern, lower priority since it uses a PVC that works)
  • Acceptance criteria: email sends survive pod restarts without manual token re-seeding
## Spike Findings: Gmail OAuth Token Persistence in Postgres ### 1. Current Token Loading Mechanism **gmail-sdk v0.1.3** uses a purely file-based token storage model. The flow: 1. `GmailClient.__init__` receives `account` and `secrets_dir` params 2. Calls `AuthMixin._load_and_refresh_token(account, secrets_dir)` which: - Reads `{secrets_dir}/gmail-{account}.json` via `_load_token()` (JSON file on disk) - Checks `expires_at` — if token expires within 300 seconds, refreshes via Google - Loads `credentials.json` from `secrets_dir` to get `client_id`/`client_secret` for refresh - **Writes refreshed token back to disk** via `_save_token()` (uses `os.open` with `O_WRONLY | O_CREAT | O_TRUNC`, mode 0o600) 3. Sets `self.access_token` on the httpx client — **one-shot at init time, no mid-request re-auth** **Critical detail**: The SDK refreshes at init time only. Once `GmailClient` is constructed, the access token is baked into the httpx `Authorization` header. There is no retry-on-401 or lazy refresh. If the token expires mid-session (after ~60 min), subsequent API calls fail with 401. **basketball-api** creates a new `GmailClient` per email send (`get_gmail_client()` in `services/email.py`), so each send triggers a fresh token load + potential refresh. This is actually safe for token expiry — the problem is that `_save_token()` fails silently or throws on read-only filesystems. **pal-e-mail** caches `GmailClient` instances in a module-level `_clients` dict, so it only refreshes on first use per sender per process lifetime. This is more vulnerable to token expiry. **Current k8s workaround** (basketball-api): An init container copies the read-only k8s secret to an `emptyDir` volume so `_save_token()` can write. But `emptyDir` is ephemeral — refreshed tokens are lost on pod restart, and the original k8s secret still has the old (possibly expired) access token. The refresh_token survives (Google doesn't revoke it), so the SDK can always re-refresh, but the cycle repeats every restart. **pal-e-mail** uses a PVC (`gmail-oauth` PersistentVolumeClaim), which is more durable but still filesystem-dependent and creates operational complexity. ### 2. Does gmail-sdk Support Pluggable Token Stores? **No.** The SDK has no abstraction layer for token storage. The relevant methods are all `@staticmethod` on `AuthMixin`: - `_load_token(account, secrets_dir)` — hardcoded `Path(secrets_dir) / f"gmail-{account}.json"` file read - `_save_token(account, secrets_dir, token_data)` — hardcoded file write - `_load_and_refresh_token(account, secrets_dir)` — orchestrates load/refresh/save, returns `access_token` string - `_load_credentials(secrets_dir)` — reads `credentials.json` for client_id/client_secret All four are static methods with no hook points, no callback interface, no class-level override pattern. The `GmailClient.__init__` calls `_load_and_refresh_token` directly. ### 3. Recommended Approach **Option A (Recommended): Add a `TokenStore` protocol to gmail-sdk** Add an abstract `TokenStore` interface to gmail-sdk, with a default `FileTokenStore` that preserves current behavior: ```python # gmail_sdk/token_store.py from typing import Protocol, Any class TokenStore(Protocol): def load(self, account: str) -> dict[str, Any] | None: ... def save(self, account: str, token_data: dict[str, Any]) -> None: ... class FileTokenStore: """Default file-based store (current behavior).""" def __init__(self, secrets_dir: str): ... def load(self, account: str) -> dict[str, Any] | None: ... def save(self, account: str, token_data: dict[str, Any]) -> None: ... ``` Then `GmailClient.__init__` accepts an optional `token_store: TokenStore | None` param. If provided, uses it instead of file-based methods. If not, creates `FileTokenStore(secrets_dir)` — fully backwards compatible. basketball-api then implements `PostgresTokenStore`: ```python class PostgresTokenStore: def __init__(self, session_factory): self._session_factory = session_factory def load(self, account: str) -> dict[str, Any] | None: with self._session_factory() as db: row = db.query(OAuthToken).filter_by(account=account).first() return json.loads(row.token_json) if row else None def save(self, account: str, token_data: dict[str, Any]) -> None: with self._session_factory() as db: row = db.query(OAuthToken).filter_by(account=account).first() if row: row.token_json = json.dumps(token_data) row.updated_at = func.now() else: row = OAuthToken(account=account, token_json=json.dumps(token_data)) db.add(row) db.commit() ``` **Why Option A over wrapping at the app level**: Intercepting `_save_token` from outside the SDK is fragile (monkeypatching static methods). The SDK owns the refresh-then-save flow, so the abstraction belongs in the SDK. This also benefits pal-e-mail, which has the same problem. **Option B (Alternative): Wrapper in basketball-api only, no SDK changes** Subclass `GmailClient` and override the static methods. This works because Python allows overriding static methods on subclasses, but it is brittle — `_load_and_refresh_token` calls `cls._load_token` and `cls._save_token`, so overriding on a subclass does work via `cls` dispatch. However, this couples basketball-api to SDK internals and breaks if the SDK refactors. **Recommendation: Option A.** The SDK change is small (one new file, ~40 lines, plus ~10 lines changed in `auth.py` and `client.py`), backwards compatible, and fixes the problem for all consumers. ### 4. Database Schema **New `oauth_tokens` table** (not a JSONB column on `tenants`): ```sql CREATE TABLE oauth_tokens ( id SERIAL PRIMARY KEY, account VARCHAR(100) UNIQUE NOT NULL, -- e.g. "marcus" token_json TEXT NOT NULL, -- full token dict as JSON updated_at TIMESTAMP DEFAULT now(), created_at TIMESTAMP DEFAULT now() ); ``` **Why a separate table, not a column on tenants**: - Separation of concerns — OAuth tokens are infrastructure, not business data - The `account` key (e.g. "marcus") is the gmail-sdk concept, not the tenant slug - Multiple tenants could share a gmail account, or a tenant could have zero - Easier to reason about encryption, rotation, and access control independently **Encryption decision: Not needed at this stage.** Reasoning: - The refresh_token is the sensitive value. It grants persistent Gmail access. - Postgres is behind k8s network policy (only basketball-api namespace can reach it). No public exposure. - The current k8s secret (which stores the same token in plaintext JSON) has the same security posture. - Adding pgcrypto or Fernet adds key management complexity (where does the encryption key live? another k8s secret — turtles all the way down). - If encryption becomes a requirement later, it is a clean additive change (encrypt `token_json` column, add decrypt in `PostgresTokenStore.load`). ### 5. Token Refresh Flow and Concurrency **Concurrency handling: Last-write-wins is sufficient.** - basketball-api creates a new `GmailClient` per request (no caching). Two concurrent requests hitting an expired token will both refresh with Google. Google handles this gracefully — both get valid access tokens and the refresh_token is not rotated. - Both writes to `oauth_tokens` will contain valid tokens. Whichever writes last wins. No mutex needed. - The SDK's `_load_and_refresh_token` is synchronous and scoped to the request. No shared state. **Error handling if DB write fails after successful Google refresh**: - The access token is already set on the `GmailClient` instance — the email still sends. - The refreshed token is lost — next request will re-refresh from DB (which still has the old access_token but a valid refresh_token). - Log a warning, do not fail the email send. Token refresh is best-effort persistence. ### 6. Migration Path **One-time seed**: Read current token file from k8s secret, INSERT into `oauth_tokens` table via alembic data migration or a management command. **Deployment sequence**: 1. Merge gmail-sdk `TokenStore` protocol (bump to v0.1.4) 2. Add `oauth_tokens` model + alembic migration to basketball-api 3. Implement `PostgresTokenStore` in basketball-api 4. Seed DB from current k8s secret value 5. Deploy basketball-api with new code — it reads/writes tokens from Postgres 6. Remove `gmail-oauth-token` secret, `copy-gmail-oauth` init container, and `gmail-oauth-writable` emptyDir from `pal-e-deployments/overlays/basketball-api/` 7. (Separate ticket) Migrate pal-e-mail to use the same pattern, remove its PVC ### 7. Complete File Change List **gmail-sdk** (new minor version 0.1.4): - `src/gmail_sdk/token_store.py` — NEW: `TokenStore` protocol + `FileTokenStore` default - `src/gmail_sdk/auth.py` — MODIFY: refactor `_load_token`, `_save_token`, `_load_and_refresh_token` to delegate to token store - `src/gmail_sdk/client.py` — MODIFY: accept optional `token_store` param in `__init__` - `src/gmail_sdk/__init__.py` — MODIFY: export `TokenStore`, `FileTokenStore` - `tests/test_token_store.py` — NEW: unit tests for protocol + file store - `pyproject.toml` — MODIFY: bump version **basketball-api**: - `src/basketball_api/models.py` — MODIFY: add `OAuthToken` model - `src/basketball_api/token_store.py` — NEW: `PostgresTokenStore` implementation - `src/basketball_api/services/email.py` — MODIFY: `get_gmail_client()` passes `PostgresTokenStore` to `GmailClient` - `alembic/versions/019_add_oauth_tokens_table.py` — NEW: migration - `pyproject.toml` — MODIFY: bump gmail-sdk dependency to >=0.1.4 **pal-e-deployments**: - `overlays/basketball-api/prod/deployment-patch.yaml` — MODIFY: remove init container, gmail-oauth volumes, gmail-oauth-writable emptyDir ### 8. Effort Estimate | Work Item | Estimate | |-----------|----------| | gmail-sdk TokenStore protocol + FileTokenStore + tests | 2 points | | basketball-api OAuthToken model + migration + PostgresTokenStore | 2 points | | Wire up in email service + integration test | 1 point | | pal-e-deployments cleanup (remove volumes/init container) | 1 point | | Seed DB from current secret + deploy validation | 1 point | | **Total** | **7 points** | Split across 2-3 PRs: gmail-sdk first, then basketball-api + pal-e-deployments together. ### 9. Follow-up Ticket Scope A feature ticket should cover: - All file changes listed above - pal-e-mail migration (separate ticket, same pattern, lower priority since it uses a PVC that works) - Acceptance criteria: email sends survive pod restarts without manual token re-seeding
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#130
No description provided.