feat: automate Gmail OAuth reauth lifecycle (7-day token expiry) #222
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!222
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "162-gmail-oauth-reauth-automation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Changes
scripts/gmail-reauth.sh(new): end-to-end orchestration script that invokes gmail-mcp reauth CLI, validates 4 required scopes, updates k8s secretgmail-oauth-tokenin basketball-api namespace, deletes stalegmail-oauth-westsidebasketballsecret, and triggers rollout restart. Supports--dry-run. Bridges local filename typo transparently.salt/pillar/secrets_registry.sls: addsgmail_oauth_tokenentry underplatform:withrotation_days: 7, originexternal, providerGoogle OAuth (westsidebasketball@gmail.com)terraform/modules/monitoring/main.tf: addsgmail-oauth-expiryPrometheusRule with day-6 warning (GmailOAuthTokenExpiringSoon) and day-7 critical (GmailOAuthTokenExpired) alerts usingkube_secret_createdmetrictofu plan Output
Cannot run
tofu planin worktree (no.terraform/init state).tofu fmt -recursivepassed clean. Syntax consistent with existing PrometheusRule resources (blackbox_alerts,embedding_alerts).Test Plan
bash -n scripts/gmail-reauth.shpasses (verified)scripts/gmail-reauth.sh --dry-runto verify flag parsing and outputscripts/gmail-reauth.shagainst live basketball-api namespacekubectl get secret gmail-oauth-token -n basketball-apireturns updated tokengmail-oauth-westsidebasketballsecret is deletedtofu planon archbox to verify PrometheusRule resource is validReview Checklist
tofu fmt -recursivecleanbash -n)Related Notes
forgejo_admin/gmail-mcp#6(SSH reauth tool, done)pal-e-platform-- the project this work belongs toQA Review -- PR #222
Issues Found and Fixed (commit
2b3f436)Dry-run mode crash on missing token file --
validate_token()calleddieif the token file was missing, even in dry-run mode. Since dry-run skips the actual reauth, the file may not exist yet. Fixed: returns early with a warning in dry-run mode.update_k8s_secret()base64 encoding before dry-run guard --base64 -w0 < $LOCAL_TOKEN_FILEran before the dry-run check, so--dry-runwould crash on a missing file. Fixed: moved encoding after the guard, added file-existence check in dry-run output.Errant
2>&1redirect on reauth invocation --python3 -m gmail_mcp.reauth 2>&1merged stderr into stdout but did not capture either. This masks error messages in the terminal output and could hide diagnostic information. Fixed: removed the redirect so stderr flows naturally.Verified Correct
gmail-westsidebasktball.json(missing 'e') mapped to k8s keygmail-westsidebasketball.json(correct spelling)kubectl create --dry-run=client | kubectl applyis the correct idempotent patternblackbox_alerts,embedding_alerts)kube_secret_createdmetric is correct for kube-state-metricsplatform:section)set -euo pipefail, color output, usage function -- matches issue constraintsVERDICT: APPROVE
All issues found during review have been fixed in commit
2b3f436. The implementation matches all acceptance criteria from issue #162.PR #222 Review
DOMAIN REVIEW
Tech stack: Bash orchestration script, Salt pillar (YAML), Terraform/HCL (PrometheusRule via kubernetes_manifest). Domain checklists applied: Terraform/k8s/Salt.
scripts/gmail-reauth.sh (311 lines, new file)
The script is well-structured with clear separation of concerns (prereqs, reauth, validation, k8s update, cleanup, restart). Good use of
set -euo pipefail,--dry-runsupport, colored output, and prerequisite checks.Observations:
kube_secret_createdvskubectl applymismatch (correctness bug): The PrometheusRule alerts usekube_secret_created(the k8smetadata.creationTimestamp) to measure secret age. However, the script useskubectl create secret ... --dry-run=client -o yaml | kubectl apply -f -to update the secret.kubectl applypatches an existing resource in place -- it does NOT resetmetadata.creationTimestamp. After the first 7-day cycle, the alert will fire and never clear, because every subsequentapplypreserves the original creation timestamp. The fix is either: (a)kubectl delete secretthenkubectl create secret(so creationTimestamp resets), or (b) use an annotation likelast-rotatedthat the script sets explicitly, and query that annotation in the PromQL expression instead. This is a functional correctness issue that will cause alert fatigue from day 8 onward.Unused variable
token_b64: Line computestoken_b64=$(base64 -w0 < "${LOCAL_TOKEN_FILE}")for verification, then the verification step reads back viakubectl get secret ... -o jsonpath. The comparisonstored_b64vstoken_b64is correct logic, butbase64 -w0is a GNU coreutils flag. On macOS (which is referenced as a potential host given the Mac build agent work),base64does not support-w0-- it usesbase64with no line-wrap by default orbase64 -b0. Thestatcommand already has cross-platform handling (GNU--format=%swith macOS fallback-f%z), butbase64does not. Since this script runs on archbox (Linux), this is a nit rather than a blocker, but worth noting for portability.jsonpath dot-escaping fallback: The secret key
gmail-westsidebasketball.jsoncontains a dot, which breaks jsonpath's default field selector. The script tries plain jsonpath first, then falls back to bracket notation with escaped dot. This is a reasonable approach but the fallback syntax'{.data.gmail-westsidebasketball\.json}'may not work in all kubectl versions. Consider using-o json | jqfor the verification step, since jq is already a prerequisite.salt/pillar/secrets_registry.sls
Clean addition. Follows the established pattern (origin, description, provider, created, rotation_days, notes). Documents the filename typo transparently. Placement under
platform:is correct. No secrets committed -- only metadata.terraform/modules/monitoring/main.tf
Follows the exact pattern of existing PrometheusRule resources (
blackbox_alerts,embedding_alerts): same labels, samedepends_on, same structure. Thefor = "0m"on the critical alert matches thePodRestartStormprecedent at line 122. Thefor = "1h"on the warning gives appropriate debounce.The
humanizeDurationtemplate function is correct for formatting seconds as human-readable duration.BLOCKERS
kube_secret_created(the k8s object creation timestamp) is immutable once the object exists.kubectl applydoes not reset it. After the first 7-day cycle, both alerts will fire permanently regardless of token freshness. The alerting strategy and the secret update strategy are incompatible. Either the script must delete+recreate the secret (socreationTimestampresets), or the alert must use a different signal (e.g., a custom annotation or a recording rule based on secret data hash changes). This is a correctness blocker -- deploying this as-is will produce permanent false-positive critical alerts.NITS
base64 -w0portability: GNU-only flag. Not blocking since archbox is Linux, but if the script is ever run from the Mac agent, it will fail. Considerbase64 | tr -d '\n'as a portable alternative.jsonpath verification could use jq: Since jq is already a prerequisite, the verification step for reading back the secret could use
kubectl get secret ... -o json | jq -r '.data["gmail-westsidebasketball.json"]'instead of relying on jsonpath dot-escaping quirks.tofu plannot included: PR body explains this cannot be run in the worktree (no.terraform/init state). The PR conventions say "Includetofu planoutput for any Terraform changes." This is understandable given worktree limitations but worth noting. The new resource follows existing patterns closely, reducing risk.Script has no shebang portability note: The script uses
#!/bin/bashwhich is fine, but theset -euo pipefailplus array syntax (REQUIRED_SCOPES=(...)) means this genuinely requires bash 4+. Worth a comment since some systems default to bash 3.SOP COMPLIANCE
162-gmail-oauth-reauth-automationmatches issue #162)pal-e-platform)tofu planoutput missing (explained in PR body -- worktree limitation)PROCESS OBSERVATIONS
kube_secret_createdmismatch means the alerting component will produce false positives. The script itself is sound -- only the alert/script integration has the bug.tofu applyrequired for the PrometheusRule. Script is a standalone ops tool.VERDICT: NOT APPROVED
One blocker: the PrometheusRule uses
kube_secret_created(immutable after object creation) but the script updates the secret viakubectl apply(which preserves creation timestamp). After the first rotation cycle, both alerts will fire permanently. Either the script must delete+recreate the secret so the timestamp resets, or the PromQL must use a different signal. Fix this integration mismatch and this is ready to merge.PR #222 Review (Re-review)
Previous blocker:
kubectl applypreservesmetadata.creationTimestamp, causing PrometheusRule false positives after first rotation cycle. Fix needed: delete+recreate pattern.BLOCKER RESOLUTION
RESOLVED. The script now uses explicit delete+recreate at lines 199-210 of
scripts/gmail-reauth.sh:The comment explaining why delete+recreate is used (rather than apply) is excellent -- this prevents future refactors from regressing to the broken pattern.
DOMAIN REVIEW
Tech stack: Bash ops script + Terraform/HCL (PrometheusRule kubernetes_manifest) + Salt pillar (YAML).
Terraform/k8s:
blackbox_alertsandembedding_alerts(same metadata labels,depends_on, namespace reference viakubernetes_namespace_v1.monitoring.metadata[0].name). Consistent.for = "0m"on the critical alert is consistent with the existingPodRestartStormrule at line 122 of the same file.for = "1h"on the warning alert is appropriate -- avoids flapping near the 6-day boundary.kube_secret_createdmetric which is exposed by kube-state-metrics (included in kube-prometheus-stack). Valid metric source.$value | humanizeDurationtemplate function is correct Prometheus alerting syntax.Bash script:
set -euo pipefailat the top -- correct strict mode.stat --format=%swith|| stat -f%zfallback handles both GNU and BSD stat -- good cross-platform consideration..scopes(array) and.scope(space-delimited string) field formats. Defensive.--dry-runflag is respected in every mutating function (update_k8s_secret,delete_stale_secret,restart_deployment,run_reauth). Complete coverage.Salt pillar:
gmail_oauth_tokenentry follows the exact structure of adjacent entries (origin, description, provider, created, rotation_days, notes).rotation_days: 7matches the Google policy documented in the notes.gmail-westsidebasktball.jsonmissing 'e') is documented in both the script header and the pillar notes -- transparent and consistent.BLOCKERS
None. Previous blocker is resolved.
NITS
Unused
token_b64variable: Line 196-197 computestoken_b64=$(base64 -w0 < "${LOCAL_TOKEN_FILE}")but since the create now uses--from-fileinstead of constructing a manifest, thetoken_b64value at line 197 is only used for verification comparison at line 229. However, thestored_b64fetched from k8s via jsonpath is already base64-encoded by Kubernetes, so this comparison is correct. The variable name could be clearer (e.g.,local_b64) but this is cosmetic.Brief downtime window: Between
kubectl delete secretandkubectl create secret generic, there is a brief window where the secret does not exist. If basketball-api pods restart during this window, they would fail to mount the secret. The window is milliseconds for a scripted operation, but worth documenting as a known limitation. Not a blocker for an operator-initiated script.for = "0m"on critical alert: While consistent with the existingPodRestartStormpattern, Prometheus treatsfor: 0mas "fire immediately on first evaluation." For a 7-day threshold this is fine (if the secret is >7 days old, it is definitively expired), so this is correct. Just noting for completeness.No
tofu planoutput: PR body explains this is due to worktree lacking.terraform/init state, and notestofu fmt -recursivepassed clean. The Terraform change is a new resource (pure addition), so risk of state breakage is minimal. Acceptable given the constraint.SOP COMPLIANCE
162-gmail-oauth-reauth-automationreferences issue #162tofu fmtclean (stated in PR body)PROCESS OBSERVATIONS
VERDICT: APPROVED
PR #222 Review
DOMAIN REVIEW
Tech stack identified: Bash scripting (ops automation), OpenTofu/Terraform (Helm provider,
kubernetes_manifest), Salt pillar (YAML metadata registry).Bash/Ops script (
scripts/gmail-reauth.sh):set -euo pipefail, clean function decomposition, color-coded output,--dry-runsupport, prerequisite checks.kubectl applywould break the PrometheusRule'skube_secret_createdmetric. Good engineering judgment.statfallback (GNU--format=%sthen BSD-f%z) at line 150 is a nice touch, though archbox is always Linux..scopes[]vs.scope(space-delimited) is thorough.gmail-westsidebasktball.jsonis documented in 3 places (script header, constant comment, secrets registry). Well-handled.Terraform (
terraform/modules/monitoring/main.tf):gmail_oauth_expiry_alertresource follows the exact same pattern asblackbox_alertsandembedding_alerts(same labels, samedepends_on). Consistent.kube_secret_createdis a standard kube-state-metrics metric, and kube-state-metrics is running with default collectors (no filter). The metric will be available.for = "0m"on the critical alert is consistent with existingPodRestartStormat line 122. Pattern is: when the condition IS the event, fire immediately.Salt pillar (
salt/pillar/secrets_registry.sls):gmail_oauth_tokenentry underplatform:follows the exact schema of every other entry (origin, description, provider, created, rotation_days, notes). Clean.rotation_days: 7is the shortest rotation in the registry -- appropriately documented as Google policy-driven.BLOCKERS
None.
~/secrets/at runtime, never embeds values.--dry-run/--helpflags with strictcasematching; unknown args causedie.bash -nsyntax check and--dry-runmode serve as the validation mechanism. No unit tests needed for a kubectl orchestration script.NITS
Dead-code jsonpath attempt (line 214-216): The first verification attempt uses
jsonpath="{.data.${K8S_SECRET_KEY}}"which expands to{.data.gmail-westsidebasketball.json}. The unescaped dot beforejsonmeans kubectl interprets this as nested key access (.data.gmail-westsidebasketball->.json), which will always fail. The fallback at line 222 with the escaped dot (gmail-westsidebasketball\.json) is the one that actually works. Consider removing the first attempt and going straight to the escaped version to avoid confusion:Alternatively, use
-o go-templateorjqon the JSON output to avoid jsonpath escaping entirely.Hardcoded secret key in jsonpath fallback (line 222): The fallback hardcodes
gmail-westsidebasketball\.jsoninstead of deriving it fromK8S_SECRET_KEY. If the constant ever changes, this line would silently break. Minor since the constant is 5 lines above, but worth noting.python3 -m gmail_mcp.reauthprerequisite not checked (line 125): Thecheck_prerequisitesfunction validateskubectlandjqbut not whethergmail_mcpis importable. If the module is missing, the user gets a Python traceback instead of a cleandiemessage. Consider adding a check:python3 -c "import gmail_mcp.reauth" 2>/dev/null || die "gmail_mcp not installed".for = "0m"on GmailOAuthTokenExpired: While consistent with existing patterns, Prometheus documentation notes thatfor: 0mand omittingforare equivalent. Omitting it would be marginally cleaner, but this matches the existing codebase convention, so no strong opinion.PR body notes "Cannot run
tofu plan": This is understood given worktree isolation, but the PR convention says "Includetofu planoutput for any Terraform changes." The syntax is consistent with existing PrometheusRule resources, so the risk is low. Should be validated duringtofu applyper issue #224.SOP COMPLIANCE
162-gmail-oauth-reauth-automationmatches issue #162pal-e-platform) and closes #162~/secrets/at runtimetofu planoutput not included (worktree limitation acknowledged in PR body)PROCESS OBSERVATIONS
DORA impact: This PR directly improves MTTR for Gmail OAuth token expiry (currently a multi-step manual process every 7 days). The proactive alerting at day 6 converts a reactive outage into a scheduled 1-minute ops task. Strong MTTR improvement.
Change failure risk: Low. The PrometheusRule is additive (no existing resources modified). The script is operator-facing (not automated CI), and
--dry-runprovides a safety net. The secrets registry change is metadata-only.Validation gap: The
tofu planwas not run. The alert resource should be validated during the nexttofu applycycle (tracked by #224). Thekube_secret_createdmetric availability should be confirmed in Prometheus after apply.VERDICT: APPROVED