Bug: pal-e-mail ServiceMonitor scraping nonexistent /metrics endpoint #169
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-platform#169
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
standalone — discovered during AlertManager triage 2026-03-26
Repo
forgejo_admin/pal-e-platform(ServiceMonitor config) and/orforgejo_admin/pal-e-mail(if adding metrics endpoint)What Broke
ServiceMonitor deployed with pal-e-mail scrapes
/metricson porthttp, but the FastAPI app returns 404 on that path. Prometheus scrape fails, causing twoTargetDownalerts (warning) firing since 2026-03-22 (4 days). The service itself is healthy —/healthzreturns 200.Logs show repeated:
GET /metrics HTTP/1.1" 404 Not Foundfrom the Prometheus scraper IP.Repro Steps
kubectl logs -n pal-e-mail -l app=pal-e-mail --tail=20/metricsreturning 404TargetDownalerts for pal-e-mailExpected Behavior
Either: (a) ServiceMonitor removed so Prometheus doesn't scrape a nonexistent endpoint, or (b)
/metricsendpoint exists and returns Prometheus metrics.Environment
pal-e-mail-548bcf86f8-rpxcxTargetDownx2 (warning), firing since 2026-03-22Acceptance Criteria
TargetDownalerts clear for pal-e-mail/metricsreturns 200 with valid Prometheus metricsRelated
project-pal-e-platform— project (observability concern)story:superuser-observe— user storyarch:servicemonitor— architecture componentScope Review: NEEDS_REFINEMENT
Review note:
review-386-2026-03-26Repo mismatch: the ServiceMonitor is deployed via
pal-e-deployments(kustomize overlay), notpal-e-platform. The Forgejo issue is filed on the wrong repo.pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml+bases/servicemonitor/servicemonitor.yaml, not inpal-e-platform/terraform/main.tfScope Correction (post-review)
Per review
review-386-2026-03-26, correcting repo and fix direction.Repo Correction
Wrong repo in original scope. The ServiceMonitor is NOT in pal-e-platform terraform. It's deployed via pal-e-deployments kustomize overlay.
Correct repo:
forgejo_admin/pal-e-deploymentsFile Targets
pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yamllines 43-53 (thetarget: kind: ServiceMonitorpatch block)pal-e-deployments/bases/servicemonitor/servicemonitor.yaml— shared by basketball-api, mcd-tracker, pal-e-docs (all have working /metrics)Fix Direction (decided)
Remove the ServiceMonitor from pal-e-mail's overlay. Three sibling services use the same base and have working
/metricsendpoints. pal-e-mail doesn't expose/metrics. Adding metrics can be a separate feature ticket later.Blast Radius
Acceptance Criteria (updated)
TargetDownalerts clear for pal-e-mailScope Review: NEEDS_REFINEMENT
Review note:
review-386-2026-03-26-v2Second review pass (post scope correction). Three items remain before READY:
- ../../../bases/servicemonitor). Without removing both, an orphaned ServiceMonitor named "app" still deploys into the namespace.kustomize build overlays/pal-e-mail/prod/to verify no ServiceMonitor in output. Post-merge:kubectl get servicemonitor -n pal-e-mailshould return empty.Refinement: incomplete file targets + test expectations
Per review
review-386-2026-03-26-v2:Additional File Target
pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yamlline 5 — remove- ../../../bases/servicemonitorbase reference. If only the patch block (lines 43-53) is removed, an orphaned ServiceMonitor named "app" with selector "app: app" still deploys.Test Expectations (added)
kustomize build overlays/pal-e-mail/prod/should produce no ServiceMonitor resourcekubectl get servicemonitor -n pal-e-mailshould return emptyCross-Repo PR Note
This issue is filed on pal-e-platform but the PR must be opened on pal-e-deployments. Agent must branch and PR against
forgejo_admin/pal-e-deployments.Scope Review: READY
Review note:
review-386-2026-03-26-v3Third review pass (re-review after both scope corrections). All three items from v2 review are now addressed: line 5 base reference documented, test expectations added, cross-repo PR note included. File targets verified against live codebase -- both removal points (line 5 + lines 43-53) confirmed. Blast radius clear: 3 sibling services unaffected. Scope is complete and agent-executable.