Bug: pal-e-mail ServiceMonitor scraping nonexistent /metrics endpoint #169

Closed
opened 2026-03-26 15:22:35 +00:00 by forgejo_admin · 5 comments
Contributor

Type

Bug

Lineage

standalone — discovered during AlertManager triage 2026-03-26

Repo

forgejo_admin/pal-e-platform (ServiceMonitor config) and/or forgejo_admin/pal-e-mail (if adding metrics endpoint)

What Broke

ServiceMonitor deployed with pal-e-mail scrapes /metrics on port http, but the FastAPI app returns 404 on that path. Prometheus scrape fails, causing two TargetDown alerts (warning) firing since 2026-03-22 (4 days). The service itself is healthy — /healthz returns 200.

Logs show repeated: GET /metrics HTTP/1.1" 404 Not Found from the Prometheus scraper IP.

Repro Steps

  1. Check pal-e-mail logs: kubectl logs -n pal-e-mail -l app=pal-e-mail --tail=20
  2. Observe: /metrics returning 404
  3. Check AlertManager: two TargetDown alerts for pal-e-mail

Expected Behavior

Either: (a) ServiceMonitor removed so Prometheus doesn't scrape a nonexistent endpoint, or (b) /metrics endpoint exists and returns Prometheus metrics.

Environment

  • Cluster/namespace: pal-e-mail (service), monitoring (Prometheus scraper)
  • Service version: current deployment pal-e-mail-548bcf86f8-rpxcx
  • Related alerts: TargetDown x2 (warning), firing since 2026-03-22

Acceptance Criteria

  • TargetDown alerts clear for pal-e-mail
  • Either ServiceMonitor removed OR /metrics returns 200 with valid Prometheus metrics
  • No regression on pal-e-mail health
  • project-pal-e-platform — project (observability concern)
  • story:superuser-observe — user story
  • arch:servicemonitor — architecture component
### Type Bug ### Lineage standalone — discovered during AlertManager triage 2026-03-26 ### Repo `forgejo_admin/pal-e-platform` (ServiceMonitor config) and/or `forgejo_admin/pal-e-mail` (if adding metrics endpoint) ### What Broke ServiceMonitor deployed with pal-e-mail scrapes `/metrics` on port `http`, but the FastAPI app returns 404 on that path. Prometheus scrape fails, causing two `TargetDown` alerts (warning) firing since 2026-03-22 (4 days). The service itself is healthy — `/healthz` returns 200. Logs show repeated: `GET /metrics HTTP/1.1" 404 Not Found` from the Prometheus scraper IP. ### Repro Steps 1. Check pal-e-mail logs: `kubectl logs -n pal-e-mail -l app=pal-e-mail --tail=20` 2. Observe: `/metrics` returning 404 3. Check AlertManager: two `TargetDown` alerts for pal-e-mail ### Expected Behavior Either: (a) ServiceMonitor removed so Prometheus doesn't scrape a nonexistent endpoint, or (b) `/metrics` endpoint exists and returns Prometheus metrics. ### Environment - Cluster/namespace: pal-e-mail (service), monitoring (Prometheus scraper) - Service version: current deployment `pal-e-mail-548bcf86f8-rpxcx` - Related alerts: `TargetDown` x2 (warning), firing since 2026-03-22 ### Acceptance Criteria - [ ] `TargetDown` alerts clear for pal-e-mail - [ ] Either ServiceMonitor removed OR `/metrics` returns 200 with valid Prometheus metrics - [ ] No regression on pal-e-mail health ### Related - `project-pal-e-platform` — project (observability concern) - `story:superuser-observe` — user story - `arch:servicemonitor` — architecture component
Author
Contributor

Scope Review: NEEDS_REFINEMENT

Review note: review-386-2026-03-26

Repo mismatch: the ServiceMonitor is deployed via pal-e-deployments (kustomize overlay), not pal-e-platform. The Forgejo issue is filed on the wrong repo.

  • Repo field wrong: ServiceMonitor lives in pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml + bases/servicemonitor/servicemonitor.yaml, not in pal-e-platform/terraform/main.tf
  • File Targets missing: no specific file paths for the agent to modify
  • Pick one fix approach: ticket offers two options (remove ServiceMonitor vs add /metrics) but should recommend one — agent needs a clear directive
  • Blast radius clear: basketball-api, mcd-tracker, pal-e-docs all use same ServiceMonitor base but all have working /metrics endpoints
## Scope Review: NEEDS_REFINEMENT Review note: `review-386-2026-03-26` Repo mismatch: the ServiceMonitor is deployed via `pal-e-deployments` (kustomize overlay), not `pal-e-platform`. The Forgejo issue is filed on the wrong repo. - **Repo field wrong**: ServiceMonitor lives in `pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml` + `bases/servicemonitor/servicemonitor.yaml`, not in `pal-e-platform/terraform/main.tf` - **File Targets missing**: no specific file paths for the agent to modify - **Pick one fix approach**: ticket offers two options (remove ServiceMonitor vs add /metrics) but should recommend one — agent needs a clear directive - **Blast radius clear**: basketball-api, mcd-tracker, pal-e-docs all use same ServiceMonitor base but all have working /metrics endpoints
Author
Contributor

Scope Correction (post-review)

Per review review-386-2026-03-26, correcting repo and fix direction.

Repo Correction

Wrong repo in original scope. The ServiceMonitor is NOT in pal-e-platform terraform. It's deployed via pal-e-deployments kustomize overlay.

Correct repo: forgejo_admin/pal-e-deployments

File Targets

  • Remove ServiceMonitor inclusion: pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml lines 43-53 (the target: kind: ServiceMonitor patch block)
  • Base (do NOT modify): pal-e-deployments/bases/servicemonitor/servicemonitor.yaml — shared by basketball-api, mcd-tracker, pal-e-docs (all have working /metrics)

Fix Direction (decided)

Remove the ServiceMonitor from pal-e-mail's overlay. Three sibling services use the same base and have working /metrics endpoints. pal-e-mail doesn't expose /metrics. Adding metrics can be a separate feature ticket later.

Blast Radius

  • basketball-api, mcd-tracker, pal-e-docs ServiceMonitors unaffected (they include the base independently)
  • pal-e-mail health unaffected (healthz still works, just no Prometheus metrics)

Acceptance Criteria (updated)

  • ServiceMonitor reference removed from pal-e-mail kustomization.yaml
  • TargetDown alerts clear for pal-e-mail
  • ArgoCD syncs successfully after change
  • Other 3 service ServiceMonitors still functional
## Scope Correction (post-review) Per review `review-386-2026-03-26`, correcting repo and fix direction. ### Repo Correction **Wrong repo in original scope.** The ServiceMonitor is NOT in pal-e-platform terraform. It's deployed via pal-e-deployments kustomize overlay. **Correct repo:** `forgejo_admin/pal-e-deployments` ### File Targets - **Remove ServiceMonitor inclusion:** `pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml` lines 43-53 (the `target: kind: ServiceMonitor` patch block) - **Base (do NOT modify):** `pal-e-deployments/bases/servicemonitor/servicemonitor.yaml` — shared by basketball-api, mcd-tracker, pal-e-docs (all have working /metrics) ### Fix Direction (decided) **Remove the ServiceMonitor** from pal-e-mail's overlay. Three sibling services use the same base and have working `/metrics` endpoints. pal-e-mail doesn't expose `/metrics`. Adding metrics can be a separate feature ticket later. ### Blast Radius - basketball-api, mcd-tracker, pal-e-docs ServiceMonitors unaffected (they include the base independently) - pal-e-mail health unaffected (healthz still works, just no Prometheus metrics) ### Acceptance Criteria (updated) - [ ] ServiceMonitor reference removed from pal-e-mail kustomization.yaml - [ ] `TargetDown` alerts clear for pal-e-mail - [ ] ArgoCD syncs successfully after change - [ ] Other 3 service ServiceMonitors still functional
Author
Contributor

Scope Review: NEEDS_REFINEMENT

Review note: review-386-2026-03-26-v2
Second review pass (post scope correction). Three items remain before READY:

  • Incomplete file targets: Scope correction says remove patch block (lines 43-53) but omits removing the resource reference at line 5 (- ../../../bases/servicemonitor). Without removing both, an orphaned ServiceMonitor named "app" still deploys into the namespace.
  • Missing test expectations: Agent needs kustomize build overlays/pal-e-mail/prod/ to verify no ServiceMonitor in output. Post-merge: kubectl get servicemonitor -n pal-e-mail should return empty.
  • Cross-repo PR target: Issue is on pal-e-platform but PR must target pal-e-deployments. Add explicit note so agent creates PR on correct repo.
## Scope Review: NEEDS_REFINEMENT Review note: `review-386-2026-03-26-v2` Second review pass (post scope correction). Three items remain before READY: - **Incomplete file targets**: Scope correction says remove patch block (lines 43-53) but omits removing the resource reference at line 5 (`- ../../../bases/servicemonitor`). Without removing both, an orphaned ServiceMonitor named "app" still deploys into the namespace. - **Missing test expectations**: Agent needs `kustomize build overlays/pal-e-mail/prod/` to verify no ServiceMonitor in output. Post-merge: `kubectl get servicemonitor -n pal-e-mail` should return empty. - **Cross-repo PR target**: Issue is on pal-e-platform but PR must target pal-e-deployments. Add explicit note so agent creates PR on correct repo.
Author
Contributor

Refinement: incomplete file targets + test expectations

Per review review-386-2026-03-26-v2:

Additional File Target

  • pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml line 5 — remove - ../../../bases/servicemonitor base reference. If only the patch block (lines 43-53) is removed, an orphaned ServiceMonitor named "app" with selector "app: app" still deploys.

Test Expectations (added)

  • Pre-merge: kustomize build overlays/pal-e-mail/prod/ should produce no ServiceMonitor resource
  • Post-merge: kubectl get servicemonitor -n pal-e-mail should return empty

Cross-Repo PR Note

This issue is filed on pal-e-platform but the PR must be opened on pal-e-deployments. Agent must branch and PR against forgejo_admin/pal-e-deployments.

## Refinement: incomplete file targets + test expectations Per review `review-386-2026-03-26-v2`: ### Additional File Target - `pal-e-deployments/overlays/pal-e-mail/prod/kustomization.yaml` **line 5** — remove `- ../../../bases/servicemonitor` base reference. If only the patch block (lines 43-53) is removed, an orphaned ServiceMonitor named "app" with selector "app: app" still deploys. ### Test Expectations (added) - Pre-merge: `kustomize build overlays/pal-e-mail/prod/` should produce no ServiceMonitor resource - Post-merge: `kubectl get servicemonitor -n pal-e-mail` should return empty ### Cross-Repo PR Note This issue is filed on pal-e-platform but the PR must be opened on **pal-e-deployments**. Agent must branch and PR against `forgejo_admin/pal-e-deployments`.
Author
Contributor

Scope Review: READY

Review note: review-386-2026-03-26-v3
Third review pass (re-review after both scope corrections). All three items from v2 review are now addressed: line 5 base reference documented, test expectations added, cross-repo PR note included. File targets verified against live codebase -- both removal points (line 5 + lines 43-53) confirmed. Blast radius clear: 3 sibling services unaffected. Scope is complete and agent-executable.

## Scope Review: READY Review note: `review-386-2026-03-26-v3` Third review pass (re-review after both scope corrections). All three items from v2 review are now addressed: line 5 base reference documented, test expectations added, cross-repo PR note included. File targets verified against live codebase -- both removal points (line 5 + lines 43-53) confirmed. Blast radius clear: 3 sibling services unaffected. Scope is complete and agent-executable.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#169
No description provided.