Fix Telegram chat_id type and dora-exporter OOM #53

Closed
opened 2026-03-14 18:19:24 +00:00 by forgejo_admin · 0 comments

Lineage

bug-alert-noise-broken-services — relates to plan-pal-e-platform → Phase 3 (alerting)

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want the PrometheusOperator to sync successfully and the dora-exporter to stop OOM-killing
So that the monitoring stack is healthy and alerts are meaningful

Context

Two TF configuration issues in the monitoring stack:

  1. PrometheusOperatorSyncFailed — The Telegram chat_id Helm value at line 317 uses type = "string" which forces the value to be quoted. The alertmanager config parser expects int64 for chat_id. Error: yaml: unmarshal errors: line 31: cannot unmarshal !!str '-520096...' into int64. This was flagged as a QA nit on PR #43.

  2. OOMKilled (dora-exporter) — The dora-exporter container at line 1030 has a 128Mi memory limit. It's being OOM-killed repeatedly (exit code 137). Current OOMKilled alert is critical severity.

File Targets

Files the agent should modify:

  • terraform/main.tf line 317 — change type = "string" to type = "auto" for chat_id
  • terraform/main.tf line 1030 — change memory limit from "128Mi" to "256Mi"

Files the agent should NOT touch:

  • .woodpecker.yaml — CI pipeline, no changes needed
  • Any other resources in main.tf

Acceptance Criteria

  • Telegram chat_id Helm set uses type = "auto" instead of type = "string"
  • dora-exporter memory limit increased to 256Mi
  • tofu validate passes
  • tofu fmt produces no changes

Test Expectations

  • tofu validate passes
  • tofu fmt check produces no diff
  • After apply: PrometheusOperatorSyncFailed alert resolves
  • After apply: dora-exporter pod starts without OOMKilled

Constraints

  • Only change the two specific values identified — no other refactoring
  • Keep dora-exporter requests at 32Mi (only bump the limit)

Checklist

  • PR opened
  • tofu validate passes
  • tofu fmt clean
  • No unrelated changes
  • Closes this issue
  • bug-alert-noise-broken-services — pal-e-docs bug note
  • PR #43 QA nit — chat_id validation
  • PR #35 QA nit — OOMKilled gauge
  • plan-pal-e-platform — Platform Hardening plan
### Lineage `bug-alert-noise-broken-services` — relates to `plan-pal-e-platform` → Phase 3 (alerting) ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want the PrometheusOperator to sync successfully and the dora-exporter to stop OOM-killing So that the monitoring stack is healthy and alerts are meaningful ### Context Two TF configuration issues in the monitoring stack: 1. **PrometheusOperatorSyncFailed** — The Telegram `chat_id` Helm value at line 317 uses `type = "string"` which forces the value to be quoted. The alertmanager config parser expects int64 for chat_id. Error: `yaml: unmarshal errors: line 31: cannot unmarshal !!str '-520096...' into int64`. This was flagged as a QA nit on PR #43. 2. **OOMKilled (dora-exporter)** — The dora-exporter container at line 1030 has a 128Mi memory limit. It's being OOM-killed repeatedly (exit code 137). Current OOMKilled alert is critical severity. ### File Targets Files the agent should modify: - `terraform/main.tf` line 317 — change `type = "string"` to `type = "auto"` for chat_id - `terraform/main.tf` line 1030 — change memory limit from `"128Mi"` to `"256Mi"` Files the agent should NOT touch: - `.woodpecker.yaml` — CI pipeline, no changes needed - Any other resources in main.tf ### Acceptance Criteria - [ ] Telegram chat_id Helm set uses `type = "auto"` instead of `type = "string"` - [ ] dora-exporter memory limit increased to 256Mi - [ ] `tofu validate` passes - [ ] `tofu fmt` produces no changes ### Test Expectations - [ ] `tofu validate` passes - [ ] `tofu fmt` check produces no diff - [ ] After apply: PrometheusOperatorSyncFailed alert resolves - [ ] After apply: dora-exporter pod starts without OOMKilled ### Constraints - Only change the two specific values identified — no other refactoring - Keep dora-exporter requests at 32Mi (only bump the limit) ### Checklist - [ ] PR opened - [ ] `tofu validate` passes - [ ] `tofu fmt` clean - [ ] No unrelated changes - [ ] Closes this issue ### Related - `bug-alert-noise-broken-services` — pal-e-docs bug note - PR #43 QA nit — chat_id validation - PR #35 QA nit — OOMKilled gauge - `plan-pal-e-platform` — Platform Hardening plan
forgejo_admin 2026-03-14 18:25:10 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#53
No description provided.