WRONG REPO: Issue filed on basketball-api but all work is in pal-e-platform. PrometheusRule CRDs live in terraform/modules/monitoring/main.tf (existing pattern at lines 387+ and 684+). Should be re-filed or repo field updated.
Missing ### Type header: Should be Feature.
Vague file targets: Replace with terraform/modules/monitoring/main.tf -- add PrometheusRule resource following existing blackbox_alerts pattern.
Stale info: "notification channel TBD" -- Telegram is already configured via Alertmanager.
Potential duplicate: kube-prometheus-stack ships with KubePodCrashLooping alert by default. Verify whether it's already active and just not routing, before creating a custom rule.
Scope is platform-wide: A CrashLoopBackOff alert rule would fire for ALL pods, not just basketball-api. This is better, but the ticket should acknowledge it.
Board labels missing: Needs type:feature, arch:monitoring, repo:pal-e-platform.

## Scope Review: NEEDS_REFINEMENT Review note: `review-70-2026-03-27` Critical repo placement issue and several template gaps: - **WRONG REPO:** Issue filed on basketball-api but all work is in **pal-e-platform**. PrometheusRule CRDs live in `terraform/modules/monitoring/main.tf` (existing pattern at lines 387+ and 684+). Should be re-filed or repo field updated. - **Missing `### Type` header:** Should be `Feature`. - **Vague file targets:** Replace with `terraform/modules/monitoring/main.tf` -- add PrometheusRule resource following existing `blackbox_alerts` pattern. - **Stale info:** "notification channel TBD" -- Telegram is already configured via Alertmanager. - **Potential duplicate:** kube-prometheus-stack ships with `KubePodCrashLooping` alert by default. Verify whether it's already active and just not routing, before creating a custom rule. - **Scope is platform-wide:** A CrashLoopBackOff alert rule would fire for ALL pods, not just basketball-api. This is better, but the ticket should acknowledge it. - **Board labels missing:** Needs type:feature, arch:monitoring, repo:pal-e-platform.

forgejo_admin commented

2026-03-27 22:14:10 +00:00

Author

Owner

Issue body updated per scope review corrections.

forgejo_admin commented

2026-03-28 19:09:37 +00:00

Author

Owner

Superseded -- Closing

This issue requested alerting rules for CrashLoopBackOff and downtime. All requested alerting already exists in pal-e-platform/terraform/modules/monitoring/main.tf:

PodRestartStorm (line 120) -- Fires on >3 restarts in 15 minutes. Covers the CrashLoopBackOff detection use case.
OOMKilled (line 131) -- Fires when a container is OOMKilled.
EndpointDown (line 405) -- Blackbox probe fires on probe_success == 0 for >2 minutes. Covers the downtime detection use case.
EndpointSlowResponse -- Fires on probe_duration_seconds > 5s for >5 minutes.
KubePodCrashLooping -- Built-in kube-prometheus-stack rule (defaultRules.create = true by default).

Notification routing is fully configured: Telegram (primary) + Slack (secondary) via Alertmanager.

The scope review (review-70-2026-03-27) also flagged this as:

WRONG REPO -- All monitoring work lives in pal-e-platform, not basketball-api.
Potential duplicate of built-in KubePodCrashLooping rule.
Issue body corrupted by the $NEW_BODY bug (session 2026-03-28).

Action: Removing board item #70 from board-westside-basketball and closing this issue as superseded.

## Superseded -- Closing This issue requested alerting rules for CrashLoopBackOff and downtime. All requested alerting already exists in `pal-e-platform/terraform/modules/monitoring/main.tf`: - **`PodRestartStorm`** (line 120) -- Fires on >3 restarts in 15 minutes. Covers the CrashLoopBackOff detection use case. - **`OOMKilled`** (line 131) -- Fires when a container is OOMKilled. - **`EndpointDown`** (line 405) -- Blackbox probe fires on `probe_success == 0` for >2 minutes. Covers the downtime detection use case. - **`EndpointSlowResponse`** -- Fires on probe_duration_seconds > 5s for >5 minutes. - **`KubePodCrashLooping`** -- Built-in kube-prometheus-stack rule (defaultRules.create = true by default). Notification routing is fully configured: Telegram (primary) + Slack (secondary) via Alertmanager. The scope review (`review-70-2026-03-27`) also flagged this as: - **WRONG REPO** -- All monitoring work lives in pal-e-platform, not basketball-api. - Potential duplicate of built-in `KubePodCrashLooping` rule. - Issue body corrupted by the `$NEW_BODY` bug (session 2026-03-28). **Action:** Removing board item #70 from board-westside-basketball and closing this issue as superseded.

forgejo_admin closed this issue

2026-03-28 19:09:48 +00:00