Fix Grafana CrashLoopBackOff — duplicate default datasource #18

Closed
opened 2026-03-06 12:24:02 +00:00 by forgejo_admin · 1 comment

Plan

todo-fix-grafana-duplicate-default-datasource -- Bug fix (no plan phase)

Repo

forgejo_admin/pal-e-platform

User Story

As a platform operator
I want Grafana to start successfully
So that I can use dashboards and alerting

Context

Grafana has been in CrashLoopBackOff for ~8 days. The error is:

Datasource provisioning error: datasource.yaml config is invalid.
Only one datasource per organization can be marked as default

Three ConfigMaps with label grafana_datasource: "1" exist in the monitoring namespace. Two set isDefault: true:

  1. kube-prometheus-stack-grafana-datasource (Helm) — Prometheus isDefault: true
  2. loki-stack (loki-stack Helm chart auto-generated) — Loki isDefault: true
  3. grafana-loki-datasource (custom Terraform ConfigMap) — Loki isDefault: false

ConfigMap #3 was the correct approach but #2 (auto-generated by loki-stack chart) is still present and competing. The fix is to disable the loki-stack chart's auto-generated datasource sidecar.

File Targets

Files the agent should modify:

  • terraform/main.tf -- loki-stack Helm values (~line 192): add sidecar = { datasources = { enabled = false } } inside the existing grafana block

Files the agent should NOT touch:

  • The grafana-loki-datasource ConfigMap resource (~line 222) -- this is correct as-is
  • The kube_prometheus_stack Helm release -- no changes needed there

Acceptance Criteria

  • loki-stack Helm values disable the sidecar datasource provisioning
  • tofu fmt passes
  • tofu validate passes
  • tofu plan shows only the loki-stack Helm release updating

Test Expectations

  • Run tofu fmt -check -- no formatting issues
  • Run tofu validate -- valid configuration
  • Run tofu plan -- only loki-stack release changes, no unexpected drift
  • Run command: tofu fmt -check && tofu validate && tofu plan

Constraints

  • Match existing HCL style in main.tf
  • Only change the loki-stack Helm values block, nothing else
  • Do NOT run tofu apply -- leave that for manual verification

Checklist

  • PR opened
  • Tests pass
  • No unrelated changes
  • project-pal-e-platform -- platform project
  • todo-fix-grafana-duplicate-default-datasource -- pal-e-docs TODO
  • deployment-lessons -- documented under "Hard Shutdown Survival"
### Plan `todo-fix-grafana-duplicate-default-datasource` -- Bug fix (no plan phase) ### Repo `forgejo_admin/pal-e-platform` ### User Story As a platform operator I want Grafana to start successfully So that I can use dashboards and alerting ### Context Grafana has been in CrashLoopBackOff for ~8 days. The error is: ``` Datasource provisioning error: datasource.yaml config is invalid. Only one datasource per organization can be marked as default ``` Three ConfigMaps with label `grafana_datasource: "1"` exist in the monitoring namespace. Two set `isDefault: true`: 1. `kube-prometheus-stack-grafana-datasource` (Helm) — Prometheus `isDefault: true` 2. `loki-stack` (loki-stack Helm chart auto-generated) — Loki `isDefault: true` 3. `grafana-loki-datasource` (custom Terraform ConfigMap) — Loki `isDefault: false` ConfigMap #3 was the correct approach but #2 (auto-generated by loki-stack chart) is still present and competing. The fix is to disable the loki-stack chart's auto-generated datasource sidecar. ### File Targets Files the agent should modify: - `terraform/main.tf` -- loki-stack Helm values (~line 192): add `sidecar = { datasources = { enabled = false } }` inside the existing `grafana` block Files the agent should NOT touch: - The `grafana-loki-datasource` ConfigMap resource (~line 222) -- this is correct as-is - The `kube_prometheus_stack` Helm release -- no changes needed there ### Acceptance Criteria - [ ] loki-stack Helm values disable the sidecar datasource provisioning - [ ] `tofu fmt` passes - [ ] `tofu validate` passes - [ ] `tofu plan` shows only the loki-stack Helm release updating ### Test Expectations - [ ] Run `tofu fmt -check` -- no formatting issues - [ ] Run `tofu validate` -- valid configuration - [ ] Run `tofu plan` -- only loki-stack release changes, no unexpected drift - Run command: `tofu fmt -check && tofu validate && tofu plan` ### Constraints - Match existing HCL style in main.tf - Only change the loki-stack Helm values block, nothing else - Do NOT run `tofu apply` -- leave that for manual verification ### Checklist - [ ] PR opened - [ ] Tests pass - [ ] No unrelated changes ### Related - `project-pal-e-platform` -- platform project - `todo-fix-grafana-duplicate-default-datasource` -- pal-e-docs TODO - `deployment-lessons` -- documented under "Hard Shutdown Survival"
Author
Owner

Closing — resolved by PR #21 (merged). The Closes #18 was missing from the PR body, so this stayed open. pal-e-docs TODO bug-grafana-crashloop already marked done.

Closing — resolved by PR #21 (merged). The `Closes #18` was missing from the PR body, so this stayed open. pal-e-docs TODO `bug-grafana-crashloop` already marked `done`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/pal-e-platform#18
No description provided.