Fix Grafana CrashLoopBackOff — duplicate default datasource #18
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform#18
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Plan
todo-fix-grafana-duplicate-default-datasource-- Bug fix (no plan phase)Repo
forgejo_admin/pal-e-platformUser Story
As a platform operator
I want Grafana to start successfully
So that I can use dashboards and alerting
Context
Grafana has been in CrashLoopBackOff for ~8 days. The error is:
Three ConfigMaps with label
grafana_datasource: "1"exist in the monitoring namespace. Two setisDefault: true:kube-prometheus-stack-grafana-datasource(Helm) — PrometheusisDefault: trueloki-stack(loki-stack Helm chart auto-generated) — LokiisDefault: truegrafana-loki-datasource(custom Terraform ConfigMap) — LokiisDefault: falseConfigMap #3 was the correct approach but #2 (auto-generated by loki-stack chart) is still present and competing. The fix is to disable the loki-stack chart's auto-generated datasource sidecar.
File Targets
Files the agent should modify:
terraform/main.tf-- loki-stack Helm values (~line 192): addsidecar = { datasources = { enabled = false } }inside the existinggrafanablockFiles the agent should NOT touch:
grafana-loki-datasourceConfigMap resource (~line 222) -- this is correct as-iskube_prometheus_stackHelm release -- no changes needed thereAcceptance Criteria
tofu fmtpassestofu validatepassestofu planshows only the loki-stack Helm release updatingTest Expectations
tofu fmt -check-- no formatting issuestofu validate-- valid configurationtofu plan-- only loki-stack release changes, no unexpected drifttofu fmt -check && tofu validate && tofu planConstraints
tofu apply-- leave that for manual verificationChecklist
Related
project-pal-e-platform-- platform projecttodo-fix-grafana-duplicate-default-datasource-- pal-e-docs TODOdeployment-lessons-- documented under "Hard Shutdown Survival"Closing — resolved by PR #21 (merged). The
Closes #18was missing from the PR body, so this stayed open. pal-e-docs TODObug-grafana-crashloopalready markeddone.