Fix Telegram chat_id type and dora-exporter OOM

forgejo_admin commented

2026-03-14 18:21:13 +00:00

Owner

Summary

Fixes two production issues: Telegram alerting broken by chat_id being passed as a string instead of integer, and dora-exporter pods getting OOMKilled at 128Mi.

Changes

terraform/main.tf (line 317): Changed telegram_chat_id set_sensitive type from "string" to "auto" so Helm passes it as an integer (Telegram API requires numeric chat_id)
terraform/main.tf (line 1030): Bumped dora-exporter memory limit from 128Mi to 256Mi to prevent OOMKill

Test Plan

tofu fmt -- no changes needed
tofu validate -- passes
After merge: verify tofu plan shows only the two expected changes, then tofu apply
Confirm Telegram alerts fire successfully (chat_id now numeric)
Confirm dora-exporter pod stays running without OOMKill

Review Checklist

Only two values changed -- no other modifications
tofu fmt produces no diff
tofu validate passes
PR body contains Closes #53

Plan: plan-pal-e-platform
Forgejo issue: #53

Closes #53

## Summary Fixes two production issues: Telegram alerting broken by chat_id being passed as a string instead of integer, and dora-exporter pods getting OOMKilled at 128Mi. ## Changes - `terraform/main.tf` (line 317): Changed `telegram_chat_id` `set_sensitive` type from `"string"` to `"auto"` so Helm passes it as an integer (Telegram API requires numeric chat_id) - `terraform/main.tf` (line 1030): Bumped dora-exporter memory limit from `128Mi` to `256Mi` to prevent OOMKill ## Test Plan - `tofu fmt` -- no changes needed - `tofu validate` -- passes - After merge: verify `tofu plan` shows only the two expected changes, then `tofu apply` - Confirm Telegram alerts fire successfully (chat_id now numeric) - Confirm dora-exporter pod stays running without OOMKill ## Review Checklist - [x] Only two values changed -- no other modifications - [x] `tofu fmt` produces no diff - [x] `tofu validate` passes - [x] PR body contains `Closes #53` ## Related - Plan: `plan-pal-e-platform` - Forgejo issue: #53 Closes #53

forgejo_admin added 1 commit

2026-03-14 18:21:13 +00:00

Fix Telegram chat_id type and dora-exporter OOM

ci/woodpecker/push/woodpecker Pipeline was successful

Details

ci/woodpecker/pr/woodpecker Pipeline was successful

Details

ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful

Details

caaf649322

- Change telegram_chat_id set_sensitive type from "string" to "auto"
  so Helm receives it as an integer (Telegram API requires numeric chat_id)
- Bump dora-exporter memory limit from 128Mi to 256Mi to prevent OOMKill

Closes #53

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

forgejo_admin commented

2026-03-14 18:21:35 +00:00

Author

Owner

Review: LGTM

Diff is exactly 2 lines changed in 1 file, scoped precisely to the issue:

telegram_chat_id type "string" -> "auto" -- Helm will now pass it as an integer, which is what the Telegram API expects.
dora-exporter memory limit 128Mi -> 256Mi -- doubles the ceiling to prevent OOMKill.

No unrelated changes. tofu fmt clean, tofu validate passes. Ready to merge.

## Review: LGTM Diff is exactly 2 lines changed in 1 file, scoped precisely to the issue: 1. `telegram_chat_id` type `"string"` -> `"auto"` -- Helm will now pass it as an integer, which is what the Telegram API expects. 2. `dora-exporter` memory limit `128Mi` -> `256Mi` -- doubles the ceiling to prevent OOMKill. No unrelated changes. `tofu fmt` clean, `tofu validate` passes. Ready to merge.

forgejo_admin commented

2026-03-14 18:22:08 +00:00

Author

Owner

Tofu Plan Output

tailscale_acl.this: Refreshing state... [id=acl]
helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin]
kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo]
kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres]
data.kubernetes_namespace_v1.pal_e_docs: Reading...
kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama]
kubernetes_namespace_v1.minio: Refreshing state... [id=minio]
kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak]
kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale]
data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs]
kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system]
kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor]
data.kubernetes_namespace_v1.tofu_state: Reading...
kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring]
kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url]
data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state]
helm_release.forgejo: Refreshing state... [id=forgejo]
kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data]
kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin]
helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator]
kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
helm_release.cnpg: Refreshing state... [id=cnpg]
helm_release.loki_stack: Refreshing state... [id=loki-stack]
kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack]
kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
helm_release.ollama: Refreshing state... [id=ollama]
helm_release.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel]
kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel]
kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel]
helm_release.harbor: Refreshing state... [id=harbor]
kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource]
helm_release.minio: Refreshing state... [id=minio]
kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel]
kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard]
kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel]
kubernetes_manifest.dora_exporter_service_monitor: Refreshing state...
minio_iam_user.cnpg: Refreshing state... [id=cnpg]
minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal]
minio_iam_user.tf_backup: Refreshing state... [id=tf-backup]
minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup]
minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups]
minio_s3_bucket.assets: Refreshing state... [id=assets]
minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal]
kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel]
kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel]
minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001]
minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001]
kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds]
kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds]
kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel]

OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

OpenTofu will perform the following actions:

  # helm_release.kube_prometheus_stack will be updated in-place
  ~ resource "helm_release" "kube_prometheus_stack" {
        id                         = "kube-prometheus-stack"
      ~ metadata                   = [
          - {
              - app_version    = "v0.89.0"
              - chart          = "kube-prometheus-stack"
              - first_deployed = 1771560679
              - last_deployed  = 1773506154
              - name           = "kube-prometheus-stack"
              - namespace      = "monitoring"
              - notes          = <<-EOT
                    kube-prometheus-stack has been installed. Check its status by running:
                      kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack"
                    
                    Get Grafana 'admin' user password by running:
                    
                      kubectl --namespace monitoring get secrets kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo
                    
                    Access Grafana local instance:
                    
                      export POD_NAME=$(kubectl --namespace monitoring get pod -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=kube-prometheus-stack" -oname)
                      kubectl --namespace monitoring port-forward $POD_NAME 3000
                    
                    Get your grafana admin user password by running:
                    
                      kubectl get secret --namespace monitoring -l app.kubernetes.io/component=admin-secret -o jsonpath="{.items[0].data.admin-password}" | base64 --decode ; echo
                    
                    
                    Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
                    
                    1. Get your 'admin' user password by running:
                    
                       kubectl get secret --namespace monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
                    
                    
                    2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
                    
                       kube-prometheus-stack-grafana.monitoring.svc.cluster.local
                    
                       Get the Grafana URL to visit by running these commands in the same shell:
                         export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=kube-prometheus-stack" -o jsonpath="{.items[0].metadata.name}")
                         kubectl --namespace monitoring port-forward $POD_NAME 3000
                    
                    3. Login with the password from step 1 and the username: admin
                    
                    1. Get the application URL by running these commands:
                      export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=kube-prometheus-stack" -o jsonpath="{.items[0].metadata.name}")
                      echo "Visit http://127.0.0.1:9100 to use your application"
                      kubectl port-forward --namespace monitoring $POD_NAME 9100
                    kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
                    The exposed metrics can be found here:
                    https://github.com/kubernetes/kube-state-metrics/blob/master/docs/README.md#exposed-metrics
                    
                    The metrics are exported on the HTTP endpoint /metrics on the listening port.
                    In your case, kube-prometheus-stack-kube-state-metrics.monitoring.svc.cluster.local:8080/metrics
                    
                    They are served either as plaintext or protobuf depending on the Accept header.
                    They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint.
                EOT
              - revision       = 12
              - values         = jsonencode(
                    {
                      - additionalPrometheusRules = [
                          - {
                              - groups = [
                                  - {
                                      - name  = "pod-health"
                                      - rules = [
                                          - {
                                              - alert       = "PodRestartStorm"
                                              - annotations = {
                                                  - description = "Pod {{ $labels.namespace }}/{{ $labels.pod }} has restarted {{ $value }} times in the last 15 minutes."
                                                  - summary     = "Pod {{ $labels.namespace }}/{{ $labels.pod }} restarting frequently"
                                                }
                                              - expr        = "increase(kube_pod_container_status_restarts_total[15m]) > 3"
                                              - for         = "0m"
                                              - labels      = {
                                                  - severity = "warning"
                                                }
                                            },
                                          - {
                                              - alert       = "OOMKilled"
                                              - annotations = {
                                                  - description = "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} was OOMKilled."
                                                  - summary     = "Pod {{ $labels.namespace }}/{{ $labels.pod }} OOMKilled"
                                                }
                                              - expr        = "kube_pod_container_status_last_terminated_reason{reason=\"OOMKilled\"} > 0"
                                              - for         = "0m"
                                              - labels      = {
                                                  - severity = "critical"
                                                }
                                            },
                                        ]
                                    },
                                  - {
                                      - name  = "node-health"
                                      - rules = [
                                          - {
                                              - alert       = "DiskPressure"
                                              - annotations = {
                                                  - description = "Filesystem {{ $labels.mountpoint }} on {{ $labels.instance }} has only {{ $value | printf \"%.1f\" }}% space remaining."
                                                  - summary     = "Disk pressure on {{ $labels.instance }}"
                                                }
                                              - expr        = "(node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15"
                                              - for         = "5m"
                                              - labels      = {
                                                  - severity = "critical"
                                                }
                                            },
                                        ]
                                    },
                                  - {
                                      - name  = "target-health"
                                      - rules = [
                                          - {
                                              - alert       = "TargetDown"
                                              - annotations = {
                                                  - description = "Target {{ $labels.job }}/{{ $labels.instance }} has been down for more than 5 minutes."
                                                  - summary     = "Target {{ $labels.instance }} is down"
                                                }
                                              - expr        = "up == 0"
                                              - for         = "5m"
                                              - labels      = {
                                                  - severity = "warning"
                                                }
                                            },
                                        ]
                                    },
                                ]
                              - name   = "platform-alerts"
                            },
                        ]
                      - alertmanager              = {
                          - alertmanagerSpec = {
                              - resources = {
                                  - limits   = {
                                      - memory = "128Mi"
                                    }
                                  - requests = {
                                      - cpu    = "10m"
                                      - memory = "64Mi"
                                    }
                                }
                              - storage   = {
                                  - volumeClaimTemplate = {
                                      - spec = {
                                          - accessModes      = [
                                              - "ReadWriteOnce",
                                            ]
                                          - resources        = {
                                              - requests = {
                                                  - storage = "1Gi"
                                                }
                                            }
                                          - storageClassName = "local-path"
                                        }
                                    }
                                }
                            }
                          - config           = {
                              - global    = {
                                  - resolve_timeout = "5m"
                                }
                              - receivers = [
                                  - {
                                      - name = "default"
                                    },
                                  - {
                                      - name             = "telegram"
                                      - telegram_configs = [
                                          - {
                                              - bot_token     = "8256326037:AAEZ-LlhhkyaDs8TtWhGqm9dUzYj_7hkpiE"
                                              - chat_id       = "-5200965094"
                                              - parse_mode    = "HTML"
                                              - send_resolved = true
                                            },
                                        ]
                                    },
                                ]
                              - route     = {
                                  - group_by        = [
                                      - "alertname",
                                      - "namespace",
                                    ]
                                  - group_interval  = "5m"
                                  - group_wait      = "30s"
                                  - receiver        = "telegram"
                                  - repeat_interval = "12h"
                                  - routes          = []
                                }
                            }
                        }
                      - grafana                   = {
                          - adminPassword = "(sensitive value)"
                          - persistence   = {
                              - enabled          = true
                              - size             = "2Gi"
                              - storageClassName = "local-path"
                            }
                          - resources     = {
                              - limits   = {
                                  - memory = "256Mi"
                                }
                              - requests = {
                                  - cpu    = "50m"
                                  - memory = "128Mi"
                                }
                            }
                          - sidecar       = {
                              - dashboards  = {
                                  - enabled         = true
                                  - searchNamespace = "ALL"
                                }
                              - datasources = {
                                  - enabled         = true
                                  - searchNamespace = "ALL"
                                }
                            }
                        }
                      - kube-state-metrics        = {
                          - resources = {
                              - limits   = {
                                  - memory = "128Mi"
                                }
                              - requests = {
                                  - cpu    = "10m"
                                  - memory = "32Mi"
                                }
                            }
                        }
                      - kubeControllerManager     = {
                          - enabled = false
                        }
                      - kubeEtcd                  = {
                          - enabled = false
                        }
                      - kubeProxy                 = {
                          - enabled = false
                        }
                      - kubeScheduler             = {
                          - enabled = false
                        }
                      - nodeExporter              = {
                          - resources = {
                              - limits   = {
                                  - memory = "64Mi"
                                }
                              - requests = {
                                  - cpu    = "20m"
                                  - memory = "32Mi"
                                }
                            }
                        }
                      - prometheus                = {
                          - prometheusSpec = {
                              - podMonitorSelectorNilUsesHelmValues     = false
                              - resources                               = {
                                  - limits   = {
                                      - memory = "1Gi"
                                    }
                                  - requests = {
                                      - cpu    = "200m"
                                      - memory = "512Mi"
                                    }
                                }
                              - retention                               = "15d"
                              - retentionSize                           = "10GB"
                              - ruleSelectorNilUsesHelmValues           = false
                              - serviceMonitorSelectorNilUsesHelmValues = false
                              - storageSpec                             = {
                                  - volumeClaimTemplate = {
                                      - spec = {
                                          - accessModes      = [
                                              - "ReadWriteOnce",
                                            ]
                                          - resources        = {
                                              - requests = {
                                                  - storage = "15Gi"
                                                }
                                            }
                                          - storageClassName = "local-path"
                                        }
                                    }
                                }
                            }
                        }
                    }
                )
              - version        = "82.0.0"
            },
        ] -> (known after apply)
        name                       = "kube-prometheus-stack"
        # (28 unchanged attributes hidden)

      - set_sensitive {
          # At least one attribute in this block is (or was) sensitive,
          # so its contents will not be displayed.
        }
      + set_sensitive {
          # At least one attribute in this block is (or was) sensitive,
          # so its contents will not be displayed.
        }

        # (2 unchanged blocks hidden)
    }

  # kubernetes_deployment_v1.dora_exporter will be updated in-place
  ~ resource "kubernetes_deployment_v1" "dora_exporter" {
        id               = "monitoring/dora-exporter"
        # (1 unchanged attribute hidden)

      ~ spec {
            # (5 unchanged attributes hidden)

          ~ template {
              ~ spec {
                    # (12 unchanged attributes hidden)

                  ~ container {
                        name                       = "dora-exporter"
                        # (9 unchanged attributes hidden)

                      ~ resources {
                          ~ limits   = {
                              ~ "memory" = "128Mi" -> "256Mi"
                            }
                            # (1 unchanged attribute hidden)
                        }

                        # (4 unchanged blocks hidden)
                    }
                }

                # (1 unchanged block hidden)
            }

            # (2 unchanged blocks hidden)
        }

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so OpenTofu can't
guarantee to take exactly these actions if you run "tofu apply" now.

## Tofu Plan Output ``` tailscale_acl.this: Refreshing state... [id=acl] helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin] kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo] kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres] data.kubernetes_namespace_v1.pal_e_docs: Reading... kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker] kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama] kubernetes_namespace_v1.minio: Refreshing state... [id=minio] kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak] kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale] data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs] kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system] kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor] data.kubernetes_namespace_v1.tofu_state: Reading... kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring] kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url] data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state] helm_release.forgejo: Refreshing state... [id=forgejo] kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data] kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin] helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator] kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak] kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] helm_release.cnpg: Refreshing state... [id=cnpg] helm_release.loki_stack: Refreshing state... [id=loki-stack] kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack] kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak] helm_release.ollama: Refreshing state... [id=ollama] helm_release.woodpecker: Refreshing state... [id=woodpecker] kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel] kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel] kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel] helm_release.harbor: Refreshing state... [id=harbor] kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource] helm_release.minio: Refreshing state... [id=minio] kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel] kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard] kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel] kubernetes_manifest.dora_exporter_service_monitor: Refreshing state... minio_iam_user.cnpg: Refreshing state... [id=cnpg] minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal] minio_iam_user.tf_backup: Refreshing state... [id=tf-backup] minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup] minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups] minio_s3_bucket.assets: Refreshing state... [id=assets] minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal] kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel] kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel] minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001] minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001] kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds] kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds] kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel] OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place OpenTofu will perform the following actions: # helm_release.kube_prometheus_stack will be updated in-place ~ resource "helm_release" "kube_prometheus_stack" { id = "kube-prometheus-stack" ~ metadata = [ - { - app_version = "v0.89.0" - chart = "kube-prometheus-stack" - first_deployed = 1771560679 - last_deployed = 1773506154 - name = "kube-prometheus-stack" - namespace = "monitoring" - notes = <<-EOT kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack" Get Grafana 'admin' user password by running: kubectl --namespace monitoring get secrets kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo Access Grafana local instance: export POD_NAME=$(kubectl --namespace monitoring get pod -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=kube-prometheus-stack" -oname) kubectl --namespace monitoring port-forward $POD_NAME 3000 Get your grafana admin user password by running: kubectl get secret --namespace monitoring -l app.kubernetes.io/component=admin-secret -o jsonpath="{.items[0].data.admin-password}" | base64 --decode ; echo Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. 1. Get your 'admin' user password by running: kubectl get secret --namespace monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo 2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster: kube-prometheus-stack-grafana.monitoring.svc.cluster.local Get the Grafana URL to visit by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=kube-prometheus-stack" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace monitoring port-forward $POD_NAME 3000 3. Login with the password from step 1 and the username: admin 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=kube-prometheus-stack" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:9100 to use your application" kubectl port-forward --namespace monitoring $POD_NAME 9100 kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. The exposed metrics can be found here: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/README.md#exposed-metrics The metrics are exported on the HTTP endpoint /metrics on the listening port. In your case, kube-prometheus-stack-kube-state-metrics.monitoring.svc.cluster.local:8080/metrics They are served either as plaintext or protobuf depending on the Accept header. They are designed to be consumed either by Prometheus itself or by a scraper that is compatible with scraping a Prometheus client endpoint. EOT - revision = 12 - values = jsonencode( { - additionalPrometheusRules = [ - { - groups = [ - { - name = "pod-health" - rules = [ - { - alert = "PodRestartStorm" - annotations = { - description = "Pod {{ $labels.namespace }}/{{ $labels.pod }} has restarted {{ $value }} times in the last 15 minutes." - summary = "Pod {{ $labels.namespace }}/{{ $labels.pod }} restarting frequently" } - expr = "increase(kube_pod_container_status_restarts_total[15m]) > 3" - for = "0m" - labels = { - severity = "warning" } }, - { - alert = "OOMKilled" - annotations = { - description = "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} was OOMKilled." - summary = "Pod {{ $labels.namespace }}/{{ $labels.pod }} OOMKilled" } - expr = "kube_pod_container_status_last_terminated_reason{reason=\"OOMKilled\"} > 0" - for = "0m" - labels = { - severity = "critical" } }, ] }, - { - name = "node-health" - rules = [ - { - alert = "DiskPressure" - annotations = { - description = "Filesystem {{ $labels.mountpoint }} on {{ $labels.instance }} has only {{ $value | printf \"%.1f\" }}% space remaining." - summary = "Disk pressure on {{ $labels.instance }}" } - expr = "(node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15" - for = "5m" - labels = { - severity = "critical" } }, ] }, - { - name = "target-health" - rules = [ - { - alert = "TargetDown" - annotations = { - description = "Target {{ $labels.job }}/{{ $labels.instance }} has been down for more than 5 minutes." - summary = "Target {{ $labels.instance }} is down" } - expr = "up == 0" - for = "5m" - labels = { - severity = "warning" } }, ] }, ] - name = "platform-alerts" }, ] - alertmanager = { - alertmanagerSpec = { - resources = { - limits = { - memory = "128Mi" } - requests = { - cpu = "10m" - memory = "64Mi" } } - storage = { - volumeClaimTemplate = { - spec = { - accessModes = [ - "ReadWriteOnce", ] - resources = { - requests = { - storage = "1Gi" } } - storageClassName = "local-path" } } } } - config = { - global = { - resolve_timeout = "5m" } - receivers = [ - { - name = "default" }, - { - name = "telegram" - telegram_configs = [ - { - bot_token = "8256326037:AAEZ-LlhhkyaDs8TtWhGqm9dUzYj_7hkpiE" - chat_id = "-5200965094" - parse_mode = "HTML" - send_resolved = true }, ] }, ] - route = { - group_by = [ - "alertname", - "namespace", ] - group_interval = "5m" - group_wait = "30s" - receiver = "telegram" - repeat_interval = "12h" - routes = [] } } } - grafana = { - adminPassword = "(sensitive value)" - persistence = { - enabled = true - size = "2Gi" - storageClassName = "local-path" } - resources = { - limits = { - memory = "256Mi" } - requests = { - cpu = "50m" - memory = "128Mi" } } - sidecar = { - dashboards = { - enabled = true - searchNamespace = "ALL" } - datasources = { - enabled = true - searchNamespace = "ALL" } } } - kube-state-metrics = { - resources = { - limits = { - memory = "128Mi" } - requests = { - cpu = "10m" - memory = "32Mi" } } } - kubeControllerManager = { - enabled = false } - kubeEtcd = { - enabled = false } - kubeProxy = { - enabled = false } - kubeScheduler = { - enabled = false } - nodeExporter = { - resources = { - limits = { - memory = "64Mi" } - requests = { - cpu = "20m" - memory = "32Mi" } } } - prometheus = { - prometheusSpec = { - podMonitorSelectorNilUsesHelmValues = false - resources = { - limits = { - memory = "1Gi" } - requests = { - cpu = "200m" - memory = "512Mi" } } - retention = "15d" - retentionSize = "10GB" - ruleSelectorNilUsesHelmValues = false - serviceMonitorSelectorNilUsesHelmValues = false - storageSpec = { - volumeClaimTemplate = { - spec = { - accessModes = [ - "ReadWriteOnce", ] - resources = { - requests = { - storage = "15Gi" } } - storageClassName = "local-path" } } } } } } ) - version = "82.0.0" }, ] -> (known after apply) name = "kube-prometheus-stack" # (28 unchanged attributes hidden) - set_sensitive { # At least one attribute in this block is (or was) sensitive, # so its contents will not be displayed. } + set_sensitive { # At least one attribute in this block is (or was) sensitive, # so its contents will not be displayed. } # (2 unchanged blocks hidden) } # kubernetes_deployment_v1.dora_exporter will be updated in-place ~ resource "kubernetes_deployment_v1" "dora_exporter" { id = "monitoring/dora-exporter" # (1 unchanged attribute hidden) ~ spec { # (5 unchanged attributes hidden) ~ template { ~ spec { # (12 unchanged attributes hidden) ~ container { name = "dora-exporter" # (9 unchanged attributes hidden) ~ resources { ~ limits = { ~ "memory" = "128Mi" -> "256Mi" } # (1 unchanged attribute hidden) } # (4 unchanged blocks hidden) } } # (1 unchanged block hidden) } # (2 unchanged blocks hidden) } # (1 unchanged block hidden) } Plan: 0 to add, 2 to change, 0 to destroy. ───────────────────────────────────────────────────────────────────────────── Note: You didn't use the -out option to save this plan, so OpenTofu can't guarantee to take exactly these actions if you run "tofu apply" now. ```

forgejo_admin commented

2026-03-14 18:25:09 +00:00

Author

Owner

PR #54 Review

BLOCKERS

None.

NITS

None.

SOP COMPLIANCE

Branch named after issue (53-fix-telegram-chat-id-type-and-dora-expor references #53)
PR body follows template (Summary, Changes, Test Plan, Related all present)
Related references plan slug (plan-pal-e-platform)
Closes #53 present in PR body
tofu fmt and tofu validate reported as passing
No secrets committed
No scope creep (exactly 2 line changes in 1 file, both matching issue description)

CODE REVIEW

telegram_chat_id type change (line 317): type = "string" to type = "auto" -- correct. The Telegram API requires chat_id as an integer. Helm provider set_sensitive with type = "auto" passes the value as its native type instead of string-coercing it.
dora-exporter memory limit (line 1030): 128Mi to 256Mi -- reasonable doubling for OOMKill. Request stays at 32Mi, so no over-reservation. The pod gets burst room without claiming excess guaranteed memory.

Both changes are minimal, targeted, and match the issue description exactly.

VERDICT: APPROVED

## PR #54 Review ### BLOCKERS None. ### NITS None. ### SOP COMPLIANCE - [x] Branch named after issue (`53-fix-telegram-chat-id-type-and-dora-expor` references #53) - [x] PR body follows template (Summary, Changes, Test Plan, Related all present) - [x] Related references plan slug (`plan-pal-e-platform`) - [x] `Closes #53` present in PR body - [x] `tofu fmt` and `tofu validate` reported as passing - [x] No secrets committed - [x] No scope creep (exactly 2 line changes in 1 file, both matching issue description) ### CODE REVIEW 1. **`telegram_chat_id` type change** (`line 317`): `type = "string"` to `type = "auto"` -- correct. The Telegram API requires `chat_id` as an integer. Helm provider `set_sensitive` with `type = "auto"` passes the value as its native type instead of string-coercing it. 2. **`dora-exporter` memory limit** (`line 1030`): `128Mi` to `256Mi` -- reasonable doubling for OOMKill. Request stays at `32Mi`, so no over-reservation. The pod gets burst room without claiming excess guaranteed memory. Both changes are minimal, targeted, and match the issue description exactly. ### VERDICT: APPROVED