feat: Ollama hostPath volume + embedding alerting

forgejo_admin commented

2026-03-16 02:55:11 +00:00

Owner

Summary

Replaces Ollama PVC with hostPath volume so models survive any k8s lifecycle event (PVC recreation, namespace deletion). Adds Prometheus scraping and alert rules for the embedding worker to detect failures within 10 minutes.

Changes

terraform/main.tf — Ollama helm release: disabled PVC, added hostPath volume (/var/lib/ollama on host -> /root/.ollama in container) via chart's volumes/volumeMounts values
terraform/main.tf — New kubernetes_service_v1.embedding_worker_metrics: ClusterIP Service in pal-e-docs namespace exposing port 8001 for the embedding worker pod
terraform/main.tf — New kubernetes_manifest.embedding_worker_service_monitor: ServiceMonitor in monitoring namespace that scrapes pal-e-docs namespace embedding worker every 30s
terraform/main.tf — New kubernetes_manifest.embedding_alerts: PrometheusRule with two alert rules:
- EmbeddingErrorRate (warning): rate(embedding_errors_total[5m]) > 0 for 5m
- EmbeddingPipelineDown (critical): zero embedding_total increase + errors increasing for 10m

tofu plan -lock=false Output

Plan: 3 to add, 2 to change, 0 to destroy.

# helm_release.ollama will be updated in-place
  ~ values = [
      - "persistentVolume":
      -   "enabled": true
      -   "size": "10Gi"
      -   "storageClass": "local-path"
      +   "enabled": false
      + "volumeMounts":
      + - "mountPath": "/root/.ollama"
      +   "name": "ollama-data"
      + "volumes":
      + - "hostPath":
      +     "path": "/var/lib/ollama"
      +     "type": "DirectoryOrCreate"
      +   "name": "ollama-data"
    ]

# kubernetes_manifest.embedding_alerts will be created
  + PrometheusRule "embedding-alerts" in monitoring namespace
    - EmbeddingErrorRate (warning): rate(embedding_errors_total[5m]) > 0, for 5m
    - EmbeddingPipelineDown (critical): increase(embedding_total[10m]) == 0
      and increase(embedding_errors_total[10m]) > 0, for 10m

# kubernetes_manifest.embedding_worker_service_monitor will be created
  + ServiceMonitor "embedding-worker" in monitoring namespace
    - Scrapes pal-e-docs namespace, app=pal-e-docs-embedding-worker,
      port "metrics", /metrics, 30s interval

# kubernetes_service_v1.embedding_worker_metrics will be created
  + Service "embedding-worker-metrics" in pal-e-docs namespace
    - ClusterIP, port 8001 -> 8001, selector app=pal-e-docs-embedding-worker

# helm_release.woodpecker will be updated in-place
  (no functional change -- set_sensitive block reordering only)

Test Plan

tofu fmt -- passes (no diff)
tofu validate -- passes
tofu plan -lock=false -- shows 3 to add, 2 to change, 0 to destroy
After apply: kubectl delete pod -n ollama <pod> -> pod restarts -> ollama list shows qwen3-embedding:4b
After apply: kubectl port-forward -n pal-e-docs svc/embedding-worker-metrics 8001:8001 -> curl localhost:8001/metrics returns Prometheus metrics
After apply: verify PrometheusRule and ServiceMonitor appear in Prometheus targets

Review Checklist

Passed automated review-fix loop
No secrets committed
No unnecessary file changes
Commit messages are descriptive

Post-Deploy Steps (manual)

Reset error blocks: UPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error' (152 blocks)
Verify embedding worker picks up pending blocks

Discovered Scope

The pal-e-docs NetworkPolicy (podSelector: app=pal-e-docs) only covers the main API pod. The embedding worker (app=pal-e-docs-embedding-worker) is NOT covered by the policy, meaning it allows all ingress by default. This works for now (Prometheus can reach it), but a proper policy should be added in pal-e-deployments to explicitly scope embedding worker ingress.

Closes #89
plan-pal-e-docs -- Phase F12

## Summary Replaces Ollama PVC with hostPath volume so models survive any k8s lifecycle event (PVC recreation, namespace deletion). Adds Prometheus scraping and alert rules for the embedding worker to detect failures within 10 minutes. ## Changes - `terraform/main.tf` — Ollama helm release: disabled PVC, added hostPath volume (`/var/lib/ollama` on host -> `/root/.ollama` in container) via chart's `volumes`/`volumeMounts` values - `terraform/main.tf` — New `kubernetes_service_v1.embedding_worker_metrics`: ClusterIP Service in `pal-e-docs` namespace exposing port 8001 for the embedding worker pod - `terraform/main.tf` — New `kubernetes_manifest.embedding_worker_service_monitor`: ServiceMonitor in `monitoring` namespace that scrapes `pal-e-docs` namespace embedding worker every 30s - `terraform/main.tf` — New `kubernetes_manifest.embedding_alerts`: PrometheusRule with two alert rules: - `EmbeddingErrorRate` (warning): `rate(embedding_errors_total[5m]) > 0` for 5m - `EmbeddingPipelineDown` (critical): zero `embedding_total` increase + errors increasing for 10m ## tofu plan -lock=false Output ``` Plan: 3 to add, 2 to change, 0 to destroy. # helm_release.ollama will be updated in-place ~ values = [ - "persistentVolume": - "enabled": true - "size": "10Gi" - "storageClass": "local-path" + "enabled": false + "volumeMounts": + - "mountPath": "/root/.ollama" + "name": "ollama-data" + "volumes": + - "hostPath": + "path": "/var/lib/ollama" + "type": "DirectoryOrCreate" + "name": "ollama-data" ] # kubernetes_manifest.embedding_alerts will be created + PrometheusRule "embedding-alerts" in monitoring namespace - EmbeddingErrorRate (warning): rate(embedding_errors_total[5m]) > 0, for 5m - EmbeddingPipelineDown (critical): increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0, for 10m # kubernetes_manifest.embedding_worker_service_monitor will be created + ServiceMonitor "embedding-worker" in monitoring namespace - Scrapes pal-e-docs namespace, app=pal-e-docs-embedding-worker, port "metrics", /metrics, 30s interval # kubernetes_service_v1.embedding_worker_metrics will be created + Service "embedding-worker-metrics" in pal-e-docs namespace - ClusterIP, port 8001 -> 8001, selector app=pal-e-docs-embedding-worker # helm_release.woodpecker will be updated in-place (no functional change -- set_sensitive block reordering only) ``` ## Test Plan - [x] `tofu fmt` -- passes (no diff) - [x] `tofu validate` -- passes - [x] `tofu plan -lock=false` -- shows 3 to add, 2 to change, 0 to destroy - [ ] After apply: `kubectl delete pod -n ollama <pod>` -> pod restarts -> `ollama list` shows `qwen3-embedding:4b` - [ ] After apply: `kubectl port-forward -n pal-e-docs svc/embedding-worker-metrics 8001:8001` -> `curl localhost:8001/metrics` returns Prometheus metrics - [ ] After apply: verify PrometheusRule and ServiceMonitor appear in Prometheus targets ## Review Checklist - [x] Passed automated review-fix loop - [x] No secrets committed - [x] No unnecessary file changes - [x] Commit messages are descriptive ## Post-Deploy Steps (manual) 1. Reset error blocks: `UPDATE blocks SET embedding_status = 'pending' WHERE embedding_status = 'error'` (152 blocks) 2. Verify embedding worker picks up pending blocks ## Discovered Scope - The `pal-e-docs` NetworkPolicy (`podSelector: app=pal-e-docs`) only covers the main API pod. The embedding worker (`app=pal-e-docs-embedding-worker`) is NOT covered by the policy, meaning it allows all ingress by default. This works for now (Prometheus can reach it), but a proper policy should be added in `pal-e-deployments` to explicitly scope embedding worker ingress. ## Related - Closes #89 - `plan-pal-e-docs` -- Phase F12

forgejo_admin added 1 commit

2026-03-16 02:55:11 +00:00

feat: swap Ollama PVC for hostPath volume + add embedding alerting

ci/woodpecker/push/woodpecker Pipeline was successful

Details

ci/woodpecker/pr/woodpecker Pipeline was successful

Details

ci/woodpecker/pull_request_closed/woodpecker Pipeline was successful

Details

19a5723b6c

Ollama models were lost during PVC recreation, causing 6+ days of
semantic search downtime. This replaces the PVC with a hostPath
volume (/var/lib/ollama) so models survive any k8s lifecycle event.

Adds Prometheus scraping and alert rules for the embedding worker:
- Service + ServiceMonitor for embedding worker metrics on :8001
- Warning alert: embedding_errors_total rate > 0 for 5 minutes
- Critical alert: zero embeddings + errors increasing for 10 minutes

Closes #89

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

forgejo_admin commented

2026-03-16 02:55:53 +00:00

Author

Owner

Tofu Plan Output

data.kubernetes_namespace_v1.pal_e_docs: Reading...
data.kubernetes_namespace_v1.tofu_state: Reading...
kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama]
kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo]
kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system]
kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale]
kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres]
kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak]
helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin]
data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs]
tailscale_acl.this: Refreshing state... [id=acl]
data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state]
kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor]
kubernetes_namespace_v1.minio: Refreshing state... [id=minio]
kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring]
kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url]
kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
helm_release.forgejo: Refreshing state... [id=forgejo]
kubernetes_secret_v1.woodpecker_db_credentials: Refreshing state... [id=woodpecker/woodpecker-db-credentials]
kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data]
helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator]
helm_release.cnpg: Refreshing state... [id=cnpg]
kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin]
kubernetes_manifest.netpol_ollama: Refreshing state...
kubernetes_manifest.netpol_forgejo: Refreshing state...
kubernetes_manifest.netpol_woodpecker: Refreshing state...
kubernetes_manifest.netpol_cnpg_system: Refreshing state...
kubernetes_manifest.netpol_postgres: Refreshing state...
kubernetes_manifest.netpol_keycloak: Refreshing state...
kubernetes_manifest.netpol_minio: Refreshing state...
kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_config_map_v1.uptime_dashboard: Refreshing state... [id=monitoring/uptime-dashboard]
kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_manifest.netpol_harbor: Refreshing state...
helm_release.loki_stack: Refreshing state... [id=loki-stack]
helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack]
kubernetes_manifest.netpol_monitoring: Refreshing state...
helm_release.ollama: Refreshing state... [id=ollama]
kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak]
kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel]
kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel]
kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource]
kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel]
helm_release.blackbox_exporter: Refreshing state... [id=blackbox-exporter]
kubernetes_config_map_v1.pal_e_docs_dashboard: Refreshing state... [id=monitoring/pal-e-docs-dashboard]
kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard]
helm_release.minio: Refreshing state... [id=minio]
kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter]
kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel]
helm_release.harbor: Refreshing state... [id=harbor]
kubernetes_manifest.blackbox_alerts: Refreshing state...
kubernetes_manifest.dora_exporter_service_monitor: Refreshing state...
minio_s3_bucket.assets: Refreshing state... [id=assets]
minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal]
minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup]
kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel]
minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal]
minio_iam_user.cnpg: Refreshing state... [id=cnpg]
minio_iam_user.tf_backup: Refreshing state... [id=tf-backup]
minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups]
kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel]
minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001]
minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001]
kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds]
kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds]
kubernetes_secret_v1.woodpecker_cnpg_s3_creds: Refreshing state... [id=woodpecker/cnpg-s3-creds]
kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup]
kubernetes_cron_job_v1.cnpg_backup_verify: Refreshing state... [id=postgres/cnpg-backup-verify]
kubernetes_manifest.woodpecker_postgres: Refreshing state...
kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel]
helm_release.woodpecker: Refreshing state... [id=woodpecker]
kubernetes_manifest.woodpecker_postgres_scheduled_backup: Refreshing state...
kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel]

OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

OpenTofu will perform the following actions:

  # helm_release.ollama will be updated in-place
  ~ resource "helm_release" "ollama" {
        id                         = "ollama"
      ~ metadata                   = [
          - {
              - app_version    = "0.17.6"
              - chart          = "ollama"
              - first_deployed = 1773025100
              - last_deployed  = 1773025100
              - name           = "ollama"
              - namespace      = "ollama"
              - notes          = <<-EOT
                    1. Get the application URL by running these commands:
                      export POD_NAME=$(kubectl get pods --namespace ollama -l "app.kubernetes.io/name=ollama,app.kubernetes.io/instance=ollama" -o jsonpath="{.items[0].metadata.name}")
                      export CONTAINER_PORT=$(kubectl get pod --namespace ollama $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
                      echo "Visit http://127.0.0.1:8080 to use your application"
                      kubectl --namespace ollama port-forward $POD_NAME 8080:$CONTAINER_PORT
                EOT
              - revision       = 1
              - values         = jsonencode(
                    {
                      - ollama           = {
                          - gpu    = {
                              - enabled = true
                              - number  = 1
                            }
                          - models = {
                              - pull = [
                                  - "qwen3-embedding:4b",
                                ]
                            }
                        }
                      - persistentVolume = {
                          - enabled      = true
                          - size         = "10Gi"
                          - storageClass = "local-path"
                        }
                      - resources        = {
                          - limits   = {
                              - memory = "6Gi"
                            }
                          - requests = {
                              - cpu    = "100m"
                              - memory = "1Gi"
                            }
                        }
                      - runtimeClassName = "nvidia"
                    }
                )
              - version        = "1.49.0"
            },
        ] -> (known after apply)
        name                       = "ollama"
      ~ values                     = [
          - <<-EOT
                "ollama":
                  "gpu":
                    "enabled": true
                    "number": 1
                  "models":
                    "pull":
                    - "qwen3-embedding:4b"
                "persistentVolume":
                  "enabled": true
                  "size": "10Gi"
                  "storageClass": "local-path"
                "resources":
                  "limits":
                    "memory": "6Gi"
                  "requests":
                    "cpu": "100m"
                    "memory": "1Gi"
                "runtimeClassName": "nvidia"
            EOT,
          + <<-EOT
                "ollama":
                  "gpu":
                    "enabled": true
                    "number": 1
                  "models":
                    "pull":
                    - "qwen3-embedding:4b"
                "persistentVolume":
                  "enabled": false
                "resources":
                  "limits":
                    "memory": "6Gi"
                  "requests":
                    "cpu": "100m"
                    "memory": "1Gi"
                "runtimeClassName": "nvidia"
                "volumeMounts":
                - "mountPath": "/root/.ollama"
                  "name": "ollama-data"
                "volumes":
                - "hostPath":
                    "path": "/var/lib/ollama"
                    "type": "DirectoryOrCreate"
                  "name": "ollama-data"
            EOT,
        ]
        # (26 unchanged attributes hidden)
    }

  # kubernetes_manifest.embedding_alerts will be created
  + resource "kubernetes_manifest" "embedding_alerts" {
      + manifest = {
          + apiVersion = "monitoring.coreos.com/v1"
          + kind       = "PrometheusRule"
          + metadata   = {
              + labels    = {
                  + "app.kubernetes.io/part-of" = "kube-prometheus-stack"
                  + release                     = "kube-prometheus-stack"
                }
              + name      = "embedding-alerts"
              + namespace = "monitoring"
            }
          + spec       = {
              + groups = [
                  + {
                      + name  = "embedding-health"
                      + rules = [
                          + {
                              + alert       = "EmbeddingErrorRate"
                              + annotations = {
                                  + description = "embedding_errors_total has been increasing for 5 minutes. Rate: {{ $value | printf \"%.2f\" }}/s."
                                  + summary     = "Embedding worker is producing errors"
                                }
                              + expr        = "rate(embedding_errors_total[5m]) > 0"
                              + for         = "5m"
                              + labels      = {
                                  + severity = "warning"
                                }
                            },
                          + {
                              + alert       = "EmbeddingPipelineDown"
                              + annotations = {
                                  + description = "No successful embeddings in 10 minutes while errors are increasing. Semantic search is degraded."
                                  + summary     = "Embedding pipeline is down — errors with no successful embeddings"
                                }
                              + expr        = "increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0"
                              + for         = "10m"
                              + labels      = {
                                  + severity = "critical"
                                }
                            },
                        ]
                    },
                ]
            }
        }
      + object   = {
          + apiVersion = "monitoring.coreos.com/v1"
          + kind       = "PrometheusRule"
          + metadata   = {
              + annotations                = (known after apply)
              + creationTimestamp          = (known after apply)
              + deletionGracePeriodSeconds = (known after apply)
              + deletionTimestamp          = (known after apply)
              + finalizers                 = (known after apply)
              + generateName               = (known after apply)
              + generation                 = (known after apply)
              + labels                     = (known after apply)
              + managedFields              = (known after apply)
              + name                       = "embedding-alerts"
              + namespace                  = "monitoring"
              + ownerReferences            = (known after apply)
              + resourceVersion            = (known after apply)
              + selfLink                   = (known after apply)
              + uid                        = (known after apply)
            }
          + spec       = {
              + groups = [
                  + {
                      + interval                  = (known after apply)
                      + labels                    = (known after apply)
                      + limit                     = (known after apply)
                      + name                      = "embedding-health"
                      + partial_response_strategy = (known after apply)
                      + query_offset              = (known after apply)
                      + rules                     = [
                          + {
                              + alert           = "EmbeddingErrorRate"
                              + annotations     = {
                                  + description = "embedding_errors_total has been increasing for 5 minutes. Rate: {{ $value | printf \"%.2f\" }}/s."
                                  + summary     = "Embedding worker is producing errors"
                                }
                              + expr            = "rate(embedding_errors_total[5m]) > 0"
                              + for             = "5m"
                              + keep_firing_for = (known after apply)
                              + labels          = {
                                  + severity = "warning"
                                }
                              + record          = (known after apply)
                            },
                          + {
                              + alert           = "EmbeddingPipelineDown"
                              + annotations     = {
                                  + description = "No successful embeddings in 10 minutes while errors are increasing. Semantic search is degraded."
                                  + summary     = "Embedding pipeline is down — errors with no successful embeddings"
                                }
                              + expr            = "increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0"
                              + for             = "10m"
                              + keep_firing_for = (known after apply)
                              + labels          = {
                                  + severity = "critical"
                                }
                              + record          = (known after apply)
                            },
                        ]
                    },
                ]
            }
        }
    }

  # kubernetes_manifest.embedding_worker_service_monitor will be created
  + resource "kubernetes_manifest" "embedding_worker_service_monitor" {
      + manifest = {
          + apiVersion = "monitoring.coreos.com/v1"
          + kind       = "ServiceMonitor"
          + metadata   = {
              + labels    = {
                  + app = "pal-e-docs-embedding-worker"
                }
              + name      = "embedding-worker"
              + namespace = "monitoring"
            }
          + spec       = {
              + endpoints         = [
                  + {
                      + interval = "30s"
                      + path     = "/metrics"
                      + port     = "metrics"
                    },
                ]
              + namespaceSelector = {
                  + matchNames = [
                      + "pal-e-docs",
                    ]
                }
              + selector          = {
                  + matchLabels = {
                      + app = "pal-e-docs-embedding-worker"
                    }
                }
            }
        }
      + object   = {
          + apiVersion = "monitoring.coreos.com/v1"
          + kind       = "ServiceMonitor"
          + metadata   = {
              + annotations                = (known after apply)
              + creationTimestamp          = (known after apply)
              + deletionGracePeriodSeconds = (known after apply)
              + deletionTimestamp          = (known after apply)
              + finalizers                 = (known after apply)
              + generateName               = (known after apply)
              + generation                 = (known after apply)
              + labels                     = (known after apply)
              + managedFields              = (known after apply)
              + name                       = "embedding-worker"
              + namespace                  = "monitoring"
              + ownerReferences            = (known after apply)
              + resourceVersion            = (known after apply)
              + selfLink                   = (known after apply)
              + uid                        = (known after apply)
            }
          + spec       = {
              + attachMetadata                 = {
                  + node = (known after apply)
                }
              + bodySizeLimit                  = (known after apply)
              + convertClassicHistogramsToNHCB = (known after apply)
              + endpoints                      = [
                  + {
                      + authorization            = {
                          + credentials = {
                              + key      = (known after apply)
                              + name     = (known after apply)
                              + optional = (known after apply)
                            }
                          + type        = (known after apply)
                        }
                      + basicAuth                = {
                          + password = {
                              + key      = (known after apply)
                              + name     = (known after apply)
                              + optional = (known after apply)
                            }
                          + username = {
                              + key      = (known after apply)
                              + name     = (known after apply)
                              + optional = (known after apply)
                            }
                        }
                      + bearerTokenFile          = (known after apply)
                      + bearerTokenSecret        = {
                          + key      = (known after apply)
                          + name     = (known after apply)
                          + optional = (known after apply)
                        }
                      + enableHttp2              = (known after apply)
                      + filterRunning            = (known after apply)
                      + followRedirects          = (known after apply)
                      + honorLabels              = (known after apply)
                      + honorTimestamps          = (known after apply)
                      + interval                 = "30s"
                      + metricRelabelings        = (known after apply)
                      + noProxy                  = (known after apply)
                      + oauth2                   = {
                          + clientId             = {
                              + configMap = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                              + secret    = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                            }
                          + clientSecret         = {
                              + key      = (known after apply)
                              + name     = (known after apply)
                              + optional = (known after apply)
                            }
                          + endpointParams       = (known after apply)
                          + noProxy              = (known after apply)
                          + proxyConnectHeader   = (known after apply)
                          + proxyFromEnvironment = (known after apply)
                          + proxyUrl             = (known after apply)
                          + scopes               = (known after apply)
                          + tlsConfig            = {
                              + ca                 = {
                                  + configMap = {
                                      + key      = (known after apply)
                                      + name     = (known after apply)
                                      + optional = (known after apply)
                                    }
                                  + secret    = {
                                      + key      = (known after apply)
                                      + name     = (known after apply)
                                      + optional = (known after apply)
                                    }
                                }
                              + cert               = {
                                  + configMap = {
                                      + key      = (known after apply)
                                      + name     = (known after apply)
                                      + optional = (known after apply)
                                    }
                                  + secret    = {
                                      + key      = (known after apply)
                                      + name     = (known after apply)
                                      + optional = (known after apply)
                                    }
                                }
                              + insecureSkipVerify = (known after apply)
                              + keySecret          = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                              + maxVersion         = (known after apply)
                              + minVersion         = (known after apply)
                              + serverName         = (known after apply)
                            }
                          + tokenUrl             = (known after apply)
                        }
                      + params                   = (known after apply)
                      + path                     = "/metrics"
                      + port                     = "metrics"
                      + proxyConnectHeader       = (known after apply)
                      + proxyFromEnvironment     = (known after apply)
                      + proxyUrl                 = (known after apply)
                      + relabelings              = (known after apply)
                      + scheme                   = (known after apply)
                      + scrapeTimeout            = (known after apply)
                      + targetPort               = (known after apply)
                      + tlsConfig                = {
                          + ca                 = {
                              + configMap = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                              + secret    = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                            }
                          + caFile             = (known after apply)
                          + cert               = {
                              + configMap = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                              + secret    = {
                                  + key      = (known after apply)
                                  + name     = (known after apply)
                                  + optional = (known after apply)
                                }
                            }
                          + certFile           = (known after apply)
                          + insecureSkipVerify = (known after apply)
                          + keyFile            = (known after apply)
                          + keySecret          = {
                              + key      = (known after apply)
                              + name     = (known after apply)
                              + optional = (known after apply)
                            }
                          + maxVersion         = (known after apply)
                          + minVersion         = (known after apply)
                          + serverName         = (known after apply)
                        }
                      + trackTimestampsStaleness = (known after apply)
                    },
                ]
              + fallbackScrapeProtocol         = (known after apply)
              + jobLabel                       = (known after apply)
              + keepDroppedTargets             = (known after apply)
              + labelLimit                     = (known after apply)
              + labelNameLengthLimit           = (known after apply)
              + labelValueLengthLimit          = (known after apply)
              + namespaceSelector              = {
                  + any        = (known after apply)
                  + matchNames = [
                      + "pal-e-docs",
                    ]
                }
              + nativeHistogramBucketLimit     = (known after apply)
              + nativeHistogramMinBucketFactor = (known after apply)
              + podTargetLabels                = (known after apply)
              + sampleLimit                    = (known after apply)
              + scrapeClass                    = (known after apply)
              + scrapeClassicHistograms        = (known after apply)
              + scrapeNativeHistograms         = (known after apply)
              + scrapeProtocols                = (known after apply)
              + selector                       = {
                  + matchExpressions = (known after apply)
                  + matchLabels      = {
                      + app = "pal-e-docs-embedding-worker"
                    }
                }
              + selectorMechanism              = (known after apply)
              + serviceDiscoveryRole           = (known after apply)
              + targetLabels                   = (known after apply)
              + targetLimit                    = (known after apply)
            }
        }
    }

  # kubernetes_service_v1.embedding_worker_metrics will be created
  + resource "kubernetes_service_v1" "embedding_worker_metrics" {
      + id                     = (known after apply)
      + status                 = (known after apply)
      + wait_for_load_balancer = true

      + metadata {
          + generation       = (known after apply)
          + labels           = {
              + "app" = "pal-e-docs-embedding-worker"
            }
          + name             = "embedding-worker-metrics"
          + namespace        = "pal-e-docs"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }

      + spec {
          + allocate_load_balancer_node_ports = true
          + cluster_ip                        = (known after apply)
          + cluster_ips                       = (known after apply)
          + external_traffic_policy           = (known after apply)
          + health_check_node_port            = (known after apply)
          + internal_traffic_policy           = (known after apply)
          + ip_families                       = (known after apply)
          + ip_family_policy                  = (known after apply)
          + publish_not_ready_addresses       = false
          + selector                          = {
              + "app" = "pal-e-docs-embedding-worker"
            }
          + session_affinity                  = "None"
          + type                              = "ClusterIP"

          + port {
              + name        = "metrics"
              + node_port   = (known after apply)
              + port        = 8001
              + protocol    = "TCP"
              + target_port = "8001"
            }

          + session_affinity_config (known after apply)
        }
    }

Plan: 3 to add, 1 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so OpenTofu can't
guarantee to take exactly these actions if you run "tofu apply" now.

## Tofu Plan Output ``` data.kubernetes_namespace_v1.pal_e_docs: Reading... data.kubernetes_namespace_v1.tofu_state: Reading... kubernetes_namespace_v1.woodpecker: Refreshing state... [id=woodpecker] kubernetes_namespace_v1.ollama: Refreshing state... [id=ollama] kubernetes_namespace_v1.forgejo: Refreshing state... [id=forgejo] kubernetes_namespace_v1.cnpg_system: Refreshing state... [id=cnpg-system] kubernetes_namespace_v1.tailscale: Refreshing state... [id=tailscale] kubernetes_namespace_v1.postgres: Refreshing state... [id=postgres] kubernetes_namespace_v1.keycloak: Refreshing state... [id=keycloak] helm_release.nvidia_device_plugin: Refreshing state... [id=nvidia-device-plugin] data.kubernetes_namespace_v1.pal_e_docs: Read complete after 0s [id=pal-e-docs] tailscale_acl.this: Refreshing state... [id=acl] data.kubernetes_namespace_v1.tofu_state: Read complete after 0s [id=tofu-state] kubernetes_namespace_v1.harbor: Refreshing state... [id=harbor] kubernetes_namespace_v1.minio: Refreshing state... [id=minio] kubernetes_namespace_v1.monitoring: Refreshing state... [id=monitoring] kubernetes_secret_v1.paledocs_db_url: Refreshing state... [id=pal-e-docs/paledocs-db-url] kubernetes_role_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_service_account_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] helm_release.forgejo: Refreshing state... [id=forgejo] kubernetes_secret_v1.woodpecker_db_credentials: Refreshing state... [id=woodpecker/woodpecker-db-credentials] kubernetes_persistent_volume_claim_v1.keycloak_data: Refreshing state... [id=keycloak/keycloak-data] helm_release.tailscale_operator: Refreshing state... [id=tailscale-operator] helm_release.cnpg: Refreshing state... [id=cnpg] kubernetes_service_v1.keycloak: Refreshing state... [id=keycloak/keycloak] kubernetes_secret_v1.keycloak_admin: Refreshing state... [id=keycloak/keycloak-admin] kubernetes_manifest.netpol_ollama: Refreshing state... kubernetes_manifest.netpol_forgejo: Refreshing state... kubernetes_manifest.netpol_woodpecker: Refreshing state... kubernetes_manifest.netpol_cnpg_system: Refreshing state... kubernetes_manifest.netpol_postgres: Refreshing state... kubernetes_manifest.netpol_keycloak: Refreshing state... kubernetes_manifest.netpol_minio: Refreshing state... kubernetes_service_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_secret_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_config_map_v1.uptime_dashboard: Refreshing state... [id=monitoring/uptime-dashboard] kubernetes_role_binding_v1.tf_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_manifest.netpol_harbor: Refreshing state... helm_release.loki_stack: Refreshing state... [id=loki-stack] helm_release.kube_prometheus_stack: Refreshing state... [id=kube-prometheus-stack] kubernetes_manifest.netpol_monitoring: Refreshing state... helm_release.ollama: Refreshing state... [id=ollama] kubernetes_deployment_v1.keycloak: Refreshing state... [id=keycloak/keycloak] kubernetes_ingress_v1.forgejo_funnel: Refreshing state... [id=forgejo/forgejo-funnel] kubernetes_ingress_v1.keycloak_funnel: Refreshing state... [id=keycloak/keycloak-funnel] kubernetes_config_map_v1.grafana_loki_datasource: Refreshing state... [id=monitoring/grafana-loki-datasource] kubernetes_ingress_v1.alertmanager_funnel: Refreshing state... [id=monitoring/alertmanager-funnel] helm_release.blackbox_exporter: Refreshing state... [id=blackbox-exporter] kubernetes_config_map_v1.pal_e_docs_dashboard: Refreshing state... [id=monitoring/pal-e-docs-dashboard] kubernetes_config_map_v1.dora_dashboard: Refreshing state... [id=monitoring/dora-dashboard] helm_release.minio: Refreshing state... [id=minio] kubernetes_deployment_v1.dora_exporter: Refreshing state... [id=monitoring/dora-exporter] kubernetes_ingress_v1.grafana_funnel: Refreshing state... [id=monitoring/grafana-funnel] helm_release.harbor: Refreshing state... [id=harbor] kubernetes_manifest.blackbox_alerts: Refreshing state... kubernetes_manifest.dora_exporter_service_monitor: Refreshing state... minio_s3_bucket.assets: Refreshing state... [id=assets] minio_iam_policy.cnpg_wal: Refreshing state... [id=cnpg-wal] minio_iam_policy.tf_backup: Refreshing state... [id=tf-backup] kubernetes_ingress_v1.minio_api_funnel: Refreshing state... [id=minio/minio-api-funnel] minio_s3_bucket.postgres_wal: Refreshing state... [id=postgres-wal] minio_iam_user.cnpg: Refreshing state... [id=cnpg] minio_iam_user.tf_backup: Refreshing state... [id=tf-backup] minio_s3_bucket.tf_state_backups: Refreshing state... [id=tf-state-backups] kubernetes_ingress_v1.minio_funnel: Refreshing state... [id=minio/minio-funnel] minio_iam_user_policy_attachment.cnpg: Refreshing state... [id=cnpg-20260302210642491000000001] minio_iam_user_policy_attachment.tf_backup: Refreshing state... [id=tf-backup-20260314163610110100000001] kubernetes_secret_v1.cnpg_s3_creds: Refreshing state... [id=postgres/cnpg-s3-creds] kubernetes_secret_v1.tf_backup_s3_creds: Refreshing state... [id=tofu-state/tf-backup-s3-creds] kubernetes_secret_v1.woodpecker_cnpg_s3_creds: Refreshing state... [id=woodpecker/cnpg-s3-creds] kubernetes_cron_job_v1.tf_state_backup: Refreshing state... [id=tofu-state/tf-state-backup] kubernetes_cron_job_v1.cnpg_backup_verify: Refreshing state... [id=postgres/cnpg-backup-verify] kubernetes_manifest.woodpecker_postgres: Refreshing state... kubernetes_ingress_v1.harbor_funnel: Refreshing state... [id=harbor/harbor-funnel] helm_release.woodpecker: Refreshing state... [id=woodpecker] kubernetes_manifest.woodpecker_postgres_scheduled_backup: Refreshing state... kubernetes_ingress_v1.woodpecker_funnel: Refreshing state... [id=woodpecker/woodpecker-funnel] OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create ~ update in-place OpenTofu will perform the following actions: # helm_release.ollama will be updated in-place ~ resource "helm_release" "ollama" { id = "ollama" ~ metadata = [ - { - app_version = "0.17.6" - chart = "ollama" - first_deployed = 1773025100 - last_deployed = 1773025100 - name = "ollama" - namespace = "ollama" - notes = <<-EOT 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace ollama -l "app.kubernetes.io/name=ollama,app.kubernetes.io/instance=ollama" -o jsonpath="{.items[0].metadata.name}") export CONTAINER_PORT=$(kubectl get pod --namespace ollama $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl --namespace ollama port-forward $POD_NAME 8080:$CONTAINER_PORT EOT - revision = 1 - values = jsonencode( { - ollama = { - gpu = { - enabled = true - number = 1 } - models = { - pull = [ - "qwen3-embedding:4b", ] } } - persistentVolume = { - enabled = true - size = "10Gi" - storageClass = "local-path" } - resources = { - limits = { - memory = "6Gi" } - requests = { - cpu = "100m" - memory = "1Gi" } } - runtimeClassName = "nvidia" } ) - version = "1.49.0" }, ] -> (known after apply) name = "ollama" ~ values = [ - <<-EOT "ollama": "gpu": "enabled": true "number": 1 "models": "pull": - "qwen3-embedding:4b" "persistentVolume": "enabled": true "size": "10Gi" "storageClass": "local-path" "resources": "limits": "memory": "6Gi" "requests": "cpu": "100m" "memory": "1Gi" "runtimeClassName": "nvidia" EOT, + <<-EOT "ollama": "gpu": "enabled": true "number": 1 "models": "pull": - "qwen3-embedding:4b" "persistentVolume": "enabled": false "resources": "limits": "memory": "6Gi" "requests": "cpu": "100m" "memory": "1Gi" "runtimeClassName": "nvidia" "volumeMounts": - "mountPath": "/root/.ollama" "name": "ollama-data" "volumes": - "hostPath": "path": "/var/lib/ollama" "type": "DirectoryOrCreate" "name": "ollama-data" EOT, ] # (26 unchanged attributes hidden) } # kubernetes_manifest.embedding_alerts will be created + resource "kubernetes_manifest" "embedding_alerts" { + manifest = { + apiVersion = "monitoring.coreos.com/v1" + kind = "PrometheusRule" + metadata = { + labels = { + "app.kubernetes.io/part-of" = "kube-prometheus-stack" + release = "kube-prometheus-stack" } + name = "embedding-alerts" + namespace = "monitoring" } + spec = { + groups = [ + { + name = "embedding-health" + rules = [ + { + alert = "EmbeddingErrorRate" + annotations = { + description = "embedding_errors_total has been increasing for 5 minutes. Rate: {{ $value | printf \"%.2f\" }}/s." + summary = "Embedding worker is producing errors" } + expr = "rate(embedding_errors_total[5m]) > 0" + for = "5m" + labels = { + severity = "warning" } }, + { + alert = "EmbeddingPipelineDown" + annotations = { + description = "No successful embeddings in 10 minutes while errors are increasing. Semantic search is degraded." + summary = "Embedding pipeline is down — errors with no successful embeddings" } + expr = "increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0" + for = "10m" + labels = { + severity = "critical" } }, ] }, ] } } + object = { + apiVersion = "monitoring.coreos.com/v1" + kind = "PrometheusRule" + metadata = { + annotations = (known after apply) + creationTimestamp = (known after apply) + deletionGracePeriodSeconds = (known after apply) + deletionTimestamp = (known after apply) + finalizers = (known after apply) + generateName = (known after apply) + generation = (known after apply) + labels = (known after apply) + managedFields = (known after apply) + name = "embedding-alerts" + namespace = "monitoring" + ownerReferences = (known after apply) + resourceVersion = (known after apply) + selfLink = (known after apply) + uid = (known after apply) } + spec = { + groups = [ + { + interval = (known after apply) + labels = (known after apply) + limit = (known after apply) + name = "embedding-health" + partial_response_strategy = (known after apply) + query_offset = (known after apply) + rules = [ + { + alert = "EmbeddingErrorRate" + annotations = { + description = "embedding_errors_total has been increasing for 5 minutes. Rate: {{ $value | printf \"%.2f\" }}/s." + summary = "Embedding worker is producing errors" } + expr = "rate(embedding_errors_total[5m]) > 0" + for = "5m" + keep_firing_for = (known after apply) + labels = { + severity = "warning" } + record = (known after apply) }, + { + alert = "EmbeddingPipelineDown" + annotations = { + description = "No successful embeddings in 10 minutes while errors are increasing. Semantic search is degraded." + summary = "Embedding pipeline is down — errors with no successful embeddings" } + expr = "increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0" + for = "10m" + keep_firing_for = (known after apply) + labels = { + severity = "critical" } + record = (known after apply) }, ] }, ] } } } # kubernetes_manifest.embedding_worker_service_monitor will be created + resource "kubernetes_manifest" "embedding_worker_service_monitor" { + manifest = { + apiVersion = "monitoring.coreos.com/v1" + kind = "ServiceMonitor" + metadata = { + labels = { + app = "pal-e-docs-embedding-worker" } + name = "embedding-worker" + namespace = "monitoring" } + spec = { + endpoints = [ + { + interval = "30s" + path = "/metrics" + port = "metrics" }, ] + namespaceSelector = { + matchNames = [ + "pal-e-docs", ] } + selector = { + matchLabels = { + app = "pal-e-docs-embedding-worker" } } } } + object = { + apiVersion = "monitoring.coreos.com/v1" + kind = "ServiceMonitor" + metadata = { + annotations = (known after apply) + creationTimestamp = (known after apply) + deletionGracePeriodSeconds = (known after apply) + deletionTimestamp = (known after apply) + finalizers = (known after apply) + generateName = (known after apply) + generation = (known after apply) + labels = (known after apply) + managedFields = (known after apply) + name = "embedding-worker" + namespace = "monitoring" + ownerReferences = (known after apply) + resourceVersion = (known after apply) + selfLink = (known after apply) + uid = (known after apply) } + spec = { + attachMetadata = { + node = (known after apply) } + bodySizeLimit = (known after apply) + convertClassicHistogramsToNHCB = (known after apply) + endpoints = [ + { + authorization = { + credentials = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + type = (known after apply) } + basicAuth = { + password = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + username = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + bearerTokenFile = (known after apply) + bearerTokenSecret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + enableHttp2 = (known after apply) + filterRunning = (known after apply) + followRedirects = (known after apply) + honorLabels = (known after apply) + honorTimestamps = (known after apply) + interval = "30s" + metricRelabelings = (known after apply) + noProxy = (known after apply) + oauth2 = { + clientId = { + configMap = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + secret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + clientSecret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + endpointParams = (known after apply) + noProxy = (known after apply) + proxyConnectHeader = (known after apply) + proxyFromEnvironment = (known after apply) + proxyUrl = (known after apply) + scopes = (known after apply) + tlsConfig = { + ca = { + configMap = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + secret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + cert = { + configMap = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + secret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + insecureSkipVerify = (known after apply) + keySecret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + maxVersion = (known after apply) + minVersion = (known after apply) + serverName = (known after apply) } + tokenUrl = (known after apply) } + params = (known after apply) + path = "/metrics" + port = "metrics" + proxyConnectHeader = (known after apply) + proxyFromEnvironment = (known after apply) + proxyUrl = (known after apply) + relabelings = (known after apply) + scheme = (known after apply) + scrapeTimeout = (known after apply) + targetPort = (known after apply) + tlsConfig = { + ca = { + configMap = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + secret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + caFile = (known after apply) + cert = { + configMap = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + secret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } } + certFile = (known after apply) + insecureSkipVerify = (known after apply) + keyFile = (known after apply) + keySecret = { + key = (known after apply) + name = (known after apply) + optional = (known after apply) } + maxVersion = (known after apply) + minVersion = (known after apply) + serverName = (known after apply) } + trackTimestampsStaleness = (known after apply) }, ] + fallbackScrapeProtocol = (known after apply) + jobLabel = (known after apply) + keepDroppedTargets = (known after apply) + labelLimit = (known after apply) + labelNameLengthLimit = (known after apply) + labelValueLengthLimit = (known after apply) + namespaceSelector = { + any = (known after apply) + matchNames = [ + "pal-e-docs", ] } + nativeHistogramBucketLimit = (known after apply) + nativeHistogramMinBucketFactor = (known after apply) + podTargetLabels = (known after apply) + sampleLimit = (known after apply) + scrapeClass = (known after apply) + scrapeClassicHistograms = (known after apply) + scrapeNativeHistograms = (known after apply) + scrapeProtocols = (known after apply) + selector = { + matchExpressions = (known after apply) + matchLabels = { + app = "pal-e-docs-embedding-worker" } } + selectorMechanism = (known after apply) + serviceDiscoveryRole = (known after apply) + targetLabels = (known after apply) + targetLimit = (known after apply) } } } # kubernetes_service_v1.embedding_worker_metrics will be created + resource "kubernetes_service_v1" "embedding_worker_metrics" { + id = (known after apply) + status = (known after apply) + wait_for_load_balancer = true + metadata { + generation = (known after apply) + labels = { + "app" = "pal-e-docs-embedding-worker" } + name = "embedding-worker-metrics" + namespace = "pal-e-docs" + resource_version = (known after apply) + uid = (known after apply) } + spec { + allocate_load_balancer_node_ports = true + cluster_ip = (known after apply) + cluster_ips = (known after apply) + external_traffic_policy = (known after apply) + health_check_node_port = (known after apply) + internal_traffic_policy = (known after apply) + ip_families = (known after apply) + ip_family_policy = (known after apply) + publish_not_ready_addresses = false + selector = { + "app" = "pal-e-docs-embedding-worker" } + session_affinity = "None" + type = "ClusterIP" + port { + name = "metrics" + node_port = (known after apply) + port = 8001 + protocol = "TCP" + target_port = "8001" } + session_affinity_config (known after apply) } } Plan: 3 to add, 1 to change, 0 to destroy. ───────────────────────────────────────────────────────────────────────────── Note: You didn't use the -out option to save this plan, so OpenTofu can't guarantee to take exactly these actions if you run "tofu apply" now. ```

forgejo_admin commented

2026-03-16 02:55:59 +00:00

Author

Owner

Self-Review

Acceptance criteria vs implementation:

Criteria	Status
Ollama hostPath volume (`/var/lib/ollama`), PVC removed	Done
`tofu plan -lock=false` shows expected changes	Done (3 add, 2 change)
Prometheus scrapes embedding worker on `:8001`	Done (Service + ServiceMonitor)
Alert: `rate(embedding_errors_total[5m]) > 0` warning	Done
Alert: zero embeddings + errors for 10m critical	Done
`tofu fmt` passes	Done
`tofu validate` passes	Done

Design decisions:

hostPath via helm chart values -- Used the chart's volumes/volumeMounts values rather than raw k8s resources. This keeps the Ollama deployment as a single helm release. DirectoryOrCreate type ensures the directory is created on first deploy.
Service in pal-e-docs namespace -- The embedding worker has no existing k8s Service (only the main API pod does). Created embedding-worker-metrics Service to enable ServiceMonitor discovery. The Service lives in pal-e-docs namespace alongside the worker pod.
ServiceMonitor in monitoring namespace -- Follows the existing pattern (see dora_exporter_service_monitor). Uses namespaceSelector.matchNames to reach across to pal-e-docs namespace.
PrometheusRule labels -- Includes app.kubernetes.io/part-of: kube-prometheus-stack and release: kube-prometheus-stack to match the existing pattern (see blackbox_alerts).

No issues found. Ready for human review.

## Self-Review **Acceptance criteria vs implementation:** | Criteria | Status | |----------|--------| | Ollama hostPath volume (`/var/lib/ollama`), PVC removed | Done | | `tofu plan -lock=false` shows expected changes | Done (3 add, 2 change) | | Prometheus scrapes embedding worker on `:8001` | Done (Service + ServiceMonitor) | | Alert: `rate(embedding_errors_total[5m]) > 0` warning | Done | | Alert: zero embeddings + errors for 10m critical | Done | | `tofu fmt` passes | Done | | `tofu validate` passes | Done | **Design decisions:** 1. **hostPath via helm chart values** -- Used the chart's `volumes`/`volumeMounts` values rather than raw k8s resources. This keeps the Ollama deployment as a single helm release. `DirectoryOrCreate` type ensures the directory is created on first deploy. 2. **Service in pal-e-docs namespace** -- The embedding worker has no existing k8s Service (only the main API pod does). Created `embedding-worker-metrics` Service to enable ServiceMonitor discovery. The Service lives in `pal-e-docs` namespace alongside the worker pod. 3. **ServiceMonitor in monitoring namespace** -- Follows the existing pattern (see `dora_exporter_service_monitor`). Uses `namespaceSelector.matchNames` to reach across to `pal-e-docs` namespace. 4. **PrometheusRule labels** -- Includes `app.kubernetes.io/part-of: kube-prometheus-stack` and `release: kube-prometheus-stack` to match the existing pattern (see `blackbox_alerts`). **No issues found. Ready for human review.**

forgejo_admin commented

2026-03-16 02:58:00 +00:00

Author

Owner

PR #90 Review

DOMAIN REVIEW

Tech stack: Terraform (OpenTofu) / Kubernetes / Helm / Prometheus Operator CRDs

Ollama hostPath migration:

The PVC-to-hostPath swap is well-reasoned. The safety argument is sound: gpu.enabled = true + gpu.number = 1 in the Ollama chart values translates to a nvidia.com/gpu: 1 resource request, which guarantees the pod always schedules on the GPU node. DirectoryOrCreate is the correct hostPath.type -- it avoids requiring manual pre-provisioning. The host path /var/lib/ollama follows FHS conventions. The comment block at lines 1596-1602 clearly documents the rationale and the safety invariant.

One consideration: hostPath volumes are writable by the container as root, which is inherent to this pattern and acceptable here since Ollama needs write access to store models.

Embedding Worker Metrics Service (lines 1662-1682):

Clean. Uses data.kubernetes_namespace_v1.pal_e_docs for the namespace reference (consistent with being a data source -- the namespace is managed by pal-e-services, not this repo). Selector label app = "pal-e-docs-embedding-worker" must match the actual pod labels in the pal-e-docs deployment. Port 8001 for metrics is a reasonable non-conflicting port.

ServiceMonitor (lines 1684-1719):

Correctly placed in the monitoring namespace with a namespaceSelector.matchNames pointing to pal-e-docs. This is the right pattern for cross-namespace scraping (contrasted with the DORA exporter ServiceMonitor at line 1240, which omits namespaceSelector because both resources are in monitoring). depends_on = [helm_release.kube_prometheus_stack] ensures the CRD exists before the manifest is applied. 30-second scrape interval is appropriate for an operational metric.

PrometheusRule (lines 1726-1774):

Labels (app.kubernetes.io/part-of = "kube-prometheus-stack" and release = "kube-prometheus-stack") match the existing blackbox_alerts resource at line 488-490 -- consistent pattern for Prometheus rule discovery.

Alert logic review:

EmbeddingErrorRate: rate(embedding_errors_total[5m]) > 0 for 5m. Fires on any sustained error rate. This is correct for a warning-level alert -- even a low error rate in an embedding pipeline should be investigated.
EmbeddingPipelineDown: increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0 for 10m. This correctly identifies a stuck pipeline (errors happening but no successes). The for: 10m on top of the 10-minute window means the condition must persist for ~20 minutes total before firing, which avoids false positives during transient hiccups. Good severity escalation (critical vs warning).

Terraform style:

Follows existing patterns in the file (resource naming, depends_on, namespace references, comment blocks)
No tofu fmt issues (confirmed in PR body)
tofu validate passes (confirmed in PR body)
tofu plan output is included per repo convention

BLOCKERS

None.

No secrets or credentials in the diff.
No unvalidated user input (infrastructure-only change).
No DRY violations -- the PrometheusRule and ServiceMonitor follow the same pattern as existing resources but are for a distinct service (no duplicated auth/security logic).
This is an infrastructure change, not application code -- "test coverage" applies as post-deploy validation, which is documented in the Test Plan (port-forward metrics check, model persistence check, Prometheus targets check).

NITS

EmbeddingPipelineDown floating-point comparison: increase(embedding_total[10m]) == 0 uses exact float equality. In Prometheus, increase() can return very small non-zero values due to floating-point math on counter resets. Consider increase(embedding_total[10m]) < 1 instead. This is unlikely to cause issues in practice (the counter increments by whole numbers and resets are rare), but it is the more defensive pattern.
NetworkPolicy gap (acknowledged): The Discovered Scope section correctly notes that the embedding worker pod lacks a NetworkPolicy. This is properly deferred -- just confirming it is tracked.
Woodpecker set_sensitive reorder: The tofu plan output notes a no-op change to helm_release.woodpecker from set_sensitive block reordering. This is harmless (Terraform internal ordering) but worth confirming it produces no actual diff on apply.

SOP COMPLIANCE

Branch named after issue: 89-ollama-hostpath-embedding-alerting references issue #89
PR body follows template: Summary, Changes, Test Plan, Related all present
Related references plan slug: plan-pal-e-docs -- Phase F12
tofu plan -lock=false output included (per CLAUDE.md convention)
tofu fmt and tofu validate confirmed passing
No secrets committed
No unnecessary file changes (single file, all changes scoped to issue)
Commit messages are descriptive
Discovered scope documented and deferred (NetworkPolicy gap)
Post-deploy steps documented (error block reset)

PROCESS OBSERVATIONS

Deployment frequency: Single focused PR, clean scope. No risk of deployment bottleneck.
Change failure risk: Low. The hostPath change replaces an existing storage mechanism with a more durable one. The alerting additions are purely additive (new ServiceMonitor + PrometheusRule). The tofu plan shows 3 add / 2 change / 0 destroy -- no destructive operations.
Mean time to recovery: The embedding alerts directly improve MTTR for the embedding pipeline. The two-tier alert design (warning at 5m, critical at 10m+) gives operators time to respond before impact escalates to users.
Test plan quality: Pre-deploy checks (fmt, validate, plan) are done. Post-deploy checks are documented but unchecked -- these need manual verification after apply.

VERDICT: APPROVED

## PR #90 Review ### DOMAIN REVIEW **Tech stack:** Terraform (OpenTofu) / Kubernetes / Helm / Prometheus Operator CRDs **Ollama hostPath migration:** The PVC-to-hostPath swap is well-reasoned. The safety argument is sound: `gpu.enabled = true` + `gpu.number = 1` in the Ollama chart values translates to a `nvidia.com/gpu: 1` resource request, which guarantees the pod always schedules on the GPU node. `DirectoryOrCreate` is the correct `hostPath.type` -- it avoids requiring manual pre-provisioning. The host path `/var/lib/ollama` follows FHS conventions. The comment block at lines 1596-1602 clearly documents the rationale and the safety invariant. One consideration: `hostPath` volumes are writable by the container as root, which is inherent to this pattern and acceptable here since Ollama needs write access to store models. **Embedding Worker Metrics Service (lines 1662-1682):** Clean. Uses `data.kubernetes_namespace_v1.pal_e_docs` for the namespace reference (consistent with being a data source -- the namespace is managed by pal-e-services, not this repo). Selector label `app = "pal-e-docs-embedding-worker"` must match the actual pod labels in the pal-e-docs deployment. Port 8001 for metrics is a reasonable non-conflicting port. **ServiceMonitor (lines 1684-1719):** Correctly placed in the `monitoring` namespace with a `namespaceSelector.matchNames` pointing to `pal-e-docs`. This is the right pattern for cross-namespace scraping (contrasted with the DORA exporter ServiceMonitor at line 1240, which omits `namespaceSelector` because both resources are in `monitoring`). `depends_on = [helm_release.kube_prometheus_stack]` ensures the CRD exists before the manifest is applied. 30-second scrape interval is appropriate for an operational metric. **PrometheusRule (lines 1726-1774):** Labels (`app.kubernetes.io/part-of = "kube-prometheus-stack"` and `release = "kube-prometheus-stack"`) match the existing `blackbox_alerts` resource at line 488-490 -- consistent pattern for Prometheus rule discovery. Alert logic review: - `EmbeddingErrorRate`: `rate(embedding_errors_total[5m]) > 0` for 5m. Fires on any sustained error rate. This is correct for a warning-level alert -- even a low error rate in an embedding pipeline should be investigated. - `EmbeddingPipelineDown`: `increase(embedding_total[10m]) == 0 and increase(embedding_errors_total[10m]) > 0` for 10m. This correctly identifies a stuck pipeline (errors happening but no successes). The `for: 10m` on top of the 10-minute window means the condition must persist for ~20 minutes total before firing, which avoids false positives during transient hiccups. Good severity escalation (critical vs warning). **Terraform style:** - Follows existing patterns in the file (resource naming, `depends_on`, namespace references, comment blocks) - No `tofu fmt` issues (confirmed in PR body) - `tofu validate` passes (confirmed in PR body) - `tofu plan` output is included per repo convention ### BLOCKERS None. - No secrets or credentials in the diff. - No unvalidated user input (infrastructure-only change). - No DRY violations -- the PrometheusRule and ServiceMonitor follow the same pattern as existing resources but are for a distinct service (no duplicated auth/security logic). - This is an infrastructure change, not application code -- "test coverage" applies as post-deploy validation, which is documented in the Test Plan (port-forward metrics check, model persistence check, Prometheus targets check). ### NITS 1. **EmbeddingPipelineDown floating-point comparison:** `increase(embedding_total[10m]) == 0` uses exact float equality. In Prometheus, `increase()` can return very small non-zero values due to floating-point math on counter resets. Consider `increase(embedding_total[10m]) < 1` instead. This is unlikely to cause issues in practice (the counter increments by whole numbers and resets are rare), but it is the more defensive pattern. 2. **NetworkPolicy gap (acknowledged):** The Discovered Scope section correctly notes that the embedding worker pod lacks a NetworkPolicy. This is properly deferred -- just confirming it is tracked. 3. **Woodpecker `set_sensitive` reorder:** The `tofu plan` output notes a no-op change to `helm_release.woodpecker` from `set_sensitive` block reordering. This is harmless (Terraform internal ordering) but worth confirming it produces no actual diff on apply. ### SOP COMPLIANCE - [x] Branch named after issue: `89-ollama-hostpath-embedding-alerting` references issue #89 - [x] PR body follows template: Summary, Changes, Test Plan, Related all present - [x] Related references plan slug: `plan-pal-e-docs -- Phase F12` - [x] `tofu plan -lock=false` output included (per CLAUDE.md convention) - [x] `tofu fmt` and `tofu validate` confirmed passing - [x] No secrets committed - [x] No unnecessary file changes (single file, all changes scoped to issue) - [x] Commit messages are descriptive - [x] Discovered scope documented and deferred (NetworkPolicy gap) - [x] Post-deploy steps documented (error block reset) ### PROCESS OBSERVATIONS - **Deployment frequency:** Single focused PR, clean scope. No risk of deployment bottleneck. - **Change failure risk:** Low. The hostPath change replaces an existing storage mechanism with a more durable one. The alerting additions are purely additive (new ServiceMonitor + PrometheusRule). The `tofu plan` shows 3 add / 2 change / 0 destroy -- no destructive operations. - **Mean time to recovery:** The embedding alerts directly improve MTTR for the embedding pipeline. The two-tier alert design (warning at 5m, critical at 10m+) gives operators time to respond before impact escalates to users. - **Test plan quality:** Pre-deploy checks (fmt, validate, plan) are done. Post-deploy checks are documented but unchecked -- these need manual verification after apply. ### VERDICT: APPROVED