fix: switch 4 blackbox probes to internal service URLs #178
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!178
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "168-fix-blackbox-hairpin-probes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Blackbox exporter pods inside the cluster get TLS handshake failures (
unexpected eof while reading) when probing external Tailscale funnel URLs due to hairpin routing. This switches all 4 affected probes to internal k8s service URLs, eliminating the TLS hairpin path entirely. Same fix pattern as Keycloak probe in #117 / commit4213fde.Changes
terraform/main.tf— switched 4 blackbox probe targets from external funnel URLs to internal service URLs:pal-e-docs:https://pal-e-docs.tail5b443a.ts.net/healthz->http://pal-e-docs.pal-e-docs.svc.cluster.local:8000/healthzpal-e-app:https://pal-e-app.tail5b443a.ts.net->http://pal-e-app.pal-e-app.svc.cluster.local:3000westside-app:https://westsidekingsandqueens.tail5b443a.ts.net->http://westside-app.westsidekingsandqueens.svc.cluster.local:3000westside-dev:https://westside-dev.tail5b443a.ts.net->http://westside-dev.westsidekingsandqueens.svc.cluster.local:80tofu plan Output
Plan: 0 to add, 1 to change (blackbox_exporter), 0 to destroy. Two other unrelated changes in plan (woodpecker pending-upgrade status, harbor netpol argocd ingress) are pre-existing drift, not introduced by this PR.
Test Plan
tofu fmtclean (no changes)tofu plan -lock=falseshows only the 4 probe URL changes onhelm_release.blackbox_exporterprobe_success == 1via PromQLEndpointDownalert for pal-e-app clears in AlertManagerReview Checklist
Related
4213fde(Keycloak probe hairpin fix)Self-Review: APPROVED
Diff: 5 lines changed in 1 file (
terraform/main.tf)Verification
westsidekingsandqueensnamespace, notwestside-app/westside-devtofu fmtclean -- no formatting changes neededtofu plan -lock=falseconfirms onlyhelm_release.blackbox_exporteris affected by this change (4 URL swaps)No issues found. Ready for merge.
PR #178 Review
DOMAIN REVIEW
Tech stack: OpenTofu / Helm / k8s (Terraform HCL managing blackbox exporter probe targets via Helm values).
Internal URL correctness -- All 4 URLs verified against
pal-e-deploymentskustomize overlays and base service definitions:pal-e-docspal-e-docspal-e-apppal-e-appoverlays/pal-e-app/prod/kustomization.yamlwestside-appwestsidekingsandqueensoverlays/westsidekingsandqueens/prod/kustomization.yamlwestside-devwestsidekingsandqueensoverlays/westsidekingsandqueens/dev/service.yamlAll 4 internal URLs are correctly formed following the
http://<service>.<namespace>.svc.cluster.local:<port>pattern.Blackbox module compatibility -- The helm chart uses the default
http_2xxmodule for all targets. Switching fromhttps://external tohttp://internal requires no module config change. All existing platform probes (forgejo, woodpecker, grafana, alertmanager, harbor, argocd, keycloak, minio) already use this samehttp://internal pattern.Comment accuracy -- The updated comment ("internal URLs -- avoids TLS hairpin through Tailscale funnel") is now accurate. The previous comment ("external URLs -- validates full funnel path") was already partially wrong since
basketball-apiandplatform-validationwere already on internal URLs within the same block.Precedent -- PR references #117 / commit
4213fde(Keycloak hairpin fix) as the established pattern. Consistent approach.No state-breaking changes --
tofu planoutput confirms 0 add, 1 in-place update (blackbox_exporter helm release), 0 destroy. Safe change.BLOCKERS
None.
This is a pure configuration change (4 URL string swaps in Helm values). No new functionality requiring tests. No user input. No secrets. No auth logic.
NITS
pal-e-docs healthz path: The probe keeps
/healthzin the internal URL (http://pal-e-docs.pal-e-docs.svc.cluster.local:8000/healthz), which is correct -- pal-e-docs exposes a health endpoint there. The other probes hit root or no path, which is fine for static SvelteKit apps. No action needed, just noting the intentional asymmetry.Minor: funnel-path monitoring gap: By moving all probes to internal URLs, there is no longer any synthetic monitoring of the Tailscale funnel TLS path itself. This is a known and accepted tradeoff (the funnel path was causing false-positive
EndpointDownalerts due to hairpin failures). If external endpoint monitoring becomes desired later, a separate probe set with a longerforduration could be added. Not blocking -- just noting for future consideration.SOP COMPLIANCE
168-fix-blackbox-hairpin-probesreferences issue #168tofu planoutput included (PR convention for Terraform changes)tofu fmtconfirmed clean (Test Plan checkbox)-lock=falseused in plan (per memory:feedback_tofu_lock_false.md).envfiles in diff)PROCESS OBSERVATIONS
EndpointDownalerts. When the alert fires now, it means the service is actually down, not that the Tailscale funnel hairpin failed. Signal-to-noise improvement for on-call response.tofu applyreverting the URLs.svc.cluster.localURLs. Zero external funnel probes remain. Consistent, predictable configuration.VERDICT: APPROVED
e6519746949eb107cc68