Phase 8a: Platform namespace NetworkPolicies in Terraform #77
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-platform!77
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "76-phase-8a-platform-namespace-networkpolic"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
kubernetes_namespace_v1resourcespodSelector = {}(all pods) withpolicyTypes = ["Ingress"]only -- no egress restrictionsChanges
terraform/network-policies.tf(new): 9kubernetes_manifestresources for NetworkPoliciestofu plan Output
The 9 additions are the 9 NetworkPolicy resources. The 2 changes are pre-existing drift (dora_exporter secret write-only attribute, unrelated to this PR).
tofu fmt-- no changes needed (already formatted).tofu validate-- Success.Test Plan
tofu fmt network-policies.tfpasses cleantofu validatereturns Successtofu planshows 9 to add, 0 to destroykubectl get netpol -Ashowsdefault-deny-ingressin all 9 namespacesReview Checklist
Related
plan-pal-e-platformTofu Plan Output
PR #77 Review
DOMAIN REVIEW
Tech stack: Terraform (OpenTofu) / Kubernetes NetworkPolicy / k8s security hardening
This PR adds 9
kubernetes_manifestresources in a newterraform/network-policies.tffile, one default-deny-ingress NetworkPolicy per platform namespace. The approach is sound:podSelector = {}selects all pods,policyTypes = ["Ingress"]only restricts ingress (egress unrestricted), and each policy explicitly whitelists ingress sources vianamespaceSelectorusing the well-knownkubernetes.io/metadata.namelabel.Terraform style: Clean. All resources use
depends_onto reference the parent namespace. Namespace references use thekubernetes_namespace_v1.*.metadata[0].namepattern consistently (no hardcoded namespace strings). Resource naming followsnetpol_{namespace}convention. Comment header explains the design rationale (ArgoCD skipped, egress unrestricted).I verified every policy's ingress rules against the actual cross-namespace traffic flows declared in
main.tf. The following issues were found by tracing service URLs, ServiceMonitor/PodMonitor declarations, and Blackbox Exporter probe targets.BLOCKERS
B1. forgejo policy is missing
monitoringingress (will break Blackbox Exporter probes + DORA exporter)The Blackbox Exporter (deployed in
monitoring) probes Forgejo via internal URLhttp://forgejo-http.forgejo.svc.cluster.local:80(main.tf line 415). The DORA exporter (also inmonitoring) calls Forgejo athttp://forgejo-http.forgejo.svc.cluster.local:80(main.tf line 1141). Both are ingress TO forgejo FROM monitoring.Current policy only allows:
tailscale,woodpecker.Must also allow:
monitoring.After apply, Blackbox will report Forgejo as DOWN and DORA metrics collection will fail.
B2. woodpecker policy is missing
monitoringandcnpg-systemingresshttp://woodpecker-server.woodpecker.svc.cluster.local:80(main.tf line 420). DORA exporter hitshttp://woodpecker-server.woodpecker.svc.cluster.local:80(main.tf line 1139). Both are ingress FROM monitoring.cnpg-system) manages thewoodpecker-dbCluster CR in thewoodpeckernamespace (main.tf line 1462). The operator needs to connect to the Postgres pods it manages. This is ingress TO woodpecker FROM cnpg-system.enablePodMonitor = true, main.tf line 1527) -- ingress FROM monitoring.Current policy only allows:
tailscale,woodpecker(self).Must also allow:
monitoring,cnpg-system.After apply, Blackbox will report Woodpecker as DOWN, DORA metrics will fail, CNPG operator will lose management of woodpecker-db (backup scheduling, failover, health checks), and Prometheus will fail to scrape woodpecker-db metrics.
B3. minio policy is missing
monitoringandtofu-stateingresshttp://minio.minio.svc.cluster.local:9000/minio/health/live(main.tf line 450). Ingress FROM monitoring.serviceMonitor.enabled = true, main.tf line 1041). Ingress FROM monitoring.tofu-statenamespace and connects tohttp://minio.minio.svc.cluster.local:9000(main.tf line 2028). Ingress FROM tofu-state.Current policy only allows:
tailscale,postgres,woodpecker.Must also allow:
monitoring,tofu-state.After apply, Blackbox will report MinIO as DOWN, Prometheus MinIO metrics will stop, and daily TF state backups will fail silently.
B4. harbor policy is missing
woodpeckeringress (will break CI image pulls)Woodpecker CI pipeline pods (in
woodpeckernamespace) pull container images from Harbor during builds (kanikopushes to Harbor, pipeline pods pull base images). This is ingress TO harbor FROM woodpecker.Current policy only allows:
tailscale,harbor(self),monitoring.Must also allow:
woodpecker.After apply, CI pipelines that pull from Harbor will fail.
B5. cnpg-system policy is missing
monitoringingress (will break operator metrics scraping)The CNPG operator has
podMonitorEnabled = true(main.tf line 1358), meaning Prometheus scrapes the operator's metrics endpoint. This is ingress FROM monitoring TO cnpg-system.Current policy only allows:
kube-system.Must also allow:
monitoring.After apply, CNPG operator metrics will stop being collected by Prometheus.
NITS
N1. Consider a
localsblock for repeated namespace selector patternsThe
{ namespaceSelector = { matchLabels = { "kubernetes.io/metadata.name" = "tailscale" } } }pattern appears in every policy. Alocalsblock with common selectors (e.g.,local.from_tailscale,local.from_monitoring) would reduce repetition and make policies easier to read:N2. postgres policy may want
monitoringif Prometheus scrapes CNPG cluster metrics directlyCNPG clusters expose metrics via PodMonitor. If the CNPG Cluster in the
postgresnamespace has monitoring enabled (it does not appear to in this repo -- it is managed by pal-e-services), this would need monitoring ingress. Not a blocker for this PR since thepostgresnamespace CNPG cluster is defined elsewhere, but worth noting for the follow-up app-namespace phase.N3. Comment at top could document the traffic map
A brief traffic map comment (like the one in the issue body) at the top of the file would help future maintainers understand WHY each namespace has specific ingress sources without needing to cross-reference main.tf.
SOP COMPLIANCE
76-phase-8a-platform-namespace-networkpolic(matches issue #76)plan-pal-e-platformCloses #76in Related sectiontofu fmtandtofu validatepassed per PR bodytofu planoutput included (9 to add, 2 to change unrelated drift)PROCESS OBSERVATIONS
Change failure risk: HIGH. This PR as-is will break monitoring (Blackbox probes, DORA metrics, Prometheus scraping) and CI (Harbor image pulls) across multiple namespaces the moment it is applied. The blast radius covers most of the platform's observability stack and CI pipeline. The fixes are straightforward -- adding missing
monitoring,cnpg-system,tofu-state, andwoodpeckernamespace selectors to the affected policies -- but the current state would cause a multi-service outage.MTTR consideration: NetworkPolicy issues are notoriously hard to debug because failures manifest as silent timeouts, not explicit errors. Blackbox probes returning DOWN, CI pipelines hanging on image pull, and CNPG operator losing contact with managed databases would all appear as unrelated incidents. Including the traffic map as a comment in the file (N3) would significantly reduce future MTTR.
Positive notes: The design is architecturally sound. Default-deny with explicit allow-lists is the correct approach. Using
kubernetes.io/metadata.namelabels is the right selector strategy. Skipping ArgoCD (Helm-managed namespace) is a reasonable scoping decision. The PR body is thorough and the test plan is sensible.VERDICT: NOT APPROVED
Five blockers must be fixed: missing
monitoringingress on forgejo, woodpecker, minio, and cnpg-system; missingcnpg-systemon woodpecker; missingtofu-stateon minio; missingwoodpeckeron harbor. All are missing ingress rules that will break active cross-namespace traffic flows verified againstmain.tf.Tofu Plan Output
PR #77 Re-Review
DOMAIN REVIEW
Tech stack: Terraform (OpenTofu) / Kubernetes NetworkPolicies / k3s
Single new file:
terraform/network-policies.tf(195 lines). Creates 9kubernetes_manifestNetworkPolicy resources, one per platform namespace managed bykubernetes_namespace_v1. Each policy is default-deny ingress with explicit allow-list of source namespaces.Previous review found 5 blockers -- all 5 concerned missing ingress rules that would break real traffic flows. Verified each against the updated diff:
netpol_forgejoallows frommonitoringnetpol_woodpeckerallows frommonitoringandcnpg-systemnetpol_minioallows frommonitoringandtofu-statenetpol_harborallows fromwoodpeckernetpol_cnpg_systemallows frommonitoringTerraform patterns verified:
kubernetes_manifest(appropriate for raw k8s manifests in Terraform)depends_onreferencing the parent namespace resource (prevents race conditions)kubernetes_namespace_v1.*.metadata[0].name(no hardcoded strings)podSelector = {}correctly selects all pods in each namespacepolicyTypes = ["Ingress"]only -- egress unrestricted (documented in header comment)tofu fmtandtofu validateconfirmed clean per PR bodyCross-reference verification of all 9 policies against
main.tf:pal-e-postgres-rw.postgres.svc) + cnpg-system (operator). Correct.BLOCKERS
None. All 5 previous blockers are resolved.
NITS
Tailscale namespace excluded without comment: 10
kubernetes_namespace_v1resources exist but only 9 get policies. The PR body says "9 platform namespaces" without explaining why tailscale is excluded. Adding a one-line comment in the header (e.g., "Tailscale namespace excluded -- it handles ingress proxying for all funnels and must accept arbitrary inbound traffic") would prevent future confusion about whether it was forgotten.ArgoCD exclusion rationale: The header comment explains ArgoCD is skipped because its namespace is Helm-created, not standalone. This is accurate but worth noting as a future gap -- ArgoCD should eventually get a NetworkPolicy too (perhaps in a follow-up issue).
SOP COMPLIANCE
76-phase-8a-platform-namespace-networkpolic)plan-pal-e-platformCloses #76in Related sectiontofu fmtandtofu validateconfirmed cleantofu planoutput documented (9 to add, 2 pre-existing drift changes)PROCESS OBSERVATIONS
VERDICT: APPROVED