feat: tofu apply to provision notion-mcp-remote #296

Open
opened 2026-04-21 01:28:26 +00:00 by forgejo_admin · 3 comments
Contributor

Type

Feature

Lineage

Standalone — scoped from project-notion-mcp-remote. Executes service-onboarding-sop step 5 after the upstream PRs on pal-e-services (var.services) and pal-e-deployments (overlay) have landed.

Repo

forgejo_admin/pal-e-platform

User Story

As an operator
I want tofu apply to materialise the notion-mcp-remote service in the cluster
So that ArgoCD begins syncing and a public Tailscale Funnel URL becomes available for claude.ai to consume.

Context

This is the bring-up apply for notion-mcp-remote. After pal-e-services #var.services entry and pal-e-deployments #overlay land, running tofu plan/apply on pal-e-platform materialises:

  • Namespace notion-mcp-remote
  • ArgoCD Application pointed at overlays/notion-mcp-remote/prod
  • Harbor project + robot accounts + image pull secret
  • Tailscale Funnel ingress → public URL notion-mcp-remote.tail5b443a.ts.net

Lucas-approval gated. Always use -lock=false to avoid blocking CI. First bring-up requires full apply (not -target) — port/ingress resources don't reconcile under targeted applies.

File Targets

Expected: no file changes in this repo. This ticket is the apply operation and the recordkeeping around it.

If tofu plan surfaces missing plumbing (e.g. a missing netpol row or a service module tweak), open follow-up tickets rather than inline it here.

Files NOT to touch:

  • network-policies.tf — not needed (service only egresses to api.notion.com)

Acceptance Criteria

  • tofu plan -lock=false on pal-e-platform shows the expected diff: namespace, ArgoCD Application, Harbor project + robots, image pull secret, Tailscale Funnel ingress
  • No unexpected destructive changes (no resource deletions on unrelated services)
  • Lucas approves the plan before apply
  • tofu apply -lock=false completes successfully
  • kubectl get ns notion-mcp-remote returns the namespace
  • ArgoCD Application visible and reports Healthy + Synced after first image lands
  • notion-mcp-remote.tail5b443a.ts.net resolves and returns a response from the pod

Test Expectations

  • tofu fmt -check clean (no formatting changes required)
  • tofu validate passes
  • Plan output reviewed by Lucas and agent before apply
  • Run command: cd terraform && tofu plan -lock=false -var-file=k3s.tfvars

Constraints

  • Do NOT apply before the Pre-Deploy Validation Checklist is 100% green (see Related)
  • Application secrets (NOTION_OAUTH_CLIENT_*, SESSION_SECRET, ONBOARD_SECRET, BASE_URL) must already exist in the notion-mcp-remote namespace via kubectl create secret generic — ArgoCD will otherwise overwrite them. This is a sibling todo on the notion-mcp-remote board.
  • Always use -lock=false
  • Do not use -target — first deploys need full apply

Checklist

  • Pre-Deploy Validation Checklist all green
  • tofu plan reviewed
  • Lucas approval captured in ticket comment
  • tofu apply executed
  • Namespace + ArgoCD Application verified
  • Funnel URL reachable
  • project-notion-mcp-remote
  • arch-deployment-notion-mcp-remote
  • service-onboarding-sop (Pre-Deploy Validation Checklist)
  • story-notion-mcp-remote-ops-deploy-gitops
  • Sibling tickets: pal-e-services #var.services entry, pal-e-deployments #overlay, notion-mcp-remote #Harbor URL fix + #secrets todo
### Type Feature ### Lineage Standalone — scoped from `project-notion-mcp-remote`. Executes `service-onboarding-sop` step 5 after the upstream PRs on pal-e-services (var.services) and pal-e-deployments (overlay) have landed. ### Repo `forgejo_admin/pal-e-platform` ### User Story As an operator I want `tofu apply` to materialise the notion-mcp-remote service in the cluster So that ArgoCD begins syncing and a public Tailscale Funnel URL becomes available for claude.ai to consume. ### Context This is the bring-up apply for notion-mcp-remote. After pal-e-services #var.services entry and pal-e-deployments #overlay land, running `tofu plan/apply` on pal-e-platform materialises: - Namespace `notion-mcp-remote` - ArgoCD Application pointed at `overlays/notion-mcp-remote/prod` - Harbor project + robot accounts + image pull secret - Tailscale Funnel ingress → public URL `notion-mcp-remote.tail5b443a.ts.net` Lucas-approval gated. Always use `-lock=false` to avoid blocking CI. First bring-up requires full apply (not `-target`) — port/ingress resources don't reconcile under targeted applies. ### File Targets Expected: no file changes in this repo. This ticket is the **apply operation** and the recordkeeping around it. If `tofu plan` surfaces missing plumbing (e.g. a missing netpol row or a service module tweak), open follow-up tickets rather than inline it here. Files NOT to touch: - `network-policies.tf` — not needed (service only egresses to api.notion.com) ### Acceptance Criteria - [ ] `tofu plan -lock=false` on pal-e-platform shows the expected diff: namespace, ArgoCD Application, Harbor project + robots, image pull secret, Tailscale Funnel ingress - [ ] No unexpected destructive changes (no resource deletions on unrelated services) - [ ] Lucas approves the plan before apply - [ ] `tofu apply -lock=false` completes successfully - [ ] `kubectl get ns notion-mcp-remote` returns the namespace - [ ] ArgoCD Application visible and reports Healthy + Synced after first image lands - [ ] `notion-mcp-remote.tail5b443a.ts.net` resolves and returns a response from the pod ### Test Expectations - [ ] `tofu fmt -check` clean (no formatting changes required) - [ ] `tofu validate` passes - [ ] Plan output reviewed by Lucas and agent before apply - Run command: `cd terraform && tofu plan -lock=false -var-file=k3s.tfvars` ### Constraints - Do NOT apply before the Pre-Deploy Validation Checklist is 100% green (see Related) - Application secrets (NOTION_OAUTH_CLIENT_*, SESSION_SECRET, ONBOARD_SECRET, BASE_URL) must already exist in the `notion-mcp-remote` namespace via `kubectl create secret generic` — ArgoCD will otherwise overwrite them. This is a sibling todo on the notion-mcp-remote board. - Always use `-lock=false` - Do not use `-target` — first deploys need full apply ### Checklist - [ ] Pre-Deploy Validation Checklist all green - [ ] `tofu plan` reviewed - [ ] Lucas approval captured in ticket comment - [ ] `tofu apply` executed - [ ] Namespace + ArgoCD Application verified - [ ] Funnel URL reachable ### Related - `project-notion-mcp-remote` - `arch-deployment-notion-mcp-remote` - `service-onboarding-sop` (Pre-Deploy Validation Checklist) - `story-notion-mcp-remote-ops-deploy-gitops` - Sibling tickets: pal-e-services #var.services entry, pal-e-deployments #overlay, notion-mcp-remote #Harbor URL fix + #secrets todo
Author
Contributor

Scope Review: NEEDS_REFINEMENT

Review note: review-1045-2026-04-21

Template, dependencies, acceptance criteria, and guardrails are all solid. One traceability gap to resolve:

  • [LABEL] or [SCOPE] — the arch:argocd label on this board item has no backing arch-argocd note in pal-e-docs. The ticket body correctly anchors on arch-deployment-notion-mcp-remote (which exists). Recommend either:
    1. Preferred: swap the board label to arch:deployment-notion-mcp-remote so the label matches the real backing note, or
    2. Create a shared arch-argocd platform-component note (also reusable by pal-e-services #57 which carries the same label).

No decomposition needed (single apply, ~3 min agent work, Lucas-gated). Apply-ready once pal-e-services #57 (var.services entry) and pal-e-deployments #132 (overlay) merge and the pre-deploy validation checklist is 100% green.

## Scope Review: NEEDS_REFINEMENT Review note: `review-1045-2026-04-21` Template, dependencies, acceptance criteria, and guardrails are all solid. One traceability gap to resolve: - `[LABEL]` or `[SCOPE]` — the `arch:argocd` label on this board item has no backing `arch-argocd` note in pal-e-docs. The ticket body correctly anchors on `arch-deployment-notion-mcp-remote` (which exists). Recommend either: 1. **Preferred:** swap the board label to `arch:deployment-notion-mcp-remote` so the label matches the real backing note, or 2. Create a shared `arch-argocd` platform-component note (also reusable by pal-e-services #57 which carries the same label). No decomposition needed (single apply, ~3 min agent work, Lucas-gated). Apply-ready once pal-e-services #57 (var.services entry) and pal-e-deployments #132 (overlay) merge and the pre-deploy validation checklist is 100% green.
Author
Contributor

Scope Review: APPROVED (re-review)

Review note: review-1045-2026-04-21-v2

Prior review (review-1045-2026-04-21) flagged a single gap: the arch:argocd label had no backing note in pal-e-docs. Main session relabeled board item 1045 to arch:deployment-notion-mcp-remote, which has a backing active architecture note (arch-deployment-notion-mcp-remote, id 1548) — and matches the architectural anchor already referenced in this ticket's body.

Traceability triangle complete. Template complete. AC agent-verifiable. Dependencies (#57, #132, notion-mcp-remote #7) documented and sequenced. Lucas approval gate explicit. No decomposition needed.

Ticket is apply-ready once pal-e-services #57 and pal-e-deployments #132 land and the pre-deploy validation checklist is green.

## Scope Review: APPROVED (re-review) Review note: `review-1045-2026-04-21-v2` Prior review (`review-1045-2026-04-21`) flagged a single gap: the `arch:argocd` label had no backing note in pal-e-docs. Main session relabeled board item 1045 to `arch:deployment-notion-mcp-remote`, which has a backing active architecture note (`arch-deployment-notion-mcp-remote`, id 1548) — and matches the architectural anchor already referenced in this ticket's body. Traceability triangle complete. Template complete. AC agent-verifiable. Dependencies (#57, #132, notion-mcp-remote #7) documented and sequenced. Lucas approval gate explicit. No decomposition needed. Ticket is apply-ready once pal-e-services #57 and pal-e-deployments #132 land and the pre-deploy validation checklist is green.
Author
Contributor

Status — 2026-05-03 (mid-cascade pause for operator review)

Both upstream PRs landed:

Local clones reconciled:

  • ~/pal-e-services: origin was misconfigured (pointed at GitHub mirror); fetched + fast-forward-pulled via forgejo remote. HEAD now at b671634. Working tree clean. New notion-mcp-remote block mirrored from k3s.tfvars.example into live terraform/k3s.tfvars (gitignored, operator-only file).
  • ~/pal-e-deployments: NOT touched. On feature branch 75-rename-pal-e-production with 1 unpushed commit (f9c850e revert: basketball-api to pre-merge SHA (PR #490 broke checkout)). Operator handles when convenient; not blocking for tofu apply since this repo is read by ArgoCD, not the local checkout.

tofu plan output — escalation required

tofu plan -lock=false -var-file=k3s.tfvars ran successfully.

Plan: 7 to add, 14 to change, 0 to destroy.

The 7 adds (expected, match PR #72 test plan exactly)

  • kubernetes_namespace_v1.service["notion-mcp-remote"]
  • harbor_project.service["notion-mcp-remote"]
  • 2 robot accounts (CI + image puller)
  • kubernetes_secret_v1.harbor_creds["notion-mcp-remote"]
  • argocd_application for notion-mcp-remote (points at overlays/notion-mcp-remote/prod)
  • Tailscale Funnel ingress for notion-mcp-remote.tail5b443a.ts.net

Plus output additions:

+ ci_robot_usernames.notion-mcp-remote = (known after apply)
+ service_urls.notion-mcp-remote       = "https://notion-mcp-remote.tail5b443a.ts.net"

The 14 unrelated changes — pre-existing drift, NOT from PR #72

Per PR #72 test plan: "If unrelated drift appears, STOP and escalate (precedent: review-1064-2026-04-20)." Stopping here.

Two distinct drift patterns visible:

  1. Stale ArgoCD label removal across 7 namespaces (platform-validation, playme2k, westside-admin, westside-ai-assistant, westsidekingsandqueens, plus 2 more not in tail output): each harbor_creds secret has an argocd.argoproj.io/instance label that isn't in the .tf source — terraform wants to strip them. Likely benign (ArgoCD re-applies its own labels on next sync) but unverified.
  2. Write-only attribute migration: binary_data_wo and data_wo on the same secrets. Kubernetes provider upgrade artifact, no actual data change.

Operator decision needed

Option Action Tradeoff
A Targeted apply (-target=... for the 7 new resources only) Safest. Mid-cascade isolation. Drift becomes its own ticket.
B Investigate drift first, file separate cleanup PR Cleanest audit. Slowest.
C Apply everything in one tofu apply Fastest. Mixes concerns; cause/effect harder to debug if anything breaks.

Ava's recommendation: A — preserves the kanban discipline of one-ticket-one-change while we're mid-cascade, drift cleanup gets its own platform-board ticket so it's properly traced.

What blocks closure of this ticket

  1. Operator decision on A/B/C above
  2. tofu apply runs successfully
  3. Validation: ArgoCD Application appears, syncs, all 4 overlay resources land in notion-mcp-remote namespace, Funnel hostname resolves over HTTPS
  4. Then this ticket → done, downstream cascade unblocks (#1046 already operator-runnable in parallel; #1047, #1048, #1049 sequential after this)
  • Project page: project-notion-mcp-remote (Status section reflects this state)
  • Architecture: arch-deployment-notion-mcp-remote
  • SOP: sop-platform-tf-changes
  • Discovered scope spike: forgejo_admin/notion-mcp-remote#11
## Status — 2026-05-03 (mid-cascade pause for operator review) Both upstream PRs landed: - forgejo_admin/pal-e-services#72 (var.services entry) — MERGED - forgejo_admin/pal-e-deployments#138 (kustomize overlay) — MERGED Local clones reconciled: - `~/pal-e-services`: origin was misconfigured (pointed at GitHub mirror); fetched + fast-forward-pulled via `forgejo` remote. HEAD now at `b671634`. Working tree clean. New `notion-mcp-remote` block mirrored from `k3s.tfvars.example` into live `terraform/k3s.tfvars` (gitignored, operator-only file). - `~/pal-e-deployments`: NOT touched. On feature branch `75-rename-pal-e-production` with 1 unpushed commit (`f9c850e revert: basketball-api to pre-merge SHA (PR #490 broke checkout)`). Operator handles when convenient; not blocking for `tofu apply` since this repo is read by ArgoCD, not the local checkout. ## tofu plan output — escalation required `tofu plan -lock=false -var-file=k3s.tfvars` ran successfully. ``` Plan: 7 to add, 14 to change, 0 to destroy. ``` ### The 7 adds (expected, match PR #72 test plan exactly) - `kubernetes_namespace_v1.service["notion-mcp-remote"]` - `harbor_project.service["notion-mcp-remote"]` - 2 robot accounts (CI + image puller) - `kubernetes_secret_v1.harbor_creds["notion-mcp-remote"]` - `argocd_application` for notion-mcp-remote (points at `overlays/notion-mcp-remote/prod`) - Tailscale Funnel ingress for `notion-mcp-remote.tail5b443a.ts.net` Plus output additions: ``` + ci_robot_usernames.notion-mcp-remote = (known after apply) + service_urls.notion-mcp-remote = "https://notion-mcp-remote.tail5b443a.ts.net" ``` ### The 14 unrelated changes — pre-existing drift, NOT from PR #72 Per PR #72 test plan: *"If unrelated drift appears, STOP and escalate (precedent: review-1064-2026-04-20)."* Stopping here. Two distinct drift patterns visible: 1. **Stale ArgoCD label removal** across 7 namespaces (platform-validation, playme2k, westside-admin, westside-ai-assistant, westsidekingsandqueens, plus 2 more not in tail output): each `harbor_creds` secret has an `argocd.argoproj.io/instance` label that isn't in the `.tf` source — terraform wants to strip them. Likely benign (ArgoCD re-applies its own labels on next sync) but unverified. 2. **Write-only attribute migration**: `binary_data_wo` and `data_wo` on the same secrets. Kubernetes provider upgrade artifact, no actual data change. ### Operator decision needed | Option | Action | Tradeoff | |---|---|---| | **A** | Targeted apply (`-target=...` for the 7 new resources only) | Safest. Mid-cascade isolation. Drift becomes its own ticket. | | **B** | Investigate drift first, file separate cleanup PR | Cleanest audit. Slowest. | | **C** | Apply everything in one `tofu apply` | Fastest. Mixes concerns; cause/effect harder to debug if anything breaks. | Ava's recommendation: **A** — preserves the kanban discipline of one-ticket-one-change while we're mid-cascade, drift cleanup gets its own platform-board ticket so it's properly traced. ## What blocks closure of this ticket 1. Operator decision on A/B/C above 2. `tofu apply` runs successfully 3. Validation: ArgoCD Application appears, syncs, all 4 overlay resources land in `notion-mcp-remote` namespace, Funnel hostname resolves over HTTPS 4. Then this ticket → `done`, downstream cascade unblocks (#1046 already operator-runnable in parallel; #1047, #1048, #1049 sequential after this) ## Related - Project page: `project-notion-mcp-remote` (Status section reflects this state) - Architecture: `arch-deployment-notion-mcp-remote` - SOP: `sop-platform-tf-changes` - Discovered scope spike: forgejo_admin/notion-mcp-remote#11
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-platform#296
No description provided.