docs: fix CLAUDE.md, restructure README, verify docs/ (#426) #431

Merged
ldraney merged 2 commits from 426-docs-audit into main 2026-06-13 19:45:35 +00:00
Owner

Summary

Fixes stale CLAUDE.md, restructures README as a documentation TOC, and creates five new docs/ files covering architecture, monitoring, database, networking, and secrets.

Changes

  • CLAUDE.md -- rewrote from scratch. Old version had wrong state backend ("local" vs kubernetes), missing providers (only mentioned 3 of 5), no key paths, no run commands, stale agent dispatch section
  • README.md -- restructured as TOC pointing to docs/. Fixed module count (10/11 -> 12, missing admin), dashboard count (4 -> 8), repo URLs (github.com -> forgejo), removed inline content moved to docs/
  • docs/architecture.md -- new: module inventory, provider table, dependency graph, state management, network policy overview, moved blocks explanation
  • docs/monitoring.md -- new: Prometheus/Grafana/Loki stack, all 8 dashboards, DORA exporter, alerting config, ServiceMonitors
  • docs/database.md -- new: CNPG operator vs cluster distinction, backup strategy, shared database model, network policies
  • docs/networking.md -- new: Tailscale funnel inventory, full network policy matrix, custom domain architecture
  • docs/secrets.md -- new: Salt GPG pipeline, complete 26-variable secret inventory, validation flow, how to add new secrets

Test Plan

  • Verify all doc links in README resolve
  • Confirm module count (12) matches ls terraform/modules/
  • Confirm dashboard count (8) matches ls terraform/dashboards/
  • Confirm provider count (5) matches terraform/versions.tf

Review Checklist

  • CLAUDE.md reflects actual tech stack and paths
  • README module table matches terraform/modules/ directory
  • All new docs cross-referenced from README TOC
  • Existing docs (hetzner-edge, keycloak-smtp, spikes/) preserved and linked
  • No inline content duplicated between README and docs/

None -- docs-only change, no pal-e-docs notes affected.

Closes #426

## Summary Fixes stale CLAUDE.md, restructures README as a documentation TOC, and creates five new docs/ files covering architecture, monitoring, database, networking, and secrets. ## Changes - **CLAUDE.md** -- rewrote from scratch. Old version had wrong state backend ("local" vs kubernetes), missing providers (only mentioned 3 of 5), no key paths, no run commands, stale agent dispatch section - **README.md** -- restructured as TOC pointing to docs/. Fixed module count (10/11 -> 12, missing `admin`), dashboard count (4 -> 8), repo URLs (github.com -> forgejo), removed inline content moved to docs/ - **docs/architecture.md** -- new: module inventory, provider table, dependency graph, state management, network policy overview, moved blocks explanation - **docs/monitoring.md** -- new: Prometheus/Grafana/Loki stack, all 8 dashboards, DORA exporter, alerting config, ServiceMonitors - **docs/database.md** -- new: CNPG operator vs cluster distinction, backup strategy, shared database model, network policies - **docs/networking.md** -- new: Tailscale funnel inventory, full network policy matrix, custom domain architecture - **docs/secrets.md** -- new: Salt GPG pipeline, complete 26-variable secret inventory, validation flow, how to add new secrets ## Test Plan - [ ] Verify all doc links in README resolve - [ ] Confirm module count (12) matches `ls terraform/modules/` - [ ] Confirm dashboard count (8) matches `ls terraform/dashboards/` - [ ] Confirm provider count (5) matches `terraform/versions.tf` ## Review Checklist - [x] CLAUDE.md reflects actual tech stack and paths - [x] README module table matches terraform/modules/ directory - [x] All new docs cross-referenced from README TOC - [x] Existing docs (hetzner-edge, keycloak-smtp, spikes/) preserved and linked - [x] No inline content duplicated between README and docs/ ## Related Notes None -- docs-only change, no pal-e-docs notes affected. ## Related Closes #426
docs: fix CLAUDE.md, restructure README as TOC, create docs/ (#426)
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
d391d5b9ec
CLAUDE.md was stale (wrong state backend, missing providers/paths).
README had wrong module count (10/11 instead of 12), wrong dashboard
count (4 instead of 8), missing admin module, github.com URLs instead
of forgejo. Restructured README as documentation TOC. Created five new
docs: architecture, monitoring, database, networking, secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #431 Review

DOMAIN REVIEW

Tech stack: OpenTofu IaC, Kubernetes (k3s), Helm, Tailscale, SaltStack, CNPG. This is a docs-only PR -- no infrastructure changes. Review focused on factual accuracy of documentation against the actual repo contents.

Verification method: Cross-referenced every numeric claim (module count, dashboard count, provider count, secret count, funnel count, namespace count) against the actual files on main.

Fact-check results:

Claim Source File Actual Verdict
12 modules terraform/modules/ 12 directories (admin, ci, database, forgejo, harbor, hetzner-edge, keycloak, monitoring, networking, ops, staging, storage) Correct
8 dashboards terraform/dashboards/ 8 JSON files Correct
5 providers terraform/versions.tf kubernetes, helm, tailscale, minio, hcloud Correct
26 TF_SECRET_VARS Makefile 26 entries Correct
9 funnels terraform/modules/networking/main.tf 9 kubernetes_ingress_v1 resources Correct
10 namespace network policies terraform/network-policies.tf 10 policies (monitoring, forgejo, woodpecker, harbor, minio, keycloak, postgres, basketball-api, staging, cnpg-system) Correct
5 depends_on relationships terraform/main.tf ci->forgejo+database, harbor->monitoring, storage->monitoring, ops->storage, admin->keycloak Correct

Provider version constraints in docs/architecture.md match terraform/versions.tf exactly.

Network policy matrix in docs/networking.md matches terraform/network-policies.tf exactly -- every namespace and every allowed ingress source verified line by line.

Secret inventory in docs/secrets.md matches Makefile TF_SECRET_VARS exactly -- all 26 variables present with correct usage descriptions.

Dashboard inventory in docs/monitoring.md matches terraform/dashboards/ filenames exactly.

Content coverage: The old README's inline sections (Key Design Decisions, GitOps Pipeline, Observability, Tech Stack) are fully covered by the new docs:

  • "Tailscale for ingress" -> docs/networking.md
  • "Salt for secrets" -> docs/secrets.md
  • "In-cluster state" -> docs/architecture.md State Management section
  • "DORA from day one" -> docs/monitoring.md DORA Exporter section
  • GitOps pipeline flow -> docs/architecture.md dependency graph
  • Observability dashboard count -> docs/monitoring.md (corrected from 4 to 8)
  • Tech Stack listing -> CLAUDE.md Tech Stack section

Old stale claims fixed:

  • Module count: old README said 10 (body) and 11 (layout). Actual is 12 (missing admin). Fixed.
  • Dashboard count: old README said 4. Actual is 8. Fixed.
  • Secret count: old README said 15. Actual is 26. Fixed.
  • Repo URLs: old README pointed to github.com. Fixed to forgejo.tail5b443a.ts.net.
  • CLAUDE.md state backend: old said "Local terraform state". Actual is Kubernetes backend. Fixed.

BLOCKERS

None.

NITS

  1. Dependency graph in docs/architecture.md is slightly misleading. The ASCII graph implies networking is independent (top of the graph), but networking actually takes namespace outputs from 7 other modules (monitoring, forgejo, ci, harbor, storage, keycloak, admin). These are implicit data dependencies through variable references, not explicit depends_on, but the graph gives the impression that networking has no dependencies. Consider adding a note that networking has implicit dependencies via namespace inputs.

  2. docs/superpowers/ not linked from README. The docs/ directory contains 7 plans/specs under docs/superpowers/ (plans and design specs from March-May 2026). The README Documentation table links only to the 5 new docs + 3 existing docs. The superpowers/ directory is unlisted. Not a blocker since those are historical spike/plan artifacts, but a brief mention or link would make the TOC complete.

  3. minio_oidc_client_secret undocumented. terraform/variables.tf defines a minio_oidc_client_secret variable (sensitive, default = ""). It is not in the Makefile's TF_SECRET_VARS list, which is why docs/secrets.md correctly omits it from the pipeline inventory. However, it exists as an optional variable that could be set. A brief mention in docs/secrets.md about optional secrets (those with defaults that are not part of the validation pipeline) would prevent future confusion when someone finds it in variables.tf.

  4. docs/networking.md funnel table omits hostname column. Each funnel creates a specific *.tail5b443a.ts.net hostname. The table lists Service and Namespace Source but not the actual hostname. Adding the hostname (e.g., grafana.tail5b443a.ts.net) would make the inventory more operationally useful.

  5. docs/architecture.md module inventory: ops module description. The table says the ops module manages "NVIDIA device plugin (Helm), embedding worker metrics service, tofu state backup CronJob". This omits Ollama, which the README module table correctly includes ("NVIDIA GPU plugin, Ollama, embedding worker, tofu state backup CronJobs"). The architecture doc should match.

SOP COMPLIANCE

  • PR body follows template (Summary, Changes, Test Plan, Related sections present)
  • No secrets committed
  • No unnecessary file changes (7 files, all docs-related to #426)
  • Commit messages -- single commit expected for docs-only PR
  • Repo URLs corrected from GitHub to Forgejo throughout
  • Existing docs (hetzner-edge.md, keycloak-smtp.md, spikes/) preserved and linked

PROCESS OBSERVATIONS

  • This PR corrects significant documentation drift. The old README had 3 wrong counts (modules, dashboards, secrets), a wrong state backend claim, and pointed to a different git host entirely. Good catch.
  • The docs/ decomposition follows the landscaping-assistant reference pattern: README as TOC, focused topic docs, thin CLAUDE.md with dev-only content.
  • The PR body's test plan includes 4 verifiable checks -- all pass against the actual repo.

VERDICT: APPROVED

## PR #431 Review ### DOMAIN REVIEW **Tech stack:** OpenTofu IaC, Kubernetes (k3s), Helm, Tailscale, SaltStack, CNPG. This is a docs-only PR -- no infrastructure changes. Review focused on factual accuracy of documentation against the actual repo contents. **Verification method:** Cross-referenced every numeric claim (module count, dashboard count, provider count, secret count, funnel count, namespace count) against the actual files on `main`. **Fact-check results:** | Claim | Source File | Actual | Verdict | |-------|------------|--------|---------| | 12 modules | `terraform/modules/` | 12 directories (admin, ci, database, forgejo, harbor, hetzner-edge, keycloak, monitoring, networking, ops, staging, storage) | Correct | | 8 dashboards | `terraform/dashboards/` | 8 JSON files | Correct | | 5 providers | `terraform/versions.tf` | kubernetes, helm, tailscale, minio, hcloud | Correct | | 26 TF_SECRET_VARS | `Makefile` | 26 entries | Correct | | 9 funnels | `terraform/modules/networking/main.tf` | 9 `kubernetes_ingress_v1` resources | Correct | | 10 namespace network policies | `terraform/network-policies.tf` | 10 policies (monitoring, forgejo, woodpecker, harbor, minio, keycloak, postgres, basketball-api, staging, cnpg-system) | Correct | | 5 depends_on relationships | `terraform/main.tf` | ci->forgejo+database, harbor->monitoring, storage->monitoring, ops->storage, admin->keycloak | Correct | Provider version constraints in `docs/architecture.md` match `terraform/versions.tf` exactly. Network policy matrix in `docs/networking.md` matches `terraform/network-policies.tf` exactly -- every namespace and every allowed ingress source verified line by line. Secret inventory in `docs/secrets.md` matches `Makefile` `TF_SECRET_VARS` exactly -- all 26 variables present with correct usage descriptions. Dashboard inventory in `docs/monitoring.md` matches `terraform/dashboards/` filenames exactly. **Content coverage:** The old README's inline sections (Key Design Decisions, GitOps Pipeline, Observability, Tech Stack) are fully covered by the new docs: - "Tailscale for ingress" -> `docs/networking.md` - "Salt for secrets" -> `docs/secrets.md` - "In-cluster state" -> `docs/architecture.md` State Management section - "DORA from day one" -> `docs/monitoring.md` DORA Exporter section - GitOps pipeline flow -> `docs/architecture.md` dependency graph - Observability dashboard count -> `docs/monitoring.md` (corrected from 4 to 8) - Tech Stack listing -> `CLAUDE.md` Tech Stack section **Old stale claims fixed:** - Module count: old README said 10 (body) and 11 (layout). Actual is 12 (missing `admin`). Fixed. - Dashboard count: old README said 4. Actual is 8. Fixed. - Secret count: old README said 15. Actual is 26. Fixed. - Repo URLs: old README pointed to `github.com`. Fixed to `forgejo.tail5b443a.ts.net`. - CLAUDE.md state backend: old said "Local terraform state". Actual is Kubernetes backend. Fixed. ### BLOCKERS None. ### NITS 1. **Dependency graph in `docs/architecture.md` is slightly misleading.** The ASCII graph implies `networking` is independent (top of the graph), but `networking` actually takes namespace outputs from 7 other modules (`monitoring`, `forgejo`, `ci`, `harbor`, `storage`, `keycloak`, `admin`). These are implicit data dependencies through variable references, not explicit `depends_on`, but the graph gives the impression that `networking` has no dependencies. Consider adding a note that `networking` has implicit dependencies via namespace inputs. 2. **`docs/superpowers/` not linked from README.** The `docs/` directory contains 7 plans/specs under `docs/superpowers/` (plans and design specs from March-May 2026). The README Documentation table links only to the 5 new docs + 3 existing docs. The superpowers/ directory is unlisted. Not a blocker since those are historical spike/plan artifacts, but a brief mention or link would make the TOC complete. 3. **`minio_oidc_client_secret` undocumented.** `terraform/variables.tf` defines a `minio_oidc_client_secret` variable (sensitive, `default = ""`). It is not in the Makefile's `TF_SECRET_VARS` list, which is why `docs/secrets.md` correctly omits it from the pipeline inventory. However, it exists as an optional variable that could be set. A brief mention in `docs/secrets.md` about optional secrets (those with defaults that are not part of the validation pipeline) would prevent future confusion when someone finds it in `variables.tf`. 4. **`docs/networking.md` funnel table omits hostname column.** Each funnel creates a specific `*.tail5b443a.ts.net` hostname. The table lists Service and Namespace Source but not the actual hostname. Adding the hostname (e.g., `grafana.tail5b443a.ts.net`) would make the inventory more operationally useful. 5. **`docs/architecture.md` module inventory: `ops` module description.** The table says the ops module manages "NVIDIA device plugin (Helm), embedding worker metrics service, tofu state backup CronJob". This omits Ollama, which the README module table correctly includes ("NVIDIA GPU plugin, Ollama, embedding worker, tofu state backup CronJobs"). The architecture doc should match. ### SOP COMPLIANCE - [x] PR body follows template (Summary, Changes, Test Plan, Related sections present) - [x] No secrets committed - [x] No unnecessary file changes (7 files, all docs-related to #426) - [x] Commit messages -- single commit expected for docs-only PR - [x] Repo URLs corrected from GitHub to Forgejo throughout - [x] Existing docs (hetzner-edge.md, keycloak-smtp.md, spikes/) preserved and linked ### PROCESS OBSERVATIONS - This PR corrects significant documentation drift. The old README had 3 wrong counts (modules, dashboards, secrets), a wrong state backend claim, and pointed to a different git host entirely. Good catch. - The docs/ decomposition follows the landscaping-assistant reference pattern: README as TOC, focused topic docs, thin CLAUDE.md with dev-only content. - The PR body's test plan includes 4 verifiable checks -- all pass against the actual repo. ### VERDICT: APPROVED
docs: fix review nits in architecture.md
All checks were successful
ci/woodpecker/push/terraform Pipeline was successful
ci/woodpecker/pr/terraform Pipeline was successful
ci/woodpecker/pull_request_closed/terraform Pipeline was successful
167da53f34
Clarify dependency graph to distinguish explicit depends_on from
implicit input-based dependencies. Add Ollama to ops module description.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #431 Review

DOMAIN REVIEW

Tech stack: Terraform/OpenTofu IaC (docs-only change -- no HCL modified). 7 files changed: CLAUDE.md, README.md, and 5 new docs/ files (architecture.md, database.md, monitoring.md, networking.md, secrets.md).

Verification against live repo state:

Claim Verified
12 modules in terraform/modules/ Yes -- admin, ci, database, forgejo, harbor, hetzner-edge, keycloak, monitoring, networking, ops, staging, storage
8 dashboards in terraform/dashboards/ Yes -- dora, uptime, pal-e-app, basketball-api, landscaping-assistant, believers-elite, playme2k, mac-agent
5 providers (kubernetes, helm, tailscale, minio, hcloud) Yes -- versions.tf lines 10-31, providers.tf lines 1-25
26 TF_SECRET_VARS in Makefile Yes -- all 26 match the secrets.md inventory table
depends_on graph (ci->forgejo+database, harbor->monitoring, storage->monitoring, ops->storage, admin->keycloak) Yes -- main.tf confirmed
Network policy matrix (10 namespaces) Yes -- network-policies.tf confirmed, namespace lists match docs/networking.md
Kubernetes backend (tofu-state namespace, pal-e-platform suffix) Yes -- versions.tf backend block confirmed
Existing docs preserved (hetzner-edge.md, keycloak-smtp.md, spikes/) Yes -- README TOC links to all existing and new docs

CLAUDE.md assessment: Properly thin and dev-focused. Contains: tech stack (5 providers, state backend, secrets pipeline, Salt), key paths (12 entries covering root TF files, modules, salt, Makefile, docs), running commands (make targets for tofu and salt), PR conventions. No agent dispatch boilerplate, no roadmap, no inline content that belongs in docs/. Old version had wrong state backend ("local" vs kubernetes), only 3 providers, and stale agent dispatch section -- all fixed.

README assessment: Successfully restructured as TOC. Module table updated from 10/11 to 12 (added admin). Dashboard count corrected from 4 to 8. Repo URLs corrected from github.com to forgejo.tail5b443a.ts.net. Inline sections (Key Design Decisions, GitOps Pipeline, Observability, detailed Tech Stack) properly migrated to docs/ files. Documentation table links all 8 doc files.

BLOCKERS

None.

NITS

  1. Dependency graph diagram incomplete (docs/architecture.md): The ASCII diagram shows forgejo --> ci but omits the database --> ci arrow. The parenthetical annotation "(depends_on: forgejo, database)" is correct, but the visual graph is misleading -- a reader scanning the diagram would miss that database is also a dependency of ci. Suggest adding a database node with an arrow to ci.

  2. .claude-no-enforce present in working tree: Confirmed this file exists at repo root but is NOT in the PR diff, so it is not being committed. Just flagging for awareness -- ensure .gitignore covers it if it should never be committed.

SOP COMPLIANCE

  • PR body has ## Summary
  • PR body has ## Changes (detailed per-file breakdown)
  • PR body has ## Test Plan (4 verification items with checkboxes)
  • PR body has ## Related (Closes #426)
  • No secrets committed (docs-only, no .tfvars or credentials)
  • No unnecessary file changes (all 7 files are docs-related, scoped to #426)
  • Commit message is descriptive

PROCESS OBSERVATIONS

  • Deployment frequency: No impact -- docs-only change, no infrastructure modification.
  • Change failure risk: Zero -- no HCL changes, no state impact.
  • Documentation quality: Strong improvement. Old README was monolithic with stale counts. New structure follows the landscaping-assistant pattern (README as TOC, docs/ for depth). All numeric claims (12 modules, 8 dashboards, 5 providers, 26 secrets, 10 network policies) verified against live repo state.
  • Cross-reference integrity: All README doc links point to files that exist in this PR or already exist on main (hetzner-edge.md, keycloak-smtp.md, spikes/). No broken links.

VERDICT: APPROVED

## PR #431 Review ### DOMAIN REVIEW **Tech stack**: Terraform/OpenTofu IaC (docs-only change -- no HCL modified). 7 files changed: CLAUDE.md, README.md, and 5 new docs/ files (architecture.md, database.md, monitoring.md, networking.md, secrets.md). **Verification against live repo state**: | Claim | Verified | |-------|----------| | 12 modules in `terraform/modules/` | Yes -- admin, ci, database, forgejo, harbor, hetzner-edge, keycloak, monitoring, networking, ops, staging, storage | | 8 dashboards in `terraform/dashboards/` | Yes -- dora, uptime, pal-e-app, basketball-api, landscaping-assistant, believers-elite, playme2k, mac-agent | | 5 providers (kubernetes, helm, tailscale, minio, hcloud) | Yes -- versions.tf lines 10-31, providers.tf lines 1-25 | | 26 TF_SECRET_VARS in Makefile | Yes -- all 26 match the secrets.md inventory table | | depends_on graph (ci->forgejo+database, harbor->monitoring, storage->monitoring, ops->storage, admin->keycloak) | Yes -- main.tf confirmed | | Network policy matrix (10 namespaces) | Yes -- network-policies.tf confirmed, namespace lists match docs/networking.md | | Kubernetes backend (tofu-state namespace, pal-e-platform suffix) | Yes -- versions.tf backend block confirmed | | Existing docs preserved (hetzner-edge.md, keycloak-smtp.md, spikes/) | Yes -- README TOC links to all existing and new docs | **CLAUDE.md assessment**: Properly thin and dev-focused. Contains: tech stack (5 providers, state backend, secrets pipeline, Salt), key paths (12 entries covering root TF files, modules, salt, Makefile, docs), running commands (make targets for tofu and salt), PR conventions. No agent dispatch boilerplate, no roadmap, no inline content that belongs in docs/. Old version had wrong state backend ("local" vs kubernetes), only 3 providers, and stale agent dispatch section -- all fixed. **README assessment**: Successfully restructured as TOC. Module table updated from 10/11 to 12 (added `admin`). Dashboard count corrected from 4 to 8. Repo URLs corrected from github.com to forgejo.tail5b443a.ts.net. Inline sections (Key Design Decisions, GitOps Pipeline, Observability, detailed Tech Stack) properly migrated to docs/ files. Documentation table links all 8 doc files. ### BLOCKERS None. ### NITS 1. **Dependency graph diagram incomplete** (docs/architecture.md): The ASCII diagram shows `forgejo --> ci` but omits the `database --> ci` arrow. The parenthetical annotation "(depends_on: forgejo, database)" is correct, but the visual graph is misleading -- a reader scanning the diagram would miss that `database` is also a dependency of `ci`. Suggest adding a `database` node with an arrow to `ci`. 2. **`.claude-no-enforce` present in working tree**: Confirmed this file exists at repo root but is NOT in the PR diff, so it is not being committed. Just flagging for awareness -- ensure `.gitignore` covers it if it should never be committed. ### SOP COMPLIANCE - [x] PR body has `## Summary` - [x] PR body has `## Changes` (detailed per-file breakdown) - [x] PR body has `## Test Plan` (4 verification items with checkboxes) - [x] PR body has `## Related` (Closes #426) - [x] No secrets committed (docs-only, no .tfvars or credentials) - [x] No unnecessary file changes (all 7 files are docs-related, scoped to #426) - [x] Commit message is descriptive ### PROCESS OBSERVATIONS - **Deployment frequency**: No impact -- docs-only change, no infrastructure modification. - **Change failure risk**: Zero -- no HCL changes, no state impact. - **Documentation quality**: Strong improvement. Old README was monolithic with stale counts. New structure follows the landscaping-assistant pattern (README as TOC, docs/ for depth). All numeric claims (12 modules, 8 dashboards, 5 providers, 26 secrets, 10 network policies) verified against live repo state. - **Cross-reference integrity**: All README doc links point to files that exist in this PR or already exist on main (hetzner-edge.md, keycloak-smtp.md, spikes/). No broken links. ### VERDICT: APPROVED
ldraney deleted branch 426-docs-audit 2026-06-13 19:45:35 +00:00
Sign in to join this conversation.
No description provided.