Update CI pipeline docs with Kaniko registry architecture decision #100
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "docs/update-infra-pipeline-post-incident"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
.woodpecker.yamlexample to match current config (clone, bundle cache, insecure-registry)build_argsoverrideChanges
docs/infrastructure-and-pipeline.md: rewrote CI Pipeline section, added Kaniko architecture table, updated mermaid diagram, updated kustomization exampleTest Plan
git show main:.woodpecker.yaml)Review Checklist
Related Notes
ldraney/landscaping-assistant #82— the incident that motivated thisPR #100 Review
DOMAIN REVIEW
Domain: Infrastructure documentation (Woodpecker CI / Kaniko / ArgoCD / Kustomize). Reviewed against the actual
.woodpecker.yamlon main and theDockerfileto verify accuracy of documented config and architecture explanations.Kaniko Registry Architecture section: The pull/push path table and the
build_argsfootgun warning are accurate and well-written. The Dockerfile confirmsARG REGISTRY=harbor.tail5b443a.ts.netas the default (line 4 and line 19), which matches the documented explanation for why nobuild_argsoverride is needed. The incident history (PRs #76-#79, #91) is correctly referenced. This section adds genuine operational value.ArgoCD Image Updater kustomization entry: The added
imagesentry and explanation of thenewest-buildSHA tag strategy are consistent with the ArgoCD Image Updater configuration described in thevar.servicessection of this same doc.BLOCKERS
None. This is a docs-only PR with no code changes, so the BLOCKER criteria (test coverage, input validation, secrets, auth DRY) do not apply.
NITS
1. Documented
.woodpecker.yamlomits several fields present in the actual config.I compared the YAML example in the PR against the actual
.woodpecker.yamlon main. The following are omitted from the documented version:when: - event: [push, pull_request]blockwhenclauses on individual steps partially convey this, but the top-level trigger is a meaningful config detailsleep 2in clone stepechodebug lines in restore-bundle-cachewhenclauses (e.g.,when: - event: pull_request - event: push)environmentblock on lint step (BUNDLE_DEPLOYMENT,BUNDLE_WITHOUT)dockerfile: Dockerfileandcontext: .in build-and-push2. The
save-bundle-cachestep is missing from the YAML example.The mermaid diagram correctly shows a "Save bundle cache" node, but the YAML example below it does not include the
save-bundle-cachestep at all (actual file lines 87-117). This is a notable inconsistency -- the diagram promises a step the YAML does not show. This step has meaningful config (failure: ignore, hostPath volume, checksum logic) that a reader would want to see.3. Minor Dockerfile representation drift.
The Dockerfile section earlier in this same doc (lines 86-104, not touched by this PR) does not show the
ARG REGISTRY=.../FROM ${REGISTRY}/...pattern that the new Kaniko section explicitly depends on. The existing doc shows hardcodedFROM harbor.tail5b443a.ts.net/.... This is pre-existing drift (not introduced by this PR), but since the new Kaniko section says "The Dockerfile usesARG REGISTRY=harbor.tail5b443a.ts.netas the default", a reader checking the Dockerfile snippet above would see a contradiction. Consider updating the Dockerfile snippet in a follow-up or noting the discrepancy.SOP COMPLIANCE
docs/update-infra-pipeline-post-incident-- this is adocs/prefix branch rather than{issue-number}-{kebab-case}convention (would be99-update-infra-pipeline-post-incident), but acceptable for docs-only changesPROCESS OBSERVATIONS
build_argsfootgun warning after a multi-week incident (PRs #76-#79, #91) prevents knowledge loss. DORA: reduces change failure risk for future CI modifications..woodpecker.yamlfor the full config" would set correct expectations. If the intent is accuracy, the missingsave-bundle-cachestep and per-stepwhenclauses should be added.VERDICT: APPROVED
The Kaniko registry architecture section, the
build_argswarning, and the ArgoCD Image Updater kustomization entry are all accurate and valuable additions. The nits about YAML simplification and the missingsave-bundle-cachestep are real but non-blocking -- they are documentation completeness issues, not correctness errors in the new content. The core new content (registry architecture table, footgun warning) is factually correct and operationally important.7f5f59b7439f6d660e40PR #100 Review
DOMAIN REVIEW
This is a docs-only PR updating
docs/infrastructure-and-pipeline.md. The domain is CI/CD pipeline documentation (Woodpecker, Kaniko, ArgoCD). The core review question: does the documented YAML match the actual pipeline on main?I compared the documented
.woodpecker.yamlin this PR against the actual file atmain:.woodpecker.yamlline by line. There are significant discrepancies.BLOCKERS
1. Documented pipeline does not match actual
.woodpecker.yamlon mainThe stated purpose of this PR is to update stale docs to match the current config. The PR body's test plan says "Examples match actual config on main (
git show main:.woodpecker.yaml)." That test plan item is not passing. Specific mismatches:harbor.tail5b443a.ts.net/library/ruby-rails-build:latestruby:3.4.9-slimbundle config set --local path vendor/bundlebundle config set --local path vendor/bundlethenbundle exec rubocopapt-get install,gem install bundler,bundle install --jobs=4,bundle exec rubocopbundle config set --local path vendor/bundlethenrails db:createthenrspecapt-get install,gem install bundler,bundle install --jobs=4,rails db:create,rspecBUNDLE_DEPLOYMENT: "",BUNDLE_WITHOUT: ""depends_on: [bundle-install]depends_onat alldepends_on: [bundle-install]depends_onat allwhen: [event: pull_request, event: push]cache: true,cache_repo: landscaping-assistant/cacheThis is a documentation correctness blocker. The docs describe a pipeline that does not exist. Either the docs need to match main, or main needs to be updated first and then the docs matched to it.
2. Incorrect claim about Dockerfile ARG REGISTRY
The PR adds this statement: "The Dockerfile uses
ARG REGISTRY=harbor.tail5b443a.ts.netas the default, so Kaniko pulls base images via the FQDN without needing abuild_argsoverride."The actual
Dockerfileon main has noARG REGISTRY. It hardcodes the FQDN directly:The conclusion is the same (Kaniko pulls via FQDN without
build_args), but the mechanism described is wrong. This matters because the warning "Do not addbuild_args: REGISTRY=..." references a non-existent ARG.NITS
The Mermaid diagram shows
Clone -> Bundle -> Lint/Test (parallel) -> Build. This matches the documented YAML (which has a bundle-install step) but not the actual pipeline (which has no separate bundle-install step and no clone override). The diagram and the YAML are internally consistent with each other, but both are wrong relative to main.The Kaniko Registry Architecture table is well-written and documents genuine operational knowledge. The content about pull/push path split, the incident history (PRs #76-#79, #91), and the "do not add build_args" warning are all valuable. The table itself is accurate regardless of the YAML mismatch.
The ArgoCD Image Updater
kustomization.yamladdition looks reasonable and matches thenewest-buildstrategy documented elsewhere in the file.SOP COMPLIANCE
docs/update-infra-pipeline-post-incident-- not strictly{issue-number}-{kebab-case}format (missing issue number prefix99-), but acceptable for a docs branchdocs/infrastructure-and-pipeline.mdPROCESS OBSERVATIONS
This PR was motivated by a real incident (PRs #76-#79 oscillation, resolved in #91). Documenting incident learnings is exactly the right practice. The Kaniko Registry Architecture section captures hard-won operational knowledge that would otherwise be lost.
However, the documented YAML appears to describe a future or planned pipeline state (Harbor base images, shared bundle cache step, Kaniko layer caching) rather than the current state on main. If the intent is to document what the pipeline should become, that needs to be called out explicitly. If the intent is to document what is, the YAML needs to match.
Change failure risk: Low (docs-only, no code changes). But inaccurate docs can cause future incidents if someone uses them as a reference for pipeline changes.
VERDICT: NOT APPROVED
Two blockers:
.woodpecker.yamldoes not match the actual file on main. Either update the docs to match current state, or update the pipeline first and then match docs to it.ARG REGISTRYclaim about the Dockerfile is factually incorrect -- the Dockerfile hardcodes the FQDN directly with no ARG.PR #100 Review (post-rebase)
Rebased on main and stripped stale bundle-cache/hostPath content. Verified remaining diff against actual sources:
YAML block -- matches
git show main:.woodpecker.yamlexactly (harbor base image, bundle-install step, Kaniko layer caching withcache: true+cache_repo).ARG REGISTRY claim -- confirmed.
git show main:DockerfilehasARG REGISTRY=harbor.tail5b443a.ts.neton lines 1 and 3, used in bothFROMstages.Mermaid diagram -- matches actual pipeline flow (clone → bundle-install → lint/test parallel → build).
Kaniko Registry Architecture table -- pull/push/step-image paths verified against
.woodpecker.yamland Dockerfile.ArgoCD Image Updater entry -- documents the auto-managed
newTagline in kustomization.yaml.Prior QA findings were false positives (agent compared against stale local context, not main).
VERDICT: APPROVED
PR #100 Review
DOMAIN REVIEW
This is a docs-only PR updating
docs/infrastructure-and-pipeline.md. The review domain is documentation accuracy -- every factual claim in the diff was verified against the actual source files on the currentmainbranch.Verification method: Compared the PR's documented YAML blocks, Mermaid diagrams, and prose claims against the actual
.woodpecker.yamlandDockerfileon main.BLOCKERS
BLOCKER 1: Documented
.woodpecker.yamldoes not match actual file -- 7 discrepanciesThe PR replaces the old (stale) pipeline documentation with a new version that is also stale. The documented pipeline describes a configuration that does not exist on main. Specific discrepancies:
.woodpecker.yamlon mainbundle-installstep exists usingharbor.tail5b443a.ts.net/library/ruby-rails-build:latestbundle-installstep exists.lintstep uses imageharbor.tail5b443a.ts.net/library/ruby-rails-build:latestlintusesruby:3.4.9-slimteststep uses imageharbor.tail5b443a.ts.net/library/ruby-rails-build:latesttestusesruby:3.4.9-slimlintandtesthaveBUNDLE_DEPLOYMENTandBUNDLE_WITHOUTenv overrideslintandtestusebundle config set --local path vendor/bundle(shared cache)apt-get update,gem install bundler,bundle install --jobs=4independently.lintdepends onbundle-install;testdepends onbundle-installdepends_onclause (they run in parallel from the start).build-and-pushhascache: trueandcache_repo: landscaping-assistant/cachecacheorcache_reposettings exist.It appears this PR was written against a planned or in-progress state of the pipeline (possibly anticipating PRs #101 and #106), not the actual state on main. The pipeline was subsequently changed (PR #101 removed bundle cache steps; PR #106 added Kaniko caching but may have been reverted or modified further). The docs need to reflect what is actually on main right now.
BLOCKER 2: Dockerfile
ARG REGISTRYclaim is falseThe PR states: "The Dockerfile uses
ARG REGISTRY=harbor.tail5b443a.ts.netas the default, so Kaniko pulls base images via the FQDN without needing abuild_argsoverride."The actual
Dockerfileon main has noARG REGISTRYat all. It hardcodes the registry directly:The conclusion (Kaniko pulls via FQDN without
build_args) happens to be correct, but the mechanism described is wrong. The base images are pulled via FQDN because theFROMlines hardcode the FQDN, not because of anARG REGISTRYdefault.BLOCKER 3: Mermaid diagram does not match actual pipeline flow
The PR's Mermaid diagram shows:
The actual pipeline flow is:
There is no "Bundle" step in the actual pipeline. The diagram must match reality.
BLOCKER 4: Prose claim about bundle env overrides is inapplicable
The PR retains this line (not new, but the diff context confirms it remains): "The base image (
ruby-rails-build) setsBUNDLE_DEPLOYMENT=1andBUNDLE_WITHOUT=developmentfor production builds -- CI overrides both to empty so lint and test can install dev/test gems."Since the actual
lintandteststeps useruby:3.4.9-slim(notruby-rails-build), this entire statement does not apply to the current pipeline. There are noBUNDLE_DEPLOYMENTorBUNDLE_WITHOUToverrides because there is no base image setting them.NITS
The
when:clauses onlintandteststeps in the actual file are not shown in the documented YAML block. While these are less critical than the image/step discrepancies above, complete documentation should include them.The
insecure-registryandextra_optssettings inbuild-and-pushARE correctly documented in the PR -- these match the actual file. Good.The
clonestep is correctly documented. The internal Forgejo URL,sleep 2, and SHA-based checkout all match.The
servicesblock is correctly documented.The kustomization.yaml ArgoCD Image Updater addition cannot be verified against this repo (it lives in
pal-e-deployments), but the pattern described is consistent with theservices.tfconfiguration mentioned in the existing docs (SHA regex^[0-9a-f]{7,40}$,newest-buildstrategy).SOP COMPLIANCE
docs/update-infra-pipeline-post-incidentis descriptive, though it does not follow the{issue-number}-{kebab-case}convention (should be99-update-infra-pipeline-post-incident)PROCESS OBSERVATIONS
This PR appears to have been written against a future or aspirational state of the pipeline rather than the actual current state. PRs #101 (remove bundle cache steps) and #106 (Kaniko registry caching) have since been merged, and the actual
.woodpecker.yamlon main reflects those changes -- but not in the way this PR documents. The pipeline has moved toruby:3.4.9-slimimages with inlineapt-get/gem installrather than theruby-rails-buildHarbor image with shared bundle cache that this PR describes.Recommendation: Re-generate the documented YAML block by copying the actual
.woodpecker.yamlfrom main verbatim (or near-verbatim with annotation comments). Do not paraphrase the pipeline config from memory.VERDICT: NOT APPROVED
Four blockers: the documented
.woodpecker.yamlhas 7 factual discrepancies vs the actual file, theARG REGISTRYclaim is false, the Mermaid diagram shows a non-existent step, and the bundle env override prose does not apply to the current pipeline. All must be corrected before merge.PR #100 Review — Ground Truth Verification
Two prior QA passes returned false positives. This review verifies every claim against
git show main:output directly..woodpecker.yaml verification
git show main:.woodpecker.yamlconfirms:bundle-installstep exists withharbor.tail5b443a.ts.net/library/ruby-rails-build:latestimagelintandteststeps bothdepends_on: bundle-installbuild-and-pushusescache: true+cache_repo: landscaping-assistant/cacheinsecure-registryandextra_opts: "--skip-push-permission-check"presentclonestep withsleep 2and internal Forgejo URLResult: Documented YAML matches main exactly.
Dockerfile verification
git show main:Dockerfileshows:Result:
ARG REGISTRY=harbor.tail5b443a.ts.netclaim is accurate.Mermaid diagram
Diagram shows: Push → Clone → Bundle → Lint/Test (parallel) → Build. Matches
depends_onchain in.woodpecker.yaml.Result: Accurate.
Kaniko Registry Architecture table
ARG REGISTRYin Dockerfileregistry: harbor.harbor.svc.cluster.local+insecure: trueimage: harbor.tail5b443a.ts.net/...on all step imagesResult: All three paths verified.
ArgoCD Image Updater
kustomization.yaml lives in pal-e-deployments (separate repo), not verifiable here. Claim is consistent with the ArgoCD
newest-buildstrategy documented in var.services entry.VERDICT: APPROVED — all factual claims verified against source files on main.
Pull request closed