Pipeline optimization: single-stage Dockerfile + gem cache #188

Merged
ldraney merged 2 commits from 185-pipeline-optimization into main 2026-06-10 01:40:10 +00:00
Owner

Summary

Cut Kaniko build time by eliminating the multi-stage runtime rootfs unpack (~1m30s) and adding persistent gem cache for CI (~60-80s). Restructure DORA docs into a directory for future expansion.

Closes #187

Changes

  • Dockerfile: Single-stage build from ruby-rails-build. Eliminates the second FROM ruby-rails-runtime stage and its 1m30s rootfs unpack. Non-root rails user created inline. Image is ~200MB larger (includes build tools) but builds faster.
  • .woodpecker.yaml: bundle-install step now copies gems to/from a PVC-backed /cache/bundle volume. On warm cache, bundle install drops from ~90s to ~10s. PVC creation instructions in inline comment.
  • docs/dora.md → docs/dora/pipeline-timing.md: DORA is broader than pipeline speed — restructure into a directory. Updated optimization table to reflect implemented changes.

Test Plan

  • Kaniko build succeeds in CI (this PR's pipeline validates the Dockerfile)
  • App boots and serves traffic with the single-stage image
  • Gem cache step is no-op until PVC is created (graceful fallback via 2>/dev/null || true)
  • docs/dora/ directory renders in Forgejo

Review Checklist

  • Non-root user preserved (USER 1000:1000)
  • No secrets in Dockerfile
  • Gem cache fails gracefully without PVC
  • .dockerignore unchanged
  • Feature flag: N/A
  • #185 / PR #186 — DORA metrics analysis that identified these bottlenecks
  • docs/dora/pipeline-timing.md — timing data and optimization table
## Summary Cut Kaniko build time by eliminating the multi-stage runtime rootfs unpack (~1m30s) and adding persistent gem cache for CI (~60-80s). Restructure DORA docs into a directory for future expansion. Closes #187 ## Changes - **Dockerfile**: Single-stage build from `ruby-rails-build`. Eliminates the second `FROM ruby-rails-runtime` stage and its 1m30s rootfs unpack. Non-root `rails` user created inline. Image is ~200MB larger (includes build tools) but builds faster. - **.woodpecker.yaml**: `bundle-install` step now copies gems to/from a PVC-backed `/cache/bundle` volume. On warm cache, bundle install drops from ~90s to ~10s. PVC creation instructions in inline comment. - **docs/dora.md → docs/dora/pipeline-timing.md**: DORA is broader than pipeline speed — restructure into a directory. Updated optimization table to reflect implemented changes. ## Test Plan - [ ] Kaniko build succeeds in CI (this PR's pipeline validates the Dockerfile) - [ ] App boots and serves traffic with the single-stage image - [ ] Gem cache step is no-op until PVC is created (graceful fallback via `2>/dev/null || true`) - [ ] docs/dora/ directory renders in Forgejo ## Review Checklist - [x] Non-root user preserved (USER 1000:1000) - [x] No secrets in Dockerfile - [x] Gem cache fails gracefully without PVC - [x] .dockerignore unchanged - [ ] Feature flag: N/A ## Related Notes - #185 / PR #186 — DORA metrics analysis that identified these bottlenecks - `docs/dora/pipeline-timing.md` — timing data and optimization table
Pipeline optimization: single-stage Dockerfile, gem cache, docs restructure
Some checks failed
CI / scan_ruby (pull_request) Waiting to run
CI / scan_js (pull_request) Waiting to run
CI / lint (pull_request) Waiting to run
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
c58309a8a7
- Dockerfile: eliminate multi-stage build. One rootfs unpack instead of
  two saves ~1m30s on every Kaniko cache miss. Tradeoff: ~200MB larger
  prod image (includes build tools). Non-root rails user preserved.

- .woodpecker.yaml: add PVC-backed gem cache for bundle-install step.
  Copies vendor/bundle to/from /cache/bundle persistent volume. Saves
  ~60-80s per pipeline when cache is warm. PVC creation documented in
  inline comment (requires pal-e-platform infra work).

- docs/dora.md → docs/dora/pipeline-timing.md: DORA is broader than
  pipeline speed. Restructure into directory for future metrics docs.
  Update optimization table to reflect implemented changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #188 Review

DOMAIN REVIEW

Tech stack: Dockerfile (container build), Woodpecker CI (k8s backend), Markdown docs.

Dockerfile (single-stage conversion)

The conversion from multi-stage to single-stage is sound. The key changes:

  • Removes AS build alias and the second FROM ruby-rails-runtime stage
  • Eliminates two COPY --from=build layers that triggered a full rootfs unpack in Kaniko
  • Creates the rails user inline with groupadd/useradd (previously provided by the runtime base image)
  • Sets USER 1000:1000 explicitly -- good security practice

The tradeoff (build tools remain in the final image, ~200MB larger) is properly documented in the inline comment. This is an acceptable tradeoff for a private deployment where image size is less critical than build speed.

One observation: the chown -R rails:rails db log storage tmp runs as root before the USER switch, which is correct. The order of operations is sound.

Woodpecker gem cache

The backend_options.kubernetes.volumeMounts syntax is correct for Woodpecker's k8s backend. The inline PVC creation instructions and the WOODPECKER_BACKEND_K8S_VOLUMES server config reference are accurate and helpful.

The volume name bundle-cache is consistent between the volumeMounts declaration and the server config JSON in the comments.

.dockerignore: Already contains /vendor/bundle (line 54), which prevents CI-installed gems from being copied into the Docker build context. This is important and unchanged -- no issue here.

BLOCKERS

1. Gem cache write step lacks graceful fallback

- cp -a /cache/bundle/. vendor/bundle/ 2>/dev/null || true   # READ: has fallback
- bundle config set --local path vendor/bundle
- bundle install --jobs=4
- cp -a vendor/bundle/. /cache/bundle/                        # WRITE: NO fallback

The read step (cp -a /cache/bundle/. vendor/bundle/) correctly degrades with 2>/dev/null || true when the PVC is absent. However, the write step (cp -a vendor/bundle/. /cache/bundle/) has no such guard. If the PVC is not mounted:

  • /cache/bundle/ won't exist as a directory
  • cp -a will fail with a non-zero exit code
  • The pipeline step will fail

This breaks the stated design goal: "Gem cache step is no-op until PVC is created (graceful fallback)."

Fix: add 2>/dev/null || true to the write step as well:

cp -a vendor/bundle/. /cache/bundle/ 2>/dev/null || true

NITS

1. Branch naming mismatch (minor SOP deviation)

Branch is 185-pipeline-optimization but the parent issue is #187. Issue #185 is the closed DORA metrics analysis issue. The branch should reference the issue it closes. This is not blocking since the PR body correctly states Closes #187, but it creates confusion in branch-to-issue traceability.

2. docs/dora/ directory has only one file

Renaming docs/dora.md to docs/dora/pipeline-timing.md is forward-looking (the PR body says "DORA is broader than pipeline speed -- restructure into a directory"). However, a single-file directory looks premature. Consider adding a docs/dora/README.md or index file. Non-blocking -- this is a structural preference.

3. Timing table accuracy

The updated build-and-push row shows ~3m50s (single-stage, one unpack) / ~1m30s (fully cached). The PR summary claims "~1m30s savings" from eliminating the second rootfs unpack. These numbers are consistent, but the "fully cached" scenario (~1m30s) represents a separate optimization (Kaniko layer cache hit) that is not part of this PR. Consider clarifying that the ~1m30s figure is aspirational/pre-existing, not delivered by this change. Non-blocking.

SOP COMPLIANCE

  • Branch named after parent issue -- branch is 185-pipeline-optimization but parent issue is #187
  • PR body follows template -- has Summary, Changes, Test Plan, Review Checklist, Related Notes
  • Closes #187 present in PR body
  • Related Notes references prior work (#185 / PR #186)
  • No secrets committed -- Harbor credentials use from_secret, no plaintext secrets
  • No unnecessary file changes -- all 3 files are directly related to the optimization goal
  • .dockerignore unchanged and correct

PROCESS OBSERVATIONS

  • This PR directly targets the two largest bottlenecks identified in the DORA analysis (#185): Kaniko rootfs unpack and CI gem installation. Good data-driven approach.
  • The PVC-based gem cache introduces an operational dependency (PVC must be created manually). The inline documentation is good, but this is a manual step that could be forgotten. Consider tracking PVC creation as a follow-up task or checklist item.
  • The ReadWriteOnce access mode on the PVC means concurrent pipeline runs on different nodes will fail to mount the volume. For a single-node setup this is fine, but worth noting if the cluster scales. ReadWriteMany would be needed for multi-node, but requires a different storage class.

VERDICT: NOT APPROVED

One blocker: the gem cache write step will break the pipeline when the PVC is absent. Add the 2>/dev/null || true fallback to the write-side cp command to match the read-side behavior. After that fix, this is ready to merge.

## PR #188 Review ### DOMAIN REVIEW **Tech stack**: Dockerfile (container build), Woodpecker CI (k8s backend), Markdown docs. **Dockerfile (single-stage conversion)** The conversion from multi-stage to single-stage is sound. The key changes: - Removes `AS build` alias and the second `FROM ruby-rails-runtime` stage - Eliminates two `COPY --from=build` layers that triggered a full rootfs unpack in Kaniko - Creates the `rails` user inline with `groupadd`/`useradd` (previously provided by the runtime base image) - Sets `USER 1000:1000` explicitly -- good security practice The tradeoff (build tools remain in the final image, ~200MB larger) is properly documented in the inline comment. This is an acceptable tradeoff for a private deployment where image size is less critical than build speed. One observation: the `chown -R rails:rails db log storage tmp` runs as root before the `USER` switch, which is correct. The order of operations is sound. **Woodpecker gem cache** The `backend_options.kubernetes.volumeMounts` syntax is correct for Woodpecker's k8s backend. The inline PVC creation instructions and the `WOODPECKER_BACKEND_K8S_VOLUMES` server config reference are accurate and helpful. The volume name `bundle-cache` is consistent between the `volumeMounts` declaration and the server config JSON in the comments. **.dockerignore**: Already contains `/vendor/bundle` (line 54), which prevents CI-installed gems from being copied into the Docker build context. This is important and unchanged -- no issue here. ### BLOCKERS **1. Gem cache write step lacks graceful fallback** ```yaml - cp -a /cache/bundle/. vendor/bundle/ 2>/dev/null || true # READ: has fallback - bundle config set --local path vendor/bundle - bundle install --jobs=4 - cp -a vendor/bundle/. /cache/bundle/ # WRITE: NO fallback ``` The read step (`cp -a /cache/bundle/. vendor/bundle/`) correctly degrades with `2>/dev/null || true` when the PVC is absent. However, the write step (`cp -a vendor/bundle/. /cache/bundle/`) has no such guard. If the PVC is not mounted: - `/cache/bundle/` won't exist as a directory - `cp -a` will fail with a non-zero exit code - The pipeline step will fail This breaks the stated design goal: "Gem cache step is no-op until PVC is created (graceful fallback)." Fix: add `2>/dev/null || true` to the write step as well: ``` cp -a vendor/bundle/. /cache/bundle/ 2>/dev/null || true ``` ### NITS **1. Branch naming mismatch (minor SOP deviation)** Branch is `185-pipeline-optimization` but the parent issue is #187. Issue #185 is the closed DORA metrics analysis issue. The branch should reference the issue it closes. This is not blocking since the PR body correctly states `Closes #187`, but it creates confusion in branch-to-issue traceability. **2. docs/dora/ directory has only one file** Renaming `docs/dora.md` to `docs/dora/pipeline-timing.md` is forward-looking (the PR body says "DORA is broader than pipeline speed -- restructure into a directory"). However, a single-file directory looks premature. Consider adding a `docs/dora/README.md` or index file. Non-blocking -- this is a structural preference. **3. Timing table accuracy** The updated build-and-push row shows `~3m50s (single-stage, one unpack) / ~1m30s (fully cached)`. The PR summary claims "~1m30s savings" from eliminating the second rootfs unpack. These numbers are consistent, but the "fully cached" scenario (`~1m30s`) represents a separate optimization (Kaniko layer cache hit) that is not part of this PR. Consider clarifying that the `~1m30s` figure is aspirational/pre-existing, not delivered by this change. Non-blocking. ### SOP COMPLIANCE - [ ] Branch named after parent issue -- branch is `185-pipeline-optimization` but parent issue is #187 - [x] PR body follows template -- has Summary, Changes, Test Plan, Review Checklist, Related Notes - [x] `Closes #187` present in PR body - [x] Related Notes references prior work (#185 / PR #186) - [x] No secrets committed -- Harbor credentials use `from_secret`, no plaintext secrets - [x] No unnecessary file changes -- all 3 files are directly related to the optimization goal - [x] .dockerignore unchanged and correct ### PROCESS OBSERVATIONS - This PR directly targets the two largest bottlenecks identified in the DORA analysis (#185): Kaniko rootfs unpack and CI gem installation. Good data-driven approach. - The PVC-based gem cache introduces an operational dependency (PVC must be created manually). The inline documentation is good, but this is a manual step that could be forgotten. Consider tracking PVC creation as a follow-up task or checklist item. - The `ReadWriteOnce` access mode on the PVC means concurrent pipeline runs on different nodes will fail to mount the volume. For a single-node setup this is fine, but worth noting if the cluster scales. `ReadWriteMany` would be needed for multi-node, but requires a different storage class. ### VERDICT: NOT APPROVED One blocker: the gem cache write step will break the pipeline when the PVC is absent. Add the `2>/dev/null || true` fallback to the write-side `cp` command to match the read-side behavior. After that fix, this is ready to merge.
Fix gem cache write fallback for missing PVC
Some checks failed
ci/woodpecker/push/woodpecker Pipeline was successful
ci/woodpecker/pr/woodpecker Pipeline was successful
CI / scan_ruby (pull_request) Has been cancelled
CI / scan_js (pull_request) Has been cancelled
CI / lint (pull_request) Has been cancelled
27839534ba
Without the fallback, cp fails and breaks bundle-install when
the PVC is not yet mounted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #188 Review (Re-review)

DOMAIN REVIEW

Tech stack: Dockerfile, Woodpecker CI (Kubernetes backend), Markdown docs. Applying container security, CI/CD pipeline, and k8s volume patterns.

Dockerfile (single-stage refactor)

  • Eliminates the second FROM ruby-rails-runtime stage and its COPY --from=build rootfs unpack. Sound optimization -- the prior two-stage pattern doubled the image unpack work in Kaniko.
  • Non-root user created inline via groupadd/useradd with explicit UID/GID 1000. USER 1000:1000 set before ENTRYPOINT. Correct security posture.
  • chown -R rails:rails db log storage tmp ensures the app directories are writable by the non-root user. Correct.
  • Build comment documents the tradeoff (image ~200MB larger, build ~1m30s faster). Good operational context.
  • AS build alias removed since there is no second stage to reference it. Clean.

Woodpecker CI (.woodpecker.yaml)

  • PVC-backed gem cache with backend_options.kubernetes.volumeMounts referencing /cache/bundle. Standard Woodpecker k8s volume pattern.
  • Read side: cp -a /cache/bundle/. vendor/bundle/ 2>/dev/null || true -- graceful fallback when PVC is empty or missing. Correct.
  • Write side: cp -a vendor/bundle/. /cache/bundle/ 2>/dev/null || true -- previous blocker is fixed. The 2>/dev/null || true fallback is now present on both sides.
  • Inline PVC creation instructions in YAML comments are helpful for operators. The ReadWriteOnce access mode is appropriate for a single pipeline runner. If pipelines run in parallel on different nodes, this would need ReadWriteMany, but that is a future concern and documented implicitly by the access mode choice.
  • 1Gi PVC size is reasonable for a Ruby gem cache (~153 gems).

Docs rename (dora.md to dora/pipeline-timing.md)

  • Clean rename with similarity index 95%. Only content changes are to the optimization table, updating rows to reflect implemented optimizations vs. planned ones.
  • Updated timing estimates are consistent with the Dockerfile and CI changes in this PR.
  • The "fully cached" vs "single-stage, one unpack" distinction in the build-and-push row is informative.

BLOCKERS

None. The previous blocker (missing 2>/dev/null || true on the write-side cp) is resolved.

NITS

  1. Branch name mismatch: Branch is 185-pipeline-optimization but the parent issue is #187. Convention is {issue-number}-{kebab-case-purpose}, so the branch should be 187-pipeline-optimization. Not blocking since the PR body correctly references Closes #187, but worth noting for traceability.

  2. PVC access mode: The inline comment specifies ReadWriteOnce. If parallel pipelines are ever needed across nodes, this would need to become ReadWriteMany (or use a cache-per-node strategy). A brief note in the comment about this constraint would help future operators.

  3. Volume name coupling: The volumeMounts reference name: bundle-cache which must match the volume name configured in WOODPECKER_BACKEND_K8S_VOLUMES. The inline comment shows the PVC name as woodpecker-bundle-cache and the volume name in the JSON as bundle-cache. This mapping is correct but the indirection could confuse someone -- consider adding a note that the name field in volumeMounts must match the name in the server-side volume config, not the PVC name.

SOP COMPLIANCE

  • Branch named after issue (185-pipeline-optimization references #185, not parent issue #187)
  • PR body follows template (Summary, Changes, Test Plan, Related present)
  • Related section references prior work (#185 / PR #186)
  • No secrets committed (.dockerignore excludes .env*, credential keys; no hardcoded secrets in diff)
  • No unnecessary file changes (all 3 files directly serve the optimization goal)
  • Commit messages are descriptive (PR title clear)

PROCESS OBSERVATIONS

  • Deployment frequency impact: Positive. Cutting ~1m30s from every build and ~60-80s from gem install on warm cache directly improves lead time for changes (DORA metric). The docs update tracking these gains is good practice.
  • Change failure risk: Low. The Dockerfile change is a simplification (removing a stage), not adding complexity. The gem cache is additive with graceful degradation -- pipelines work identically without the PVC.
  • Documentation: The DORA docs restructure into docs/dora/ sets up a clean directory for future DORA content (deployment frequency, change failure rate, etc.).

VERDICT: APPROVED

Previous blocker is resolved. All three changes are clean, well-documented, and correctly scoped. The branch naming mismatch is a nit, not a blocker.

## PR #188 Review (Re-review) ### DOMAIN REVIEW **Tech stack:** Dockerfile, Woodpecker CI (Kubernetes backend), Markdown docs. Applying container security, CI/CD pipeline, and k8s volume patterns. **Dockerfile (single-stage refactor)** - Eliminates the second `FROM ruby-rails-runtime` stage and its `COPY --from=build` rootfs unpack. Sound optimization -- the prior two-stage pattern doubled the image unpack work in Kaniko. - Non-root user created inline via `groupadd`/`useradd` with explicit UID/GID 1000. `USER 1000:1000` set before ENTRYPOINT. Correct security posture. - `chown -R rails:rails db log storage tmp` ensures the app directories are writable by the non-root user. Correct. - Build comment documents the tradeoff (image ~200MB larger, build ~1m30s faster). Good operational context. - `AS build` alias removed since there is no second stage to reference it. Clean. **Woodpecker CI (.woodpecker.yaml)** - PVC-backed gem cache with `backend_options.kubernetes.volumeMounts` referencing `/cache/bundle`. Standard Woodpecker k8s volume pattern. - Read side: `cp -a /cache/bundle/. vendor/bundle/ 2>/dev/null || true` -- graceful fallback when PVC is empty or missing. Correct. - Write side: `cp -a vendor/bundle/. /cache/bundle/ 2>/dev/null || true` -- **previous blocker is fixed**. The `2>/dev/null || true` fallback is now present on both sides. - Inline PVC creation instructions in YAML comments are helpful for operators. The `ReadWriteOnce` access mode is appropriate for a single pipeline runner. If pipelines run in parallel on different nodes, this would need `ReadWriteMany`, but that is a future concern and documented implicitly by the access mode choice. - 1Gi PVC size is reasonable for a Ruby gem cache (~153 gems). **Docs rename (dora.md to dora/pipeline-timing.md)** - Clean rename with `similarity index 95%`. Only content changes are to the optimization table, updating rows to reflect implemented optimizations vs. planned ones. - Updated timing estimates are consistent with the Dockerfile and CI changes in this PR. - The "fully cached" vs "single-stage, one unpack" distinction in the build-and-push row is informative. ### BLOCKERS None. The previous blocker (missing `2>/dev/null || true` on the write-side `cp`) is resolved. ### NITS 1. **Branch name mismatch**: Branch is `185-pipeline-optimization` but the parent issue is #187. Convention is `{issue-number}-{kebab-case-purpose}`, so the branch should be `187-pipeline-optimization`. Not blocking since the PR body correctly references `Closes #187`, but worth noting for traceability. 2. **PVC access mode**: The inline comment specifies `ReadWriteOnce`. If parallel pipelines are ever needed across nodes, this would need to become `ReadWriteMany` (or use a cache-per-node strategy). A brief note in the comment about this constraint would help future operators. 3. **Volume name coupling**: The `volumeMounts` reference `name: bundle-cache` which must match the volume name configured in `WOODPECKER_BACKEND_K8S_VOLUMES`. The inline comment shows the PVC name as `woodpecker-bundle-cache` and the volume name in the JSON as `bundle-cache`. This mapping is correct but the indirection could confuse someone -- consider adding a note that the `name` field in `volumeMounts` must match the `name` in the server-side volume config, not the PVC name. ### SOP COMPLIANCE - [ ] Branch named after issue (`185-pipeline-optimization` references #185, not parent issue #187) - [x] PR body follows template (Summary, Changes, Test Plan, Related present) - [x] Related section references prior work (#185 / PR #186) - [x] No secrets committed (`.dockerignore` excludes `.env*`, credential keys; no hardcoded secrets in diff) - [x] No unnecessary file changes (all 3 files directly serve the optimization goal) - [x] Commit messages are descriptive (PR title clear) ### PROCESS OBSERVATIONS - **Deployment frequency impact**: Positive. Cutting ~1m30s from every build and ~60-80s from gem install on warm cache directly improves lead time for changes (DORA metric). The docs update tracking these gains is good practice. - **Change failure risk**: Low. The Dockerfile change is a simplification (removing a stage), not adding complexity. The gem cache is additive with graceful degradation -- pipelines work identically without the PVC. - **Documentation**: The DORA docs restructure into `docs/dora/` sets up a clean directory for future DORA content (deployment frequency, change failure rate, etc.). ### VERDICT: APPROVED Previous blocker is resolved. All three changes are clean, well-documented, and correctly scoped. The branch naming mismatch is a nit, not a blocker.
ldraney deleted branch 185-pipeline-optimization 2026-06-10 01:40:10 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/landscaping-assistant!188
No description provided.