Fix pgvector extension ownership -- platform-provides pattern #140

Merged
forgejo_admin merged 2 commits from 126-fix-pgvector-extension-ownership into main 2026-03-09 22:58:44 +00:00

Summary

Replace CREATE EXTENSION vector (requires superuser) in the Alembic migration with a runtime existence check that raises an informative error if the extension is missing. Add a CNPG Cluster CRD manifest that provisions the pgvector extension automatically on cluster bootstrap via postInitSQL. This follows the platform-provides/app-consumes pattern.

Changes

  • alembic/versions/l2g3h4i5j6k7_add_vector_embeddings.py -- Replaced CREATE EXTENSION IF NOT EXISTS vector with a pg_extension existence check that raises RuntimeError with a clear message if the extension is not installed. Removed DROP EXTENSION IF EXISTS vector from downgrade() since extension lifecycle is now platform-owned.
  • k8s/cnpg-cluster.yaml (new) -- CNPG Cluster CRD manifest codifying the running pal-e-postgres cluster config with bootstrap.initdb.postInitSQL: [CREATE EXTENSION IF NOT EXISTS vector] added. This ensures fresh cluster bootstraps provision pgvector automatically as superuser.
  • k8s/kustomization.yaml -- Added cnpg-cluster.yaml to the resource list.

Test Plan

  • ruff check passes on the migration file
  • pytest tests/ -v -- all 614 tests pass
  • Verify migration is idempotent on the running cluster (extension already installed, check passes)
  • Verify a fresh CNPG cluster bootstrap would provision the extension via postInitSQL

Review Checklist

  • No unrelated changes included
  • Migration revision ID and chain are unchanged
  • Tests pass (614/614)
  • Lint passes (ruff check)
  • CNPG CRD matches running cluster spec (minus postInitSQL addition)
  • Plan: plan-2026-02-26-tf-modularize-postgres
  • Phase: phase-postgres-6-vector-search (sub-phase 6b-1)
  • Forgejo issue: #126

Closes #126

## Summary Replace `CREATE EXTENSION vector` (requires superuser) in the Alembic migration with a runtime existence check that raises an informative error if the extension is missing. Add a CNPG Cluster CRD manifest that provisions the pgvector extension automatically on cluster bootstrap via `postInitSQL`. This follows the platform-provides/app-consumes pattern. ## Changes - **`alembic/versions/l2g3h4i5j6k7_add_vector_embeddings.py`** -- Replaced `CREATE EXTENSION IF NOT EXISTS vector` with a `pg_extension` existence check that raises `RuntimeError` with a clear message if the extension is not installed. Removed `DROP EXTENSION IF EXISTS vector` from `downgrade()` since extension lifecycle is now platform-owned. - **`k8s/cnpg-cluster.yaml`** (new) -- CNPG Cluster CRD manifest codifying the running `pal-e-postgres` cluster config with `bootstrap.initdb.postInitSQL: [CREATE EXTENSION IF NOT EXISTS vector]` added. This ensures fresh cluster bootstraps provision pgvector automatically as superuser. - **`k8s/kustomization.yaml`** -- Added `cnpg-cluster.yaml` to the resource list. ## Test Plan - [x] `ruff check` passes on the migration file - [x] `pytest tests/ -v` -- all 614 tests pass - [ ] Verify migration is idempotent on the running cluster (extension already installed, check passes) - [ ] Verify a fresh CNPG cluster bootstrap would provision the extension via `postInitSQL` ## Review Checklist - [x] No unrelated changes included - [x] Migration revision ID and chain are unchanged - [x] Tests pass (614/614) - [x] Lint passes (ruff check) - [x] CNPG CRD matches running cluster spec (minus postInitSQL addition) ## Related - Plan: `plan-2026-02-26-tf-modularize-postgres` - Phase: `phase-postgres-6-vector-search` (sub-phase 6b-1) - Forgejo issue: #126 Closes #126
Fix pgvector extension ownership -- platform-provides pattern
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
f7c6cba6c8
Replace CREATE EXTENSION vector (requires superuser) in the Alembic
migration with an existence check that raises an informative error if
the extension is missing.  Remove DROP EXTENSION from downgrade since
the extension lifecycle is now platform-owned.

Add CNPG Cluster CRD manifest (k8s/cnpg-cluster.yaml) with
bootstrap.initdb.postInitSQL to provision the pgvector extension
automatically on cluster bootstrap as superuser.

Closes #126

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Owner

PR #140 Review

BLOCKERS

1. Cross-namespace CNPG Cluster CRD will fail or misplace the resource

The new k8s/cnpg-cluster.yaml hardcodes namespace: postgres in its metadata. However, every other manifest in k8s/ omits namespace: and relies on ArgoCD to place resources into the pal-e-docs namespace. This creates two possible failure modes:

  • If ArgoCD respects the hardcoded namespace: The Cluster CRD is created in the postgres namespace. But ArgoCD's Application is scoped to the pal-e-docs namespace. ArgoCD may lack RBAC to create resources in postgres, or the resource may be created but not tracked/synced properly by ArgoCD.
  • If ArgoCD overrides the namespace: The Cluster CRD ends up in pal-e-docs, which is wrong -- the running pal-e-postgres cluster is in the postgres namespace.

Either way, this is a problem. The CNPG Cluster CRD is a platform-level resource living in the postgres namespace, managed alongside platform infrastructure (the CNPG operator, S3 creds, etc. are all in terraform/main.tf). Placing it in the app repo's k8s/ directory alongside app-level Deployments and Services conflates platform and application concerns.

Recommendation: This CRD belongs in pal-e-platform (either as a Terraform kubernetes_manifest or a separate platform k8s directory), not in pal-e-docs/k8s/. The pal-e-docs repo should only contain the migration fix (the existence check). The CNPG CRD is a platform-provides concern -- it should live where the CNPG operator, namespace, and S3 creds already live.

2. Applying this CRD to a running cluster will re-bootstrap the database

The bootstrap.initdb section in the CRD describes the initial database setup. On a running CNPG cluster, re-applying this manifest with initdb present is safe only if the cluster already exists and CNPG ignores the bootstrap section on updates. However, if this manifest were ever used to recreate the cluster (disaster recovery, migration), the postInitSQL would correctly provision the extension. The concern is more about intent documentation: this is a codification of the running state, not a live-apply manifest. The PR body mentions this but the manifest itself has no comment clarifying this.

NITS

1. Downgrade comment references "other databases"

The downgrade comment says the extension "may be used by other databases or future migrations." Currently there is only one database (paledocs) on this cluster. While the reasoning is correct (platform owns extension lifecycle), the justification could be more precise: the extension is platform-owned, full stop. No need to speculate about other databases.

2. RuntimeError vs. Alembic-idiomatic error

Using RuntimeError works but Alembic users might expect alembic.util.CommandError for migration failures. This is non-blocking since the error message is clear and actionable, but CommandError would integrate better with Alembic's error handling.

3. CNPG CRD has no comments explaining its purpose

A brief comment at the top of cnpg-cluster.yaml explaining that this codifies the running cluster spec and that postInitSQL is the key addition would help future readers understand why this file exists and what it adds over the original manual creation.

4. The archive_mode: "on" and archive_timeout: 5min parameters

These are fine for the running cluster, but archive_timeout: 5min is a string without quotes in the YAML while other string values like max_connections use explicit quotes. Consistency would be cleaner, though CNPG likely handles both.

SOP COMPLIANCE

  • Branch named after issue: 126-fix-pgvector-extension-ownership matches issue #126
  • PR body has: Summary, Changes, Test Plan, Related sections
  • Related section references plan slug: plan-2026-02-26-tf-modularize-postgres
  • Closes #126 present in PR body
  • No secrets, .env files, or credentials committed
  • Commit messages are descriptive (title is clear)
  • No application code touched (scope is correct for migration + CRD)
  • Scope concern: CNPG Cluster CRD placement in app repo vs. platform repo (see Blocker #1)
  • Migration revision ID and chain unchanged
  • Tests reported passing (614/614)

VERDICT: NOT APPROVED

The migration change itself (existence check replacing CREATE EXTENSION) is clean and correct. The downgrade change (removing DROP EXTENSION) is also correct. However, Blocker #1 -- the CNPG Cluster CRD being placed in the app repo's k8s/ directory when it is a platform-level resource -- is a placement/architecture issue that needs resolution before merge. The platform repo (pal-e-platform) already manages the CNPG operator, the postgres namespace, and the S3 credentials for this cluster. The Cluster CRD should live there too.

Recommended fix path:

  1. Remove cnpg-cluster.yaml and the kustomization change from this PR (keep them for a separate platform PR)
  2. Merge the migration-only change in this PR
  3. Open a new issue on pal-e-platform to add the CNPG Cluster CRD with postInitSQL to the platform repo
## PR #140 Review ### BLOCKERS **1. Cross-namespace CNPG Cluster CRD will fail or misplace the resource** The new `k8s/cnpg-cluster.yaml` hardcodes `namespace: postgres` in its metadata. However, every other manifest in `k8s/` omits `namespace:` and relies on ArgoCD to place resources into the `pal-e-docs` namespace. This creates two possible failure modes: - **If ArgoCD respects the hardcoded namespace:** The Cluster CRD is created in the `postgres` namespace. But ArgoCD's Application is scoped to the `pal-e-docs` namespace. ArgoCD may lack RBAC to create resources in `postgres`, or the resource may be created but not tracked/synced properly by ArgoCD. - **If ArgoCD overrides the namespace:** The Cluster CRD ends up in `pal-e-docs`, which is wrong -- the running `pal-e-postgres` cluster is in the `postgres` namespace. Either way, this is a problem. The CNPG Cluster CRD is a **platform-level resource** living in the `postgres` namespace, managed alongside platform infrastructure (the CNPG operator, S3 creds, etc. are all in `terraform/main.tf`). Placing it in the app repo's `k8s/` directory alongside app-level Deployments and Services conflates platform and application concerns. **Recommendation:** This CRD belongs in `pal-e-platform` (either as a Terraform `kubernetes_manifest` or a separate platform k8s directory), not in `pal-e-docs/k8s/`. The pal-e-docs repo should only contain the migration fix (the existence check). The CNPG CRD is a platform-provides concern -- it should live where the CNPG operator, namespace, and S3 creds already live. **2. Applying this CRD to a running cluster will re-bootstrap the database** The `bootstrap.initdb` section in the CRD describes the **initial** database setup. On a running CNPG cluster, re-applying this manifest with `initdb` present is safe only if the cluster already exists and CNPG ignores the bootstrap section on updates. However, if this manifest were ever used to recreate the cluster (disaster recovery, migration), the `postInitSQL` would correctly provision the extension. The concern is more about **intent documentation**: this is a codification of the running state, not a live-apply manifest. The PR body mentions this but the manifest itself has no comment clarifying this. ### NITS **1. Downgrade comment references "other databases"** The downgrade comment says the extension "may be used by other databases or future migrations." Currently there is only one database (`paledocs`) on this cluster. While the reasoning is correct (platform owns extension lifecycle), the justification could be more precise: the extension is platform-owned, full stop. No need to speculate about other databases. **2. RuntimeError vs. Alembic-idiomatic error** Using `RuntimeError` works but Alembic users might expect `alembic.util.CommandError` for migration failures. This is non-blocking since the error message is clear and actionable, but `CommandError` would integrate better with Alembic's error handling. **3. CNPG CRD has no comments explaining its purpose** A brief comment at the top of `cnpg-cluster.yaml` explaining that this codifies the running cluster spec and that `postInitSQL` is the key addition would help future readers understand why this file exists and what it adds over the original manual creation. **4. The `archive_mode: "on"` and `archive_timeout: 5min` parameters** These are fine for the running cluster, but `archive_timeout: 5min` is a string without quotes in the YAML while other string values like `max_connections` use explicit quotes. Consistency would be cleaner, though CNPG likely handles both. ### SOP COMPLIANCE - [x] Branch named after issue: `126-fix-pgvector-extension-ownership` matches issue #126 - [x] PR body has: Summary, Changes, Test Plan, Related sections - [x] Related section references plan slug: `plan-2026-02-26-tf-modularize-postgres` - [x] `Closes #126` present in PR body - [x] No secrets, .env files, or credentials committed - [x] Commit messages are descriptive (title is clear) - [x] No application code touched (scope is correct for migration + CRD) - [ ] **Scope concern**: CNPG Cluster CRD placement in app repo vs. platform repo (see Blocker #1) - [x] Migration revision ID and chain unchanged - [x] Tests reported passing (614/614) ### VERDICT: NOT APPROVED The migration change itself (existence check replacing `CREATE EXTENSION`) is clean and correct. The downgrade change (removing `DROP EXTENSION`) is also correct. However, **Blocker #1** -- the CNPG Cluster CRD being placed in the app repo's `k8s/` directory when it is a platform-level resource -- is a placement/architecture issue that needs resolution before merge. The platform repo (`pal-e-platform`) already manages the CNPG operator, the `postgres` namespace, and the S3 credentials for this cluster. The Cluster CRD should live there too. **Recommended fix path:** 1. Remove `cnpg-cluster.yaml` and the kustomization change from this PR (keep them for a separate platform PR) 2. Merge the migration-only change in this PR 3. Open a new issue on `pal-e-platform` to add the CNPG Cluster CRD with `postInitSQL` to the platform repo
forgejo_admin force-pushed 126-fix-pgvector-extension-ownership from f7c6cba6c8
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
to c9f7639ba2
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
2026-03-09 22:53:09 +00:00
Compare
forgejo_admin deleted branch 126-fix-pgvector-extension-ownership 2026-03-09 22:58:44 +00:00
Sign in to join this conversation.
No description provided.