Fix pgvector extension ownership -- platform-provides pattern #140
No reviewers
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-api!140
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "126-fix-pgvector-extension-ownership"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Replace
CREATE EXTENSION vector(requires superuser) in the Alembic migration with a runtime existence check that raises an informative error if the extension is missing. Add a CNPG Cluster CRD manifest that provisions the pgvector extension automatically on cluster bootstrap viapostInitSQL. This follows the platform-provides/app-consumes pattern.Changes
alembic/versions/l2g3h4i5j6k7_add_vector_embeddings.py-- ReplacedCREATE EXTENSION IF NOT EXISTS vectorwith apg_extensionexistence check that raisesRuntimeErrorwith a clear message if the extension is not installed. RemovedDROP EXTENSION IF EXISTS vectorfromdowngrade()since extension lifecycle is now platform-owned.k8s/cnpg-cluster.yaml(new) -- CNPG Cluster CRD manifest codifying the runningpal-e-postgrescluster config withbootstrap.initdb.postInitSQL: [CREATE EXTENSION IF NOT EXISTS vector]added. This ensures fresh cluster bootstraps provision pgvector automatically as superuser.k8s/kustomization.yaml-- Addedcnpg-cluster.yamlto the resource list.Test Plan
ruff checkpasses on the migration filepytest tests/ -v-- all 614 tests passpostInitSQLReview Checklist
Related
plan-2026-02-26-tf-modularize-postgresphase-postgres-6-vector-search(sub-phase 6b-1)Closes #126
PR #140 Review
BLOCKERS
1. Cross-namespace CNPG Cluster CRD will fail or misplace the resource
The new
k8s/cnpg-cluster.yamlhardcodesnamespace: postgresin its metadata. However, every other manifest ink8s/omitsnamespace:and relies on ArgoCD to place resources into thepal-e-docsnamespace. This creates two possible failure modes:postgresnamespace. But ArgoCD's Application is scoped to thepal-e-docsnamespace. ArgoCD may lack RBAC to create resources inpostgres, or the resource may be created but not tracked/synced properly by ArgoCD.pal-e-docs, which is wrong -- the runningpal-e-postgrescluster is in thepostgresnamespace.Either way, this is a problem. The CNPG Cluster CRD is a platform-level resource living in the
postgresnamespace, managed alongside platform infrastructure (the CNPG operator, S3 creds, etc. are all interraform/main.tf). Placing it in the app repo'sk8s/directory alongside app-level Deployments and Services conflates platform and application concerns.Recommendation: This CRD belongs in
pal-e-platform(either as a Terraformkubernetes_manifestor a separate platform k8s directory), not inpal-e-docs/k8s/. The pal-e-docs repo should only contain the migration fix (the existence check). The CNPG CRD is a platform-provides concern -- it should live where the CNPG operator, namespace, and S3 creds already live.2. Applying this CRD to a running cluster will re-bootstrap the database
The
bootstrap.initdbsection in the CRD describes the initial database setup. On a running CNPG cluster, re-applying this manifest withinitdbpresent is safe only if the cluster already exists and CNPG ignores the bootstrap section on updates. However, if this manifest were ever used to recreate the cluster (disaster recovery, migration), thepostInitSQLwould correctly provision the extension. The concern is more about intent documentation: this is a codification of the running state, not a live-apply manifest. The PR body mentions this but the manifest itself has no comment clarifying this.NITS
1. Downgrade comment references "other databases"
The downgrade comment says the extension "may be used by other databases or future migrations." Currently there is only one database (
paledocs) on this cluster. While the reasoning is correct (platform owns extension lifecycle), the justification could be more precise: the extension is platform-owned, full stop. No need to speculate about other databases.2. RuntimeError vs. Alembic-idiomatic error
Using
RuntimeErrorworks but Alembic users might expectalembic.util.CommandErrorfor migration failures. This is non-blocking since the error message is clear and actionable, butCommandErrorwould integrate better with Alembic's error handling.3. CNPG CRD has no comments explaining its purpose
A brief comment at the top of
cnpg-cluster.yamlexplaining that this codifies the running cluster spec and thatpostInitSQLis the key addition would help future readers understand why this file exists and what it adds over the original manual creation.4. The
archive_mode: "on"andarchive_timeout: 5minparametersThese are fine for the running cluster, but
archive_timeout: 5minis a string without quotes in the YAML while other string values likemax_connectionsuse explicit quotes. Consistency would be cleaner, though CNPG likely handles both.SOP COMPLIANCE
126-fix-pgvector-extension-ownershipmatches issue #126plan-2026-02-26-tf-modularize-postgresCloses #126present in PR bodyVERDICT: NOT APPROVED
The migration change itself (existence check replacing
CREATE EXTENSION) is clean and correct. The downgrade change (removingDROP EXTENSION) is also correct. However, Blocker #1 -- the CNPG Cluster CRD being placed in the app repo'sk8s/directory when it is a platform-level resource -- is a placement/architecture issue that needs resolution before merge. The platform repo (pal-e-platform) already manages the CNPG operator, thepostgresnamespace, and the S3 credentials for this cluster. The Cluster CRD should live there too.Recommended fix path:
cnpg-cluster.yamland the kustomization change from this PR (keep them for a separate platform PR)pal-e-platformto add the CNPG Cluster CRD withpostInitSQLto the platform repof7c6cba6c8c9f7639ba2