Migrate pal-e-docs from SQLite to Postgres (CNPG) #76

Closed
opened 2026-03-06 17:13:25 +00:00 by forgejo_admin · 2 comments
Contributor

Plan

plan-2026-02-26-tf-modularize-postgres -- Phase 3, Parts B/C/D (traceability only)

Repo

forgejo_admin/pal-e-docs

User Story

As a platform operator
I want pal-e-docs running on the shared CNPG Postgres cluster
So that migrations are safe (no SQLite auto-commit DDL crashes) and we unlock future search capabilities

Context

pal-e-docs currently uses SQLite with Litestream for backup/replication. A CNPG Postgres cluster is already running:

  • Cluster: pal-e-postgres in postgres namespace (healthy, Postgres 17.4)
  • Database: paledocs, user: paledocs
  • Endpoint: pal-e-postgres-rw.postgres.svc.cluster.local:5432
  • Connection string will be provided via k8s secret paledocs-db-url key DATABASE_URL in the pal-e-docs namespace (created by Terraform — pal-e-platform issue #22, applied before this work deploys)

ArgoCD auto-syncs from pal-e-docs/k8s/ on main with prune + self-heal enabled. When PVC and ConfigMap are removed from kustomization.yaml, ArgoCD will delete them. This is intentional.

This is a two-PR delivery (see Deployment Sequence below).

SQLite-isms in Existing Migrations

The following patterns in alembic/versions/ are SQLite-specific and will fail on Postgres:

  1. sa.text('(CURRENT_TIMESTAMP)') — files: 54714bc57fbe, a1b2c3d4e5f6, b2c3d4e5f6a7. Postgres wants CURRENT_TIMESTAMP (no parens) or now().
  2. server_default=sa.text('1') for Boolean — file: b2c3d4e5f6a7 (users table is_approved). Postgres wants true.
  3. f6a7b8c9d0e1 (sprint schema expansion) — upgrade is pass (no-op on SQLite). On Postgres with native_enum=False, CHECK constraints for expanded enum values may need handling.

Recommended approach: Since we're running against an empty Postgres database, the cleanest path is:

  • Option A: Fix each migration to be dialect-aware (add if context.get_x_argument... or use sa.func.now() instead of sa.text)
  • Option B: Generate a single fresh baseline migration from the current models for Postgres, squashing the chain
  • Preference: Option A — fix in-place so the migration chain works on both SQLite (local dev) and Postgres (production). Replace sa.text('(CURRENT_TIMESTAMP)') with sa.func.now(), replace sa.text('1') with sa.text('true') guarded by dialect check or just use sa.true_().

File Targets

PR 1 — Code changes (Postgres support, backward compatible):

  • src/pal_e_docs/config.py -- add database_url: str | None = None setting
  • src/pal_e_docs/database.py -- use database_url when set, fall back to SQLite. Conditionally attach SQLite pragma listener.
  • alembic/env.py -- use settings.database_url when available
  • alembic/versions/54714bc57fbe_initial_schema.py -- fix (CURRENT_TIMESTAMP)sa.func.now()
  • alembic/versions/a1b2c3d4e5f6_add_repos_table.py -- same fix
  • alembic/versions/b2c3d4e5f6a7_add_users_table.py -- fix timestamp + boolean default
  • alembic/versions/c3d4e5f6a7b8_add_project_is_public_and_page_note_fk.py -- fix boolean default if present
  • alembic/versions/f6a7b8c9d0e1_sprint_schema_expansion.py -- add Postgres CHECK constraint handling
  • pyproject.toml -- add psycopg2-binary dependency (or psycopg[binary])
  • Dockerfile -- if psycopg2-binary needs system deps, add them

Files NOT to touch:

  • k8s/* -- that's PR 2
  • tests/ -- existing tests should still pass against SQLite

PR 2 — Manifest changes (switch to Postgres, remove Litestream):

  • k8s/deployment.yaml -- replace PALDOCS_DATABASE_PATH env with PALDOCS_DATABASE_URL from secret, remove Litestream init + sidecar containers, remove data volume + litestream-config volume
  • k8s/kustomization.yaml -- remove pvc.yaml and litestream-configmap.yaml
  • k8s/pvc.yaml -- DELETE this file
  • k8s/litestream-configmap.yaml -- DELETE this file

Data migration script (can live in repo root or scripts/):

  • scripts/migrate_sqlite_to_postgres.py -- dump SQLite → insert into Postgres, reset PK sequences

Acceptance Criteria

  • App starts successfully with PALDOCS_DATABASE_URL=postgresql://... env var
  • App still starts with PALDOCS_DATABASE_PATH=/data/foo.db (backward compat for local dev)
  • alembic upgrade head runs cleanly against an empty Postgres database
  • All existing API endpoints return correct data after migration
  • MCP tools (list_notes, get_note, create_note, etc.) work against Postgres
  • Litestream containers removed from deployment
  • PVC and ConfigMap removed from kustomization

Test Expectations

  • Existing tests pass (they use SQLite — no change needed)
  • Manual verification: run alembic upgrade head against a local Postgres
  • Manual verification: run data migration script against test data
  • Run: pytest tests/ (should pass unchanged for PR 1)

Deployment Sequence (CRITICAL)

1. Merge PR 1 (code) → ArgoCD deploys. App still uses SQLite. Nothing changes.
2. Lucas runs `tofu apply` on pal-e-platform (creates DB secret)
3. Lucas runs `alembic upgrade head` against Postgres (creates schema)
4. Lucas runs `scripts/migrate_sqlite_to_postgres.py` (loads data)
5. Lucas verifies data in Postgres
6. Merge PR 2 (manifests) → ArgoCD deploys. App switches to Postgres. Litestream removed.
7. Verify app is healthy, API responds, data intact.

Steps 2-5 are manual — coordinated by Betty Sue during a short maintenance window. The dev agent delivers PR 1, PR 2, and the migration script. Lucas handles the operational steps.

Constraints

  • database.py engine creation must be at module level (current pattern) — don't change to lazy init
  • Keep config.py using pydantic-settings with env_prefix = "PALDOCS_"
  • The PALDOCS_DATABASE_URL env var name must match the config field name (database_url) with the prefix
  • psycopg2-binary is fine for now (avoids libpq build dep). Can switch to psycopg[binary] (psycopg3) if preferred.
  • Do NOT run alembic upgrade head against production — that's Lucas's job during the maintenance window
  • PR 1 and PR 2 should be separate branches and separate PRs

Checklist

  • PR 1 opened (code changes)
  • PR 2 opened (manifest changes)
  • Migration script included
  • Tests pass
  • No unrelated changes
  • pal-e-platform project -- issue #22 (Terraform secret) must be applied before PR 2 merges
  • plan-2026-02-25-platform-observability -- Litestream removal affects backup strategy
### Plan `plan-2026-02-26-tf-modularize-postgres` -- Phase 3, Parts B/C/D (traceability only) ### Repo `forgejo_admin/pal-e-docs` ### User Story As a platform operator I want pal-e-docs running on the shared CNPG Postgres cluster So that migrations are safe (no SQLite auto-commit DDL crashes) and we unlock future search capabilities ### Context pal-e-docs currently uses SQLite with Litestream for backup/replication. A CNPG Postgres cluster is already running: - Cluster: `pal-e-postgres` in `postgres` namespace (healthy, Postgres 17.4) - Database: `paledocs`, user: `paledocs` - Endpoint: `pal-e-postgres-rw.postgres.svc.cluster.local:5432` - Connection string will be provided via k8s secret `paledocs-db-url` key `DATABASE_URL` in the `pal-e-docs` namespace (created by Terraform — pal-e-platform issue #22, applied before this work deploys) ArgoCD auto-syncs from `pal-e-docs/k8s/` on main with prune + self-heal enabled. When PVC and ConfigMap are removed from kustomization.yaml, ArgoCD will delete them. This is intentional. **This is a two-PR delivery (see Deployment Sequence below).** ### SQLite-isms in Existing Migrations The following patterns in `alembic/versions/` are SQLite-specific and will fail on Postgres: 1. **`sa.text('(CURRENT_TIMESTAMP)')`** — files: `54714bc57fbe`, `a1b2c3d4e5f6`, `b2c3d4e5f6a7`. Postgres wants `CURRENT_TIMESTAMP` (no parens) or `now()`. 2. **`server_default=sa.text('1')` for Boolean** — file: `b2c3d4e5f6a7` (users table `is_approved`). Postgres wants `true`. 3. **`f6a7b8c9d0e1` (sprint schema expansion)** — upgrade is `pass` (no-op on SQLite). On Postgres with `native_enum=False`, CHECK constraints for expanded enum values may need handling. **Recommended approach:** Since we're running against an empty Postgres database, the cleanest path is: - Option A: Fix each migration to be dialect-aware (add `if context.get_x_argument...` or use `sa.func.now()` instead of `sa.text`) - Option B: Generate a single fresh baseline migration from the current models for Postgres, squashing the chain - **Preference: Option A** — fix in-place so the migration chain works on both SQLite (local dev) and Postgres (production). Replace `sa.text('(CURRENT_TIMESTAMP)')` with `sa.func.now()`, replace `sa.text('1')` with `sa.text('true')` guarded by dialect check or just use `sa.true_()`. ### File Targets **PR 1 — Code changes (Postgres support, backward compatible):** - `src/pal_e_docs/config.py` -- add `database_url: str | None = None` setting - `src/pal_e_docs/database.py` -- use `database_url` when set, fall back to SQLite. Conditionally attach SQLite pragma listener. - `alembic/env.py` -- use `settings.database_url` when available - `alembic/versions/54714bc57fbe_initial_schema.py` -- fix `(CURRENT_TIMESTAMP)` → `sa.func.now()` - `alembic/versions/a1b2c3d4e5f6_add_repos_table.py` -- same fix - `alembic/versions/b2c3d4e5f6a7_add_users_table.py` -- fix timestamp + boolean default - `alembic/versions/c3d4e5f6a7b8_add_project_is_public_and_page_note_fk.py` -- fix boolean default if present - `alembic/versions/f6a7b8c9d0e1_sprint_schema_expansion.py` -- add Postgres CHECK constraint handling - `pyproject.toml` -- add `psycopg2-binary` dependency (or `psycopg[binary]`) - `Dockerfile` -- if psycopg2-binary needs system deps, add them Files NOT to touch: - `k8s/*` -- that's PR 2 - `tests/` -- existing tests should still pass against SQLite **PR 2 — Manifest changes (switch to Postgres, remove Litestream):** - `k8s/deployment.yaml` -- replace `PALDOCS_DATABASE_PATH` env with `PALDOCS_DATABASE_URL` from secret, remove Litestream init + sidecar containers, remove data volume + litestream-config volume - `k8s/kustomization.yaml` -- remove `pvc.yaml` and `litestream-configmap.yaml` - `k8s/pvc.yaml` -- DELETE this file - `k8s/litestream-configmap.yaml` -- DELETE this file **Data migration script (can live in repo root or scripts/):** - `scripts/migrate_sqlite_to_postgres.py` -- dump SQLite → insert into Postgres, reset PK sequences ### Acceptance Criteria - [ ] App starts successfully with `PALDOCS_DATABASE_URL=postgresql://...` env var - [ ] App still starts with `PALDOCS_DATABASE_PATH=/data/foo.db` (backward compat for local dev) - [ ] `alembic upgrade head` runs cleanly against an empty Postgres database - [ ] All existing API endpoints return correct data after migration - [ ] MCP tools (list_notes, get_note, create_note, etc.) work against Postgres - [ ] Litestream containers removed from deployment - [ ] PVC and ConfigMap removed from kustomization ### Test Expectations - [ ] Existing tests pass (they use SQLite — no change needed) - [ ] Manual verification: run `alembic upgrade head` against a local Postgres - [ ] Manual verification: run data migration script against test data - Run: `pytest tests/` (should pass unchanged for PR 1) ### Deployment Sequence (CRITICAL) ``` 1. Merge PR 1 (code) → ArgoCD deploys. App still uses SQLite. Nothing changes. 2. Lucas runs `tofu apply` on pal-e-platform (creates DB secret) 3. Lucas runs `alembic upgrade head` against Postgres (creates schema) 4. Lucas runs `scripts/migrate_sqlite_to_postgres.py` (loads data) 5. Lucas verifies data in Postgres 6. Merge PR 2 (manifests) → ArgoCD deploys. App switches to Postgres. Litestream removed. 7. Verify app is healthy, API responds, data intact. ``` Steps 2-5 are manual — coordinated by Betty Sue during a short maintenance window. The dev agent delivers PR 1, PR 2, and the migration script. Lucas handles the operational steps. ### Constraints - `database.py` engine creation must be at module level (current pattern) — don't change to lazy init - Keep `config.py` using pydantic-settings with `env_prefix = "PALDOCS_"` - The `PALDOCS_DATABASE_URL` env var name must match the config field name (`database_url`) with the prefix - `psycopg2-binary` is fine for now (avoids libpq build dep). Can switch to `psycopg[binary]` (psycopg3) if preferred. - Do NOT run `alembic upgrade head` against production — that's Lucas's job during the maintenance window - PR 1 and PR 2 should be separate branches and separate PRs ### Checklist - [ ] PR 1 opened (code changes) - [ ] PR 2 opened (manifest changes) - [ ] Migration script included - [ ] Tests pass - [ ] No unrelated changes ### Related - `pal-e-platform` project -- issue #22 (Terraform secret) must be applied before PR 2 merges - `plan-2026-02-25-platform-observability` -- Litestream removal affects backup strategy
Author
Contributor

PR #78 Review

BLOCKERS

None.

NITS

None. The diff is clean and minimal -- 84 lines removed, 5 added. Exactly the right scope for an infrastructure swap.

KEY VERIFICATIONS (per review request)

  • Secret name paledocs-db-url with key DATABASE_URL -- Confirmed. Matches Terraform resource at main.tf:1108-1117 (kubernetes_secret_v1.paledocs_db_url). Name: paledocs-db-url, key: DATABASE_URL.
  • Env var PALDOCS_DATABASE_URL -- Confirmed. Matches PR #77's config.py field database_url (pydantic-settings maps PALDOCS_ prefix + field name to env var PALDOCS_DATABASE_URL).
  • Litestream init container fully removed -- Confirmed. The litestream-restore init container (image, args, envFrom, volumeMounts, resources) is entirely gone.
  • Litestream sidecar container fully removed -- Confirmed. The litestream-replicate sidecar container is entirely gone.
  • PVC volume and all volume mounts removed -- Confirmed. The volumes section (PVC pal-e-docs-data + configmap litestream-config) and the main container's volumeMounts for /data are all removed.
  • pvc.yaml deleted -- Confirmed. File deleted in diff.
  • litestream-configmap.yaml deleted -- Confirmed. File deleted in diff.
  • kustomization.yaml updated -- Confirmed. References to both pvc.yaml and litestream-configmap.yaml removed.
  • No orphaned volume mounts or references remain -- Confirmed. The resulting deployment has zero volumes, volumeMounts, or initContainers sections.

SOP COMPLIANCE

  • Branch named after issue (76-postgres-manifest-switch references issue #76)
  • PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related Notes)
  • Related references plan slug (plan-2026-02-26-tf-modularize-postgres)
  • No secrets committed (only secretKeyRef references, no values)
  • No unnecessary file changes (4 files, all directly in scope)
  • Scope is tight -- pure removal of SQLite/Litestream infra + addition of one env var from secret

VERDICT: APPROVED

The manifest changes are correct, complete, and well-scoped. All three layers (Terraform secret, k8s deployment, Python config) agree on naming. The deploy ordering documented in the PR body and test plan is appropriate -- this must be merged after PR #77, Terraform apply, alembic migration, and data migration.

## PR #78 Review ### BLOCKERS None. ### NITS None. The diff is clean and minimal -- 84 lines removed, 5 added. Exactly the right scope for an infrastructure swap. ### KEY VERIFICATIONS (per review request) - **Secret name `paledocs-db-url` with key `DATABASE_URL`** -- Confirmed. Matches Terraform resource at `main.tf:1108-1117` (`kubernetes_secret_v1.paledocs_db_url`). Name: `paledocs-db-url`, key: `DATABASE_URL`. - **Env var `PALDOCS_DATABASE_URL`** -- Confirmed. Matches PR #77's `config.py` field `database_url` (pydantic-settings maps `PALDOCS_` prefix + field name to env var `PALDOCS_DATABASE_URL`). - **Litestream init container fully removed** -- Confirmed. The `litestream-restore` init container (image, args, envFrom, volumeMounts, resources) is entirely gone. - **Litestream sidecar container fully removed** -- Confirmed. The `litestream-replicate` sidecar container is entirely gone. - **PVC volume and all volume mounts removed** -- Confirmed. The `volumes` section (PVC `pal-e-docs-data` + configmap `litestream-config`) and the main container's `volumeMounts` for `/data` are all removed. - **`pvc.yaml` deleted** -- Confirmed. File deleted in diff. - **`litestream-configmap.yaml` deleted** -- Confirmed. File deleted in diff. - **`kustomization.yaml` updated** -- Confirmed. References to both `pvc.yaml` and `litestream-configmap.yaml` removed. - **No orphaned volume mounts or references remain** -- Confirmed. The resulting deployment has zero `volumes`, `volumeMounts`, or `initContainers` sections. ### SOP COMPLIANCE - [x] Branch named after issue (`76-postgres-manifest-switch` references issue #76) - [x] PR body follows template (Summary, Changes, Test Plan, Review Checklist, Related Notes) - [x] Related references plan slug (`plan-2026-02-26-tf-modularize-postgres`) - [x] No secrets committed (only secretKeyRef references, no values) - [x] No unnecessary file changes (4 files, all directly in scope) - [x] Scope is tight -- pure removal of SQLite/Litestream infra + addition of one env var from secret ### VERDICT: APPROVED The manifest changes are correct, complete, and well-scoped. All three layers (Terraform secret, k8s deployment, Python config) agree on naming. The deploy ordering documented in the PR body and test plan is appropriate -- this must be merged after PR #77, Terraform apply, alembic migration, and data migration.
Author
Contributor

PR #77 Review

BLOCKERS

None found. The code is correct and well-scoped.

NITS

  1. Unused import in migration script (scripts/migrate_sqlite_to_postgres.py line 15): from sqlalchemy.orm import Session # noqa: F401 is imported but never used. The noqa suppresses the lint warning, but the import should just be removed entirely.

  2. psycopg2-binary vs psycopg2: Using psycopg2-binary is fine for containerized deployments, but worth noting that some teams prefer the source-built psycopg2 for production. Not a blocker -- just a conscious tradeoff to be aware of.

  3. alembic/env.py model imports: The diff shows Sprint and SprintItem models are not imported in the updated alembic/env.py. The current main branch also lacks them (it only imports Note, NoteLink, NoteRevision, NoteTag, Project, Repo, Tag, User). This is a pre-existing gap, not introduced by this PR, but worth noting since those models need to be imported for autogenerate to detect them. Since the sprint migration files were hand-written, this has not caused problems yet.

OBSERVATIONS (Positive)

  • All SQLite-isms fixed: Every migration file that used sa.text('(CURRENT_TIMESTAMP)') or sa.text('1') has been updated. The later migrations (d4e5f6a7b8c9 through h8c9d0e1f2g3) already used portable constructs -- no gaps.
  • database.py rewrite is clean: The _build_engine() function correctly branches on settings.database_url, and the SQLite pragma listener is conditionally registered only when the dialect is sqlite. No pragma leakage to Postgres.
  • config.py change is minimal and correct: database_url: str | None = None maps to PALDOCS_DATABASE_URL via the PALDOCS_ env prefix. When unset, falls back to SQLite. Zero behavior change for existing deployments.
  • alembic/env.py correctly prefers Postgres URL: Same logic as database.py -- settings.database_url when set, SQLite path otherwise.
  • Migration script handles FK order correctly: TABLE_ORDER lists parents before children (projects before notes, notes before note_tags/note_links/note_revisions, sprints before sprint_items). All 10 tables accounted for.
  • Sequence reset is correct: SEQUENCE_TABLES correctly excludes note_tags and note_links (composite PKs, no auto-increment sequences). The setval() call with false as the third argument means the next nextval() returns the set value, which is MAX(id) + 1 -- correct.
  • Credential masking in migration script: pg_url.split('@')[-1] strips user:pass before printing. Good security practice.
  • Scope is tight: 9 files changed, 160 additions, 18 deletions. No scope creep. This is purely code-side Postgres support with no infrastructure changes (that is deferred to PR #78).

SOP COMPLIANCE

  • Branch named after issue: 76-postgres-code-changes references issue #76
  • PR body has: Summary, Changes, Test Plan, Related -- all present and detailed
  • Related section references plan slug: plan-2026-02-26-tf-modularize-postgres
  • No secrets, .env files, or credentials committed
  • No unnecessary file changes (scope is clean)
  • Commit messages: PR title is descriptive
  • 290 tests pass per PR description (SQLite path exercised)

VERDICT: APPROVED

## PR #77 Review ### BLOCKERS None found. The code is correct and well-scoped. ### NITS 1. **Unused import in migration script** (`scripts/migrate_sqlite_to_postgres.py` line 15): `from sqlalchemy.orm import Session # noqa: F401` is imported but never used. The `noqa` suppresses the lint warning, but the import should just be removed entirely. 2. **`psycopg2-binary` vs `psycopg2`**: Using `psycopg2-binary` is fine for containerized deployments, but worth noting that some teams prefer the source-built `psycopg2` for production. Not a blocker -- just a conscious tradeoff to be aware of. 3. **`alembic/env.py` model imports**: The diff shows Sprint and SprintItem models are not imported in the updated `alembic/env.py`. The current main branch also lacks them (it only imports Note, NoteLink, NoteRevision, NoteTag, Project, Repo, Tag, User). This is a pre-existing gap, not introduced by this PR, but worth noting since those models need to be imported for autogenerate to detect them. Since the sprint migration files were hand-written, this has not caused problems yet. ### OBSERVATIONS (Positive) - **All SQLite-isms fixed**: Every migration file that used `sa.text('(CURRENT_TIMESTAMP)')` or `sa.text('1')` has been updated. The later migrations (`d4e5f6a7b8c9` through `h8c9d0e1f2g3`) already used portable constructs -- no gaps. - **`database.py` rewrite is clean**: The `_build_engine()` function correctly branches on `settings.database_url`, and the SQLite pragma listener is conditionally registered only when the dialect is sqlite. No pragma leakage to Postgres. - **`config.py` change is minimal and correct**: `database_url: str | None = None` maps to `PALDOCS_DATABASE_URL` via the `PALDOCS_` env prefix. When unset, falls back to SQLite. Zero behavior change for existing deployments. - **`alembic/env.py` correctly prefers Postgres URL**: Same logic as database.py -- `settings.database_url` when set, SQLite path otherwise. - **Migration script handles FK order correctly**: TABLE_ORDER lists parents before children (projects before notes, notes before note_tags/note_links/note_revisions, sprints before sprint_items). All 10 tables accounted for. - **Sequence reset is correct**: SEQUENCE_TABLES correctly excludes `note_tags` and `note_links` (composite PKs, no auto-increment sequences). The `setval()` call with `false` as the third argument means the next `nextval()` returns the set value, which is `MAX(id) + 1` -- correct. - **Credential masking in migration script**: `pg_url.split('@')[-1]` strips user:pass before printing. Good security practice. - **Scope is tight**: 9 files changed, 160 additions, 18 deletions. No scope creep. This is purely code-side Postgres support with no infrastructure changes (that is deferred to PR #78). ### SOP COMPLIANCE - [x] Branch named after issue: `76-postgres-code-changes` references issue #76 - [x] PR body has: Summary, Changes, Test Plan, Related -- all present and detailed - [x] Related section references plan slug: `plan-2026-02-26-tf-modularize-postgres` - [x] No secrets, .env files, or credentials committed - [x] No unnecessary file changes (scope is clean) - [x] Commit messages: PR title is descriptive - [x] 290 tests pass per PR description (SQLite path exercised) ### VERDICT: APPROVED
Commenting is not possible because the repository is archived.
No milestone
No project
No assignees
1 participant
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ldraney/pal-e-api#76
No description provided.