fix: bump pal-e-docs memory limit to prevent OOMKilled #182

New issue

Closed

opened 2026-03-15 16:04:26 +00:00 by forgejo_admin · 1 comment

forgejo_admin commented

2026-03-15 16:04:26 +00:00

Owner

Copy link

Lineage

plan-pal-e-platform → Phase 16 → 16b (memory limits — pal-e-docs)

Repo

forgejo_admin/pal-e-docs

User Story

As a platform operator
I want pal-e-docs to have sufficient memory
So that the container stops getting OOMKilled and restarting

Context

The pal-e-docs container is getting OOMKilled with the current 128Mi memory limit. It has restarted 2 times in the last 12 hours. The app runs FastAPI + SQLAlchemy + pgvector embedding operations — 128Mi is insufficient for normal operation, especially when handling concurrent MCP tool calls that trigger embedding computations.

File Targets

Files the agent should modify:

k8s/deployment.yaml — bump the pal-e-docs container resource limits:
- requests.memory: 32Mi → 64Mi
- limits.memory: 128Mi → 256Mi

Files the agent should NOT touch:

Any other files — this is a single resource limit change

Acceptance Criteria

Memory requests set to 64Mi in deployment.yaml
Memory limits set to 256Mi in deployment.yaml
No other changes to the deployment spec

Test Expectations

Verify the YAML is valid: python3 -c "import yaml; yaml.safe_load(open('k8s/deployment.yaml'))"
If kustomize is used: kubectl kustomize k8s/ succeeds

Constraints

Only change the memory request and limit values — nothing else
Do not change CPU requests or limits

Checklist

PR opened
YAML is valid
No unrelated changes

phase-platform-16-alert-tuning — parent phase
pal-e-platform — project

### Lineage `plan-pal-e-platform` → Phase 16 → 16b (memory limits — pal-e-docs) ### Repo `forgejo_admin/pal-e-docs` ### User Story As a platform operator I want pal-e-docs to have sufficient memory So that the container stops getting OOMKilled and restarting ### Context The pal-e-docs container is getting OOMKilled with the current 128Mi memory limit. It has restarted 2 times in the last 12 hours. The app runs FastAPI + SQLAlchemy + pgvector embedding operations — 128Mi is insufficient for normal operation, especially when handling concurrent MCP tool calls that trigger embedding computations. ### File Targets Files the agent should modify: - `k8s/deployment.yaml` — bump the pal-e-docs container resource limits: - `requests.memory`: 32Mi → 64Mi - `limits.memory`: 128Mi → 256Mi Files the agent should NOT touch: - Any other files — this is a single resource limit change ### Acceptance Criteria - [ ] Memory requests set to 64Mi in deployment.yaml - [ ] Memory limits set to 256Mi in deployment.yaml - [ ] No other changes to the deployment spec ### Test Expectations - [ ] Verify the YAML is valid: `python3 -c "import yaml; yaml.safe_load(open('k8s/deployment.yaml'))"` - [ ] If kustomize is used: `kubectl kustomize k8s/` succeeds ### Constraints - Only change the memory request and limit values — nothing else - Do not change CPU requests or limits ### Checklist - [ ] PR opened - [ ] YAML is valid - [ ] No unrelated changes ### Related - `phase-platform-16-alert-tuning` — parent phase - `pal-e-platform` — project

forgejo_admin referenced this issue from a commit

2026-03-15 16:05:30 +00:00

fix: bump pal-e-docs memory limit 128Mi → 256Mi

forgejo_admin referenced this issue from a pull request that will close it,

2026-03-15 16:05:59 +00:00

fix: bump pal-e-docs memory limit 128Mi → 256Mi #183

forgejo_admin added the

status:qa

type:devops

labels

2026-03-15 16:10:47 +00:00

forgejo_admin commented

2026-03-15 16:11:47 +00:00

Author

Owner

Copy link

PR #183 Review

DOMAIN REVIEW

Tech stack: Kubernetes deployment manifest (YAML). Single-file change to resource requests/limits.

Diff verified against full file (k8s/deployment.yaml, 73 lines):

requests.memory: 32Mi -> 64Mi (changed)
limits.memory: 128Mi -> 256Mi (changed)
requests.cpu: 10m (unchanged, line 70)
No CPU limit set (unchanged -- correct for non-CPU-bound workloads)
No other spec changes: replicas, strategy, probes, env vars, image tag all identical

Reasonableness: FastAPI + SQLAlchemy + pgvector is memory-hungry. pgvector loads embedding vectors into memory for similarity search. 128Mi was demonstrably too tight (OOMKilled, 2 restarts in 12h). 256Mi limit with 64Mi request gives a 4:1 burst ratio, which is appropriate for this workload profile. The 2x bump is conservative and measured.

BLOCKERS

None.

NITS

None.

SOP COMPLIANCE

Branch 182-fix-bump-pal-e-docs-memory-limit-to-prev named after issue #182
PR body follows template (Summary, Changes, Test Plan, Related)
Related references plan-pal-e-platform
Closes #182 present
No secrets committed
No unnecessary file changes (1 file, 2 additions, 2 deletions)
Commit message is descriptive

PROCESS OBSERVATIONS

Clean operational fix. Low change failure risk -- memory limit bumps are safe, reversible changes. ArgoCD will auto-sync after merge. Post-merge validation items in the Test Plan (kubectl top pod monitoring) are appropriate.

VERDICT: APPROVED

## PR #183 Review ### DOMAIN REVIEW **Tech stack:** Kubernetes deployment manifest (YAML). Single-file change to resource requests/limits. **Diff verified against full file** (`k8s/deployment.yaml`, 73 lines): - `requests.memory`: 32Mi -> 64Mi (changed) - `limits.memory`: 128Mi -> 256Mi (changed) - `requests.cpu`: 10m (unchanged, line 70) - No CPU limit set (unchanged -- correct for non-CPU-bound workloads) - No other spec changes: replicas, strategy, probes, env vars, image tag all identical **Reasonableness:** FastAPI + SQLAlchemy + pgvector is memory-hungry. pgvector loads embedding vectors into memory for similarity search. 128Mi was demonstrably too tight (OOMKilled, 2 restarts in 12h). 256Mi limit with 64Mi request gives a 4:1 burst ratio, which is appropriate for this workload profile. The 2x bump is conservative and measured. ### BLOCKERS None. ### NITS None. ### SOP COMPLIANCE - [x] Branch `182-fix-bump-pal-e-docs-memory-limit-to-prev` named after issue #182 - [x] PR body follows template (Summary, Changes, Test Plan, Related) - [x] Related references `plan-pal-e-platform` - [x] `Closes #182` present - [x] No secrets committed - [x] No unnecessary file changes (1 file, 2 additions, 2 deletions) - [x] Commit message is descriptive ### PROCESS OBSERVATIONS Clean operational fix. Low change failure risk -- memory limit bumps are safe, reversible changes. ArgoCD will auto-sync after merge. Post-merge validation items in the Test Plan (kubectl top pod monitoring) are appropriate. ### VERDICT: APPROVED

forgejo_admin added the

status:approved

label

2026-03-15 16:11:48 +00:00

forgejo_admin closed this issue

2026-03-15 19:09:00 +00:00

No Branch/Tag specified

main

199-drop-legacy-boards

158-incident-fix-migration-crash-drop-hnsw-i

128-add-jinja2-template-rendering-endpoint-p

132-fix-template-endpoint-qa-nits-is-public

123-clean-up-test-seed-block-objects-with-nu

124-harden-anchor-id-column-to-not-null

119-generate-anchor-ids-for-all-block-types

111-7e-1a-qa-nits-cleanup-redundant-assignme

11-fix-repos-formatting

5-browse-frontend

1-scaffold-project

No results found.

Labels

Clear labels

QA passed, awaiting merge approval

status:in-progress

Dev agent is actively working

status:needs-fix

QA found issues, back to dev

status:qa

PR submitted, awaiting QA review

type:bug

Bug fix

type:devops

Infrastructure/CI/config work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

forgejo_admin/pal-e-api#182

Reference in a new issue

Repository

forgejo_admin/pal-e-api

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns