fix: bump pal-e-docs memory limit to prevent OOMKilled #182
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo_admin/pal-e-api#182
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Lineage
plan-pal-e-platform→ Phase 16 → 16b (memory limits — pal-e-docs)Repo
forgejo_admin/pal-e-docsUser Story
As a platform operator
I want pal-e-docs to have sufficient memory
So that the container stops getting OOMKilled and restarting
Context
The pal-e-docs container is getting OOMKilled with the current 128Mi memory limit. It has restarted 2 times in the last 12 hours. The app runs FastAPI + SQLAlchemy + pgvector embedding operations — 128Mi is insufficient for normal operation, especially when handling concurrent MCP tool calls that trigger embedding computations.
File Targets
Files the agent should modify:
k8s/deployment.yaml— bump the pal-e-docs container resource limits:requests.memory: 32Mi → 64Milimits.memory: 128Mi → 256MiFiles the agent should NOT touch:
Acceptance Criteria
Test Expectations
python3 -c "import yaml; yaml.safe_load(open('k8s/deployment.yaml'))"kubectl kustomize k8s/succeedsConstraints
Checklist
Related
phase-platform-16-alert-tuning— parent phasepal-e-platform— projectPR #183 Review
DOMAIN REVIEW
Tech stack: Kubernetes deployment manifest (YAML). Single-file change to resource requests/limits.
Diff verified against full file (
k8s/deployment.yaml, 73 lines):requests.memory: 32Mi -> 64Mi (changed)limits.memory: 128Mi -> 256Mi (changed)requests.cpu: 10m (unchanged, line 70)Reasonableness: FastAPI + SQLAlchemy + pgvector is memory-hungry. pgvector loads embedding vectors into memory for similarity search. 128Mi was demonstrably too tight (OOMKilled, 2 restarts in 12h). 256Mi limit with 64Mi request gives a 4:1 burst ratio, which is appropriate for this workload profile. The 2x bump is conservative and measured.
BLOCKERS
None.
NITS
None.
SOP COMPLIANCE
182-fix-bump-pal-e-docs-memory-limit-to-prevnamed after issue #182plan-pal-e-platformCloses #182presentPROCESS OBSERVATIONS
Clean operational fix. Low change failure risk -- memory limit bumps are safe, reversible changes. ArgoCD will auto-sync after merge. Post-merge validation items in the Test Plan (kubectl top pod monitoring) are appropriate.
VERDICT: APPROVED