Debug pal-e-docs container OOMKilled (chronic) #275
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/pal-e-api#275
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Bug
Lineage
Standalone — discovered 2026-05-01 during alert-state audit.
Repo
forgejo_admin/pal-e-apiWhat Broke
The
pal-e-docscontainer is OOMKilling regularly, driving a chronicOOMKilledcritical alert for 2+ days. Currently affects podpal-e-docs-6c7fdd96d7-fll8hin namespacepal-e-docs. Note: the container/pod name still sayspal-e-docs(legacy name) but the codebase repo was renamed topal-e-api.Repro Steps
kubectl describe pod -n pal-e-docs pal-e-docs-6c7fdd96d7-fll8h→ look forLast State: Terminated, Reason: OOMKilledkubectl get pod -n pal-e-docs -l app=pal-e-docs -o jsonpath='{.items[*].status.containerStatuses[*].restartCount}'→ restart countkubectl exec -n monitoring prometheus-... -- wget -qO- 'http://localhost:9090/api/v1/query?query=container_memory_working_set_bytes{namespace="pal-e-docs",container="pal-e-docs"}'→ recent peak vs limitExpected Behavior
Container runs at steady-state memory usage well below limit.
OOMKilledalert does not fire.Environment
pal-e-docspal-e-deployments(most likely)Acceptance Criteria
OOMKilledalert clears forpal-e-docscontainerRelated
pal-e-platform— alerting rule lives therealert-report-2026-05-01— alert snapshot