- Python 99.8%
|
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
updates image pal-e-docs/api tag ' |
||
|---|---|---|
| alembic | ||
| docs | ||
| k8s | ||
| scripts | ||
| src/pal_e_docs | ||
| tests | ||
| .claude-no-enforce | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .woodpecker.yaml | ||
| alembic.ini | ||
| CLAUDE.md | ||
| Dockerfile | ||
| pyproject.toml | ||
| README.md | ||
pal-e-docs
A knowledge platform where your docs, your kanban board, and your search index are the same database.
Why This Exists
Most teams run three systems that don't talk to each other: a knowledge base (Notion, Confluence), a work tracker (Jira, Linear), and whatever search they can bolt on. When you ask "what's blocking the auth migration?" you have to check the board for status, the docs for context, and hope your search covers both.
pal-e-docs collapses all three into a single Postgres store. Notes, kanban boards, and vector embeddings share one schema. A query can filter by project and status (structured), then rank by semantic similarity (unstructured) — in one call. No glue code, no sync jobs between tools.
The primary interface isn't a web UI — it's a Model Context Protocol server with 45+ tools. AI agents get native read/write access to everything: docs, boards, search, projects, links, and revisions. The system was built to be an AI agent's working memory, not retrofitted for it.
How It Works
Block-Level Content
Notes are composed of ordered blocks — headings, paragraphs, code, tables, diagrams. Each block is independently addressable by anchor ID. You can read or update a single section without loading the full document, and each block gets its own vector embedding.
This matters because semantic search returns the specific section that answers your question, not a 5,000-word document you have to scan.
On-Write Embedding Pipeline
When a block is created or its content changes, a PostgreSQL trigger fires NOTIFY to an async embedding worker. The worker calls Ollama, stores the 2560-dimensional vector, and the block becomes searchable within seconds.
No batch jobs. No stale indexes. The search index is as fresh as the last write.
Block INSERT/UPDATE → Postgres trigger → NOTIFY embedding_queue
↓
Async worker (LISTEN)
↓
Ollama embed → pgvector store
Hybrid Search (RRF)
One endpoint, three modes:
| Mode | Engine | Best For |
|---|---|---|
| Keyword | PostgreSQL tsvector full-text search |
Exact terms, slugs, names |
| Semantic | pgvector cosine similarity on block embeddings | Fuzzy concepts, "things like X" |
| Hybrid | Reciprocal Rank Fusion combining both | General queries (default) |
All three modes support SQL-level metadata filtering — project, status, note type, tags — applied as WHERE clauses before ranking. "Show me completed items in project X about auth" is one query, not three API calls stitched together.
The RRF alpha parameter (0.0 = pure keyword, 1.0 = pure semantic) lets callers tune the blend per query.
Boards Are Notes
Kanban boards aren't a separate subsystem — a board is a note with note_type="board". Board items reference other notes by slug and flow through eight columns:
backlog → todo → next_up → in_progress → qa → needs_approval → validation → done
When a phase note's status changes, its board item moves automatically. The board is a view over note state, not a parallel data store that can drift.
Note Graph
Notes link bidirectionally (zettelkasten-style), form parent-child hierarchies, carry typed metadata (14 note types, lifecycle statuses), and track full revision history with user attribution. Tags handle topic/domain classification. Projects scope everything.
MCP-Native
The MCP server exposes the full API surface as discrete tools:
| Category | Tools | Examples |
|---|---|---|
| Notes | 10 | get_note, search_notes, semantic_search, create_note_from_template |
| Blocks | 8 | get_section, update_block, get_note_toc |
| Boards | 10 | list_board_items, sync_board, bulk_move_board_items |
| Projects | 5 | CRUD + nested resource listing |
| Links, Repos, Tags | 9 | Bidirectional links, repo registry, tag queries |
An AI agent connected via MCP can search your knowledge base, check the board for blockers, update a doc section, and move a card to done — all without leaving the protocol.
The Stack
| Layer | Technology |
|---|---|
| API | Python 3.12, FastAPI, Uvicorn |
| Database | PostgreSQL 16 + pgvector |
| ORM | SQLAlchemy 2.0 + Alembic |
| Embeddings | Ollama (qwen3-embedding, 2560-dim) |
| Search | tsvector + pgvector + RRF |
| Auth | Keycloak OIDC (JWT) |
| Frontend | SvelteKit (separate repo) |
| MCP | FastMCP (separate server) |
| CI/CD | Woodpecker CI → Harbor → ArgoCD |
| Infra | k3s, Tailscale, CNPG |
By the Numbers
| Metric | Value |
|---|---|
| API Endpoints | 48 |
| MCP Tools | 45+ |
| Tests | 709+ |
| SQLAlchemy Models | 11 tables |
| Alembic Migrations | 24 |
| Embedding Dimensions | 2,560 |
| Kanban Columns | 8 |
| Note Types | 14 |
| Board Item Types | 6 |
Documentation
| Doc | Description |
|---|---|
| Architecture | FastAPI app structure, data model, auth, request lifecycle |
| Database | Dual-path engine (SQLite local, Postgres prod), shared CNPG cluster |
| API Endpoints | All 48 endpoints with request/response shapes |
| Embedding Pipeline | pgvector setup, Ollama integration, RRF hybrid search |
| Deployment | Woodpecker CI, Harbor registry, ArgoCD GitOps |
Quick Start
pip install -e ".[dev]"
# SQLite (local dev)
PALDOCS_DATABASE_PATH=./local.db alembic upgrade head
PALDOCS_DATABASE_PATH=./local.db python -m pal_e_docs.main
# Tests
pytest
# Lint
ruff check src/ tests/
ruff format src/ tests/
Related Repositories
| Repository | Role |
|---|---|
| pal-e-app | SvelteKit frontend consuming this API |
| pal-e-docs-sdk | Python SDK for programmatic API access |
| pal-e-mcp | MCP server exposing 45+ tools to AI agents |
| pal-e-platform | Infrastructure bootstrap (k3s, Tailscale, CI/CD) |
| pal-e-deployments | Kustomize overlays for ArgoCD GitOps |