# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Primary docs to read first - `README.md` (runtime commands, env model, API examples) - `DESIGN.md` (architecture decisions, module boundaries, request lifecycle) - `.env.example` (authoritative env var reference) No Cursor/Copilot rule files were found in this repo (`.cursorrules`, `.cursor/rules/`, `.github/copilot-instructions.md`). ## Common development commands ### Start locally ```bash pip install -r requirements.txt uvicorn app.main:app --reload --port 8317 ``` ### Start with Docker Compose ```bash cp .env.example .env mkdir -p data secrets docker compose up -d --build docker compose logs -f ``` ### Run tests ```bash # current focused suite python3 -m unittest tests/test_tool_call_bridge.py # discover all unittest tests under tests/ python3 -m unittest discover -s tests -p "test_*.py" # run a single test method python3 -m unittest tests.test_tool_call_bridge.ToolCallBridgeTests.test_openai_non_stream_bridges_tool_calls ``` ### Smoke-check running gateway ```bash API_KEY=$(grep '^API_KEYS=' .env | cut -d= -f2 | cut -d, -f1) curl -s http://127.0.0.1:8317/healthz curl -s http://127.0.0.1:8317/v1/models -H "Authorization: Bearer $API_KEY" ``` ### Linting/type-checking status - There is currently no repo-configured lint/type command (no `ruff`/`flake8`/`mypy` config found). - Do not invent tooling commands; if linting is needed, add tooling in a dedicated change first. ## Architecture (big picture) ### What this service is A FastAPI gateway that fronts Lingma and exposes: - OpenAI-compatible API (`/v1/models`, `/v1/chat/completions`) - Anthropic Messages-compatible API (`/v1/messages`, `/v1/messages/count_tokens`) Both protocols share the same backend pool, backpressure guard, stats, and session reuse logic. ### Request lifecycle (important for most changes) 1. Authenticate request (`app/auth.py`) 2. Normalize inbound protocol payload to internal message shape (`openai_schema.py` / `anthropic_schema.py`) 3. Session-cache lookup (`app/session_cache.py`) for prefix-based reuse 4. Pick backend instance (`app/lingma_pool.py`) with affinity + least-in-flight 5. Acquire concurrency ticket (`app/concurrency.py`) 6. Call Lingma via websocket/LSP client (`app/lingma_client.py`) 7. Map upstream result/stream back to wire protocol in `app/main.py` 8. Record stats and release ticket (including stream-finally paths) ### Core module boundaries - `app/main.py`: API entrypoint + orchestration + wire-format adapters - `app/lingma_pool.py`: multi-instance lifecycle, selection, health-aware fallback - `app/lingma_client.py`: subprocess + LSP-over-WebSocket transport to Lingma - `app/session_cache.py`: LRU+TTL cache of conversation-prefix -> upstream session id (+ instance binding) - `app/concurrency.py`: in-flight guard and queue timeout/backpressure behavior - `app/stats.py`: usage counters and Prometheus text ### Protocol-specific notes - Anthropic and OpenAI endpoints are separate adapters over shared internals. - Response-side tool bridge is implemented: upstream Lingma tool events are surfaced as: - OpenAI: `tool_calls` (stream + non-stream) - Anthropic: `tool_use` / `tool_result` blocks (stream + non-stream) - Request-side `tools` / `tool_choice` are accepted by schemas but not forwarded to Lingma. ### Operational invariants to preserve - One request must stay on one Lingma instance for session continuity. - Session cache entries include instance identity; invalidate on unhealthy instance mismatch. - Streaming paths must always release in-flight tickets in `finally`. - Multi-instance mode must use isolated workdirs per instance. ### Deployment/runtime model - Container startup runs `python /app/app/bootstrap_lingma.py` before uvicorn. - Compose mounts: - `./data -> /app/data` (persistent Lingma binary/cache/workdirs) - `./secrets -> /secrets:ro` (session bundles, secrets)