Add structured tool event propagation from Lingma stream/finish metadata and map it to OpenAI tool_calls and Anthropic tool_use/tool_result in both streaming and non-streaming responses. Add focused bridge tests and update docs/design notes to match current behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.9 KiB
3.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Primary docs to read first
README.md(runtime commands, env model, API examples)DESIGN.md(architecture decisions, module boundaries, request lifecycle).env.example(authoritative env var reference)
No Cursor/Copilot rule files were found in this repo (.cursorrules, .cursor/rules/, .github/copilot-instructions.md).
Common development commands
Start locally
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8317
Start with Docker Compose
cp .env.example .env
mkdir -p data secrets
docker compose up -d --build
docker compose logs -f
Run tests
# current focused suite
python3 -m unittest tests/test_tool_call_bridge.py
# discover all unittest tests under tests/
python3 -m unittest discover -s tests -p "test_*.py"
# run a single test method
python3 -m unittest tests.test_tool_call_bridge.ToolCallBridgeTests.test_openai_non_stream_bridges_tool_calls
Smoke-check running gateway
API_KEY=$(grep '^API_KEYS=' .env | cut -d= -f2 | cut -d, -f1)
curl -s http://127.0.0.1:8317/healthz
curl -s http://127.0.0.1:8317/v1/models -H "Authorization: Bearer $API_KEY"
Linting/type-checking status
- There is currently no repo-configured lint/type command (no
ruff/flake8/mypyconfig found). - Do not invent tooling commands; if linting is needed, add tooling in a dedicated change first.
Architecture (big picture)
What this service is
A FastAPI gateway that fronts Lingma and exposes:
- OpenAI-compatible API (
/v1/models,/v1/chat/completions) - Anthropic Messages-compatible API (
/v1/messages,/v1/messages/count_tokens)
Both protocols share the same backend pool, backpressure guard, stats, and session reuse logic.
Request lifecycle (important for most changes)
- Authenticate request (
app/auth.py) - Normalize inbound protocol payload to internal message shape (
openai_schema.py/anthropic_schema.py) - Session-cache lookup (
app/session_cache.py) for prefix-based reuse - Pick backend instance (
app/lingma_pool.py) with affinity + least-in-flight - Acquire concurrency ticket (
app/concurrency.py) - Call Lingma via websocket/LSP client (
app/lingma_client.py) - Map upstream result/stream back to wire protocol in
app/main.py - Record stats and release ticket (including stream-finally paths)
Core module boundaries
app/main.py: API entrypoint + orchestration + wire-format adaptersapp/lingma_pool.py: multi-instance lifecycle, selection, health-aware fallbackapp/lingma_client.py: subprocess + LSP-over-WebSocket transport to Lingmaapp/session_cache.py: LRU+TTL cache of conversation-prefix -> upstream session id (+ instance binding)app/concurrency.py: in-flight guard and queue timeout/backpressure behaviorapp/stats.py: usage counters and Prometheus text
Protocol-specific notes
- Anthropic and OpenAI endpoints are separate adapters over shared internals.
- Response-side tool bridge is implemented: upstream Lingma tool events are surfaced as:
- OpenAI:
tool_calls(stream + non-stream) - Anthropic:
tool_use/tool_resultblocks (stream + non-stream)
- OpenAI:
- Request-side
tools/tool_choiceare accepted by schemas but not forwarded to Lingma.
Operational invariants to preserve
- One request must stay on one Lingma instance for session continuity.
- Session cache entries include instance identity; invalidate on unhealthy instance mismatch.
- Streaming paths must always release in-flight tickets in
finally. - Multi-instance mode must use isolated workdirs per instance.
Deployment/runtime model
- Container startup runs
python /app/app/bootstrap_lingma.pybefore uvicorn. - Compose mounts:
./data -> /app/data(persistent Lingma binary/cache/workdirs)./secrets -> /secrets:ro(session bundles, secrets)