Ensure /v1/responses streams always terminate with response.completed and normalize Lingma tool_code fallbacks into structured tool calls, including single-argument forms. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
11 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Primary docs to read first
README.md(runtime commands, env model, API examples)DESIGN.md(architecture decisions, module boundaries, request lifecycle).env.example(authoritative env var reference)
No Cursor/Copilot rule files were found in this repo (.cursorrules, .cursor/rules/, .github/copilot-instructions.md).
Common development commands
Start locally
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8317
Start with Docker Compose
cp .env.example .env
mkdir -p data secrets
docker compose up -d --build
docker compose logs -f
Run tests
# current focused suite
python3 -m unittest tests/test_tool_call_bridge.py
# discover all unittest tests under tests/
python3 -m unittest discover -s tests -p "test_*.py"
# run a single test method
python3 -m unittest tests.test_tool_call_bridge.ToolCallBridgeTests.test_openai_non_stream_bridges_tool_calls
Smoke-check running gateway
API_KEY=$(grep '^API_KEYS=' .env | cut -d= -f2 | cut -d, -f1)
curl -s http://127.0.0.1:8317/healthz
curl -s http://127.0.0.1:8317/v1/models -H "Authorization: Bearer $API_KEY"
Linting/type-checking status
- There is currently no repo-configured lint/type command (no
ruff/flake8/mypyconfig found). - Do not invent tooling commands; if linting is needed, add tooling in a dedicated change first.
Architecture (big picture)
What this service is
A FastAPI gateway that fronts Lingma and exposes:
- OpenAI-compatible API (
/v1/models,/v1/chat/completions) - Anthropic Messages-compatible API (
/v1/messages,/v1/messages/count_tokens)
Both protocols share the same backend pool, backpressure guard, stats, and session reuse logic.
Request lifecycle (important for most changes)
- Authenticate request (
app/auth.py) - Normalize inbound protocol payload to internal message shape (
openai_schema.py/anthropic_schema.py) - Session-cache lookup (
app/session_cache.py) for prefix-based reuse - Pick backend instance (
app/lingma_pool.py) with affinity + least-in-flight - Acquire concurrency ticket (
app/concurrency.py) - Call Lingma via websocket/LSP client (
app/lingma_client.py) - Map upstream result/stream back to wire protocol in
app/main.py - Record stats and release ticket (including stream-finally paths)
Core module boundaries
app/main.py: API entrypoint + orchestration + wire-format adaptersapp/lingma_pool.py: multi-instance lifecycle, selection, health-aware fallbackapp/lingma_client.py: subprocess + LSP-over-WebSocket transport to Lingmaapp/session_cache.py: LRU+TTL cache of conversation-prefix -> upstream session id (+ instance binding)app/concurrency.py: in-flight guard and queue timeout/backpressure behaviorapp/stats.py: usage counters and Prometheus text
Protocol-specific notes
- Anthropic and OpenAI endpoints are separate adapters over shared internals.
- Response-side tool bridge is implemented: upstream Lingma tool events are surfaced as:
- OpenAI:
tool_calls(stream + non-stream) - Anthropic:
tool_use/tool_resultblocks (stream + non-stream)
- OpenAI:
- Request-side
tools/tool_choiceare accepted by schemas but not forwarded to Lingma.
Operational invariants to preserve
- One request must stay on one Lingma instance for session continuity.
- Session cache entries include instance identity; invalidate on unhealthy instance mismatch.
- Streaming paths must always release in-flight tickets in
finally. - Multi-instance mode must use isolated workdirs per instance.
Deployment/runtime model
- Container startup runs
python /app/app/bootstrap_lingma.pybefore uvicorn. - Compose mounts:
./data -> /app/data(persistent Lingma binary/cache/workdirs)./secrets -> /secrets:ro(session bundles, secrets)
CLAUDE.md
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.
1. Think Before Coding
Don't assume. Don't hide confusion. Surface tradeoffs.
Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
2. Simplicity First
Minimum code that solves the problem. Nothing speculative.
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
3. Surgical Changes
Touch only what you must. Clean up only your own mess.
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
4. Goal-Driven Execution
Define success criteria. Loop until verified.
Transform tasks into verifiable goals:
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
CLAUDE.md
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.
1. Think Before Coding
Don't assume. Don't hide confusion. Surface tradeoffs.
Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
2. Simplicity First
Minimum code that solves the problem. Nothing speculative.
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
3. Surgical Changes
Touch only what you must. Clean up only your own mess.
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
4. Goal-Driven Execution
Define success criteria. Loop until verified.
Transform tasks into verifiable goals:
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
GitNexus — Code Intelligence
This project is indexed by GitNexus as lingma-openai-gateway (1093 symbols, 2685 relationships, 97 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
If any GitNexus tool warns the index is stale, run
npx gitnexus analyzein terminal first.
Always Do
- MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run
gitnexus_impact({target: "symbolName", direction: "upstream"})and report the blast radius (direct callers, affected processes, risk level) to the user. - MUST run
gitnexus_detect_changes()before committing to verify your changes only affect expected symbols and execution flows. - MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use
gitnexus_query({query: "concept"})to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. - When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use
gitnexus_context({name: "symbolName"}).
Never Do
- NEVER edit a function, class, or method without first running
gitnexus_impacton it. - NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use
gitnexus_renamewhich understands the call graph. - NEVER commit changes without running
gitnexus_detect_changes()to check affected scope.
Resources
| Resource | Use for |
|---|---|
gitnexus://repo/lingma-openai-gateway/context |
Codebase overview, check index freshness |
gitnexus://repo/lingma-openai-gateway/clusters |
All functional areas |
gitnexus://repo/lingma-openai-gateway/processes |
All execution flows |
gitnexus://repo/lingma-openai-gateway/process/{name} |
Step-by-step execution trace |
CLI
| Task | Read this skill file |
|---|---|
| Understand architecture / "How does X work?" | .claude/skills/gitnexus/gitnexus-exploring/SKILL.md |
| Blast radius / "What breaks if I change X?" | .claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md |
| Trace bugs / "Why is X failing?" | .claude/skills/gitnexus/gitnexus-debugging/SKILL.md |
| Rename / extract / split / refactor | .claude/skills/gitnexus/gitnexus-refactoring/SKILL.md |
| Tools, resources, schema reference | .claude/skills/gitnexus/gitnexus-guide/SKILL.md |
| Index, status, clean, wiki CLI commands | .claude/skills/gitnexus/gitnexus-cli/SKILL.md |