Files

mmc d0df089282 fix: harden responses streaming and tool-call fallback

Ensure /v1/responses streams always terminate with response.completed and normalize Lingma tool_code fallbacks into structured tool calls, including single-argument forms.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-20 19:24:02 +08:00

11 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Primary docs to read first

README.md (runtime commands, env model, API examples)
DESIGN.md (architecture decisions, module boundaries, request lifecycle)
.env.example (authoritative env var reference)

No Cursor/Copilot rule files were found in this repo (.cursorrules, .cursor/rules/, .github/copilot-instructions.md).

Common development commands

Start locally

pip install -r requirements.txt
uvicorn app.main:app --reload --port 8317

Start with Docker Compose

cp .env.example .env
mkdir -p data secrets
docker compose up -d --build
docker compose logs -f

Run tests

# current focused suite
python3 -m unittest tests/test_tool_call_bridge.py

# discover all unittest tests under tests/
python3 -m unittest discover -s tests -p "test_*.py"

# run a single test method
python3 -m unittest tests.test_tool_call_bridge.ToolCallBridgeTests.test_openai_non_stream_bridges_tool_calls

Smoke-check running gateway

API_KEY=$(grep '^API_KEYS=' .env | cut -d= -f2 | cut -d, -f1)
curl -s http://127.0.0.1:8317/healthz
curl -s http://127.0.0.1:8317/v1/models -H "Authorization: Bearer $API_KEY"

Linting/type-checking status

There is currently no repo-configured lint/type command (no ruff/flake8/mypy config found).
Do not invent tooling commands; if linting is needed, add tooling in a dedicated change first.

Architecture (big picture)

What this service is

A FastAPI gateway that fronts Lingma and exposes:

OpenAI-compatible API (/v1/models, /v1/chat/completions)
Anthropic Messages-compatible API (/v1/messages, /v1/messages/count_tokens)

Both protocols share the same backend pool, backpressure guard, stats, and session reuse logic.

Request lifecycle (important for most changes)

Authenticate request (app/auth.py)
Normalize inbound protocol payload to internal message shape (openai_schema.py / anthropic_schema.py)
Session-cache lookup (app/session_cache.py) for prefix-based reuse
Pick backend instance (app/lingma_pool.py) with affinity + least-in-flight
Acquire concurrency ticket (app/concurrency.py)
Call Lingma via websocket/LSP client (app/lingma_client.py)
Map upstream result/stream back to wire protocol in app/main.py
Record stats and release ticket (including stream-finally paths)

Core module boundaries

app/main.py: API entrypoint + orchestration + wire-format adapters
app/lingma_pool.py: multi-instance lifecycle, selection, health-aware fallback
app/lingma_client.py: subprocess + LSP-over-WebSocket transport to Lingma
app/session_cache.py: LRU+TTL cache of conversation-prefix -> upstream session id (+ instance binding)
app/concurrency.py: in-flight guard and queue timeout/backpressure behavior
app/stats.py: usage counters and Prometheus text

Protocol-specific notes

Anthropic and OpenAI endpoints are separate adapters over shared internals.
Response-side tool bridge is implemented: upstream Lingma tool events are surfaced as:
- OpenAI: tool_calls (stream + non-stream)
- Anthropic: tool_use / tool_result blocks (stream + non-stream)
Request-side tools / tool_choice are accepted by schemas but not forwarded to Lingma.

Operational invariants to preserve

One request must stay on one Lingma instance for session continuity.
Session cache entries include instance identity; invalidate on unhealthy instance mismatch.
Streaming paths must always release in-flight tickets in finally.
Multi-instance mode must use isolated workdirs per instance.

Deployment/runtime model

Container startup runs python /app/app/bootstrap_lingma.py before uvicorn.
Compose mounts:
- ./data -> /app/data (persistent Lingma binary/cache/workdirs)
- ./secrets -> /secrets:ro (session bundles, secrets)

CLAUDE.md

Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.

Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs.

Before implementing:

State your assumptions explicitly. If uncertain, ask.
If multiple interpretations exist, present them - don't pick silently.
If a simpler approach exists, say so. Push back when warranted.
If something is unclear, stop. Name what's confusing. Ask.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

No features beyond what was asked.
No abstractions for single-use code.
No "flexibility" or "configurability" that wasn't requested.
No error handling for impossible scenarios.
If you write 200 lines and it could be 50, rewrite it.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

Don't "improve" adjacent code, comments, or formatting.
Don't refactor things that aren't broken.
Match existing style, even if you'd do it differently.
If you notice unrelated dead code, mention it - don't delete it.

When your changes create orphans:

Remove imports/variables/functions that YOUR changes made unused.
Don't remove pre-existing dead code unless asked.

The test: Every changed line should trace directly to the user's request.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Transform tasks into verifiable goals:

"Add validation" → "Write tests for invalid inputs, then make them pass"
"Fix the bug" → "Write a test that reproduces it, then make it pass"
"Refactor X" → "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.

These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.

CLAUDE.md

Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.

Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs.

Before implementing:

State your assumptions explicitly. If uncertain, ask.
If multiple interpretations exist, present them - don't pick silently.
If a simpler approach exists, say so. Push back when warranted.
If something is unclear, stop. Name what's confusing. Ask.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

No features beyond what was asked.
No abstractions for single-use code.
No "flexibility" or "configurability" that wasn't requested.
No error handling for impossible scenarios.
If you write 200 lines and it could be 50, rewrite it.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

Don't "improve" adjacent code, comments, or formatting.
Don't refactor things that aren't broken.
Match existing style, even if you'd do it differently.
If you notice unrelated dead code, mention it - don't delete it.

When your changes create orphans:

Remove imports/variables/functions that YOUR changes made unused.
Don't remove pre-existing dead code unless asked.

The test: Every changed line should trace directly to the user's request.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Transform tasks into verifiable goals:

"Add validation" → "Write tests for invalid inputs, then make them pass"
"Fix the bug" → "Write a test that reproduces it, then make it pass"
"Refactor X" → "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.

These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.

GitNexus — Code Intelligence

This project is indexed by GitNexus as lingma-openai-gateway (1093 symbols, 2685 relationships, 97 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

If any GitNexus tool warns the index is stale, run npx gitnexus analyze in terminal first.

Always Do

MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run gitnexus_impact({target: "symbolName", direction: "upstream"}) and report the blast radius (direct callers, affected processes, risk level) to the user.
MUST run gitnexus_detect_changes() before committing to verify your changes only affect expected symbols and execution flows.
MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
When exploring unfamiliar code, use gitnexus_query({query: "concept"}) to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use gitnexus_context({name: "symbolName"}).

Never Do

NEVER edit a function, class, or method without first running gitnexus_impact on it.
NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
NEVER rename symbols with find-and-replace — use gitnexus_rename which understands the call graph.
NEVER commit changes without running gitnexus_detect_changes() to check affected scope.

Resources

Resource	Use for
`gitnexus://repo/lingma-openai-gateway/context`	Codebase overview, check index freshness
`gitnexus://repo/lingma-openai-gateway/clusters`	All functional areas
`gitnexus://repo/lingma-openai-gateway/processes`	All execution flows
`gitnexus://repo/lingma-openai-gateway/process/{name}`	Step-by-step execution trace

CLI

Task	Read this skill file
Understand architecture / "How does X work?"	`.claude/skills/gitnexus/gitnexus-exploring/SKILL.md`
Blast radius / "What breaks if I change X?"	`.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md`
Trace bugs / "Why is X failing?"	`.claude/skills/gitnexus/gitnexus-debugging/SKILL.md`
Rename / extract / split / refactor	`.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md`
Tools, resources, schema reference	`.claude/skills/gitnexus/gitnexus-guide/SKILL.md`
Index, status, clean, wiki CLI commands	`.claude/skills/gitnexus/gitnexus-cli/SKILL.md`

11 KiB Raw Blame History

CLAUDE.md

Primary docs to read first

Common development commands

Start locally

Start with Docker Compose

Run tests

Smoke-check running gateway

Linting/type-checking status

Architecture (big picture)

What this service is

Request lifecycle (important for most changes)

Core module boundaries

Protocol-specific notes

Operational invariants to preserve

Deployment/runtime model

CLAUDE.md

1. Think Before Coding

2. Simplicity First

3. Surgical Changes

4. Goal-Driven Execution

CLAUDE.md

1. Think Before Coding

2. Simplicity First

3. Surgical Changes

4. Goal-Driven Execution

GitNexus — Code Intelligence

Always Do

Never Do

Resources

CLI

11 KiB

Raw Blame History