Commit Graph

45 Commits

Author SHA1 Message Date
GitHub Actions
e3d3a63492 refactor: extract OpenAI Responses route wrapper
Keep app.main.v1_responses as the compatibility entrypoint while moving the Responses wrapper and SSE bridge into a dedicated module. This reduces app/main.py without changing the existing Responses behavior or test patch points.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 10:13:49 +08:00
GitHub Actions
b479294af4 refactor: share streaming tool event normalization
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:07:44 +08:00
GitHub Actions
aac6e2785d refactor: share non-stream tool event normalization
Deduplicate allowlist filtering and forced-tool fallback parsing across the OpenAI and Anthropic non-stream bridge paths while preserving existing wire behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 07:53:26 +08:00
GitHub Actions
5a7553b35b refactor: share execution prep for tool-call phase
Keep the current tool-call bridge contract stable while extracting shared
execution setup and tightening Anthropic forwarding regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 07:39:33 +08:00
mmc
4748432501 fix: run bootstrap via module to avoid stdlib http shadowing
Switch container startup from file execution to module execution so
urllib can import stdlib http.client reliably.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 13:57:44 +08:00
mmc
83d69097c9 fix: enable tool forwarding by default and add config regression tests
Switch TOOL_FORWARD_ENABLED default to true in runtime config and .env.example,
and add regression tests covering default-on and explicit false behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 13:41:41 +08:00
GitHub Actions
0e146e60d9 refactor: extract Phase 1 gateway helpers
Move tool bridge and responses adapter helpers out of app.main so the main entrypoint can shrink without changing route orchestration behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 08:05:09 +08:00
mmc
d0df089282 fix: harden responses streaming and tool-call fallback
Ensure /v1/responses streams always terminate with response.completed and normalize Lingma tool_code fallbacks into structured tool calls, including single-argument forms.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 19:24:02 +08:00
mmc
866a212573 fix: restore proper SSE frame delimiters
Emit real newline-delimited SSE frames for /v1/responses so clients can parse response.completed before the stream closes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 15:08:16 +08:00
mmc
5e6c1c1a63 fix: harden responses stream termination
Ensure /v1/responses streaming always emits completion frames on upstream EOF, errors, and cancellation, and add targeted diagnostics for interrupted Lingma streams.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 14:55:32 +08:00
GitHub Actions
12a4d9584e feat: harden cache reuse semantics and expand protocol regressions
Stabilize cross-protocol ask-mode/streaming behavior and reduce session-reuse branch collisions, then add focused docs/tests for multimodal normalization and pool/stats/config paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 14:26:11 +08:00
GitHub Actions
b96b91e5b7 test: add baseline gateway regression suites
Add focused unittest coverage for auth/concurrency, schema normalization, and session-cache tooling behavior, and ignore local .gitnexus index artifacts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 13:25:36 +08:00
GitHub Actions
c08dea89a2 fix: ensure responses stream always completes
Emit a fallback response.completed and [DONE] when upstream SSE closes early so OpenAI /v1/responses clients do not fail on incomplete streams.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 13:23:43 +08:00
GitHub Actions
c9bd71f727 feat: add OpenAI /v1/responses adapter via chat flow
Implement a thin responses layer that reuses existing chat/completions execution so auth, pooling, streaming, tool passthrough, and error semantics stay aligned across APIs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 13:11:00 +08:00
GitHub Actions
56c57a4901 docs: sync DESIGN with current tooling behavior
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 08:31:45 +08:00
GitHub Actions
df80a86310 docs: refocus README on quickstart and runbook flow
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 08:11:00 +08:00
GitHub Actions
15cd5e8770 fix: close forced tool-choice with structured fallback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 07:18:01 +08:00
GitHub Actions
63583712a8 fix: fallback agent payload source to numeric value
Keep Lingma chat/ask payload source as numeric 1 for agent mode A/B validation against remote upstream timeout behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 06:36:07 +08:00
GitHub Actions
c67a9c3d61 fix: align agent payload semantics with VSCode tool flow
Force OpenAI tooling-context requests into agent mode and align Lingma ask payload fields for agent requests so server-side tool path matches VSCode semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 23:19:52 +08:00
GitHub Actions
e208025f35 fix: emit Lingma tool approve/invoke roundtrip
Forward tool/call/sync and tool/invoke events to Lingma with auto-approve and invokeResult so tool calls can complete end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 21:35:05 +08:00
GitHub Actions
3498b81fa2 fix: enable anthropic agent mode for tooling requests
Use agent ask_mode for Anthropic messages with tooling context so tool/write flows are executed, and add regression coverage plus docs/env updates for TOOL_FORWARD_ENABLED.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 20:15:14 +08:00
GitHub Actions
e600bae27c fix: harden tooling session reuse and event routing
Ensure session reuse is disabled for tooling contexts, include tool config in cache keys, and stabilize tool event merge/routing with expanded bridge tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.1.1
2026-04-19 19:29:30 +08:00
GitHub Actions
5aa7fbfae5 fix: align Lingma tool event lifecycle handling
Handle tool/invokeResult and richer tool/call/sync payloads in the client,
and document/retest the verified VSCode monitoring workflow for tool events.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 09:49:01 +08:00
GitHub Actions
1c7b86e2c0 feat: bridge Lingma tool events to OpenAI/Anthropic responses
Add structured tool event propagation from Lingma stream/finish metadata and map it to OpenAI tool_calls and Anthropic tool_use/tool_result in both streaming and non-streaming responses. Add focused bridge tests and update docs/design notes to match current behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 22:34:43 +08:00
GitHub Actions
b3fd8800f7 fix: align Anthropic endpoints for Claude Code compatibility
Add /v1/messages/count_tokens and switch /v1/models to Anthropic-style key auth so Claude Code probes succeed consistently.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.1.0
2026-04-18 20:05:24 +08:00
GitHub Actions
0b08dc6573 feat: Anthropic Messages API compat (/v1/messages)
Add a wire-compatible Anthropic endpoint alongside the existing OpenAI one
so Claude Code / anthropic-sdk / Cursor Agent can hit Lingma directly.

- app/anthropic_schema.py (new): request model + content-block flattener
  + internal-messages adapter + affinity key helper. Handles text / image /
  tool_use / tool_result blocks; unknown types degrade gracefully.
- app/auth.py: add require_anthropic_key (x-api-key, Bearer fallback)
  and AnthropicAuthError so auth failures render in Anthropic's error
  envelope instead of FastAPI's {detail:...} wrapper.
- app/main.py: POST /v1/messages. Shares LingmaPool / SessionCache /
  InFlightGuard / StatsCollector with the OpenAI path — same api_key +
  same conversation prefix hits the same upstream sessionId across both
  protocols (KV cache carries over). Streaming emits the named Anthropic
  event sequence (message_start / content_block_start / content_block_delta
  / content_block_stop / message_delta / message_stop). No claude-*
  model mapping table: resolve_model's default fallback handles it.
- README.md / DESIGN.md: document the new endpoint, add decision 5.12,
  iteration history M5, and a 4.3b streaming flow diagram.
- Bump FastAPI app version to 0.4.0.

Made-with: Cursor
2026-04-18 15:40:43 +08:00
GitHub Actions
d9dffbb8ba docs: restructure README + add DESIGN.md (二开白盒手册)
README 重写为分层结构:架构速览 + 快速开始 + 按主题分组的配置表 +
API 参考 + 常用场景 + 升级注意 + 故障排查 + 二开入口。相比旧版:
更好导航,破坏性改动显式标注升级路径,故障排查能覆盖生产常见坑。

DESIGN.md 是全新的工程手册,覆盖:项目目标/非目标、组件数据流、
模块职责表、6 个核心流程的 ASCII 图解(启动、非流式/流式 chat、
子进程 + LSP、bundle、自动登录、关闭)、11 条关键设计决策
(每条带问题/方案/权衡/未选其他方案原因)、扩展指引(常见需求 → 改哪些文件)、
已知问题 / TODO、完整迭代历程(M1~M4 + M3 性能 bug 根因)、
Lingma LSP 协议速查。

目标:新成员或几个月后的自己能在一天内理清全项目。

Made-with: Cursor
2026-04-18 10:36:17 +08:00
GitHub Actions
2febc37c2c prod hardening: admin/metrics authz split, subprocess lifecycle, parallel pool start, HEALTHCHECK
- authz: new ADMIN_TOKEN gates /internal/*; METRICS_PUBLIC=false by default, so
  /metrics returns 503 when neither METRICS_TOKEN nor API_KEYS is set
  (previously leaked pool topology). Startup logs loudly if API_KEYS is empty
  or admin falls back to chat keys.
- lingma_client: keep a Popen handle instead of orphaning Lingma with
  start_new_session, drain stderr to logger at DEBUG, SIGTERM -> 5s grace ->
  SIGKILL on shutdown. Fixes the zombie-process leak on container reload.
- pool: asyncio.gather to start N instances concurrently; N=2 pool shaves
  ~startup_timeout seconds off boot.
- Dockerfile: HEALTHCHECK hits /healthz and greps for pool_ready>0 so Docker
  / compose orchestrators see "stuck on login" as unhealthy.

Made-with: Cursor
2026-04-18 10:22:13 +08:00
GitHub Actions
3130533888 chore: wire read-only secrets/ volume for session bundles
Mounts ./secrets to /secrets:ro so LINGMA_SESSION_BUNDLE_FILE can point
at a host-managed file without the bundle ever being baked into the
image or committed to git. secrets/ is git-ignored except for .gitkeep
so the directory exists on fresh clones.

Made-with: Cursor
2026-04-18 09:47:03 +08:00
GitHub Actions
4e08d1af36 feat: session bundle import/export to skip Playwright auto-login
Adds a lightweight way to pre-seed a Lingma workDir with an existing
logged-in session:

- New module session_bundle.py packs/unpacks only the four cache files
  that make up a Lingma login (id, user, quota, config.json). Everything
  else (db, logs, index, diagnosis) stays local so bundles stay tiny
  and never leak session-specific artefacts.
- Safety: path-traversal/symlink members are rejected; size is capped;
  refuses to export from a workDir that isn't actually logged in;
  sensitive cache/user is chmod'd 0600 on restore.
- LingmaAccount gains optional session_bundle_b64 / session_bundle_file;
  LINGMA_SESSION_BUNDLE[_FILE] env provide the singleton fallback.
  Credentials become optional when a bundle is supplied.
- LingmaPool.start() restores the bundle into each instance workDir
  only if it isn't already logged in, so persistent volumes aren't
  clobbered and a corrupt bundle falls back to Playwright gracefully.
- POST /internal/session/export returns the bundle as base64; ?instance=
  selects a specific pool instance. Requires an authed, already-logged-in
  instance to prevent exporting empties.
- README + .env.example document the end-to-end flow.

Made-with: Cursor
2026-04-18 09:39:58 +08:00
GitHub Actions
ba865f3be0 feat: expose /internal/models/raw for authoritative model metadata
Lets callers see Lingma's raw config/queryModels response, so the
official per-key displayName/description is discoverable without
reverse-engineering the VSIX. Falls back to the pool's pick() unless
a specific instance is requested.

Made-with: Cursor
2026-04-18 09:29:11 +08:00
GitHub Actions
dfdb7087dc perf: session reuse for multi-turn latency
- Add SessionCache (LRU + TTL, per-API-key scoped) mapping
  conversation-prefix hash -> upstream Lingma sessionId.
- Hash only user/system/developer turns so client-side
  assistant reformatting doesn't invalidate the key.
- On cache hit: reuse sessionId, send only the latest user
  message with isReply=true, and stick the request to the
  instance that originally served it.
- LingmaGatewayClient.chat_complete/chat_stream accept
  session_id/is_reply and report the real finish.sessionId
  via out_meta so we persist what Lingma actually allocated.
- Invalidate cache on non-stream failure; skip writes on
  cancelled/partial streams.
- Expose cache stats in /internal/stats and /metrics.
- Configurable via SESSION_REUSE_ENABLED / SESSION_CACHE_MAX_ENTRIES
  / SESSION_CACHE_TTL_SEC (documented in README + .env.example).

Made-with: Cursor
2026-04-18 08:10:39 +08:00
GitHub Actions
d209d8ac0b perf: stop blocking on chat/ask RPC timeout (fixes ~30s TTFB)
Lingma streams answers via chat/answer + chat/finish notifications and
never sends a JSON-RPC response for chat/ask. The old code awaited
rpc.request("chat/ask") and swallowed the TimeoutError, so every chat
was forced to wait the full rpc_timeout (default 30s) before draining
the stream queue - even though the first token was already present in
the queue within ~2s.

Effect:
- non-stream TTFB dropped from ~30s to actual upstream latency (~2-3s).
- stream first-chunk dropped from ~30s to upstream first-token latency.
- consume_stream idle timeout decoupled from rpc_timeout so shortening
  rpc_timeout no longer starves long completions.

Switch chat/ask to rpc.notify (fire-and-forget) and rely entirely on the
existing chat/answer + chat/finish handlers for result delivery.

Made-with: Cursor
2026-04-18 07:54:45 +08:00
GitHub Actions
707acc9005 feat: M1+M2 gateway hardening and multi-instance pool
Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
  client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
  reconnecting/failed/closed) and exponential backoff, so a Lingma
  restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
  triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
  propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
  defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
  added `developer`/`function` roles, `tools/tool_choice/stream_options/
  user/max_tokens` fields, and `stream_options.include_usage` emits final
  usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
  Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.

Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
  accounting, `BackpressureRejected` raises 429 + `Retry-After` once
  `GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
  from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
  accepted_total/rejected_total/max_in_flight.

Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
  port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
  `LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
  backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
  least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
  `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
  `/internal/auto-login/status` lists all instances.

Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
  `GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.

Made-with: Cursor
2026-04-18 07:40:32 +08:00
root
6114c66aed chore: remove Gitea CI workflow 2026-04-17 17:38:03 +08:00
root
640bf3d6b4 ci: use self-hosted runner and simplify python steps
Some checks failed
CI / lint-and-compile (pull_request) Has been cancelled
CI / lint-and-compile (push) Has been cancelled
2026-04-17 17:36:49 +08:00
root
4e5f451489 chore: simplify build to fixed Tencent pip mirror
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 16:29:18 +08:00
root
d2995b2c48 chore: add auto pip mirror selection by region
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 16:20:32 +08:00
root
c1e261aa14 refactor: move runtime state under project data directory
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:57:51 +08:00
root
e41ee8bcc8 fix: treat 200 login API response as provisional success
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:43:57 +08:00
root
d12668201f fix: capture login API via response event listener
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:40:25 +08:00
root
0c9fdd53c9 fix: verify /users/ajax/login success in auto login flow
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:31:02 +08:00
root
5f0c1866a6 fix: harden auto-login selectors and poll auth status
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 14:40:28 +08:00
root
b621c4aca7 feat: bootstrap Lingma from latest marketplace VSIX
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 10:44:37 +08:00
root
5526779e98 chore: initialize clean history without secrets
Some checks failed
CI / lint-and-compile (push) Has been cancelled
2026-04-17 09:56:08 +08:00