feat: M1+M2 gateway hardening and multi-instance pool

Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
  client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
  reconnecting/failed/closed) and exponential backoff, so a Lingma
  restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
  triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
  propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
  defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
  added `developer`/`function` roles, `tools/tool_choice/stream_options/
  user/max_tokens` fields, and `stream_options.include_usage` emits final
  usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
  Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.

Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
  accounting, `BackpressureRejected` raises 429 + `Retry-After` once
  `GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
  from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
  accepted_total/rejected_total/max_in_flight.

Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
  port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
  `LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
  backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
  least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
  `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
  `/internal/auto-login/status` lists all instances.

Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
  `GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.

Made-with: Cursor
This commit is contained in:
GitHub Actions
2026-04-18 07:40:32 +08:00
parent 6114c66aed
commit 707acc9005
11 changed files with 1360 additions and 222 deletions

View File

@@ -4,6 +4,15 @@ HOST=0.0.0.0
PORT=8317 PORT=8317
# API Key可配置多个逗号分隔 # API Key可配置多个逗号分隔
API_KEYS=sk-your-api-key API_KEYS=sk-your-api-key
# 独立的 /metrics 鉴权 token留空则退化为 API_KEYS 也可访问;若连 API_KEYS 都没配,/metrics 为公开)
METRICS_TOKEN=
# 日志级别DEBUG / INFO / WARNING / ERROR
LOG_LEVEL=INFO
# /v1/chat/completions 并发上限(<=0 表示不限流)
GATEWAY_MAX_IN_FLIGHT=4
# 排队等待超时秒数,超过后返回 429 + Retry-After
GATEWAY_QUEUE_TIMEOUT_SEC=30
# 容器内 Lingma 二进制路径 # 容器内 Lingma 二进制路径
LINGMA_BIN=/app/data/bin/Lingma LINGMA_BIN=/app/data/bin/Lingma
@@ -45,7 +54,16 @@ AUTO_LOGIN_TIMEOUT=180
# 自动登录重试次数 # 自动登录重试次数
AUTO_LOGIN_MAX_RETRY=2 AUTO_LOGIN_MAX_RETRY=2
# Lingma 登录用户名 # Lingma 登录用户名(仅当 LINGMA_ACCOUNTS 为空时生效,单实例模式)
LINGMA_USERNAME= LINGMA_USERNAME=
# Lingma 登录密码 # Lingma 登录密码(仅当 LINGMA_ACCOUNTS 为空时生效)
LINGMA_PASSWORD= LINGMA_PASSWORD=
# ==== 多实例池(方案乙:多账号) ====
# 多账号列表,支持两种格式:
# CSV: user1:pass1,user2:pass2
# JSON: [{"username":"u1","password":"p1"},{"username":"u2","password":"p2"}]
# 配置后每个账号对应一个独立 Lingma 实例(独立 workDir + 独立自动登录)
LINGMA_ACCOUNTS=
# 实例数量:默认等于 LINGMA_ACCOUNTS 数;显式指定时账号不足会循环复用并打 warning
LINGMA_INSTANCE_COUNT=

2
.gitignore vendored
View File

@@ -3,7 +3,5 @@ __pycache__/
*.pyc *.pyc
bin/ bin/
runtime-bin/ runtime-bin/
data/
!data/
data/* data/*
!data/.gitkeep !data/.gitkeep

View File

@@ -64,6 +64,12 @@ cp .env.example .env
- `AUTO_LOGIN_MAX_RETRY`:自动登录重试次数 - `AUTO_LOGIN_MAX_RETRY`:自动登录重试次数
- `LINGMA_USERNAME`Lingma 登录用户名 - `LINGMA_USERNAME`Lingma 登录用户名
- `LINGMA_PASSWORD`Lingma 登录密码 - `LINGMA_PASSWORD`Lingma 登录密码
- `METRICS_TOKEN``/metrics` 独立鉴权 token留空则 `API_KEYS` 也可访问;两者都留空时 `/metrics` 为公开)
- `LOG_LEVEL`:日志级别(默认 `INFO`,输出结构化 JSON包含 `request_id`
- `GATEWAY_MAX_IN_FLIGHT``/v1/chat/completions` 并发上限(默认 4`<=0` 表示不限流)
- `GATEWAY_QUEUE_TIMEOUT_SEC`:排队等待超时秒数(默认 30超过后直接 429 + `Retry-After`
- `LINGMA_ACCOUNTS`:多账号实例池,格式 `u1:p1,u2:p2` 或 JSON 数组;配置后每个账号起一个独立 Lingma 子进程
- `LINGMA_INSTANCE_COUNT`:实例数(默认等于账号数;显式指定且不足时账号会循环复用)
### `.env` 最小必填示例 ### `.env` 最小必填示例
@@ -85,7 +91,18 @@ DEDICATED_DOMAIN_URL=
- 本项目所有持久化数据都在 `./data` - 本项目所有持久化数据都在 `./data`
- `data/bin/Lingma`:自动提取的 Lingma 二进制 - `data/bin/Lingma`:自动提取的 Lingma 二进制
- `data/.lingma/...`Lingma 登录态、缓存、日志 - `data/.lingma/...`Lingma 登录态、缓存、日志(单实例模式)
- `data/.lingma/pool/inst-<i>/...`:多实例模式下每个实例独立的登录态/缓存
### 多实例池(方案乙:多账号)
启用方式:在 `.env` 里配置 `LINGMA_ACCOUNTS=u1:p1,u2:p2`,重启容器即可。
- 每个账号对应一个独立 Lingma 子进程,各自独立登录、独立 workDir。
- 路由策略:同一 `user` 字段或同一 system prompt 的请求粘性路由到同一实例;其余按 least-in-flight 分配。
- 一个实例挂了/断连不影响整体,`/healthz` 汇报 `pool_ready` 计数。
- `/internal/stats.pool` 按实例粒度暴露状态,`/metrics` 增加 `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`
- 未配置 `LINGMA_ACCOUNTS` 时自动退化为单实例模式(沿用 `LINGMA_USERNAME/LINGMA_PASSWORD`),向下兼容。
## 3. Docker 运行 ## 3. Docker 运行
@@ -163,13 +180,16 @@ curl -s http://127.0.0.1:8317/internal/stats \
``` ```
```bash ```bash
curl -s http://127.0.0.1:8317/metrics curl -s http://127.0.0.1:8317/metrics \
-H "Authorization: Bearer ${METRICS_TOKEN:-sk-your-api-key}"
``` ```
说明: 说明:
- `usage.prompt_tokens/completion_tokens` 为估算值(按字节近似换算)。 - `usage.prompt_tokens/completion_tokens` 为估算值(按字节近似换算)。
- 非流式响应里会附带 `usage` 字段。 - 非流式响应里会附带 `usage` 字段。
- 流式响应可传 `stream_options: {"include_usage": true}` 让最后一帧返回 `usage`
- `/metrics` 默认需要 Bearer 鉴权:优先匹配 `METRICS_TOKEN`,否则接受 `API_KEYS` 里任意一个;两者都未配置时保持公开。
## 6. 容器内自动登录 ## 6. 容器内自动登录

View File

@@ -1,12 +1,11 @@
from __future__ import annotations from __future__ import annotations
import hmac
from fastapi import HTTPException, Request, status from fastapi import HTTPException, Request, status
def require_bearer(request: Request, api_keys: list[str]) -> None: def _extract_bearer(request: Request) -> str:
if not api_keys:
return
auth = request.headers.get("authorization", "") auth = request.headers.get("authorization", "")
if not auth.startswith("Bearer "): if not auth.startswith("Bearer "):
raise HTTPException( raise HTTPException(
@@ -19,9 +18,22 @@ def require_bearer(request: Request, api_keys: list[str]) -> None:
} }
}, },
) )
return auth[len("Bearer ") :].strip()
token = auth[len("Bearer ") :].strip()
if token not in api_keys: def _match_any(token: str, candidates: list[str]) -> bool:
for c in candidates:
if c and hmac.compare_digest(token, c):
return True
return False
def require_bearer(request: Request, api_keys: list[str]) -> None:
# Empty api_keys means auth is disabled (keeps the old behavior).
if not api_keys:
return
token = _extract_bearer(request)
if not _match_any(token, api_keys):
raise HTTPException( raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED, status_code=status.HTTP_401_UNAUTHORIZED,
detail={ detail={
@@ -32,3 +44,31 @@ def require_bearer(request: Request, api_keys: list[str]) -> None:
} }
}, },
) )
def require_metrics_access(
request: Request, api_keys: list[str], metrics_token: str
) -> None:
"""Allow metrics if any of: METRICS_TOKEN matches, or any API_KEYS match.
If neither METRICS_TOKEN nor API_KEYS are configured, metrics is public
(backwards compatible default).
"""
accepted: list[str] = []
if metrics_token:
accepted.append(metrics_token)
accepted.extend(api_keys)
if not accepted:
return
token = _extract_bearer(request)
if not _match_any(token, accepted):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail={
"error": {
"message": "Invalid metrics token",
"type": "invalid_request_error",
"code": "invalid_api_key",
}
},
)

121
app/concurrency.py Normal file
View File

@@ -0,0 +1,121 @@
from __future__ import annotations
import asyncio
from .logging_config import get_logger
logger = get_logger("lingma_gateway.concurrency")
class BackpressureRejected(Exception):
"""Raised when a request cannot acquire an in-flight slot before timeout."""
def __init__(self, retry_after: float):
super().__init__(f"backpressure rejected, retry_after={retry_after:.1f}s")
self.retry_after = retry_after
class InFlightTicket:
"""Reference-counted handle for a single in-flight slot.
Release is idempotent so callers can defensively `release()` from multiple
cleanup paths (stream finally + outer exception handler) without worrying.
"""
__slots__ = ("_parent", "_released")
def __init__(self, parent: "InFlightGuard | None"):
self._parent = parent
self._released = False
def release(self) -> None:
if self._released or self._parent is None:
self._released = True
return
self._released = True
self._parent._on_release()
async def __aenter__(self) -> "InFlightTicket":
return self
async def __aexit__(self, *_exc) -> None:
self.release()
class InFlightGuard:
"""Async semaphore wrapper with queue/reject accounting and Prometheus hooks.
- `max_in_flight <= 0` disables limiting (back-compat, unlimited).
- `queue_timeout_sec` bounds how long a request may wait for a slot. On
timeout, `try_acquire()` raises `BackpressureRejected`.
"""
def __init__(self, max_in_flight: int, queue_timeout_sec: float):
self.max = max(0, int(max_in_flight))
self.queue_timeout = max(0.0, float(queue_timeout_sec))
self._sem: asyncio.Semaphore | None = (
asyncio.Semaphore(self.max) if self.max > 0 else None
)
self.in_flight = 0
self.queued = 0
self.accepted_total = 0
self.rejected_total = 0
async def try_acquire(self) -> InFlightTicket:
if self._sem is None:
self.in_flight += 1
self.accepted_total += 1
return InFlightTicket(parent=self)
self.queued += 1
try:
if self.queue_timeout <= 0:
await self._sem.acquire()
else:
try:
await asyncio.wait_for(self._sem.acquire(), timeout=self.queue_timeout)
except (asyncio.TimeoutError, TimeoutError):
self.rejected_total += 1
logger.warning(
"backpressure rejected: in_flight=%d queued=%d max=%d",
self.in_flight,
self.queued - 1,
self.max,
)
raise BackpressureRejected(retry_after=self.queue_timeout)
finally:
self.queued -= 1
self.in_flight += 1
self.accepted_total += 1
return InFlightTicket(parent=self)
def _on_release(self) -> None:
self.in_flight -= 1
if self._sem is not None:
self._sem.release()
def stats(self) -> dict:
return {
"max_in_flight": self.max,
"in_flight": self.in_flight,
"queued": self.queued,
"accepted_total": self.accepted_total,
"rejected_total": self.rejected_total,
"queue_timeout_sec": self.queue_timeout,
}
def prometheus_lines(self) -> list[str]:
return [
"# TYPE gateway_in_flight gauge",
f"gateway_in_flight {self.in_flight}",
"# TYPE gateway_queued gauge",
f"gateway_queued {self.queued}",
"# TYPE gateway_max_in_flight gauge",
f"gateway_max_in_flight {self.max}",
"# TYPE gateway_accepted_total counter",
f"gateway_accepted_total {self.accepted_total}",
"# TYPE gateway_rejected_total counter",
f"gateway_rejected_total {self.rejected_total}",
]

View File

@@ -1,8 +1,14 @@
from __future__ import annotations from __future__ import annotations
import json
import os import os
from dataclasses import dataclass from dataclasses import dataclass, field
from pathlib import Path
@dataclass
class LingmaAccount:
username: str
password: str
@dataclass @dataclass
@@ -10,6 +16,10 @@ class Settings:
host: str host: str
port: int port: int
api_keys: list[str] api_keys: list[str]
metrics_token: str
log_level: str
gateway_max_in_flight: int
gateway_queue_timeout_sec: float
lingma_bin: str lingma_bin: str
lingma_work_dir: str lingma_work_dir: str
lingma_socket_port: int lingma_socket_port: int
@@ -22,8 +32,57 @@ class Settings:
auto_login_headless: bool auto_login_headless: bool
auto_login_timeout: int auto_login_timeout: int
auto_login_max_retry: int auto_login_max_retry: int
lingma_username: str accounts: list[LingmaAccount] = field(default_factory=list)
lingma_password: str instance_count: int = 1
def _bool_env(name: str, default: bool) -> bool:
raw = os.getenv(name)
if raw is None:
return default
return raw.strip().lower() in {"1", "true", "yes", "on"}
def _parse_accounts(raw: str) -> list[LingmaAccount]:
"""Parse LINGMA_ACCOUNTS.
Accepted formats:
- JSON array: `[{"username":"u1","password":"p1"},{"username":"u2","password":"p2"}]`
- CSV: `u1:p1,u2:p2`
- Newlines: `u1:p1\nu2:p2`
Whitespace around entries is trimmed. Empty entries are ignored.
Passwords containing ':' are supported (only the first ':' is the separator).
"""
raw = (raw or "").strip()
if not raw:
return []
if raw.startswith("["):
try:
data = json.loads(raw)
except Exception:
return []
out: list[LingmaAccount] = []
if isinstance(data, list):
for item in data:
if isinstance(item, dict):
u = str(item.get("username", "")).strip()
p = str(item.get("password", "")).strip()
if u and p:
out.append(LingmaAccount(u, p))
return out
out: list[LingmaAccount] = []
for entry in raw.replace("\n", ",").split(","):
entry = entry.strip()
if not entry or ":" not in entry:
continue
u, p = entry.split(":", 1)
u, p = u.strip(), p.strip()
if u and p:
out.append(LingmaAccount(u, p))
return out
def load_settings() -> Settings: def load_settings() -> Settings:
@@ -33,10 +92,31 @@ def load_settings() -> Settings:
"LINGMA_WORK_DIR", "LINGMA_WORK_DIR",
"/app/data/.lingma/vscode/sharedClientCache", "/app/data/.lingma/vscode/sharedClientCache",
) )
accounts = _parse_accounts(os.getenv("LINGMA_ACCOUNTS", ""))
if not accounts:
u = os.getenv("LINGMA_USERNAME", "").strip()
p = os.getenv("LINGMA_PASSWORD", "").strip()
if u and p:
accounts.append(LingmaAccount(u, p))
explicit_count = os.getenv("LINGMA_INSTANCE_COUNT", "").strip()
if explicit_count:
try:
instance_count = max(1, int(explicit_count))
except ValueError:
instance_count = len(accounts) or 1
else:
instance_count = max(1, len(accounts)) if accounts else 1
return Settings( return Settings(
host=os.getenv("HOST", "0.0.0.0"), host=os.getenv("HOST", "0.0.0.0"),
port=int(os.getenv("PORT", "8317")), port=int(os.getenv("PORT", "8317")),
api_keys=api_keys, api_keys=api_keys,
metrics_token=os.getenv("METRICS_TOKEN", "").strip(),
log_level=os.getenv("LOG_LEVEL", "INFO").strip() or "INFO",
gateway_max_in_flight=int(os.getenv("GATEWAY_MAX_IN_FLIGHT", "4")),
gateway_queue_timeout_sec=float(os.getenv("GATEWAY_QUEUE_TIMEOUT_SEC", "30")),
lingma_bin=os.getenv("LINGMA_BIN", "/app/data/bin/Lingma"), lingma_bin=os.getenv("LINGMA_BIN", "/app/data/bin/Lingma"),
lingma_work_dir=work_dir, lingma_work_dir=work_dir,
lingma_socket_port=int(os.getenv("LINGMA_SOCKET_PORT", "36510")), lingma_socket_port=int(os.getenv("LINGMA_SOCKET_PORT", "36510")),
@@ -45,10 +125,10 @@ def load_settings() -> Settings:
default_model=os.getenv("DEFAULT_MODEL", "org_auto"), default_model=os.getenv("DEFAULT_MODEL", "org_auto"),
default_ask_mode=os.getenv("DEFAULT_ASK_MODE", "chat"), default_ask_mode=os.getenv("DEFAULT_ASK_MODE", "chat"),
dedicated_domain_url=os.getenv("DEDICATED_DOMAIN_URL", "").strip(), dedicated_domain_url=os.getenv("DEDICATED_DOMAIN_URL", "").strip(),
auto_login_enabled=os.getenv("AUTO_LOGIN_ENABLED", "true").lower() in {"1", "true", "yes", "on"}, auto_login_enabled=_bool_env("AUTO_LOGIN_ENABLED", True),
auto_login_headless=os.getenv("AUTO_LOGIN_HEADLESS", "true").lower() in {"1", "true", "yes", "on"}, auto_login_headless=_bool_env("AUTO_LOGIN_HEADLESS", True),
auto_login_timeout=int(os.getenv("AUTO_LOGIN_TIMEOUT", "180")), auto_login_timeout=int(os.getenv("AUTO_LOGIN_TIMEOUT", "180")),
auto_login_max_retry=int(os.getenv("AUTO_LOGIN_MAX_RETRY", "2")), auto_login_max_retry=int(os.getenv("AUTO_LOGIN_MAX_RETRY", "2")),
lingma_username=os.getenv("LINGMA_USERNAME", "").strip(), accounts=accounts,
lingma_password=os.getenv("LINGMA_PASSWORD", "").strip(), instance_count=instance_count,
) )

View File

@@ -9,10 +9,23 @@ import subprocess
import time import time
import uuid import uuid
from pathlib import Path from pathlib import Path
from typing import AsyncIterator from typing import AsyncIterator, Callable, Optional
import websockets import websockets
from .logging_config import get_logger
logger = get_logger("lingma_gateway.client")
# Some callers live on Python 3.10 where asyncio.TimeoutError is a distinct class,
# while 3.11+ unifies it with the builtin TimeoutError. Always catch both.
TIMEOUT_EXCEPTIONS: tuple[type[BaseException], ...] = (
asyncio.TimeoutError,
TimeoutError,
)
def _is_port_open(host: str, port: int, timeout_sec: float = 0.5) -> bool: def _is_port_open(host: str, port: int, timeout_sec: float = 0.5) -> bool:
try: try:
@@ -79,23 +92,37 @@ def _parse_lsp_frames(buf: bytes):
class LspWsRpcClient: class LspWsRpcClient:
def __init__(self, ws): def __init__(self, ws, on_disconnect: Optional[Callable[[BaseException], None]] = None):
self.ws = ws self.ws = ws
self._id = 1 self._id = 1
self._pending: dict[int, asyncio.Future] = {} self._pending: dict[int, asyncio.Future] = {}
self._send_lock = asyncio.Lock() self._send_lock = asyncio.Lock()
self._reader_task = None self._reader_task: asyncio.Task | None = None
self._rx_buffer = b"" self._rx_buffer = b""
self._chat_streams: dict[str, dict] = {} self._chat_streams: dict[str, dict] = {}
self._on_disconnect = on_disconnect
self._closed = False
async def start(self): async def start(self):
self._reader_task = asyncio.create_task(self._reader_loop()) self._reader_task = asyncio.create_task(self._reader_loop())
async def close(self): async def close(self):
self._closed = True
if self._reader_task: if self._reader_task:
self._reader_task.cancel() self._reader_task.cancel()
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
await self._reader_task await self._reader_task
# Abort any pending futures so callers fail fast instead of hanging.
for fut in self._pending.values():
if not fut.done():
fut.set_exception(ConnectionError("lingma client closed"))
self._pending.clear()
# Signal open streams to terminate.
for stream in self._chat_streams.values():
if not stream["done"].is_set():
stream["done"].set()
stream["chunks"].put_nowait(None)
self._chat_streams.clear()
async def _send(self, payload: dict): async def _send(self, payload: dict):
async with self._send_lock: async with self._send_lock:
@@ -127,10 +154,23 @@ class LspWsRpcClient:
except asyncio.CancelledError: except asyncio.CancelledError:
pass pass
except Exception as exc: except Exception as exc:
if not self._closed:
logger.warning("lingma reader loop terminated: %s", exc)
# Propagate failure to anyone waiting on an RPC.
for fut in self._pending.values(): for fut in self._pending.values():
if not fut.done(): if not fut.done():
fut.set_exception(exc) fut.set_exception(exc)
self._pending.clear() self._pending.clear()
# Also unblock any in-flight chat streams so consumers exit.
for stream in self._chat_streams.values():
if not stream["done"].is_set():
stream["done"].set()
stream["chunks"].put_nowait(None)
if not self._closed and self._on_disconnect is not None:
try:
self._on_disconnect(exc)
except Exception:
logger.exception("on_disconnect callback failed")
async def _handle_server_message(self, msg: dict): async def _handle_server_message(self, msg: dict):
method = msg.get("method") method = msg.get("method")
@@ -168,7 +208,7 @@ class LspWsRpcClient:
await self._send(payload) await self._send(payload)
try: try:
msg = await asyncio.wait_for(fut, timeout=timeout) msg = await asyncio.wait_for(fut, timeout=timeout)
except TimeoutError: except TIMEOUT_EXCEPTIONS:
self._pending.pop(rid, None) self._pending.pop(rid, None)
raise TimeoutError(f"RPC timeout: {method}") raise TimeoutError(f"RPC timeout: {method}")
if "error" in msg: if "error" in msg:
@@ -189,8 +229,20 @@ class LspWsRpcClient:
"finish_at": None, "finish_at": None,
} }
def pop_stream(self, request_id: str) -> None:
stream = self._chat_streams.pop(request_id, None)
if stream is None:
return
# Drain queue so no stray future gets stuck if the consumer bailed early.
if not stream["done"].is_set():
stream["done"].set()
with contextlib.suppress(Exception):
stream["chunks"].put_nowait(None)
async def consume_stream(self, request_id: str, timeout: float) -> AsyncIterator[str]: async def consume_stream(self, request_id: str, timeout: float) -> AsyncIterator[str]:
stream = self._chat_streams[request_id] stream = self._chat_streams.get(request_id)
if stream is None:
return
start = time.monotonic() start = time.monotonic()
while True: while True:
remain = timeout - (time.monotonic() - start) remain = timeout - (time.monotonic() - start)
@@ -218,6 +270,19 @@ class LspWsRpcClient:
class LingmaGatewayClient: class LingmaGatewayClient:
"""Owns the Lingma subprocess and the LSP-over-WS connection.
Adds a small state machine + reconnect loop so the gateway can survive Lingma
restarts and slow cold starts without bringing down the FastAPI app.
"""
STATE_STOPPED = "stopped"
STATE_STARTING = "starting"
STATE_READY = "ready"
STATE_RECONNECTING = "reconnecting"
STATE_FAILED = "failed"
STATE_CLOSED = "closed"
def __init__( def __init__(
self, self,
lingma_bin: str, lingma_bin: str,
@@ -227,7 +292,11 @@ class LingmaGatewayClient:
rpc_timeout: int, rpc_timeout: int,
default_model: str, default_model: str,
default_ask_mode: str, default_ask_mode: str,
*,
name: str = "lingma",
extra_info_paths: list[Path] | None = None,
): ):
self.name = name
self.lingma_bin = Path(lingma_bin) self.lingma_bin = Path(lingma_bin)
self.work_dir = Path(work_dir) self.work_dir = Path(work_dir)
self.socket_port = socket_port self.socket_port = socket_port
@@ -235,19 +304,115 @@ class LingmaGatewayClient:
self.rpc_timeout = rpc_timeout self.rpc_timeout = rpc_timeout
self.default_model = default_model self.default_model = default_model
self.default_ask_mode = default_ask_mode self.default_ask_mode = default_ask_mode
# Each pool instance should only look at its own workDir .info to avoid
# cross-instance clobbering via the shared ~/.lingma/.info path.
if extra_info_paths is None:
extra_info_paths = [Path.home() / ".lingma" / ".info"]
self._extra_info_paths = list(extra_info_paths)
self._rpc: LspWsRpcClient | None = None self._rpc: LspWsRpcClient | None = None
self._ws = None self._ws = None
self._state = self.STATE_STOPPED
self._state_lock = asyncio.Lock()
self._ready_event = asyncio.Event()
self._reconnect_task: asyncio.Task | None = None
self._last_error: str = ""
# ------------------------------------------------------------------ state
@property
def state(self) -> str:
return self._state
@property
def last_error(self) -> str:
return self._last_error
def _set_state(self, state: str, err: str = "") -> None:
if state != self._state:
logger.info("lingma client state %s -> %s", self._state, state, extra={"ctx_new_state": state})
self._state = state
if err:
self._last_error = err
if state == self.STATE_READY:
self._ready_event.set()
else:
self._ready_event.clear()
# -------------------------------------------------------------- lifecycle
async def start(self) -> None:
"""Initial start. Failure is non-fatal: ensure_ready() will retry later."""
try:
await self._connect(initial=True)
except Exception as exc:
self._set_state(self.STATE_FAILED, err=str(exc))
logger.exception("initial lingma start failed; will retry on demand")
async def close(self) -> None:
self._set_state(self.STATE_CLOSED)
if self._reconnect_task and not self._reconnect_task.done():
self._reconnect_task.cancel()
with contextlib.suppress(Exception):
await self._reconnect_task
if self._rpc:
await self._rpc.close()
if self._ws:
with contextlib.suppress(Exception):
await self._ws.close()
async def ensure_ready(self, timeout: float | None = None) -> None:
"""Block until the RPC connection is usable, (re)connecting on demand."""
if self._state == self.STATE_CLOSED:
raise RuntimeError("lingma client is closed")
if self._state == self.STATE_READY and self._ws is not None:
return
async with self._state_lock:
if self._state == self.STATE_READY and self._ws is not None:
return
if self._state in (self.STATE_STOPPED, self.STATE_FAILED):
try:
await self._connect(initial=False)
return
except Exception as exc:
self._set_state(self.STATE_FAILED, err=str(exc))
raise
wait_timeout = timeout if timeout is not None else max(
30.0, float(self.startup_timeout) + 10.0
)
try:
await asyncio.wait_for(self._ready_event.wait(), timeout=wait_timeout)
except TIMEOUT_EXCEPTIONS:
raise RuntimeError(f"lingma not ready (state={self._state}, err={self._last_error})")
# --------------------------------------------------------------- connect
async def _connect(self, *, initial: bool) -> None:
self._set_state(self.STATE_STARTING)
async def start(self):
if not self.lingma_bin.exists(): if not self.lingma_bin.exists():
raise FileNotFoundError(f"Lingma not found: {self.lingma_bin}") raise FileNotFoundError(f"Lingma not found: {self.lingma_bin}")
if not _is_port_open("127.0.0.1", self.socket_port):
info_paths = [self.work_dir / ".info", *self._extra_info_paths]
# socket_port <= 0 is the pool-friendly "always spawn and read .info" mode.
port_prewarmed = self.socket_port > 0 and _is_port_open(
"127.0.0.1", self.socket_port
)
if not port_prewarmed:
self.work_dir.mkdir(parents=True, exist_ok=True) self.work_dir.mkdir(parents=True, exist_ok=True)
# Remove stale info files from host-mounted workspace before boot. # Remove stale info files from host-mounted workspace before boot.
for p in [self.work_dir / ".info", Path.home() / ".lingma" / ".info"]: for p in info_paths:
with contextlib.suppress(Exception): with contextlib.suppress(Exception):
if p.exists(): if p.exists():
p.unlink() p.unlink()
logger.info(
"[%s] spawning lingma: %s start --workDir %s",
self.name,
self.lingma_bin,
self.work_dir,
)
subprocess.Popen( subprocess.Popen(
[str(self.lingma_bin), "start", "--workDir", str(self.work_dir)], [str(self.lingma_bin), "start", "--workDir", str(self.work_dir)],
cwd=str(self.lingma_bin.parent), cwd=str(self.lingma_bin.parent),
@@ -255,13 +420,9 @@ class LingmaGatewayClient:
stderr=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
start_new_session=True, start_new_session=True,
) )
info, _, _ = _wait_info_any( info, _, _ = _wait_info_any(info_paths, timeout_sec=self.startup_timeout)
[self.work_dir / ".info", Path.home() / ".lingma" / ".info"],
timeout_sec=self.startup_timeout,
)
self.socket_port = info self.socket_port = info
# Wait for socket to actually become connectable.
deadline = time.time() + self.startup_timeout deadline = time.time() + self.startup_timeout
while time.time() < deadline: while time.time() < deadline:
if _is_port_open("127.0.0.1", self.socket_port, timeout_sec=0.3): if _is_port_open("127.0.0.1", self.socket_port, timeout_sec=0.3):
@@ -270,9 +431,19 @@ class LingmaGatewayClient:
else: else:
raise TimeoutError(f"Lingma socket not open on port {self.socket_port}") raise TimeoutError(f"Lingma socket not open on port {self.socket_port}")
# Close any stale ws/rpc before creating fresh ones (reconnect path).
if self._rpc is not None:
with contextlib.suppress(Exception):
await self._rpc.close()
self._rpc = None
if self._ws is not None:
with contextlib.suppress(Exception):
await self._ws.close()
self._ws = None
ws_url = f"ws://127.0.0.1:{self.socket_port}" ws_url = f"ws://127.0.0.1:{self.socket_port}"
self._ws = await websockets.connect(ws_url, max_size=10 * 1024 * 1024) self._ws = await websockets.connect(ws_url, max_size=10 * 1024 * 1024)
self._rpc = LspWsRpcClient(self._ws) self._rpc = LspWsRpcClient(self._ws, on_disconnect=self._on_disconnect)
await self._rpc.start() await self._rpc.start()
await self._rpc.request( await self._rpc.request(
"initialize", "initialize",
@@ -286,32 +457,73 @@ class LingmaGatewayClient:
timeout=self.rpc_timeout, timeout=self.rpc_timeout,
) )
await self._rpc.notify("initialized", {}) await self._rpc.notify("initialized", {})
self._set_state(self.STATE_READY)
logger.info(
"[%s] lingma ready on port %d (initial=%s)",
self.name,
self.socket_port,
initial,
)
async def close(self): def _on_disconnect(self, exc: BaseException) -> None:
if self._rpc: if self._state == self.STATE_CLOSED:
await self._rpc.close() return
if self._ws: self._set_state(self.STATE_RECONNECTING, err=str(exc))
await self._ws.close() if self._reconnect_task and not self._reconnect_task.done():
return
try:
loop = asyncio.get_running_loop()
except RuntimeError:
return
self._reconnect_task = loop.create_task(self._reconnect_loop())
async def _reconnect_loop(self) -> None:
backoff = 1.0
max_backoff = 30.0
max_attempts = 20
for attempt in range(1, max_attempts + 1):
if self._state == self.STATE_CLOSED:
return
await asyncio.sleep(backoff)
try:
async with self._state_lock:
await self._connect(initial=False)
logger.info("lingma reconnected after %d attempt(s)", attempt)
return
except Exception as exc:
self._last_error = str(exc)
logger.warning("lingma reconnect attempt %d failed: %s", attempt, exc)
backoff = min(backoff * 2, max_backoff)
self._set_state(self.STATE_FAILED, err="reconnect exhausted")
# ------------------------------------------------------------------ RPC
@property @property
def rpc(self) -> LspWsRpcClient: def rpc(self) -> LspWsRpcClient:
if self._rpc is None: if self._rpc is None:
raise RuntimeError("Lingma RPC not initialized") raise RuntimeError(f"Lingma RPC not initialized (state={self._state})")
return self._rpc return self._rpc
async def auth_status(self): async def auth_status(self):
await self.ensure_ready()
return await self.rpc.request("auth/status", {}, timeout=self.rpc_timeout) return await self.rpc.request("auth/status", {}, timeout=self.rpc_timeout)
async def query_models(self): async def query_models(self):
await self.ensure_ready()
return await self.rpc.request("config/queryModels", {}, timeout=self.rpc_timeout) return await self.rpc.request("config/queryModels", {}, timeout=self.rpc_timeout)
async def get_endpoint(self): async def get_endpoint(self):
await self.ensure_ready()
return await self.rpc.request("config/getEndpoint", {}, timeout=self.rpc_timeout) return await self.rpc.request("config/getEndpoint", {}, timeout=self.rpc_timeout)
async def update_endpoint(self, endpoint: str): async def update_endpoint(self, endpoint: str):
return await self.rpc.request("config/updateEndpoint", {"endpoint": endpoint}, timeout=self.rpc_timeout) await self.ensure_ready()
return await self.rpc.request(
"config/updateEndpoint", {"endpoint": endpoint}, timeout=self.rpc_timeout
)
async def generate_login_url(self): async def generate_login_url(self):
await self.ensure_ready()
result = await self.rpc.request("login/generateUrl", {}, timeout=self.rpc_timeout) result = await self.rpc.request("login/generateUrl", {}, timeout=self.rpc_timeout)
if isinstance(result, str): if isinstance(result, str):
return result, {"raw": result} return result, {"raw": result}
@@ -322,6 +534,8 @@ class LingmaGatewayClient:
return "", result return "", result
return "", {"raw": result} return "", {"raw": result}
# ------------------------------------------------------------------ chat
def _build_payload(self, prompt: str, model_key: str, ask_mode: str, session_id: str, request_id: str): def _build_payload(self, prompt: str, model_key: str, ask_mode: str, session_id: str, request_id: str):
session_type = "developer" if ask_mode == "agent" else "chat" session_type = "developer" if ask_mode == "agent" else "chat"
return { return {
@@ -355,17 +569,24 @@ class LingmaGatewayClient:
} }
async def chat_complete(self, prompt: str, model_key: str, ask_mode: str) -> dict: async def chat_complete(self, prompt: str, model_key: str, ask_mode: str) -> dict:
await self.ensure_ready()
request_id = str(uuid.uuid4()) request_id = str(uuid.uuid4())
session_id = str(uuid.uuid4()) session_id = str(uuid.uuid4())
payload = self._build_payload(prompt, model_key, ask_mode, session_id, request_id) payload = self._build_payload(prompt, model_key, ask_mode, session_id, request_id)
self.rpc.create_stream(request_id) self.rpc.create_stream(request_id)
try: try:
await self.rpc.request("chat/ask", payload, timeout=self.rpc_timeout) try:
except (TimeoutError, asyncio.TimeoutError): await self.rpc.request("chat/ask", payload, timeout=self.rpc_timeout)
pass except TIMEOUT_EXCEPTIONS:
async for _ in self.rpc.consume_stream(request_id, timeout=max(20.0, self.rpc_timeout + 20.0)): # chat/ask often returns nothing until chat/finish arrives; tolerate.
pass pass
result = self.rpc.get_stream_result(request_id) async for _ in self.rpc.consume_stream(
request_id, timeout=max(20.0, self.rpc_timeout + 20.0)
):
pass
result = self.rpc.get_stream_result(request_id)
finally:
self.rpc.pop_stream(request_id)
finish = result.get("finish") or {} finish = result.get("finish") or {}
result["requestId"] = request_id result["requestId"] = request_id
result["sessionId"] = finish.get("sessionId") or session_id result["sessionId"] = finish.get("sessionId") or session_id
@@ -374,13 +595,20 @@ class LingmaGatewayClient:
return result return result
async def chat_stream(self, prompt: str, model_key: str, ask_mode: str) -> AsyncIterator[str]: async def chat_stream(self, prompt: str, model_key: str, ask_mode: str) -> AsyncIterator[str]:
await self.ensure_ready()
request_id = str(uuid.uuid4()) request_id = str(uuid.uuid4())
session_id = str(uuid.uuid4()) session_id = str(uuid.uuid4())
payload = self._build_payload(prompt, model_key, ask_mode, session_id, request_id) payload = self._build_payload(prompt, model_key, ask_mode, session_id, request_id)
self.rpc.create_stream(request_id) self.rpc.create_stream(request_id)
try: try:
await self.rpc.request("chat/ask", payload, timeout=self.rpc_timeout) try:
except (TimeoutError, asyncio.TimeoutError): await self.rpc.request("chat/ask", payload, timeout=self.rpc_timeout)
pass except TIMEOUT_EXCEPTIONS:
async for chunk in self.rpc.consume_stream(request_id, timeout=max(20.0, self.rpc_timeout + 40.0)): pass
yield chunk async for chunk in self.rpc.consume_stream(
request_id, timeout=max(20.0, self.rpc_timeout + 40.0)
):
yield chunk
finally:
# Runs on normal completion, exception, or consumer GeneratorExit (client disconnect).
self.rpc.pop_stream(request_id)

275
app/lingma_pool.py Normal file
View File

@@ -0,0 +1,275 @@
from __future__ import annotations
import asyncio
from dataclasses import dataclass
from pathlib import Path
from .auto_login import AutoLoginManager
from .config import LingmaAccount
from .lingma_client import LingmaGatewayClient
from .logging_config import get_logger
logger = get_logger("lingma_gateway.pool")
@dataclass
class InstanceConfig:
index: int
name: str
work_dir: str
socket_port: int
account: LingmaAccount
class PoolInstance:
"""A single Lingma process + its auto_login + in-flight counter."""
__slots__ = ("cfg", "client", "auto_login", "in_flight")
def __init__(
self,
cfg: InstanceConfig,
client: LingmaGatewayClient,
auto_login: AutoLoginManager,
):
self.cfg = cfg
self.client = client
self.auto_login = auto_login
self.in_flight = 0
@property
def name(self) -> str:
return self.cfg.name
@property
def healthy(self) -> bool:
return self.client.state == LingmaGatewayClient.STATE_READY
class LingmaPool:
"""N-Lingma process pool with least-in-flight + affinity routing.
For N=1 this degenerates into the original single-client setup, preserving
backwards compatibility with `LINGMA_USERNAME/LINGMA_PASSWORD`-only deploys.
"""
def __init__(self, instances: list[PoolInstance]):
if not instances:
raise RuntimeError("LingmaPool requires at least 1 instance")
self._instances: list[PoolInstance] = instances
self._rr_counter = 0
@classmethod
def build(
cls,
*,
lingma_bin: str,
base_work_dir: str,
legacy_socket_port: int,
startup_timeout: int,
rpc_timeout: int,
default_model: str,
default_ask_mode: str,
accounts: list[LingmaAccount],
instance_count: int,
auto_login_headless: bool,
auto_login_timeout: int,
auto_login_max_retry: int,
verify_timeout_sec: int | None = None,
) -> "LingmaPool":
"""Materialize N PoolInstances.
Single-instance (N=1) uses the legacy workDir and LINGMA_SOCKET_PORT so
existing deployments keep their state after upgrade. N>1 derives per-instance
workDirs under `<base_work_dir>/../pool/inst-<i>` and uses dynamic ports.
"""
if instance_count < 1:
instance_count = 1
resolved_accounts: list[LingmaAccount] = []
for i in range(instance_count):
if accounts:
resolved_accounts.append(accounts[i % len(accounts)])
else:
resolved_accounts.append(LingmaAccount(username="", password=""))
if instance_count > len(accounts) and accounts:
logger.warning(
"instance_count=%d exceeds unique accounts=%d; accounts will be reused",
instance_count,
len(accounts),
)
base_dir = Path(base_work_dir)
# Put per-instance workDirs under `<data>/.lingma/pool/inst-<i>`.
# Walk up past the vscode/sharedClientCache layout if present.
pool_root = base_dir
for _ in range(3):
if pool_root.name == ".lingma":
break
if pool_root.parent == pool_root:
break
pool_root = pool_root.parent
pool_root = pool_root / "pool"
instances: list[PoolInstance] = []
for i, acc in enumerate(resolved_accounts):
if instance_count == 1:
work_dir = str(base_dir)
socket_port = legacy_socket_port
extra_info: list[Path] | None = None
else:
work_dir = str(pool_root / f"inst-{i}")
socket_port = 0
# In pool mode each instance reads only its own workDir .info to
# avoid the shared ~/.lingma/.info race between instances.
extra_info = []
name = f"inst-{i}"
client = LingmaGatewayClient(
lingma_bin=lingma_bin,
work_dir=work_dir,
socket_port=socket_port,
startup_timeout=startup_timeout,
rpc_timeout=rpc_timeout,
default_model=default_model,
default_ask_mode=default_ask_mode,
name=name,
extra_info_paths=extra_info,
)
def _make_verify(_client: LingmaGatewayClient):
async def _verify() -> bool:
try:
st = await _client.auth_status()
except Exception:
return False
return bool(st and st.get("id"))
return _verify
auto_login = AutoLoginManager(
username=acc.username,
password=acc.password,
headless=auto_login_headless,
timeout_sec=auto_login_timeout,
max_retry=auto_login_max_retry,
verify_logged_in=_make_verify(client),
verify_timeout_sec=verify_timeout_sec
or max(30, min(180, auto_login_timeout)),
debug_dir=f"/tmp/lingma-auto-login/{name}",
)
cfg = InstanceConfig(
index=i,
name=name,
work_dir=work_dir,
socket_port=socket_port,
account=acc,
)
instances.append(PoolInstance(cfg, client, auto_login))
return cls(instances)
# -------------------------------------------------------------- lifecycle
async def start(self) -> None:
"""Start all instances sequentially.
Sequential startup avoids racing on the shared ~/.lingma/.info file (for
pool-mode we skip it anyway, but Lingma may still write there internally)
and keeps docker logs readable. Failures are non-fatal; per-instance
reconnect loops will take over.
"""
for inst in self._instances:
logger.info(
"pool starting %s (workDir=%s port=%d account=%s)",
inst.name,
inst.cfg.work_dir,
inst.cfg.socket_port,
inst.cfg.account.username or "<empty>",
)
try:
await inst.client.start()
except Exception as exc:
logger.warning("pool start %s failed: %s", inst.name, exc)
async def close(self) -> None:
tasks = [asyncio.create_task(inst.client.close()) for inst in self._instances]
for t in tasks:
try:
await t
except Exception:
pass
# -------------------------------------------------------------- inspection
@property
def instances(self) -> list[PoolInstance]:
return list(self._instances)
def size(self) -> int:
return len(self._instances)
def stats(self) -> list[dict]:
return [
{
"index": inst.cfg.index,
"name": inst.name,
"state": inst.client.state,
"last_error": inst.client.last_error,
"in_flight": inst.in_flight,
"work_dir": inst.cfg.work_dir,
"socket_port": inst.cfg.socket_port,
"username": inst.cfg.account.username,
"auto_login": inst.auto_login.status(),
}
for inst in self._instances
]
def prometheus_lines(self) -> list[str]:
lines: list[str] = [
"# TYPE gateway_pool_instance_in_flight gauge",
"# TYPE gateway_pool_instance_ready gauge",
]
for inst in self._instances:
lbl = f'name="{inst.name}",idx="{inst.cfg.index}"'
lines.append(f"gateway_pool_instance_in_flight{{{lbl}}} {inst.in_flight}")
lines.append(
f"gateway_pool_instance_ready{{{lbl}}} {1 if inst.healthy else 0}"
)
return lines
# -------------------------------------------------------------- selection
def pick(self, affinity_key: str | None = None) -> PoolInstance:
"""Pick an instance for a request.
Preference order:
1. Sticky affinity if `affinity_key` is provided and the bucket is healthy.
2. Least-in-flight among healthy instances.
3. Round-robin fallback when nothing is healthy (lazy-start will kick in).
"""
if not self._instances:
raise RuntimeError("lingma pool is empty")
healthy = [i for i in self._instances if i.healthy]
if affinity_key:
bucket = self._instances[
abs(hash(affinity_key)) % len(self._instances)
]
if bucket.healthy:
return bucket
if healthy:
return min(healthy, key=lambda x: (x.in_flight, x.cfg.index))
# Nothing healthy. Fall back to round-robin so every instance gets a
# chance to reconnect via ensure_ready().
idx = self._rr_counter % len(self._instances)
self._rr_counter += 1
return self._instances[idx]

56
app/logging_config.py Normal file
View File

@@ -0,0 +1,56 @@
from __future__ import annotations
import contextvars
import json
import logging
import sys
import time
request_id_var: contextvars.ContextVar[str] = contextvars.ContextVar(
"request_id", default="-"
)
class _JsonFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
ts = time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime(record.created))
data: dict = {
"ts": f"{ts}.{int(record.msecs):03d}Z",
"level": record.levelname,
"logger": record.name,
"msg": record.getMessage(),
"request_id": request_id_var.get(),
}
if record.exc_info:
data["exc"] = self.formatException(record.exc_info)
for key, val in record.__dict__.items():
if key.startswith("ctx_"):
data[key[4:]] = val
return json.dumps(data, ensure_ascii=False)
def configure_logging(level: str = "INFO") -> None:
level = (level or "INFO").upper()
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(_JsonFormatter())
root = logging.getLogger()
root.handlers.clear()
root.addHandler(handler)
root.setLevel(level)
# Align uvicorn access/error logs with our JSON formatter.
for name in ("uvicorn", "uvicorn.error", "uvicorn.access", "fastapi"):
lg = logging.getLogger(name)
lg.handlers.clear()
lg.propagate = True
lg.setLevel(level)
# Trim noisy libraries.
logging.getLogger("websockets").setLevel("WARNING")
logging.getLogger("websockets.client").setLevel("WARNING")
def get_logger(name: str = "lingma_gateway") -> logging.Logger:
return logging.getLogger(name)

View File

@@ -1,16 +1,20 @@
from __future__ import annotations from __future__ import annotations
import asyncio
import hashlib
import json import json
import time import time
import uuid import uuid
from contextlib import asynccontextmanager
from fastapi import Depends, FastAPI, HTTPException, Request from fastapi import Depends, FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse, StreamingResponse from fastapi.responses import JSONResponse, StreamingResponse
from .auto_login import AutoLoginManager from .auth import require_bearer, require_metrics_access
from .auth import require_bearer from .concurrency import BackpressureRejected, InFlightGuard
from .config import Settings, load_settings from .config import Settings, load_settings
from .lingma_client import LingmaGatewayClient from .lingma_pool import LingmaPool, PoolInstance
from .logging_config import configure_logging, get_logger, request_id_var
from .model_map import build_model_name_map, flatten_model_keys, resolve_model from .model_map import build_model_name_map, flatten_model_keys, resolve_model
from .openai_schema import ( from .openai_schema import (
ChatCompletionChoice, ChatCompletionChoice,
@@ -18,107 +22,219 @@ from .openai_schema import (
ChatCompletionsRequest, ChatCompletionsRequest,
ModelData, ModelData,
ModelsResponse, ModelsResponse,
flatten_content,
) )
from .stats import StatsCollector, estimate_tokens from .stats import StatsCollector, estimate_tokens
app = FastAPI(title="Lingma OpenAI Gateway", version="0.1.0")
settings: Settings = load_settings() settings: Settings = load_settings()
lingma: LingmaGatewayClient | None = None configure_logging(settings.log_level)
auto_login: AutoLoginManager | None = None logger = get_logger("lingma_gateway")
pool: LingmaPool | None = None
stats_collector = StatsCollector() stats_collector = StatsCollector()
chat_guard = InFlightGuard(
max_in_flight=settings.gateway_max_in_flight,
queue_timeout_sec=settings.gateway_queue_timeout_sec,
)
def _require_pool() -> LingmaPool:
if pool is None:
raise HTTPException(
status_code=503,
detail={"error": {"message": "pool not initialized", "type": "service_unavailable"}},
)
return pool
@asynccontextmanager
async def lifespan(_app: FastAPI):
global pool
pool = LingmaPool.build(
lingma_bin=settings.lingma_bin,
base_work_dir=settings.lingma_work_dir,
legacy_socket_port=settings.lingma_socket_port,
startup_timeout=settings.lingma_startup_timeout,
rpc_timeout=settings.lingma_rpc_timeout,
default_model=settings.default_model,
default_ask_mode=settings.default_ask_mode,
accounts=settings.accounts,
instance_count=settings.instance_count,
auto_login_headless=settings.auto_login_headless,
auto_login_timeout=settings.auto_login_timeout,
auto_login_max_retry=settings.auto_login_max_retry,
)
logger.info(
"gateway startup: pool_size=%d max_in_flight=%d",
pool.size(),
settings.gateway_max_in_flight,
)
await pool.start()
try:
yield
finally:
if pool is not None:
await pool.close()
app = FastAPI(title="Lingma OpenAI Gateway", version="0.3.0", lifespan=lifespan)
@app.middleware("http")
async def request_id_middleware(request: Request, call_next):
req_id = request.headers.get("x-request-id") or f"req-{uuid.uuid4().hex[:12]}"
token = request_id_var.set(req_id)
start = time.monotonic()
status_code = 500
try:
response = await call_next(request)
status_code = response.status_code
response.headers["x-request-id"] = req_id
return response
finally:
elapsed_ms = int((time.monotonic() - start) * 1000)
logger.info(
"http %s %s -> %s in %dms",
request.method,
request.url.path,
status_code,
elapsed_ms,
extra={
"ctx_method": request.method,
"ctx_path": request.url.path,
"ctx_status": status_code,
"ctx_elapsed_ms": elapsed_ms,
},
)
request_id_var.reset(token)
def auth_guard(request: Request): def auth_guard(request: Request):
require_bearer(request, settings.api_keys) require_bearer(request, settings.api_keys)
async def _is_logged_in() -> bool: def metrics_auth_guard(request: Request):
assert lingma is not None require_metrics_access(request, settings.api_keys, settings.metrics_token)
st = await lingma.auth_status()
return bool(st and st.get("id"))
@app.on_event("startup")
async def on_startup():
global lingma, auto_login
lingma = LingmaGatewayClient(
lingma_bin=settings.lingma_bin,
work_dir=settings.lingma_work_dir,
socket_port=settings.lingma_socket_port,
startup_timeout=settings.lingma_startup_timeout,
rpc_timeout=settings.lingma_rpc_timeout,
default_model=settings.default_model,
default_ask_mode=settings.default_ask_mode,
)
await lingma.start()
auto_login = AutoLoginManager(
username=settings.lingma_username,
password=settings.lingma_password,
headless=settings.auto_login_headless,
timeout_sec=settings.auto_login_timeout,
max_retry=settings.auto_login_max_retry,
verify_logged_in=_is_logged_in,
verify_timeout_sec=max(30, min(180, settings.auto_login_timeout)),
)
@app.on_event("shutdown")
async def on_shutdown():
if lingma:
await lingma.close()
@app.get("/healthz") @app.get("/healthz")
async def healthz(): async def healthz():
return {"ok": True, "time": int(time.time())} if pool is None:
return {"ok": False, "time": int(time.time()), "reason": "pool uninitialized"}
insts = pool.stats()
ready = sum(1 for i in insts if i["state"] == "ready")
return {
"ok": ready > 0,
"time": int(time.time()),
"pool_size": len(insts),
"pool_ready": ready,
"instances": [
{"name": i["name"], "state": i["state"], "in_flight": i["in_flight"]}
for i in insts
],
}
async def _ensure_logged_in_or_auto_login() -> dict: async def _ensure_instance_logged_in(inst: PoolInstance) -> dict:
assert lingma is not None client = inst.client
status = await lingma.auth_status() auto_login = inst.auto_login
try:
status = await client.auth_status()
except Exception as exc:
logger.warning("[%s] auth_status failed before chat: %s", inst.name, exc)
raise HTTPException(
status_code=503,
detail={"error": {"message": "Lingma is not ready", "type": "service_unavailable"}},
)
if status and status.get("id"): if status and status.get("id"):
return status return status
if not settings.auto_login_enabled: if not settings.auto_login_enabled:
raise HTTPException(status_code=401, detail={"error": {"message": "Lingma not logged in"}})
if settings.dedicated_domain_url:
current = await lingma.get_endpoint()
current_ep = (current or {}).get("endpoint", "") if isinstance(current, dict) else ""
if current_ep != settings.dedicated_domain_url:
await lingma.update_endpoint(settings.dedicated_domain_url)
login_url, login_raw = await lingma.generate_login_url()
if not login_url:
raise HTTPException( raise HTTPException(
status_code=500, status_code=401,
detail={"error": {"message": f"generate login url failed: {login_raw}"}}, detail={"error": {"message": "Lingma not logged in", "type": "invalid_request_error"}},
)
if settings.dedicated_domain_url:
try:
current = await client.get_endpoint()
current_ep = (current or {}).get("endpoint", "") if isinstance(current, dict) else ""
if current_ep != settings.dedicated_domain_url:
await client.update_endpoint(settings.dedicated_domain_url)
except Exception as exc:
logger.warning("[%s] switch dedicated endpoint failed: %s", inst.name, exc)
try:
login_url, _login_raw = await client.generate_login_url()
except Exception as exc:
logger.warning("[%s] generate_login_url failed: %s", inst.name, exc)
raise HTTPException(
status_code=502,
detail={"error": {"message": "generate login url failed", "type": "upstream_error"}},
)
if not login_url:
raise HTTPException(
status_code=502,
detail={"error": {"message": "generate login url failed", "type": "upstream_error"}},
) )
assert auto_login is not None
await auto_login.ensure_started(login_url) await auto_login.ensure_started(login_url)
try: try:
await auto_login.wait_done(timeout=settings.auto_login_timeout + 20) await auto_login.wait_done(timeout=settings.auto_login_timeout + 20)
except Exception: except Exception as exc:
pass logger.warning("[%s] auto_login wait_done failed: %s", inst.name, exc)
try:
status = await client.auth_status()
except Exception as exc:
logger.warning("[%s] post-login auth_status failed: %s", inst.name, exc)
status = None
status = await lingma.auth_status()
if status and status.get("id"): if status and status.get("id"):
return status return status
logger.warning(
"[%s] auto login did not result in a logged-in session: %s",
inst.name,
auto_login.status(),
)
raise HTTPException( raise HTTPException(
status_code=401, status_code=401,
detail={"error": {"message": "Lingma auto login failed", "auto_login": auto_login.status()}}, detail={"error": {"message": "Lingma auto login failed", "type": "invalid_request_error"}},
) )
def _affinity_key_for(req: ChatCompletionsRequest) -> str | None:
"""Derive a stable affinity key so that follow-ups go to the same instance.
Priority: explicit `user` > hash of the first/system message.
"""
if req.user:
return req.user.strip() or None
for m in req.messages:
if m.role == "system":
text = flatten_content(m.content)
if text:
return "sys:" + hashlib.sha1(text.encode("utf-8")).hexdigest()[:16]
if req.messages:
first = req.messages[0]
text = flatten_content(first.content)
if text:
return "first:" + hashlib.sha1(text.encode("utf-8")).hexdigest()[:16]
return None
@app.get("/v1/models", dependencies=[Depends(auth_guard)]) @app.get("/v1/models", dependencies=[Depends(auth_guard)])
async def v1_models(): async def v1_models():
assert lingma is not None p = _require_pool()
await _ensure_logged_in_or_auto_login() inst = p.pick()
await _ensure_instance_logged_in(inst)
await stats_collector.inc_models() await stats_collector.inc_models()
models = await lingma.query_models() models = await inst.client.query_models()
keys = flatten_model_keys(models) keys = flatten_model_keys(models)
name_map = build_model_name_map(models) name_map = build_model_name_map(models)
resp = ModelsResponse(data=[ModelData(id=k, name=name_map.get(k)) for k in keys]) resp = ModelsResponse(data=[ModelData(id=k, name=name_map.get(k)) for k in keys])
@@ -126,20 +242,32 @@ async def v1_models():
def _messages_to_prompt(messages: list[dict]) -> str: def _messages_to_prompt(messages: list[dict]) -> str:
parts = [] parts: list[str] = []
for m in messages: for m in messages:
role = m.get("role", "user") role = m.get("role", "user")
content = m.get("content", "") text = flatten_content(m.get("content"))
parts.append(f"[{role}] {content}") if not text and m.get("tool_calls"):
text = f"[tool_calls] {json.dumps(m['tool_calls'], ensure_ascii=False)}"
if not text:
continue
parts.append(f"[{role}] {text}")
return "\n".join(parts).strip() return "\n".join(parts).strip()
def _include_usage(stream_options: dict | None) -> bool:
if not isinstance(stream_options, dict):
return False
return bool(stream_options.get("include_usage"))
@app.post("/v1/chat/completions", dependencies=[Depends(auth_guard)]) @app.post("/v1/chat/completions", dependencies=[Depends(auth_guard)])
async def v1_chat_completions(req: ChatCompletionsRequest): async def v1_chat_completions(req: ChatCompletionsRequest):
assert lingma is not None p = _require_pool()
await _ensure_logged_in_or_auto_login() affinity = _affinity_key_for(req)
inst = p.pick(affinity_key=affinity)
await _ensure_instance_logged_in(inst)
models = await lingma.query_models() models = await inst.client.query_models()
available = flatten_model_keys(models) available = flatten_model_keys(models)
name_map = build_model_name_map(models) name_map = build_model_name_map(models)
model = resolve_model(req.model, available, settings.default_model, name_map) model = resolve_model(req.model, available, settings.default_model, name_map)
@@ -150,142 +278,270 @@ async def v1_chat_completions(req: ChatCompletionsRequest):
prompt = _messages_to_prompt([m.model_dump() for m in req.messages]) prompt = _messages_to_prompt([m.model_dump() for m in req.messages])
if not prompt: if not prompt:
raise HTTPException(status_code=400, detail={"error": {"message": "messages is empty"}}) raise HTTPException(
status_code=400,
detail={"error": {"message": "messages is empty", "type": "invalid_request_error"}},
)
prompt_tokens = estimate_tokens(prompt) prompt_tokens = estimate_tokens(prompt)
include_usage = _include_usage(req.stream_options)
if req.stream: # Backpressure: acquire a slot *after* the cheap validation but before any
created = int(time.time()) # upstream call. This ensures we reject quickly when saturated.
completion_id = f"chatcmpl-{uuid.uuid4().hex}" try:
completion_tokens_holder = {"n": 0} ticket = await chat_guard.try_acquire()
except BackpressureRejected as exc:
retry_after = max(1, int(exc.retry_after))
logger.warning("chat rejected by backpressure, retry_after=%ds", retry_after)
raise HTTPException(
status_code=429,
detail={
"error": {
"message": "Too many in-flight requests, please retry later",
"type": "rate_limit_error",
"code": "backpressure",
}
},
headers={"Retry-After": str(retry_after)},
)
async def event_stream(): inst.in_flight += 1
success = False logger.info(
try: "chat.start inst=%s model=%s ask_mode=%s stream=%s prompt_tokens~%d",
async for chunk in lingma.chat_stream(prompt, model, ask_mode): inst.name,
completion_tokens_holder["n"] += estimate_tokens(chunk) model,
payload = { ask_mode,
req.stream,
prompt_tokens,
extra={
"ctx_instance": inst.name,
"ctx_model": model,
"ctx_ask_mode": ask_mode,
"ctx_stream": req.stream,
"ctx_prompt_tokens": prompt_tokens,
"ctx_in_flight": chat_guard.in_flight,
"ctx_affinity": affinity,
},
)
ticket_transferred = False
try:
if req.stream:
created = int(time.time())
completion_id = f"chatcmpl-{uuid.uuid4().hex}"
completion_tokens_holder = {"n": 0}
async def event_stream(_ticket=ticket, _inst=inst):
success = False
try:
async for chunk in _inst.client.chat_stream(prompt, model, ask_mode):
completion_tokens_holder["n"] += estimate_tokens(chunk)
payload = {
"id": completion_id,
"object": "chat.completion.chunk",
"created": created,
"model": model,
"choices": [
{
"index": 0,
"delta": {"content": chunk},
"finish_reason": None,
}
],
}
yield f"data: {json.dumps(payload, ensure_ascii=False)}\n\n"
done_payload = {
"id": completion_id, "id": completion_id,
"object": "chat.completion.chunk", "object": "chat.completion.chunk",
"created": created, "created": created,
"model": model, "model": model,
"choices": [ "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
{
"index": 0,
"delta": {"content": chunk},
"finish_reason": None,
}
],
} }
yield f"data: {json.dumps(payload, ensure_ascii=False)}\n\n" yield f"data: {json.dumps(done_payload, ensure_ascii=False)}\n\n"
done_payload = { if include_usage:
"id": completion_id, usage_payload = {
"object": "chat.completion.chunk", "id": completion_id,
"created": created, "object": "chat.completion.chunk",
"model": model, "created": created,
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}], "model": model,
} "choices": [],
yield f"data: {json.dumps(done_payload, ensure_ascii=False)}\n\n" "usage": {
yield "data: [DONE]\n\n" "prompt_tokens": prompt_tokens,
success = True "completion_tokens": completion_tokens_holder["n"],
finally: "total_tokens": prompt_tokens + completion_tokens_holder["n"],
await stats_collector.record_chat( },
stream=True, }
success=success, yield f"data: {json.dumps(usage_payload, ensure_ascii=False)}\n\n"
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens_holder["n"],
)
return StreamingResponse(event_stream(), media_type="text/event-stream") yield "data: [DONE]\n\n"
success = True
except asyncio.CancelledError:
logger.info("chat.stream cancelled by client (inst=%s)", _inst.name)
raise
except Exception as exc:
logger.warning("chat.stream error (inst=%s): %s", _inst.name, exc)
finally:
await stats_collector.record_chat(
stream=True,
success=success,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens_holder["n"],
)
_inst.in_flight = max(0, _inst.in_flight - 1)
_ticket.release()
try: ticket_transferred = True
result = await lingma.chat_complete(prompt, model, ask_mode) return StreamingResponse(
except Exception: event_stream(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
},
)
try:
result = await inst.client.chat_complete(prompt, model, ask_mode)
except Exception as exc:
logger.warning("chat.complete error (inst=%s): %s", inst.name, exc)
await stats_collector.record_chat(
stream=False,
success=False,
prompt_tokens=prompt_tokens,
completion_tokens=0,
)
raise HTTPException(
status_code=502,
detail={"error": {"message": "upstream lingma error", "type": "upstream_error"}},
)
completion_tokens = estimate_tokens(result.get("text") or "")
await stats_collector.record_chat( await stats_collector.record_chat(
stream=False, stream=False,
success=False, success=True,
prompt_tokens=prompt_tokens, prompt_tokens=prompt_tokens,
completion_tokens=0, completion_tokens=completion_tokens,
) )
raise response = ChatCompletionResponse(
id=f"chatcmpl-{uuid.uuid4().hex}",
completion_tokens = estimate_tokens(result.get("text") or "") created=int(time.time()),
await stats_collector.record_chat( model=model,
stream=False, choices=[
success=True, ChatCompletionChoice(
prompt_tokens=prompt_tokens, index=0,
completion_tokens=completion_tokens, finish_reason="stop",
) message={"role": "assistant", "content": result.get("text") or ""},
response = ChatCompletionResponse( )
id=f"chatcmpl-{uuid.uuid4().hex}", ],
created=int(time.time()), )
model=model, data = response.model_dump()
choices=[ data["latency"] = {
ChatCompletionChoice( "first_token_ms": result.get("firstTokenLatencyMs"),
index=0, "total_ms": result.get("totalLatencyMs"),
finish_reason="stop", }
message={"role": "assistant", "content": result.get("text") or ""}, data["usage"] = {
) "prompt_tokens": prompt_tokens,
], "completion_tokens": completion_tokens,
) "total_tokens": prompt_tokens + completion_tokens,
data = response.model_dump() }
data["latency"] = { data["served_by"] = inst.name
"first_token_ms": result.get("firstTokenLatencyMs"), return JSONResponse(content=data)
"total_ms": result.get("totalLatencyMs"), finally:
} if not ticket_transferred:
data["usage"] = { inst.in_flight = max(0, inst.in_flight - 1)
"prompt_tokens": prompt_tokens, ticket.release()
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens,
}
return JSONResponse(content=data)
@app.post("/internal/auto-login/start", dependencies=[Depends(auth_guard)]) @app.post("/internal/auto-login/start", dependencies=[Depends(auth_guard)])
async def internal_auto_login_start(): async def internal_auto_login_start(instance: str | None = None):
assert lingma is not None p = _require_pool()
assert auto_login is not None target = None
if instance:
for inst in p.instances:
if inst.name == instance:
target = inst
break
if target is None:
raise HTTPException(
status_code=404,
detail={"error": {"message": f"instance not found: {instance}"}},
)
else:
target = p.pick()
status = await lingma.auth_status() client = target.client
auto_login = target.auto_login
status = await client.auth_status()
if status and status.get("id"): if status and status.get("id"):
return {"ok": True, "state": "already_logged_in", "auth": status} return {"ok": True, "state": "already_logged_in", "instance": target.name, "auth": status}
if settings.dedicated_domain_url: if settings.dedicated_domain_url:
current = await lingma.get_endpoint() try:
current_ep = (current or {}).get("endpoint", "") if isinstance(current, dict) else "" current = await client.get_endpoint()
if current_ep != settings.dedicated_domain_url: current_ep = (current or {}).get("endpoint", "") if isinstance(current, dict) else ""
await lingma.update_endpoint(settings.dedicated_domain_url) if current_ep != settings.dedicated_domain_url:
await client.update_endpoint(settings.dedicated_domain_url)
except Exception as exc:
logger.warning("[%s] switch dedicated endpoint failed: %s", target.name, exc)
try:
login_url, _login_raw = await client.generate_login_url()
except Exception as exc:
logger.warning("[%s] generate_login_url failed: %s", target.name, exc)
raise HTTPException(status_code=502, detail={"error": {"message": "generate login url failed"}})
login_url, login_raw = await lingma.generate_login_url()
if not login_url: if not login_url:
raise HTTPException(status_code=500, detail={"error": {"message": "generate login url failed", "raw": login_raw}}) raise HTTPException(status_code=502, detail={"error": {"message": "generate login url failed"}})
started = await auto_login.ensure_started(login_url) started = await auto_login.ensure_started(login_url)
return { return {
"ok": True, "ok": True,
"state": "running" if started else "already_running", "state": "running" if started else "already_running",
"loginUrl": login_url, "instance": target.name,
"auto_login": auto_login.status(), "auto_login": auto_login.status(),
} }
@app.get("/internal/auto-login/status", dependencies=[Depends(auth_guard)]) @app.get("/internal/auto-login/status", dependencies=[Depends(auth_guard)])
async def internal_auto_login_status(): async def internal_auto_login_status():
assert auto_login is not None p = _require_pool()
assert lingma is not None out = []
return { for inst in p.instances:
"ok": True, try:
"auto_login": auto_login.status(), auth = await inst.client.auth_status()
"auth": await lingma.auth_status(), except Exception as exc:
} auth = {"error": str(exc)}
out.append(
{
"instance": inst.name,
"auto_login": inst.auto_login.status(),
"auth": auth,
"state": inst.client.state,
}
)
return {"ok": True, "instances": out}
@app.get("/internal/stats", dependencies=[Depends(auth_guard)]) @app.get("/internal/stats", dependencies=[Depends(auth_guard)])
async def internal_stats(): async def internal_stats():
return {"ok": True, "stats": await stats_collector.snapshot()} p = _require_pool()
return {
"ok": True,
"stats": await stats_collector.snapshot(),
"concurrency": chat_guard.stats(),
"pool": p.stats(),
}
@app.get("/metrics") @app.get("/metrics", dependencies=[Depends(metrics_auth_guard)])
async def metrics(): async def metrics():
text = await stats_collector.prometheus_text() base = await stats_collector.prometheus_text()
return StreamingResponse(iter([text]), media_type="text/plain; version=0.0.4") lines = list(chat_guard.prometheus_lines())
if pool is not None:
lines.extend(pool.prometheus_lines())
extra = "\n".join(lines) + "\n"
return StreamingResponse(iter([base + extra]), media_type="text/plain; version=0.0.4")

View File

@@ -1,13 +1,22 @@
from __future__ import annotations from __future__ import annotations
from typing import Literal from typing import Any, Literal
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
# Keep permissive: OpenAI clients routinely send list-of-parts (multi-modal) or None
# (for tool calls). We flatten to plain text downstream.
MessageContent = str | list[dict[str, Any]] | None
class ChatMessage(BaseModel): class ChatMessage(BaseModel):
role: Literal["system", "user", "assistant", "tool"] # OpenAI supports "developer" on newer API versions in addition to the classic set.
content: str role: Literal["system", "user", "assistant", "tool", "developer", "function"]
content: MessageContent = None
name: str | None = None
tool_call_id: str | None = None
tool_calls: list[dict[str, Any]] | None = None
class ChatCompletionsRequest(BaseModel): class ChatCompletionsRequest(BaseModel):
@@ -16,6 +25,11 @@ class ChatCompletionsRequest(BaseModel):
stream: bool = False stream: bool = False
temperature: float | None = None temperature: float | None = None
top_p: float | None = None top_p: float | None = None
max_tokens: int | None = None
user: str | None = None
stream_options: dict[str, Any] | None = None
tools: list[dict[str, Any]] | None = None
tool_choice: Any | None = None
class ModelData(BaseModel): class ModelData(BaseModel):
@@ -35,6 +49,7 @@ class ChatCompletionChoice(BaseModel):
index: int = 0 index: int = 0
finish_reason: str | None = "stop" finish_reason: str | None = "stop"
message: dict = Field(default_factory=dict) message: dict = Field(default_factory=dict)
logprobs: Any | None = None
class ChatCompletionResponse(BaseModel): class ChatCompletionResponse(BaseModel):
@@ -43,3 +58,34 @@ class ChatCompletionResponse(BaseModel):
created: int created: int
model: str model: str
choices: list[ChatCompletionChoice] choices: list[ChatCompletionChoice]
system_fingerprint: str | None = None
def flatten_content(content: MessageContent) -> str:
"""Reduce OpenAI multi-part content to a plain string prompt for Lingma."""
if content is None:
return ""
if isinstance(content, str):
return content
if isinstance(content, list):
parts: list[str] = []
for item in content:
if not isinstance(item, dict):
parts.append(str(item))
continue
t = item.get("type")
if t == "text":
text = item.get("text") or ""
if text:
parts.append(text)
elif t in ("image_url", "input_image"):
# Lingma 不支持多模态,降级成占位符,保留语义信号
parts.append("[image]")
elif t == "input_audio":
parts.append("[audio]")
else:
text = item.get("text") or item.get("content")
if isinstance(text, str) and text:
parts.append(text)
return "\n".join(p for p in parts if p)
return str(content)