Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
reconnecting/failed/closed) and exponential backoff, so a Lingma
restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
added `developer`/`function` roles, `tools/tool_choice/stream_options/
user/max_tokens` fields, and `stream_options.include_usage` emits final
usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.
Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
accounting, `BackpressureRejected` raises 429 + `Retry-After` once
`GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
accepted_total/rejected_total/max_in_flight.
Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
`LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
`gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
`/internal/auto-login/status` lists all instances.
Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
`GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.
Made-with: Cursor
70 lines
2.3 KiB
Plaintext
70 lines
2.3 KiB
Plaintext
# 网关监听地址
|
||
HOST=0.0.0.0
|
||
# 网关监听端口
|
||
PORT=8317
|
||
# API Key,可配置多个(逗号分隔)
|
||
API_KEYS=sk-your-api-key
|
||
# 独立的 /metrics 鉴权 token(留空则退化为 API_KEYS 也可访问;若连 API_KEYS 都没配,/metrics 为公开)
|
||
METRICS_TOKEN=
|
||
# 日志级别(DEBUG / INFO / WARNING / ERROR)
|
||
LOG_LEVEL=INFO
|
||
|
||
# /v1/chat/completions 并发上限(<=0 表示不限流)
|
||
GATEWAY_MAX_IN_FLIGHT=4
|
||
# 排队等待超时秒数,超过后返回 429 + Retry-After
|
||
GATEWAY_QUEUE_TIMEOUT_SEC=30
|
||
|
||
# 容器内 Lingma 二进制路径
|
||
LINGMA_BIN=/app/data/bin/Lingma
|
||
# Lingma 获取方式:marketplace 或 vsix
|
||
LINGMA_SOURCE_TYPE=marketplace
|
||
# Marketplace 发布者
|
||
LINGMA_MARKETPLACE_PUBLISHER=Alibaba-Cloud
|
||
# Marketplace 扩展名
|
||
LINGMA_MARKETPLACE_EXTENSION=tongyi-lingma
|
||
# VSIX 下载地址(最新优先)
|
||
LINGMA_VSIX_URL=https://tongyi-code.oss-cn-hangzhou.aliyuncs.com/vscode/tongyi-lingma-latest.vsix
|
||
# 启动时总是尝试从 VSIX 刷新二进制
|
||
LINGMA_BOOTSTRAP_ALWAYS=true
|
||
# 强制刷新(true 时忽略本地缓存)
|
||
LINGMA_FORCE_REFRESH=false
|
||
# Lingma 工作目录(登录/会话数据)
|
||
LINGMA_WORK_DIR=/app/data/.lingma/vscode/sharedClientCache
|
||
# Lingma WebSocket 端口
|
||
LINGMA_SOCKET_PORT=36510
|
||
# Lingma 启动等待秒数
|
||
LINGMA_STARTUP_TIMEOUT=40
|
||
# 单次 RPC 超时秒数
|
||
LINGMA_RPC_TIMEOUT=30
|
||
|
||
# 默认模型(无法映射时使用)
|
||
DEFAULT_MODEL=org_auto
|
||
# 默认模式:chat 或 agent
|
||
DEFAULT_ASK_MODE=chat
|
||
|
||
# 专属域(可选)
|
||
DEDICATED_DOMAIN_URL=
|
||
|
||
# 未登录时是否自动登录
|
||
AUTO_LOGIN_ENABLED=true
|
||
# 自动登录是否无头浏览器
|
||
AUTO_LOGIN_HEADLESS=true
|
||
# 自动登录超时秒数
|
||
AUTO_LOGIN_TIMEOUT=180
|
||
# 自动登录重试次数
|
||
AUTO_LOGIN_MAX_RETRY=2
|
||
|
||
# Lingma 登录用户名(仅当 LINGMA_ACCOUNTS 为空时生效,单实例模式)
|
||
LINGMA_USERNAME=
|
||
# Lingma 登录密码(仅当 LINGMA_ACCOUNTS 为空时生效)
|
||
LINGMA_PASSWORD=
|
||
|
||
# ==== 多实例池(方案乙:多账号) ====
|
||
# 多账号列表,支持两种格式:
|
||
# CSV: user1:pass1,user2:pass2
|
||
# JSON: [{"username":"u1","password":"p1"},{"username":"u2","password":"p2"}]
|
||
# 配置后每个账号对应一个独立 Lingma 实例(独立 workDir + 独立自动登录)
|
||
LINGMA_ACCOUNTS=
|
||
# 实例数量:默认等于 LINGMA_ACCOUNTS 数;显式指定时账号不足会循环复用并打 warning
|
||
LINGMA_INSTANCE_COUNT=
|