Files
lingma-openai-gateway/.env.example
GitHub Actions 707acc9005 feat: M1+M2 gateway hardening and multi-instance pool
Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
  client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
  reconnecting/failed/closed) and exponential backoff, so a Lingma
  restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
  triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
  propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
  defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
  added `developer`/`function` roles, `tools/tool_choice/stream_options/
  user/max_tokens` fields, and `stream_options.include_usage` emits final
  usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
  Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.

Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
  accounting, `BackpressureRejected` raises 429 + `Retry-After` once
  `GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
  from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
  accepted_total/rejected_total/max_in_flight.

Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
  port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
  `LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
  backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
  least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
  `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
  `/internal/auto-login/status` lists all instances.

Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
  `GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.

Made-with: Cursor
2026-04-18 07:40:32 +08:00

70 lines
2.3 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 网关监听地址
HOST=0.0.0.0
# 网关监听端口
PORT=8317
# API Key可配置多个逗号分隔
API_KEYS=sk-your-api-key
# 独立的 /metrics 鉴权 token留空则退化为 API_KEYS 也可访问;若连 API_KEYS 都没配,/metrics 为公开)
METRICS_TOKEN=
# 日志级别DEBUG / INFO / WARNING / ERROR
LOG_LEVEL=INFO
# /v1/chat/completions 并发上限(<=0 表示不限流)
GATEWAY_MAX_IN_FLIGHT=4
# 排队等待超时秒数,超过后返回 429 + Retry-After
GATEWAY_QUEUE_TIMEOUT_SEC=30
# 容器内 Lingma 二进制路径
LINGMA_BIN=/app/data/bin/Lingma
# Lingma 获取方式marketplace 或 vsix
LINGMA_SOURCE_TYPE=marketplace
# Marketplace 发布者
LINGMA_MARKETPLACE_PUBLISHER=Alibaba-Cloud
# Marketplace 扩展名
LINGMA_MARKETPLACE_EXTENSION=tongyi-lingma
# VSIX 下载地址(最新优先)
LINGMA_VSIX_URL=https://tongyi-code.oss-cn-hangzhou.aliyuncs.com/vscode/tongyi-lingma-latest.vsix
# 启动时总是尝试从 VSIX 刷新二进制
LINGMA_BOOTSTRAP_ALWAYS=true
# 强制刷新true 时忽略本地缓存)
LINGMA_FORCE_REFRESH=false
# Lingma 工作目录(登录/会话数据)
LINGMA_WORK_DIR=/app/data/.lingma/vscode/sharedClientCache
# Lingma WebSocket 端口
LINGMA_SOCKET_PORT=36510
# Lingma 启动等待秒数
LINGMA_STARTUP_TIMEOUT=40
# 单次 RPC 超时秒数
LINGMA_RPC_TIMEOUT=30
# 默认模型(无法映射时使用)
DEFAULT_MODEL=org_auto
# 默认模式chat 或 agent
DEFAULT_ASK_MODE=chat
# 专属域(可选)
DEDICATED_DOMAIN_URL=
# 未登录时是否自动登录
AUTO_LOGIN_ENABLED=true
# 自动登录是否无头浏览器
AUTO_LOGIN_HEADLESS=true
# 自动登录超时秒数
AUTO_LOGIN_TIMEOUT=180
# 自动登录重试次数
AUTO_LOGIN_MAX_RETRY=2
# Lingma 登录用户名(仅当 LINGMA_ACCOUNTS 为空时生效,单实例模式)
LINGMA_USERNAME=
# Lingma 登录密码(仅当 LINGMA_ACCOUNTS 为空时生效)
LINGMA_PASSWORD=
# ==== 多实例池(方案乙:多账号) ====
# 多账号列表,支持两种格式:
# CSV: user1:pass1,user2:pass2
# JSON: [{"username":"u1","password":"p1"},{"username":"u2","password":"p2"}]
# 配置后每个账号对应一个独立 Lingma 实例(独立 workDir + 独立自动登录)
LINGMA_ACCOUNTS=
# 实例数量:默认等于 LINGMA_ACCOUNTS 数;显式指定时账号不足会循环复用并打 warning
LINGMA_INSTANCE_COUNT=