feat: M1+M2 gateway hardening and multi-instance pool
Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
reconnecting/failed/closed) and exponential backoff, so a Lingma
restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
added `developer`/`function` roles, `tools/tool_choice/stream_options/
user/max_tokens` fields, and `stream_options.include_usage` emits final
usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.
Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
accounting, `BackpressureRejected` raises 429 + `Retry-After` once
`GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
accepted_total/rejected_total/max_in_flight.
Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
`LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
`gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
`/internal/auto-login/status` lists all instances.
Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
`GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.
Made-with: Cursor
This commit is contained in:
24
README.md
24
README.md
@@ -64,6 +64,12 @@ cp .env.example .env
|
||||
- `AUTO_LOGIN_MAX_RETRY`:自动登录重试次数
|
||||
- `LINGMA_USERNAME`:Lingma 登录用户名
|
||||
- `LINGMA_PASSWORD`:Lingma 登录密码
|
||||
- `METRICS_TOKEN`:`/metrics` 独立鉴权 token(留空则 `API_KEYS` 也可访问;两者都留空时 `/metrics` 为公开)
|
||||
- `LOG_LEVEL`:日志级别(默认 `INFO`,输出结构化 JSON,包含 `request_id`)
|
||||
- `GATEWAY_MAX_IN_FLIGHT`:`/v1/chat/completions` 并发上限(默认 4,`<=0` 表示不限流)
|
||||
- `GATEWAY_QUEUE_TIMEOUT_SEC`:排队等待超时秒数(默认 30,超过后直接 429 + `Retry-After`)
|
||||
- `LINGMA_ACCOUNTS`:多账号实例池,格式 `u1:p1,u2:p2` 或 JSON 数组;配置后每个账号起一个独立 Lingma 子进程
|
||||
- `LINGMA_INSTANCE_COUNT`:实例数(默认等于账号数;显式指定且不足时账号会循环复用)
|
||||
|
||||
### `.env` 最小必填示例
|
||||
|
||||
@@ -85,7 +91,18 @@ DEDICATED_DOMAIN_URL=
|
||||
|
||||
- 本项目所有持久化数据都在 `./data`:
|
||||
- `data/bin/Lingma`:自动提取的 Lingma 二进制
|
||||
- `data/.lingma/...`:Lingma 登录态、缓存、日志
|
||||
- `data/.lingma/...`:Lingma 登录态、缓存、日志(单实例模式)
|
||||
- `data/.lingma/pool/inst-<i>/...`:多实例模式下每个实例独立的登录态/缓存
|
||||
|
||||
### 多实例池(方案乙:多账号)
|
||||
|
||||
启用方式:在 `.env` 里配置 `LINGMA_ACCOUNTS=u1:p1,u2:p2`,重启容器即可。
|
||||
|
||||
- 每个账号对应一个独立 Lingma 子进程,各自独立登录、独立 workDir。
|
||||
- 路由策略:同一 `user` 字段或同一 system prompt 的请求粘性路由到同一实例;其余按 least-in-flight 分配。
|
||||
- 一个实例挂了/断连不影响整体,`/healthz` 汇报 `pool_ready` 计数。
|
||||
- `/internal/stats.pool` 按实例粒度暴露状态,`/metrics` 增加 `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`。
|
||||
- 未配置 `LINGMA_ACCOUNTS` 时自动退化为单实例模式(沿用 `LINGMA_USERNAME/LINGMA_PASSWORD`),向下兼容。
|
||||
|
||||
## 3. Docker 运行
|
||||
|
||||
@@ -163,13 +180,16 @@ curl -s http://127.0.0.1:8317/internal/stats \
|
||||
```
|
||||
|
||||
```bash
|
||||
curl -s http://127.0.0.1:8317/metrics
|
||||
curl -s http://127.0.0.1:8317/metrics \
|
||||
-H "Authorization: Bearer ${METRICS_TOKEN:-sk-your-api-key}"
|
||||
```
|
||||
|
||||
说明:
|
||||
|
||||
- `usage.prompt_tokens/completion_tokens` 为估算值(按字节近似换算)。
|
||||
- 非流式响应里会附带 `usage` 字段。
|
||||
- 流式响应可传 `stream_options: {"include_usage": true}` 让最后一帧返回 `usage`。
|
||||
- `/metrics` 默认需要 Bearer 鉴权:优先匹配 `METRICS_TOKEN`,否则接受 `API_KEYS` 里任意一个;两者都未配置时保持公开。
|
||||
|
||||
## 6. 容器内自动登录
|
||||
|
||||
|
||||
Reference in New Issue
Block a user