Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
reconnecting/failed/closed) and exponential backoff, so a Lingma
restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
added `developer`/`function` roles, `tools/tool_choice/stream_options/
user/max_tokens` fields, and `stream_options.include_usage` emits final
usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.
Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
accounting, `BackpressureRejected` raises 429 + `Retry-After` once
`GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
accepted_total/rejected_total/max_in_flight.
Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
`LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
`gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
`/internal/auto-login/status` lists all instances.
Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
`GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.
Made-with: Cursor
75 lines
2.2 KiB
Python
75 lines
2.2 KiB
Python
from __future__ import annotations
|
|
|
|
import hmac
|
|
|
|
from fastapi import HTTPException, Request, status
|
|
|
|
|
|
def _extract_bearer(request: Request) -> str:
|
|
auth = request.headers.get("authorization", "")
|
|
if not auth.startswith("Bearer "):
|
|
raise HTTPException(
|
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
|
detail={
|
|
"error": {
|
|
"message": "Missing or invalid Authorization header",
|
|
"type": "invalid_request_error",
|
|
"code": "invalid_api_key",
|
|
}
|
|
},
|
|
)
|
|
return auth[len("Bearer ") :].strip()
|
|
|
|
|
|
def _match_any(token: str, candidates: list[str]) -> bool:
|
|
for c in candidates:
|
|
if c and hmac.compare_digest(token, c):
|
|
return True
|
|
return False
|
|
|
|
|
|
def require_bearer(request: Request, api_keys: list[str]) -> None:
|
|
# Empty api_keys means auth is disabled (keeps the old behavior).
|
|
if not api_keys:
|
|
return
|
|
token = _extract_bearer(request)
|
|
if not _match_any(token, api_keys):
|
|
raise HTTPException(
|
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
|
detail={
|
|
"error": {
|
|
"message": "Invalid API key",
|
|
"type": "invalid_request_error",
|
|
"code": "invalid_api_key",
|
|
}
|
|
},
|
|
)
|
|
|
|
|
|
def require_metrics_access(
|
|
request: Request, api_keys: list[str], metrics_token: str
|
|
) -> None:
|
|
"""Allow metrics if any of: METRICS_TOKEN matches, or any API_KEYS match.
|
|
|
|
If neither METRICS_TOKEN nor API_KEYS are configured, metrics is public
|
|
(backwards compatible default).
|
|
"""
|
|
accepted: list[str] = []
|
|
if metrics_token:
|
|
accepted.append(metrics_token)
|
|
accepted.extend(api_keys)
|
|
if not accepted:
|
|
return
|
|
token = _extract_bearer(request)
|
|
if not _match_any(token, accepted):
|
|
raise HTTPException(
|
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
|
detail={
|
|
"error": {
|
|
"message": "Invalid metrics token",
|
|
"type": "invalid_request_error",
|
|
"code": "invalid_api_key",
|
|
}
|
|
},
|
|
)
|