prod hardening: admin/metrics authz split, subprocess lifecycle, parallel pool start, HEALTHCHECK
- authz: new ADMIN_TOKEN gates /internal/*; METRICS_PUBLIC=false by default, so /metrics returns 503 when neither METRICS_TOKEN nor API_KEYS is set (previously leaked pool topology). Startup logs loudly if API_KEYS is empty or admin falls back to chat keys. - lingma_client: keep a Popen handle instead of orphaning Lingma with start_new_session, drain stderr to logger at DEBUG, SIGTERM -> 5s grace -> SIGKILL on shutdown. Fixes the zombie-process leak on container reload. - pool: asyncio.gather to start N instances concurrently; N=2 pool shaves ~startup_timeout seconds off boot. - Dockerfile: HEALTHCHECK hits /healthz and greps for pool_ready>0 so Docker / compose orchestrators see "stuck on login" as unhealthy. Made-with: Cursor
This commit is contained in:
@@ -2,10 +2,14 @@
|
|||||||
HOST=0.0.0.0
|
HOST=0.0.0.0
|
||||||
# 网关监听端口
|
# 网关监听端口
|
||||||
PORT=8317
|
PORT=8317
|
||||||
# API Key,可配置多个(逗号分隔)
|
# API Key,可配置多个(逗号分隔)。空 = 不鉴权(启动会打 warning),仅用于本地 dev
|
||||||
API_KEYS=sk-your-api-key
|
API_KEYS=sk-your-api-key
|
||||||
# 独立的 /metrics 鉴权 token(留空则退化为 API_KEYS 也可访问;若连 API_KEYS 都没配,/metrics 为公开)
|
# 独立的 /metrics 鉴权 token(留空则退化为 API_KEYS 亦可访问;若与 API_KEYS 同时为空,/metrics 默认 503)
|
||||||
METRICS_TOKEN=
|
METRICS_TOKEN=
|
||||||
|
# 显式把 /metrics 设为公开(仅在私网采集器场景使用)
|
||||||
|
METRICS_PUBLIC=false
|
||||||
|
# 独立的 /internal/* 管理 token(留空则退化为 API_KEYS);强烈建议生产环境单独配置
|
||||||
|
ADMIN_TOKEN=
|
||||||
# 日志级别(DEBUG / INFO / WARNING / ERROR)
|
# 日志级别(DEBUG / INFO / WARNING / ERROR)
|
||||||
LOG_LEVEL=INFO
|
LOG_LEVEL=INFO
|
||||||
|
|
||||||
|
|||||||
11
Dockerfile
11
Dockerfile
@@ -17,4 +17,15 @@ COPY app /app/app
|
|||||||
|
|
||||||
EXPOSE 8317
|
EXPOSE 8317
|
||||||
|
|
||||||
|
# Container-level health signal. Docker Compose / orchestrators rely on this
|
||||||
|
# to stop sending traffic when the pool is wedged, restart unhealthy replicas,
|
||||||
|
# and drive rolling deploys. /healthz returns ok=true only when at least one
|
||||||
|
# Lingma instance is in state=ready, so it catches the "stuck on login" case
|
||||||
|
# that a raw TCP probe would miss.
|
||||||
|
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
|
||||||
|
CMD python -c "import os,json,urllib.request,sys; \
|
||||||
|
port=os.environ.get('PORT','8317'); \
|
||||||
|
r=urllib.request.urlopen(f'http://127.0.0.1:{port}/healthz', timeout=3); \
|
||||||
|
sys.exit(0 if json.load(r).get('ok') else 1)" || exit 1
|
||||||
|
|
||||||
CMD ["sh", "-c", "python /app/app/bootstrap_lingma.py && uvicorn app.main:app --host ${HOST:-0.0.0.0} --port ${PORT:-8317}"]
|
CMD ["sh", "-c", "python /app/app/bootstrap_lingma.py && uvicorn app.main:app --host ${HOST:-0.0.0.0} --port ${PORT:-8317}"]
|
||||||
|
|||||||
@@ -64,7 +64,9 @@ cp .env.example .env
|
|||||||
- `AUTO_LOGIN_MAX_RETRY`:自动登录重试次数
|
- `AUTO_LOGIN_MAX_RETRY`:自动登录重试次数
|
||||||
- `LINGMA_USERNAME`:Lingma 登录用户名
|
- `LINGMA_USERNAME`:Lingma 登录用户名
|
||||||
- `LINGMA_PASSWORD`:Lingma 登录密码
|
- `LINGMA_PASSWORD`:Lingma 登录密码
|
||||||
- `METRICS_TOKEN`:`/metrics` 独立鉴权 token(留空则 `API_KEYS` 也可访问;两者都留空时 `/metrics` 为公开)
|
- `METRICS_TOKEN`:`/metrics` 独立鉴权 token(留空则 `API_KEYS` 也可访问;两者皆空时 `/metrics` 默认 503,除非显式开 `METRICS_PUBLIC=true`)
|
||||||
|
- `METRICS_PUBLIC`:显式把 `/metrics` 设为公开,仅在私网采集器场景使用(默认 `false`)
|
||||||
|
- `ADMIN_TOKEN`:`/internal/*` 管理端点独立鉴权 token(留空则退化为 `API_KEYS`)。生产环境建议单独配置,这样轮换 `API_KEYS` 不需要重新颁发 session bundle 导出权限
|
||||||
- `LOG_LEVEL`:日志级别(默认 `INFO`,输出结构化 JSON,包含 `request_id`)
|
- `LOG_LEVEL`:日志级别(默认 `INFO`,输出结构化 JSON,包含 `request_id`)
|
||||||
- `GATEWAY_MAX_IN_FLIGHT`:`/v1/chat/completions` 并发上限(默认 4,`<=0` 表示不限流)
|
- `GATEWAY_MAX_IN_FLIGHT`:`/v1/chat/completions` 并发上限(默认 4,`<=0` 表示不限流)
|
||||||
- `GATEWAY_QUEUE_TIMEOUT_SEC`:排队等待超时秒数(默认 30,超过后直接 429 + `Retry-After`)
|
- `GATEWAY_QUEUE_TIMEOUT_SEC`:排队等待超时秒数(默认 30,超过后直接 429 + `Retry-After`)
|
||||||
@@ -236,7 +238,8 @@ curl -s http://127.0.0.1:8317/metrics \
|
|||||||
- `usage.prompt_tokens/completion_tokens` 为估算值(按字节近似换算)。
|
- `usage.prompt_tokens/completion_tokens` 为估算值(按字节近似换算)。
|
||||||
- 非流式响应里会附带 `usage` 字段。
|
- 非流式响应里会附带 `usage` 字段。
|
||||||
- 流式响应可传 `stream_options: {"include_usage": true}` 让最后一帧返回 `usage`。
|
- 流式响应可传 `stream_options: {"include_usage": true}` 让最后一帧返回 `usage`。
|
||||||
- `/metrics` 默认需要 Bearer 鉴权:优先匹配 `METRICS_TOKEN`,否则接受 `API_KEYS` 里任意一个;两者都未配置时保持公开。
|
- `/metrics` 默认需要 Bearer 鉴权:优先匹配 `METRICS_TOKEN`,否则接受 `API_KEYS` 里任意一个;两者皆未配置时返回 503,显式 `METRICS_PUBLIC=true` 才公开。
|
||||||
|
- `/internal/*` 管理端点(auto-login, session export, models/raw, stats)默认走 `ADMIN_TOKEN`,未配置时退化为 `API_KEYS`;两者都未配置则 503。
|
||||||
|
|
||||||
## 6. 容器内自动登录
|
## 6. 容器内自动登录
|
||||||
|
|
||||||
|
|||||||
85
app/auth.py
85
app/auth.py
@@ -29,7 +29,8 @@ def _match_any(token: str, candidates: list[str]) -> bool:
|
|||||||
|
|
||||||
|
|
||||||
def require_bearer(request: Request, api_keys: list[str]) -> None:
|
def require_bearer(request: Request, api_keys: list[str]) -> None:
|
||||||
# Empty api_keys means auth is disabled (keeps the old behavior).
|
# Empty api_keys means auth is disabled (kept for local dev). The startup
|
||||||
|
# logger warns loudly in that case so it can't go unnoticed in prod.
|
||||||
if not api_keys:
|
if not api_keys:
|
||||||
return
|
return
|
||||||
token = _extract_bearer(request)
|
token = _extract_bearer(request)
|
||||||
@@ -47,19 +48,42 @@ def require_bearer(request: Request, api_keys: list[str]) -> None:
|
|||||||
|
|
||||||
|
|
||||||
def require_metrics_access(
|
def require_metrics_access(
|
||||||
request: Request, api_keys: list[str], metrics_token: str
|
request: Request,
|
||||||
|
api_keys: list[str],
|
||||||
|
metrics_token: str,
|
||||||
|
*,
|
||||||
|
public: bool = False,
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Allow metrics if any of: METRICS_TOKEN matches, or any API_KEYS match.
|
"""Gate /metrics.
|
||||||
|
|
||||||
If neither METRICS_TOKEN nor API_KEYS are configured, metrics is public
|
Resolution order:
|
||||||
(backwards compatible default).
|
1. `public=True` (METRICS_PUBLIC) — wide open, explicit opt-in for
|
||||||
|
sidecar scrapers on a private network.
|
||||||
|
2. `METRICS_TOKEN` configured — must match.
|
||||||
|
3. `API_KEYS` configured — any configured API key works.
|
||||||
|
4. Nothing configured at all — 503 (scraping disabled) so we don't
|
||||||
|
silently leak the pool topology on an un-hardened deployment.
|
||||||
"""
|
"""
|
||||||
|
if public:
|
||||||
|
return
|
||||||
accepted: list[str] = []
|
accepted: list[str] = []
|
||||||
if metrics_token:
|
if metrics_token:
|
||||||
accepted.append(metrics_token)
|
accepted.append(metrics_token)
|
||||||
accepted.extend(api_keys)
|
accepted.extend(api_keys)
|
||||||
if not accepted:
|
if not accepted:
|
||||||
return
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": (
|
||||||
|
"metrics scraping is disabled: set METRICS_TOKEN, "
|
||||||
|
"API_KEYS, or METRICS_PUBLIC=true"
|
||||||
|
),
|
||||||
|
"type": "service_unavailable",
|
||||||
|
"code": "metrics_disabled",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
token = _extract_bearer(request)
|
token = _extract_bearer(request)
|
||||||
if not _match_any(token, accepted):
|
if not _match_any(token, accepted):
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
@@ -72,3 +96,52 @@ def require_metrics_access(
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def require_admin_access(
|
||||||
|
request: Request,
|
||||||
|
api_keys: list[str],
|
||||||
|
admin_token: str,
|
||||||
|
) -> None:
|
||||||
|
"""Gate /internal/* admin endpoints.
|
||||||
|
|
||||||
|
Resolution order:
|
||||||
|
1. `ADMIN_TOKEN` configured — must match exactly.
|
||||||
|
2. Otherwise fall back to the regular API_KEYS (single-tenant deploys).
|
||||||
|
3. If nothing is configured — 503 so we never expose auto-login /
|
||||||
|
session-export on an unauthenticated gateway.
|
||||||
|
|
||||||
|
Backwards compat: existing deployments that only set `API_KEYS` keep
|
||||||
|
working; add ADMIN_TOKEN in .env when you want a dedicated split.
|
||||||
|
"""
|
||||||
|
accepted: list[str] = []
|
||||||
|
if admin_token:
|
||||||
|
accepted.append(admin_token)
|
||||||
|
else:
|
||||||
|
accepted.extend(api_keys)
|
||||||
|
if not accepted:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": (
|
||||||
|
"admin endpoints disabled: configure ADMIN_TOKEN "
|
||||||
|
"(recommended) or API_KEYS"
|
||||||
|
),
|
||||||
|
"type": "service_unavailable",
|
||||||
|
"code": "admin_disabled",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
token = _extract_bearer(request)
|
||||||
|
if not _match_any(token, accepted):
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||||
|
detail={
|
||||||
|
"error": {
|
||||||
|
"message": "Invalid admin token",
|
||||||
|
"type": "invalid_request_error",
|
||||||
|
"code": "invalid_api_key",
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|||||||
@@ -22,6 +22,8 @@ class Settings:
|
|||||||
port: int
|
port: int
|
||||||
api_keys: list[str]
|
api_keys: list[str]
|
||||||
metrics_token: str
|
metrics_token: str
|
||||||
|
admin_token: str
|
||||||
|
metrics_public: bool
|
||||||
log_level: str
|
log_level: str
|
||||||
gateway_max_in_flight: int
|
gateway_max_in_flight: int
|
||||||
gateway_queue_timeout_sec: float
|
gateway_queue_timeout_sec: float
|
||||||
@@ -151,6 +153,8 @@ def load_settings() -> Settings:
|
|||||||
port=int(os.getenv("PORT", "8317")),
|
port=int(os.getenv("PORT", "8317")),
|
||||||
api_keys=api_keys,
|
api_keys=api_keys,
|
||||||
metrics_token=os.getenv("METRICS_TOKEN", "").strip(),
|
metrics_token=os.getenv("METRICS_TOKEN", "").strip(),
|
||||||
|
admin_token=os.getenv("ADMIN_TOKEN", "").strip(),
|
||||||
|
metrics_public=_bool_env("METRICS_PUBLIC", False),
|
||||||
log_level=os.getenv("LOG_LEVEL", "INFO").strip() or "INFO",
|
log_level=os.getenv("LOG_LEVEL", "INFO").strip() or "INFO",
|
||||||
gateway_max_in_flight=int(os.getenv("GATEWAY_MAX_IN_FLIGHT", "4")),
|
gateway_max_in_flight=int(os.getenv("GATEWAY_MAX_IN_FLIGHT", "4")),
|
||||||
gateway_queue_timeout_sec=float(os.getenv("GATEWAY_QUEUE_TIMEOUT_SEC", "30")),
|
gateway_queue_timeout_sec=float(os.getenv("GATEWAY_QUEUE_TIMEOUT_SEC", "30")),
|
||||||
|
|||||||
@@ -316,6 +316,11 @@ class LingmaGatewayClient:
|
|||||||
self._ready_event = asyncio.Event()
|
self._ready_event = asyncio.Event()
|
||||||
self._reconnect_task: asyncio.Task | None = None
|
self._reconnect_task: asyncio.Task | None = None
|
||||||
self._last_error: str = ""
|
self._last_error: str = ""
|
||||||
|
# Lingma subprocess handle. Kept so we can reap on shutdown and read
|
||||||
|
# stderr for debugging (pre-v0.4 we forked with DEVNULL + new_session
|
||||||
|
# which orphaned the process and hid crash logs).
|
||||||
|
self._proc: subprocess.Popen | None = None
|
||||||
|
self._stderr_task: asyncio.Task | None = None
|
||||||
|
|
||||||
# ------------------------------------------------------------------ state
|
# ------------------------------------------------------------------ state
|
||||||
|
|
||||||
@@ -359,6 +364,76 @@ class LingmaGatewayClient:
|
|||||||
if self._ws:
|
if self._ws:
|
||||||
with contextlib.suppress(Exception):
|
with contextlib.suppress(Exception):
|
||||||
await self._ws.close()
|
await self._ws.close()
|
||||||
|
await self._terminate_proc()
|
||||||
|
if self._stderr_task and not self._stderr_task.done():
|
||||||
|
self._stderr_task.cancel()
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
await self._stderr_task
|
||||||
|
|
||||||
|
async def _drain_stderr(self, proc: subprocess.Popen) -> None:
|
||||||
|
"""Mirror Lingma stderr to the logger at DEBUG level.
|
||||||
|
|
||||||
|
Running in a worker thread (readline is blocking) and dumping lines
|
||||||
|
through logger.debug means crashes like native-module load failures
|
||||||
|
are visible when LOG_LEVEL=DEBUG but don't spam production logs.
|
||||||
|
"""
|
||||||
|
if proc.stderr is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
name = self.name
|
||||||
|
|
||||||
|
def reader() -> None:
|
||||||
|
try:
|
||||||
|
for line in iter(proc.stderr.readline, b""):
|
||||||
|
if not line:
|
||||||
|
break
|
||||||
|
text = line.decode("utf-8", errors="replace").rstrip()
|
||||||
|
if text:
|
||||||
|
logger.debug("[%s] lingma stderr: %s", name, text)
|
||||||
|
except Exception as exc: # pragma: no cover -- defensive
|
||||||
|
logger.debug("[%s] stderr drain aborted: %s", name, exc)
|
||||||
|
|
||||||
|
try:
|
||||||
|
await asyncio.to_thread(reader)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def _terminate_proc(self) -> None:
|
||||||
|
"""Reap the Lingma subprocess we spawned.
|
||||||
|
|
||||||
|
SIGTERM first with a short grace period, then SIGKILL. Blocking waits
|
||||||
|
are off-loaded to a thread so they don't stall the FastAPI shutdown
|
||||||
|
event loop. Idempotent: safe to call even if nothing was spawned.
|
||||||
|
"""
|
||||||
|
proc = self._proc
|
||||||
|
if proc is None:
|
||||||
|
return
|
||||||
|
self._proc = None
|
||||||
|
try:
|
||||||
|
if proc.poll() is None:
|
||||||
|
try:
|
||||||
|
proc.terminate()
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("[%s] proc.terminate failed: %s", self.name, exc)
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(asyncio.to_thread(proc.wait), timeout=5.0)
|
||||||
|
except TIMEOUT_EXCEPTIONS:
|
||||||
|
logger.warning(
|
||||||
|
"[%s] lingma (pid=%s) didn't exit in 5s, sending SIGKILL",
|
||||||
|
self.name,
|
||||||
|
proc.pid,
|
||||||
|
)
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
proc.kill()
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
await asyncio.wait_for(
|
||||||
|
asyncio.to_thread(proc.wait), timeout=3.0
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
# Close stderr pipe so the drain thread can exit cleanly.
|
||||||
|
if proc.stderr is not None:
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
proc.stderr.close()
|
||||||
|
|
||||||
async def ensure_ready(self, timeout: float | None = None) -> None:
|
async def ensure_ready(self, timeout: float | None = None) -> None:
|
||||||
"""Block until the RPC connection is usable, (re)connecting on demand."""
|
"""Block until the RPC connection is usable, (re)connecting on demand."""
|
||||||
@@ -413,12 +488,26 @@ class LingmaGatewayClient:
|
|||||||
self.lingma_bin,
|
self.lingma_bin,
|
||||||
self.work_dir,
|
self.work_dir,
|
||||||
)
|
)
|
||||||
subprocess.Popen(
|
# Reap any old proc from a previous connect attempt before spawning
|
||||||
|
# a fresh one so we never accumulate zombie Lingma instances.
|
||||||
|
await self._terminate_proc()
|
||||||
|
if self._stderr_task and not self._stderr_task.done():
|
||||||
|
self._stderr_task.cancel()
|
||||||
|
with contextlib.suppress(Exception):
|
||||||
|
await self._stderr_task
|
||||||
|
self._stderr_task = None
|
||||||
|
|
||||||
|
self._proc = subprocess.Popen(
|
||||||
[str(self.lingma_bin), "start", "--workDir", str(self.work_dir)],
|
[str(self.lingma_bin), "start", "--workDir", str(self.work_dir)],
|
||||||
cwd=str(self.lingma_bin.parent),
|
cwd=str(self.lingma_bin.parent),
|
||||||
stdout=subprocess.DEVNULL,
|
stdout=subprocess.DEVNULL,
|
||||||
stderr=subprocess.DEVNULL,
|
stderr=subprocess.PIPE,
|
||||||
start_new_session=True,
|
)
|
||||||
|
logger.info(
|
||||||
|
"[%s] lingma spawned (pid=%s)", self.name, self._proc.pid
|
||||||
|
)
|
||||||
|
self._stderr_task = asyncio.create_task(
|
||||||
|
self._drain_stderr(self._proc)
|
||||||
)
|
)
|
||||||
info, _, _ = _wait_info_any(info_paths, timeout_sec=self.startup_timeout)
|
info, _, _ = _wait_info_any(info_paths, timeout_sec=self.startup_timeout)
|
||||||
self.socket_port = info
|
self.socket_port = info
|
||||||
|
|||||||
@@ -183,16 +183,14 @@ class LingmaPool:
|
|||||||
# -------------------------------------------------------------- lifecycle
|
# -------------------------------------------------------------- lifecycle
|
||||||
|
|
||||||
async def start(self) -> None:
|
async def start(self) -> None:
|
||||||
"""Start all instances sequentially.
|
"""Boot every pool instance in parallel.
|
||||||
|
|
||||||
Sequential startup avoids racing on the shared ~/.lingma/.info file (for
|
Bundle restore is still sequential (cheap, filesystem-level) and logged
|
||||||
pool-mode we skip it anyway, but Lingma may still write there internally)
|
per instance; only the expensive `client.start()` path — which waits on
|
||||||
and keeps docker logs readable. Failures are non-fatal; per-instance
|
the Lingma socket and an LSP initialize round-trip — runs concurrently.
|
||||||
reconnect loops will take over.
|
|
||||||
|
|
||||||
Before spawning each Lingma process we optionally restore a pre-captured
|
Any one instance failing is non-fatal: per-instance reconnect loops
|
||||||
session bundle into the workDir, which lets us skip Playwright login
|
take over once their first `ensure_ready()` fires.
|
||||||
entirely on a fresh volume.
|
|
||||||
"""
|
"""
|
||||||
for inst in self._instances:
|
for inst in self._instances:
|
||||||
self._maybe_apply_session_bundle(inst)
|
self._maybe_apply_session_bundle(inst)
|
||||||
@@ -208,11 +206,18 @@ class LingmaPool:
|
|||||||
),
|
),
|
||||||
is_logged_in_workdir(inst.cfg.work_dir),
|
is_logged_in_workdir(inst.cfg.work_dir),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
async def _start_one(inst: PoolInstance) -> None:
|
||||||
try:
|
try:
|
||||||
await inst.client.start()
|
await inst.client.start()
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
logger.warning("pool start %s failed: %s", inst.name, exc)
|
logger.warning("pool start %s failed: %s", inst.name, exc)
|
||||||
|
|
||||||
|
await asyncio.gather(
|
||||||
|
*(_start_one(inst) for inst in self._instances),
|
||||||
|
return_exceptions=False,
|
||||||
|
)
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _maybe_apply_session_bundle(inst: "PoolInstance") -> None:
|
def _maybe_apply_session_bundle(inst: "PoolInstance") -> None:
|
||||||
"""Restore an exported Lingma session into inst.work_dir, if needed.
|
"""Restore an exported Lingma session into inst.work_dir, if needed.
|
||||||
|
|||||||
45
app/main.py
45
app/main.py
@@ -10,7 +10,7 @@ from contextlib import asynccontextmanager
|
|||||||
from fastapi import Depends, FastAPI, HTTPException, Request
|
from fastapi import Depends, FastAPI, HTTPException, Request
|
||||||
from fastapi.responses import JSONResponse, StreamingResponse
|
from fastapi.responses import JSONResponse, StreamingResponse
|
||||||
|
|
||||||
from .auth import require_bearer, require_metrics_access
|
from .auth import require_admin_access, require_bearer, require_metrics_access
|
||||||
from .concurrency import BackpressureRejected, InFlightGuard
|
from .concurrency import BackpressureRejected, InFlightGuard
|
||||||
from .config import Settings, load_settings
|
from .config import Settings, load_settings
|
||||||
from .lingma_pool import LingmaPool, PoolInstance
|
from .lingma_pool import LingmaPool, PoolInstance
|
||||||
@@ -76,6 +76,7 @@ async def lifespan(_app: FastAPI):
|
|||||||
pool.size(),
|
pool.size(),
|
||||||
settings.gateway_max_in_flight,
|
settings.gateway_max_in_flight,
|
||||||
)
|
)
|
||||||
|
_log_auth_posture()
|
||||||
await pool.start()
|
await pool.start()
|
||||||
try:
|
try:
|
||||||
yield
|
yield
|
||||||
@@ -121,7 +122,37 @@ def auth_guard(request: Request):
|
|||||||
|
|
||||||
|
|
||||||
def metrics_auth_guard(request: Request):
|
def metrics_auth_guard(request: Request):
|
||||||
require_metrics_access(request, settings.api_keys, settings.metrics_token)
|
require_metrics_access(
|
||||||
|
request,
|
||||||
|
settings.api_keys,
|
||||||
|
settings.metrics_token,
|
||||||
|
public=settings.metrics_public,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def admin_auth_guard(request: Request):
|
||||||
|
require_admin_access(request, settings.api_keys, settings.admin_token)
|
||||||
|
|
||||||
|
|
||||||
|
def _log_auth_posture() -> None:
|
||||||
|
"""Loud warnings on misconfigured auth so ops can't miss them."""
|
||||||
|
if not settings.api_keys:
|
||||||
|
logger.warning(
|
||||||
|
"AUTH DISABLED: API_KEYS is empty, /v1/* is wide open. "
|
||||||
|
"Set API_KEYS before exposing this gateway to anything "
|
||||||
|
"other than localhost."
|
||||||
|
)
|
||||||
|
if not settings.admin_token:
|
||||||
|
logger.warning(
|
||||||
|
"ADMIN_TOKEN not set: /internal/* reuses API_KEYS for auth. "
|
||||||
|
"For production set a dedicated ADMIN_TOKEN so rotating chat "
|
||||||
|
"keys doesn't require exporting the session bundle."
|
||||||
|
)
|
||||||
|
if settings.metrics_public:
|
||||||
|
logger.warning(
|
||||||
|
"METRICS_PUBLIC=true: /metrics is open. Only enable this "
|
||||||
|
"when the gateway is behind a private-network scraper."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@app.get("/healthz")
|
@app.get("/healthz")
|
||||||
@@ -563,7 +594,7 @@ async def v1_chat_completions(req: ChatCompletionsRequest, request: Request):
|
|||||||
ticket.release()
|
ticket.release()
|
||||||
|
|
||||||
|
|
||||||
@app.post("/internal/auto-login/start", dependencies=[Depends(auth_guard)])
|
@app.post("/internal/auto-login/start", dependencies=[Depends(admin_auth_guard)])
|
||||||
async def internal_auto_login_start(instance: str | None = None):
|
async def internal_auto_login_start(instance: str | None = None):
|
||||||
p = _require_pool()
|
p = _require_pool()
|
||||||
target = None
|
target = None
|
||||||
@@ -614,7 +645,7 @@ async def internal_auto_login_start(instance: str | None = None):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/internal/auto-login/status", dependencies=[Depends(auth_guard)])
|
@app.get("/internal/auto-login/status", dependencies=[Depends(admin_auth_guard)])
|
||||||
async def internal_auto_login_status():
|
async def internal_auto_login_status():
|
||||||
p = _require_pool()
|
p = _require_pool()
|
||||||
out = []
|
out = []
|
||||||
@@ -634,7 +665,7 @@ async def internal_auto_login_status():
|
|||||||
return {"ok": True, "instances": out}
|
return {"ok": True, "instances": out}
|
||||||
|
|
||||||
|
|
||||||
@app.post("/internal/session/export", dependencies=[Depends(auth_guard)])
|
@app.post("/internal/session/export", dependencies=[Depends(admin_auth_guard)])
|
||||||
async def internal_session_export(instance: str | None = None):
|
async def internal_session_export(instance: str | None = None):
|
||||||
"""Export a logged-in Lingma session as a base64 tar.gz bundle.
|
"""Export a logged-in Lingma session as a base64 tar.gz bundle.
|
||||||
|
|
||||||
@@ -693,7 +724,7 @@ async def internal_session_export(instance: str | None = None):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/internal/models/raw", dependencies=[Depends(auth_guard)])
|
@app.get("/internal/models/raw", dependencies=[Depends(admin_auth_guard)])
|
||||||
async def internal_models_raw(instance: str | None = None):
|
async def internal_models_raw(instance: str | None = None):
|
||||||
"""Return the raw `config/queryModels` response from Lingma.
|
"""Return the raw `config/queryModels` response from Lingma.
|
||||||
|
|
||||||
@@ -723,7 +754,7 @@ async def internal_models_raw(instance: str | None = None):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@app.get("/internal/stats", dependencies=[Depends(auth_guard)])
|
@app.get("/internal/stats", dependencies=[Depends(admin_auth_guard)])
|
||||||
async def internal_stats():
|
async def internal_stats():
|
||||||
p = _require_pool()
|
p = _require_pool()
|
||||||
return {
|
return {
|
||||||
|
|||||||
Reference in New Issue
Block a user