prod hardening: admin/metrics authz split, subprocess lifecycle, parallel pool start, HEALTHCHECK

- authz: new ADMIN_TOKEN gates /internal/*; METRICS_PUBLIC=false by default, so
  /metrics returns 503 when neither METRICS_TOKEN nor API_KEYS is set
  (previously leaked pool topology). Startup logs loudly if API_KEYS is empty
  or admin falls back to chat keys.
- lingma_client: keep a Popen handle instead of orphaning Lingma with
  start_new_session, drain stderr to logger at DEBUG, SIGTERM -> 5s grace ->
  SIGKILL on shutdown. Fixes the zombie-process leak on container reload.
- pool: asyncio.gather to start N instances concurrently; N=2 pool shaves
  ~startup_timeout seconds off boot.
- Dockerfile: HEALTHCHECK hits /healthz and greps for pool_ready>0 so Docker
  / compose orchestrators see "stuck on login" as unhealthy.

Made-with: Cursor
This commit is contained in:
GitHub Actions
2026-04-18 10:22:13 +08:00
parent 3130533888
commit 2febc37c2c
8 changed files with 248 additions and 28 deletions

View File

@@ -22,6 +22,8 @@ class Settings:
port: int
api_keys: list[str]
metrics_token: str
admin_token: str
metrics_public: bool
log_level: str
gateway_max_in_flight: int
gateway_queue_timeout_sec: float
@@ -151,6 +153,8 @@ def load_settings() -> Settings:
port=int(os.getenv("PORT", "8317")),
api_keys=api_keys,
metrics_token=os.getenv("METRICS_TOKEN", "").strip(),
admin_token=os.getenv("ADMIN_TOKEN", "").strip(),
metrics_public=_bool_env("METRICS_PUBLIC", False),
log_level=os.getenv("LOG_LEVEL", "INFO").strip() or "INFO",
gateway_max_in_flight=int(os.getenv("GATEWAY_MAX_IN_FLIGHT", "4")),
gateway_queue_timeout_sec=float(os.getenv("GATEWAY_QUEUE_TIMEOUT_SEC", "30")),