prod hardening: admin/metrics authz split, subprocess lifecycle, parallel pool start, HEALTHCHECK

- authz: new ADMIN_TOKEN gates /internal/*; METRICS_PUBLIC=false by default, so /metrics returns 503 when neither METRICS_TOKEN nor API_KEYS is set (previously leaked pool topology). Startup logs loudly if API_KEYS is empty or admin falls back to chat keys. - lingma_client: keep a Popen handle instead of orphaning Lingma with start_new_session, drain stderr to logger at DEBUG, SIGTERM -> 5s grace -> SIGKILL on shutdown. Fixes the zombie-process leak on container reload. - pool: asyncio.gather to start N instances concurrently; N=2 pool shaves ~startup_timeout seconds off boot. - Dockerfile: HEALTHCHECK hits /healthz and greps for pool_ready>0 so Docker / compose orchestrators see "stuck on login" as unhealthy. Made-with: Cursor
2026-04-18 10:22:13 +08:00
parent 3130533888
commit 2febc37c2c
8 changed files with 248 additions and 28 deletions
--- a/app/config.py
+++ b/app/config.py
@@ -22,6 +22,8 @@ class Settings:
    port: int
    api_keys: list[str]
    metrics_token: str
+    admin_token: str
+    metrics_public: bool
    log_level: str
    gateway_max_in_flight: int
    gateway_queue_timeout_sec: float
@@ -151,6 +153,8 @@ def load_settings() -> Settings:
        port=int(os.getenv("PORT", "8317")),
        api_keys=api_keys,
        metrics_token=os.getenv("METRICS_TOKEN", "").strip(),
+        admin_token=os.getenv("ADMIN_TOKEN", "").strip(),
+        metrics_public=_bool_env("METRICS_PUBLIC", False),
        log_level=os.getenv("LOG_LEVEL", "INFO").strip() or "INFO",
        gateway_max_in_flight=int(os.getenv("GATEWAY_MAX_IN_FLIGHT", "4")),
        gateway_queue_timeout_sec=float(os.getenv("GATEWAY_QUEUE_TIMEOUT_SEC", "30")),