prod hardening: admin/metrics authz split, subprocess lifecycle, parallel pool start, HEALTHCHECK
- authz: new ADMIN_TOKEN gates /internal/*; METRICS_PUBLIC=false by default, so /metrics returns 503 when neither METRICS_TOKEN nor API_KEYS is set (previously leaked pool topology). Startup logs loudly if API_KEYS is empty or admin falls back to chat keys. - lingma_client: keep a Popen handle instead of orphaning Lingma with start_new_session, drain stderr to logger at DEBUG, SIGTERM -> 5s grace -> SIGKILL on shutdown. Fixes the zombie-process leak on container reload. - pool: asyncio.gather to start N instances concurrently; N=2 pool shaves ~startup_timeout seconds off boot. - Dockerfile: HEALTHCHECK hits /healthz and greps for pool_ready>0 so Docker / compose orchestrators see "stuck on login" as unhealthy. Made-with: Cursor
This commit is contained in:
@@ -183,16 +183,14 @@ class LingmaPool:
|
||||
# -------------------------------------------------------------- lifecycle
|
||||
|
||||
async def start(self) -> None:
|
||||
"""Start all instances sequentially.
|
||||
"""Boot every pool instance in parallel.
|
||||
|
||||
Sequential startup avoids racing on the shared ~/.lingma/.info file (for
|
||||
pool-mode we skip it anyway, but Lingma may still write there internally)
|
||||
and keeps docker logs readable. Failures are non-fatal; per-instance
|
||||
reconnect loops will take over.
|
||||
Bundle restore is still sequential (cheap, filesystem-level) and logged
|
||||
per instance; only the expensive `client.start()` path — which waits on
|
||||
the Lingma socket and an LSP initialize round-trip — runs concurrently.
|
||||
|
||||
Before spawning each Lingma process we optionally restore a pre-captured
|
||||
session bundle into the workDir, which lets us skip Playwright login
|
||||
entirely on a fresh volume.
|
||||
Any one instance failing is non-fatal: per-instance reconnect loops
|
||||
take over once their first `ensure_ready()` fires.
|
||||
"""
|
||||
for inst in self._instances:
|
||||
self._maybe_apply_session_bundle(inst)
|
||||
@@ -208,11 +206,18 @@ class LingmaPool:
|
||||
),
|
||||
is_logged_in_workdir(inst.cfg.work_dir),
|
||||
)
|
||||
|
||||
async def _start_one(inst: PoolInstance) -> None:
|
||||
try:
|
||||
await inst.client.start()
|
||||
except Exception as exc:
|
||||
logger.warning("pool start %s failed: %s", inst.name, exc)
|
||||
|
||||
await asyncio.gather(
|
||||
*(_start_one(inst) for inst in self._instances),
|
||||
return_exceptions=False,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _maybe_apply_session_bundle(inst: "PoolInstance") -> None:
|
||||
"""Restore an exported Lingma session into inst.work_dir, if needed.
|
||||
|
||||
Reference in New Issue
Block a user