Files
lingma-openai-gateway/app/lingma_client.py
GitHub Actions d209d8ac0b perf: stop blocking on chat/ask RPC timeout (fixes ~30s TTFB)
Lingma streams answers via chat/answer + chat/finish notifications and
never sends a JSON-RPC response for chat/ask. The old code awaited
rpc.request("chat/ask") and swallowed the TimeoutError, so every chat
was forced to wait the full rpc_timeout (default 30s) before draining
the stream queue - even though the first token was already present in
the queue within ~2s.

Effect:
- non-stream TTFB dropped from ~30s to actual upstream latency (~2-3s).
- stream first-chunk dropped from ~30s to upstream first-token latency.
- consume_stream idle timeout decoupled from rpc_timeout so shortening
  rpc_timeout no longer starves long completions.

Switch chat/ask to rpc.notify (fire-and-forget) and rely entirely on the
existing chat/answer + chat/finish handlers for result delivery.

Made-with: Cursor
2026-04-18 07:54:45 +08:00

23 KiB