Commit Graph

10 Commits

Author SHA1 Message Date
GitHub Actions
dfdb7087dc perf: session reuse for multi-turn latency
- Add SessionCache (LRU + TTL, per-API-key scoped) mapping
  conversation-prefix hash -> upstream Lingma sessionId.
- Hash only user/system/developer turns so client-side
  assistant reformatting doesn't invalidate the key.
- On cache hit: reuse sessionId, send only the latest user
  message with isReply=true, and stick the request to the
  instance that originally served it.
- LingmaGatewayClient.chat_complete/chat_stream accept
  session_id/is_reply and report the real finish.sessionId
  via out_meta so we persist what Lingma actually allocated.
- Invalidate cache on non-stream failure; skip writes on
  cancelled/partial streams.
- Expose cache stats in /internal/stats and /metrics.
- Configurable via SESSION_REUSE_ENABLED / SESSION_CACHE_MAX_ENTRIES
  / SESSION_CACHE_TTL_SEC (documented in README + .env.example).

Made-with: Cursor
2026-04-18 08:10:39 +08:00
GitHub Actions
d209d8ac0b perf: stop blocking on chat/ask RPC timeout (fixes ~30s TTFB)
Lingma streams answers via chat/answer + chat/finish notifications and
never sends a JSON-RPC response for chat/ask. The old code awaited
rpc.request("chat/ask") and swallowed the TimeoutError, so every chat
was forced to wait the full rpc_timeout (default 30s) before draining
the stream queue - even though the first token was already present in
the queue within ~2s.

Effect:
- non-stream TTFB dropped from ~30s to actual upstream latency (~2-3s).
- stream first-chunk dropped from ~30s to upstream first-token latency.
- consume_stream idle timeout decoupled from rpc_timeout so shortening
  rpc_timeout no longer starves long completions.

Switch chat/ask to rpc.notify (fire-and-forget) and rely entirely on the
existing chat/answer + chat/finish handlers for result delivery.

Made-with: Cursor
2026-04-18 07:54:45 +08:00
GitHub Actions
707acc9005 feat: M1+M2 gateway hardening and multi-instance pool
Behavior hardening (M1):
- Fix `_chat_streams` memory leak: pop_stream on completion, error, and
  client disconnect.
- Add WebSocket reconnect with state machine (stopped/starting/ready/
  reconnecting/failed/closed) and exponential backoff, so a Lingma
  restart no longer requires restarting the gateway.
- Lazy initialization: startup failure is non-fatal, first real request
  triggers retry, `/healthz` reflects readiness.
- Migrate FastAPI on_event to lifespan.
- Structured JSON logging with request_id ContextVar; `x-request-id`
  propagated to responses.
- SSE now sets `Cache-Control: no-cache`, `X-Accel-Buffering: no` to
  defeat proxy buffering.
- OpenAI schema compatibility: `content` accepts str | list[parts] | None,
  added `developer`/`function` roles, `tools/tool_choice/stream_options/
  user/max_tokens` fields, and `stream_options.include_usage` emits final
  usage chunk.
- `require_bearer` uses `hmac.compare_digest`; `/metrics` now requires
  Bearer when `METRICS_TOKEN` or `API_KEYS` are set.
- Python 3.10/3.11 `TimeoutError` vs `asyncio.TimeoutError` unified.
- Error responses no longer leak `auto_login.status()` details.

Backpressure (M2 / A2):
- New `InFlightGuard` with per-request ticket, queue + rejection
  accounting, `BackpressureRejected` raises 429 + `Retry-After` once
  `GATEWAY_QUEUE_TIMEOUT_SEC` elapses.
- Streaming ticket ownership transfers to the generator so CancelledError
  from client disconnect still releases the slot.
- `/internal/stats.concurrency` and `/metrics` expose in_flight/queued/
  accepted_total/rejected_total/max_in_flight.

Multi-instance pool (M2 / A1 + B3):
- New `LingmaPool` with N processes, each with its own workDir, socket
  port (dynamic when N>1), and `AutoLoginManager`.
- Account parser supports CSV (`u1:p1,u2:p2`) and JSON formats via
  `LINGMA_ACCOUNTS`; falls back to `LINGMA_USERNAME/LINGMA_PASSWORD` for
  backwards compatibility (N=1 keeps legacy paths/ports).
- Routing: sticky affinity by `user` / system-prompt hash, then
  least-in-flight, finally round-robin fallback for unhealthy pool.
- `/healthz` reports per-instance state and ready count.
- `/internal/stats.pool` and `/metrics` expose per-instance
  `gateway_pool_instance_in_flight{name}` / `gateway_pool_instance_ready{name}`.
- `/internal/auto-login/start?instance=inst-N` targets a specific instance;
  `/internal/auto-login/status` lists all instances.

Compat notes:
- `.env.example` adds `METRICS_TOKEN`, `LOG_LEVEL`, `GATEWAY_MAX_IN_FLIGHT`,
  `GATEWAY_QUEUE_TIMEOUT_SEC`, `LINGMA_ACCOUNTS`, `LINGMA_INSTANCE_COUNT`.
- `.gitignore` cleaned up data/ duplication.
- Existing single-instance deployments keep working without config change.

Made-with: Cursor
2026-04-18 07:40:32 +08:00
root
c1e261aa14 refactor: move runtime state under project data directory
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:57:51 +08:00
root
e41ee8bcc8 fix: treat 200 login API response as provisional success
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:43:57 +08:00
root
d12668201f fix: capture login API via response event listener
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:40:25 +08:00
root
0c9fdd53c9 fix: verify /users/ajax/login success in auto login flow
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 15:31:02 +08:00
root
5f0c1866a6 fix: harden auto-login selectors and poll auth status
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 14:40:28 +08:00
root
b621c4aca7 feat: bootstrap Lingma from latest marketplace VSIX
Some checks failed
CI / lint-and-compile (push) Has been cancelled
CI / lint-and-compile (pull_request) Has been cancelled
2026-04-17 10:44:37 +08:00
root
5526779e98 chore: initialize clean history without secrets
Some checks failed
CI / lint-and-compile (push) Has been cancelled
2026-04-17 09:56:08 +08:00