feat: harden cache reuse semantics and expand protocol regressions

Stabilize cross-protocol ask-mode/streaming behavior and reduce session-reuse branch collisions, then add focused docs/tests for multimodal normalization and pool/stats/config paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 14:26:11 +08:00
parent b96b91e5b7
commit 12a4d9584e
9 changed files with 441 additions and 55 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -518,7 +518,7 @@ FastAPI `lifespan` 退出 → `pool.close()` → 每个 `client.close()` → 进
 ### 5.3 session cache 只哈希 user/system/developer 消息

 - **问题**：OpenAI 客户端常常会规范化 / 裁剪 assistant 消息（例如 trim 末尾空白、去掉思考内容），导致下一轮的 `messages[:-1]` 跟上一轮的 `messages` 不完全字节相等。
- **方案**：`hash_user_context` 只对 `system / user / developer` 三种 role 做 SHA1；assistant/tool 不参与。只要**用户输入路径**稳定，哈希就稳定。
+- **方案**：`hash_user_context` 只对 `system / user / developer` 三种 role 做 SHA1；assistant/tool 不参与。只要**用户输入路径**稳定，哈希就稳定。多模态会先在归一化阶段降级为占位符（如 `[image]` / `[audio]`）再参与哈希，因此会保留“模态存在”信号但不保留原始媒体内容。
 - **权衡**：理论上客户端篡改 assistant 语义（比如把模型的回答改成相反的）时，cache 依然命中，但 Lingma 侧自己持有 session 原版历史，下一轮还是按原版继续。对用户意图的偏离不可见。这是 OK 的——客户端本来就不该篡改 assistant 内容。

 ### 5.4 session cache 写入用 `write_key = hash(messages)`，查询用 `lookup_key = hash(messages[:-1])`