docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档

- 创建英文版工具模拟实现清单，涵盖13个核心实现面 - 添加中文版工具模拟实现清单，详细说明各项验收标准 - 编写英文版工具模拟方法论文档，阐述核心实现模式 - 补充中文版方法论文档，包括多轮调用与重试策略指导 - 实现HTTP API服务器测试，验证工具历史保持功能 - 新增工具模拟核心模块，包含工具定义提取与注入功能 - 添加拒绝检测、动作块解析等关键工具模拟组件
2026-03-30 15:35:23 +08:00
parent eded45eb5d
commit df69105329
6 changed files with 1706 additions and 0 deletions
--- a/docs/tool-emulation-methodology.md
+++ b/docs/tool-emulation-methodology.md
@@ -0,0 +1,131 @@
+# Methodology: Simulating Tool Calls over a Plain Chat API
+
+This document describes a practical pattern for supporting tool calling when the upstream model only exposes a plain chat API.
+
+The core idea is:
+
+1. Convert downstream tool definitions into a prompt-level contract.
+2. Ask the model to emit structured action text.
+3. Parse that action text in the proxy.
+4. Re-encode it back into standard protocol fields such as OpenAI `tool_calls` or Anthropic `tool_use`.
+
+## Core Pattern
+
+When the upstream model does not support native tool calls, do not rely on blindly forwarding `tools`.
+
+Instead:
+
+- treat the model as a text generator
+- define a stable action DSL
+- keep the proxy responsible for state, retries, parsing, and protocol mapping
+
+In this project the action DSL is a fenced block:
+
+```text
+```json action
+{"tool":"NAME","parameters":{"key":"value"}}
+```
+```
+
+## What the Proxy Must Do
+
+The proxy is not a passive transport anymore. Once tool emulation is enabled, it should:
+
+- inject tool definitions into the prompt
+- preserve tool history across turns
+- project historical tool calls back into action text
+- wrap tool results into a continuation prompt
+- detect refusal patterns such as “I don't have tools”
+- retry with a stronger instruction when a tool call was expected but missing
+- map parsed actions back into downstream protocol fields
+
+## Multi-turn Tool Calling
+
+Single-turn emulation is not enough. A useful agent loop looks like this:
+
+1. model emits a tool call
+2. external executor runs the tool
+3. tool result is fed back into the conversation
+4. model decides whether to call another tool or finish
+
+To make this stable:
+
+- do not feed tool results back as raw text only
+- wrap them in a continuation message that clearly asks for the next action
+- keep emulation active even when later turns do not repeat the original `tools` field
+
+That last point matters. Many clients send `tools` only on the first turn. The proxy should still keep the conversation in emulation mode when it sees tool history.
+
+## Few-shot Guidance
+
+The minimum few-shot should teach the model the output shape.
+
+A better few-shot also teaches state transitions:
+
+- when to call a tool
+- when to wait for the tool result
+- when to call another tool
+- when to answer normally
+
+For complex agent loops, a multi-step example with:
+
+- user request
+- assistant tool call
+- tool result
+- assistant next action
+
+is usually more effective than a single static action example.
+
+## Retry Guidance
+
+Retry is useful when:
+
+- a tool call was expected but no action block was produced
+- the model says tools are unavailable
+- the request forces tool usage
+
+A retry prompt should be explicit and procedural, for example:
+
+```text
+Your last response did not include any ```json action``` block.
+You must respond with at least one valid action block now.
+Do not explain. Output the action block directly.
+```
+
+Retries should be bounded. A small retry budget plus stronger instructions per retry is usually enough.
+
+## Protocol Mapping
+
+OpenAI side:
+
+- input may contain `tools`, `tool_choice`, `assistant.tool_calls`, and `tool`
+- output should map back into `message.tool_calls` and `finish_reason = "tool_calls"`
+
+Anthropic side:
+
+- input may contain `tools`, `tool_choice`, `tool_use`, and `tool_result`
+- output should map back into `content[].tool_use` and `stop_reason = "tool_use"`
+
+## Common Failure Modes
+
+- only supporting the first tool turn
+- losing emulation state on later turns
+- not projecting historical tool calls back into text
+- feeding back raw tool results without continuation instructions
+- missing refusal detection
+- using a parser that is too brittle for real model output
+
+## In This Repository
+
+The implementation here follows exactly this pattern:
+
+- downstream tool schemas are rewritten into prompt instructions
+- the model emits `json action` blocks
+- the proxy parses them
+- the proxy re-encodes them as OpenAI or Anthropic tool protocol fields
+- later turns can continue from tool history even when `tools` are not repeated
+
+Implementation checklist:
+
+- [tool-emulation-checklist.md](./tool-emulation-checklist.md)
+