Files
lingma-proxy-compose/docs/tool-emulation-checklist.md
coolxll df69105329 docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档
- 创建英文版工具模拟实现清单,涵盖13个核心实现面
- 添加中文版工具模拟实现清单,详细说明各项验收标准
- 编写英文版工具模拟方法论文档,阐述核心实现模式
- 补充中文版方法论文档,包括多轮调用与重试策略指导
- 实现HTTP API服务器测试,验证工具历史保持功能
- 新增工具模拟核心模块,包含工具定义提取与注入功能
- 添加拒绝检测、动作块解析等关键工具模拟组件
2026-03-30 15:35:23 +08:00

195 lines
4.9 KiB
Markdown

# Tool Emulation Checklist
This checklist is for implementation work.
It is not meant to explain the theory again. It breaks plain-chat tool emulation into concrete surfaces that can be implemented and validated incrementally.
## 1. Prompt Contract
- tell the model that tools are available
- list tool names, short descriptions, and schema summaries
- define a fixed action format
- define multi-turn rules
- encode `tool_choice` constraints
- include at least one valid action example
- ideally include one example where a tool result arrives and the model decides what to do next
Acceptance:
- the first turn reliably emits a valid action block
- later turns do not collapse into plain explanation after a tool result
## 2. Request Normalization
- OpenAI:
- parse `tools`
- parse `tool_choice`
- parse `assistant.tool_calls`
- parse `tool`
- Anthropic:
- parse `tools`
- parse `tool_choice`
- parse `tool_use`
- parse `tool_result`
- normalize everything into one internal structure
- detect tool history even when the current turn does not repeat `tools`
Acceptance:
- emulation stays active on later turns without repeated tool definitions
## 3. Tool History Projection
- project historical assistant tool calls back into action text
- do not pass downstream protocol-specific history directly to the upstream model
- preserve tool name, arguments, and call id where useful
Acceptance:
- the model can “see” its own previous actions in later turns
## 4. Tool Result Continuation
- do not feed raw tool output back without framing
- wrap tool results into an explicit continuation message
- handle empty, partial, and error outputs consistently
Acceptance:
- after a tool result, the model can either call another tool or finish naturally
## 5. Parser Contract
- recognize both ` ```json action ` and plain ` ```json `
- tolerate smart quotes, trailing commas, and stringified argument JSON
- extract `tool`, `name`, `parameters`, `arguments`, or `input`
- support multiple blocks in one reply
- strip action blocks from normal assistant text
Acceptance:
- multiple action blocks can be parsed reliably
## 6. Retry Policy
- trigger when:
- a tool call was expected but no action block was produced
- refusal language is detected
- `tool_choice=any`
- `tool_choice=tool`
- retry with a stricter message
- bound retry count
- log retry reason
Acceptance:
- refusal-style replies can be corrected without infinite loops
## 7. Refusal Detection
- maintain a refusal phrase set
- detect both hard refusals and soft “environment limitation” answers
- distinguish between:
- a legitimate no-tool answer
- a failed tool-use turn
Acceptance:
- common “tools are unavailable” replies trigger retry when appropriate
## 8. Response Re-encoding
- OpenAI:
- emit `message.tool_calls`
- set `finish_reason = tool_calls`
- Anthropic:
- emit `content[].tool_use`
- set `stop_reason = tool_use`
- preserve normal text when no tool call is present
Acceptance:
- downstream clients remain unaware that the upstream lacks native tools
## 9. Streaming Strategy
- OpenAI:
- role chunk
- text deltas
- tool call deltas
- Anthropic:
- `message_start`
- `content_block_start`
- `content_block_delta`
- `content_block_stop`
- `message_delta`
- `message_stop`
- document clearly when streaming is synthesized from a completed non-stream result
Acceptance:
- downstream stream consumers receive protocol-valid event sequences
## 10. Multi-turn State Machine
- distinguish at least:
- first decision
- tool call emitted
- waiting for tool result
- tool result received, next decision pending
- final answer
- derive state from message history, not only the current payload
- do not confuse “tool history exists” with “another tool call is mandatory”
Acceptance:
- agent loops remain stable across more than one turn
## 11. Observability
- log:
- whether emulation is active
- how many tool calls were parsed
- whether retry fired
- which refusal signal matched
- ideally log whether:
- the prompt contract was injected
- tool history was detected
Acceptance:
- failures can be localized to prompt, parser, retry, or state management
## 12. Test Matrix
- OpenAI:
- single-turn tool call
- multi-turn tool result continuation
- later turn without repeated `tools`
- forced tool
- `tool_choice=any`
- Anthropic:
- single-turn `tool_use`
- multi-turn `tool_result` continuation
- later turn without repeated `tools`
- streaming `tool_use`
- error cases:
- refusal
- invalid JSON
- multiple action blocks
- plain-text final answer
Acceptance:
- both “first tool turn” and “second-turn continuation” are covered
## 13. Recommended Next Priorities
If the system already works, the highest-value next improvements are:
1. stronger few-shot for “tool result arrives, then call another tool”
2. better history-aware retry policy
3. finer refusal categories
4. stronger parser tolerance
5. richer streaming behavior