Files

coolxll df69105329 docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档

- 创建英文版工具模拟实现清单，涵盖13个核心实现面
- 添加中文版工具模拟实现清单，详细说明各项验收标准
- 编写英文版工具模拟方法论文档，阐述核心实现模式
- 补充中文版方法论文档，包括多轮调用与重试策略指导
- 实现HTTP API服务器测试，验证工具历史保持功能
- 新增工具模拟核心模块，包含工具定义提取与注入功能
- 添加拒绝检测、动作块解析等关键工具模拟组件

2026-03-30 15:35:23 +08:00

4.9 KiB

Raw Blame History

Tool Emulation Checklist

This checklist is for implementation work.

It is not meant to explain the theory again. It breaks plain-chat tool emulation into concrete surfaces that can be implemented and validated incrementally.

1. Prompt Contract

tell the model that tools are available
list tool names, short descriptions, and schema summaries
define a fixed action format
define multi-turn rules
encode tool_choice constraints
include at least one valid action example
ideally include one example where a tool result arrives and the model decides what to do next

Acceptance:

the first turn reliably emits a valid action block
later turns do not collapse into plain explanation after a tool result

2. Request Normalization

OpenAI:
- parse tools
- parse tool_choice
- parse assistant.tool_calls
- parse tool
Anthropic:
- parse tools
- parse tool_choice
- parse tool_use
- parse tool_result
normalize everything into one internal structure
detect tool history even when the current turn does not repeat tools

Acceptance:

emulation stays active on later turns without repeated tool definitions

3. Tool History Projection

project historical assistant tool calls back into action text
do not pass downstream protocol-specific history directly to the upstream model
preserve tool name, arguments, and call id where useful

Acceptance:

the model can “see” its own previous actions in later turns

4. Tool Result Continuation

do not feed raw tool output back without framing
wrap tool results into an explicit continuation message
handle empty, partial, and error outputs consistently

Acceptance:

after a tool result, the model can either call another tool or finish naturally

5. Parser Contract

recognize both ```json action and plain ```json
tolerate smart quotes, trailing commas, and stringified argument JSON
extract tool, name, parameters, arguments, or input
support multiple blocks in one reply
strip action blocks from normal assistant text

Acceptance:

multiple action blocks can be parsed reliably

6. Retry Policy

trigger when:
- a tool call was expected but no action block was produced
- refusal language is detected
- tool_choice=any
- tool_choice=tool
retry with a stricter message
bound retry count
log retry reason

Acceptance:

refusal-style replies can be corrected without infinite loops

7. Refusal Detection

maintain a refusal phrase set
detect both hard refusals and soft “environment limitation” answers
distinguish between:
- a legitimate no-tool answer
- a failed tool-use turn

Acceptance:

common “tools are unavailable” replies trigger retry when appropriate

8. Response Re-encoding

OpenAI:
- emit message.tool_calls
- set finish_reason = tool_calls
Anthropic:
- emit content[].tool_use
- set stop_reason = tool_use
preserve normal text when no tool call is present

Acceptance:

downstream clients remain unaware that the upstream lacks native tools

9. Streaming Strategy

OpenAI:
- role chunk
- text deltas
- tool call deltas
Anthropic:
- message_start
- content_block_start
- content_block_delta
- content_block_stop
- message_delta
- message_stop
document clearly when streaming is synthesized from a completed non-stream result

Acceptance:

downstream stream consumers receive protocol-valid event sequences

10. Multi-turn State Machine

distinguish at least:
- first decision
- tool call emitted
- waiting for tool result
- tool result received, next decision pending
- final answer
derive state from message history, not only the current payload
do not confuse “tool history exists” with “another tool call is mandatory”

Acceptance:

agent loops remain stable across more than one turn

11. Observability

log:
- whether emulation is active
- how many tool calls were parsed
- whether retry fired
- which refusal signal matched
ideally log whether:
- the prompt contract was injected
- tool history was detected

Acceptance:

failures can be localized to prompt, parser, retry, or state management

12. Test Matrix

OpenAI:
- single-turn tool call
- multi-turn tool result continuation
- later turn without repeated tools
- forced tool
- tool_choice=any
Anthropic:
- single-turn tool_use
- multi-turn tool_result continuation
- later turn without repeated tools
- streaming tool_use
error cases:
- refusal
- invalid JSON
- multiple action blocks
- plain-text final answer

Acceptance:

both “first tool turn” and “second-turn continuation” are covered

13. Recommended Next Priorities

If the system already works, the highest-value next improvements are:

stronger few-shot for “tool result arrives, then call another tool”
better history-aware retry policy
finer refusal categories
stronger parser tolerance
richer streaming behavior

4.9 KiB Raw Blame History

Tool Emulation Checklist

1. Prompt Contract

2. Request Normalization

3. Tool History Projection

4. Tool Result Continuation

5. Parser Contract

6. Retry Policy

7. Refusal Detection

8. Response Re-encoding

9. Streaming Strategy

10. Multi-turn State Machine

11. Observability

12. Test Matrix

13. Recommended Next Priorities

4.9 KiB

Raw Blame History