Files
lingma-proxy-compose/docs/tool-emulation-checklist.md
coolxll df69105329 docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档
- 创建英文版工具模拟实现清单,涵盖13个核心实现面
- 添加中文版工具模拟实现清单,详细说明各项验收标准
- 编写英文版工具模拟方法论文档,阐述核心实现模式
- 补充中文版方法论文档,包括多轮调用与重试策略指导
- 实现HTTP API服务器测试,验证工具历史保持功能
- 新增工具模拟核心模块,包含工具定义提取与注入功能
- 添加拒绝检测、动作块解析等关键工具模拟组件
2026-03-30 15:35:23 +08:00

4.9 KiB

Tool Emulation Checklist

This checklist is for implementation work.

It is not meant to explain the theory again. It breaks plain-chat tool emulation into concrete surfaces that can be implemented and validated incrementally.

1. Prompt Contract

  • tell the model that tools are available
  • list tool names, short descriptions, and schema summaries
  • define a fixed action format
  • define multi-turn rules
  • encode tool_choice constraints
  • include at least one valid action example
  • ideally include one example where a tool result arrives and the model decides what to do next

Acceptance:

  • the first turn reliably emits a valid action block
  • later turns do not collapse into plain explanation after a tool result

2. Request Normalization

  • OpenAI:
    • parse tools
    • parse tool_choice
    • parse assistant.tool_calls
    • parse tool
  • Anthropic:
    • parse tools
    • parse tool_choice
    • parse tool_use
    • parse tool_result
  • normalize everything into one internal structure
  • detect tool history even when the current turn does not repeat tools

Acceptance:

  • emulation stays active on later turns without repeated tool definitions

3. Tool History Projection

  • project historical assistant tool calls back into action text
  • do not pass downstream protocol-specific history directly to the upstream model
  • preserve tool name, arguments, and call id where useful

Acceptance:

  • the model can “see” its own previous actions in later turns

4. Tool Result Continuation

  • do not feed raw tool output back without framing
  • wrap tool results into an explicit continuation message
  • handle empty, partial, and error outputs consistently

Acceptance:

  • after a tool result, the model can either call another tool or finish naturally

5. Parser Contract

  • recognize both ```json action and plain ```json
  • tolerate smart quotes, trailing commas, and stringified argument JSON
  • extract tool, name, parameters, arguments, or input
  • support multiple blocks in one reply
  • strip action blocks from normal assistant text

Acceptance:

  • multiple action blocks can be parsed reliably

6. Retry Policy

  • trigger when:
    • a tool call was expected but no action block was produced
    • refusal language is detected
    • tool_choice=any
    • tool_choice=tool
  • retry with a stricter message
  • bound retry count
  • log retry reason

Acceptance:

  • refusal-style replies can be corrected without infinite loops

7. Refusal Detection

  • maintain a refusal phrase set
  • detect both hard refusals and soft “environment limitation” answers
  • distinguish between:
    • a legitimate no-tool answer
    • a failed tool-use turn

Acceptance:

  • common “tools are unavailable” replies trigger retry when appropriate

8. Response Re-encoding

  • OpenAI:
    • emit message.tool_calls
    • set finish_reason = tool_calls
  • Anthropic:
    • emit content[].tool_use
    • set stop_reason = tool_use
  • preserve normal text when no tool call is present

Acceptance:

  • downstream clients remain unaware that the upstream lacks native tools

9. Streaming Strategy

  • OpenAI:
    • role chunk
    • text deltas
    • tool call deltas
  • Anthropic:
    • message_start
    • content_block_start
    • content_block_delta
    • content_block_stop
    • message_delta
    • message_stop
  • document clearly when streaming is synthesized from a completed non-stream result

Acceptance:

  • downstream stream consumers receive protocol-valid event sequences

10. Multi-turn State Machine

  • distinguish at least:
    • first decision
    • tool call emitted
    • waiting for tool result
    • tool result received, next decision pending
    • final answer
  • derive state from message history, not only the current payload
  • do not confuse “tool history exists” with “another tool call is mandatory”

Acceptance:

  • agent loops remain stable across more than one turn

11. Observability

  • log:
    • whether emulation is active
    • how many tool calls were parsed
    • whether retry fired
    • which refusal signal matched
  • ideally log whether:
    • the prompt contract was injected
    • tool history was detected

Acceptance:

  • failures can be localized to prompt, parser, retry, or state management

12. Test Matrix

  • OpenAI:
    • single-turn tool call
    • multi-turn tool result continuation
    • later turn without repeated tools
    • forced tool
    • tool_choice=any
  • Anthropic:
    • single-turn tool_use
    • multi-turn tool_result continuation
    • later turn without repeated tools
    • streaming tool_use
  • error cases:
    • refusal
    • invalid JSON
    • multiple action blocks
    • plain-text final answer

Acceptance:

  • both “first tool turn” and “second-turn continuation” are covered

If the system already works, the highest-value next improvements are:

  1. stronger few-shot for “tool result arrives, then call another tool”
  2. better history-aware retry policy
  3. finer refusal categories
  4. stronger parser tolerance
  5. richer streaming behavior