docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档
- 创建英文版工具模拟实现清单,涵盖13个核心实现面 - 添加中文版工具模拟实现清单,详细说明各项验收标准 - 编写英文版工具模拟方法论文档,阐述核心实现模式 - 补充中文版方法论文档,包括多轮调用与重试策略指导 - 实现HTTP API服务器测试,验证工具历史保持功能 - 新增工具模拟核心模块,包含工具定义提取与注入功能 - 添加拒绝检测、动作块解析等关键工具模拟组件
This commit is contained in:
194
docs/tool-emulation-checklist.md
Normal file
194
docs/tool-emulation-checklist.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# Tool Emulation Checklist
|
||||
|
||||
This checklist is for implementation work.
|
||||
|
||||
It is not meant to explain the theory again. It breaks plain-chat tool emulation into concrete surfaces that can be implemented and validated incrementally.
|
||||
|
||||
## 1. Prompt Contract
|
||||
|
||||
- tell the model that tools are available
|
||||
- list tool names, short descriptions, and schema summaries
|
||||
- define a fixed action format
|
||||
- define multi-turn rules
|
||||
- encode `tool_choice` constraints
|
||||
- include at least one valid action example
|
||||
- ideally include one example where a tool result arrives and the model decides what to do next
|
||||
|
||||
Acceptance:
|
||||
|
||||
- the first turn reliably emits a valid action block
|
||||
- later turns do not collapse into plain explanation after a tool result
|
||||
|
||||
## 2. Request Normalization
|
||||
|
||||
- OpenAI:
|
||||
- parse `tools`
|
||||
- parse `tool_choice`
|
||||
- parse `assistant.tool_calls`
|
||||
- parse `tool`
|
||||
- Anthropic:
|
||||
- parse `tools`
|
||||
- parse `tool_choice`
|
||||
- parse `tool_use`
|
||||
- parse `tool_result`
|
||||
- normalize everything into one internal structure
|
||||
- detect tool history even when the current turn does not repeat `tools`
|
||||
|
||||
Acceptance:
|
||||
|
||||
- emulation stays active on later turns without repeated tool definitions
|
||||
|
||||
## 3. Tool History Projection
|
||||
|
||||
- project historical assistant tool calls back into action text
|
||||
- do not pass downstream protocol-specific history directly to the upstream model
|
||||
- preserve tool name, arguments, and call id where useful
|
||||
|
||||
Acceptance:
|
||||
|
||||
- the model can “see” its own previous actions in later turns
|
||||
|
||||
## 4. Tool Result Continuation
|
||||
|
||||
- do not feed raw tool output back without framing
|
||||
- wrap tool results into an explicit continuation message
|
||||
- handle empty, partial, and error outputs consistently
|
||||
|
||||
Acceptance:
|
||||
|
||||
- after a tool result, the model can either call another tool or finish naturally
|
||||
|
||||
## 5. Parser Contract
|
||||
|
||||
- recognize both ` ```json action ` and plain ` ```json `
|
||||
- tolerate smart quotes, trailing commas, and stringified argument JSON
|
||||
- extract `tool`, `name`, `parameters`, `arguments`, or `input`
|
||||
- support multiple blocks in one reply
|
||||
- strip action blocks from normal assistant text
|
||||
|
||||
Acceptance:
|
||||
|
||||
- multiple action blocks can be parsed reliably
|
||||
|
||||
## 6. Retry Policy
|
||||
|
||||
- trigger when:
|
||||
- a tool call was expected but no action block was produced
|
||||
- refusal language is detected
|
||||
- `tool_choice=any`
|
||||
- `tool_choice=tool`
|
||||
- retry with a stricter message
|
||||
- bound retry count
|
||||
- log retry reason
|
||||
|
||||
Acceptance:
|
||||
|
||||
- refusal-style replies can be corrected without infinite loops
|
||||
|
||||
## 7. Refusal Detection
|
||||
|
||||
- maintain a refusal phrase set
|
||||
- detect both hard refusals and soft “environment limitation” answers
|
||||
- distinguish between:
|
||||
- a legitimate no-tool answer
|
||||
- a failed tool-use turn
|
||||
|
||||
Acceptance:
|
||||
|
||||
- common “tools are unavailable” replies trigger retry when appropriate
|
||||
|
||||
## 8. Response Re-encoding
|
||||
|
||||
- OpenAI:
|
||||
- emit `message.tool_calls`
|
||||
- set `finish_reason = tool_calls`
|
||||
- Anthropic:
|
||||
- emit `content[].tool_use`
|
||||
- set `stop_reason = tool_use`
|
||||
- preserve normal text when no tool call is present
|
||||
|
||||
Acceptance:
|
||||
|
||||
- downstream clients remain unaware that the upstream lacks native tools
|
||||
|
||||
## 9. Streaming Strategy
|
||||
|
||||
- OpenAI:
|
||||
- role chunk
|
||||
- text deltas
|
||||
- tool call deltas
|
||||
- Anthropic:
|
||||
- `message_start`
|
||||
- `content_block_start`
|
||||
- `content_block_delta`
|
||||
- `content_block_stop`
|
||||
- `message_delta`
|
||||
- `message_stop`
|
||||
- document clearly when streaming is synthesized from a completed non-stream result
|
||||
|
||||
Acceptance:
|
||||
|
||||
- downstream stream consumers receive protocol-valid event sequences
|
||||
|
||||
## 10. Multi-turn State Machine
|
||||
|
||||
- distinguish at least:
|
||||
- first decision
|
||||
- tool call emitted
|
||||
- waiting for tool result
|
||||
- tool result received, next decision pending
|
||||
- final answer
|
||||
- derive state from message history, not only the current payload
|
||||
- do not confuse “tool history exists” with “another tool call is mandatory”
|
||||
|
||||
Acceptance:
|
||||
|
||||
- agent loops remain stable across more than one turn
|
||||
|
||||
## 11. Observability
|
||||
|
||||
- log:
|
||||
- whether emulation is active
|
||||
- how many tool calls were parsed
|
||||
- whether retry fired
|
||||
- which refusal signal matched
|
||||
- ideally log whether:
|
||||
- the prompt contract was injected
|
||||
- tool history was detected
|
||||
|
||||
Acceptance:
|
||||
|
||||
- failures can be localized to prompt, parser, retry, or state management
|
||||
|
||||
## 12. Test Matrix
|
||||
|
||||
- OpenAI:
|
||||
- single-turn tool call
|
||||
- multi-turn tool result continuation
|
||||
- later turn without repeated `tools`
|
||||
- forced tool
|
||||
- `tool_choice=any`
|
||||
- Anthropic:
|
||||
- single-turn `tool_use`
|
||||
- multi-turn `tool_result` continuation
|
||||
- later turn without repeated `tools`
|
||||
- streaming `tool_use`
|
||||
- error cases:
|
||||
- refusal
|
||||
- invalid JSON
|
||||
- multiple action blocks
|
||||
- plain-text final answer
|
||||
|
||||
Acceptance:
|
||||
|
||||
- both “first tool turn” and “second-turn continuation” are covered
|
||||
|
||||
## 13. Recommended Next Priorities
|
||||
|
||||
If the system already works, the highest-value next improvements are:
|
||||
|
||||
1. stronger few-shot for “tool result arrives, then call another tool”
|
||||
2. better history-aware retry policy
|
||||
3. finer refusal categories
|
||||
4. stronger parser tolerance
|
||||
5. richer streaming behavior
|
||||
241
docs/tool-emulation-checklist.zh-CN.md
Normal file
241
docs/tool-emulation-checklist.zh-CN.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Tool Emulation 实现清单
|
||||
|
||||
这份清单是给后续迭代用的。
|
||||
|
||||
目标不是解释原理,而是把“纯聊天 API 模拟 tools 调用”拆成可逐项完成、可逐项验证的实现面。
|
||||
|
||||
## 1. Prompt Contract
|
||||
|
||||
- 明确告诉模型当前有可用工具,不要声称“工具不可用”
|
||||
- 列出全部工具:
|
||||
- 名称
|
||||
- 简短描述
|
||||
- 参数 schema 摘要
|
||||
- 固定动作输出格式:
|
||||
- ` ```json action ... ``` `
|
||||
- 明确多轮规则:
|
||||
- 独立动作可并行
|
||||
- 依赖动作要等 tool result
|
||||
- 无需工具时才输出普通文本
|
||||
- 明确 `tool_choice` 约束:
|
||||
- `any`
|
||||
- 指定 tool
|
||||
- 给至少一个合法 action block 示例
|
||||
- 最好再给一个“tool result 回来后继续决策”的 few-shot
|
||||
|
||||
验收标准:
|
||||
|
||||
- 模型第一轮能稳定输出合法 action block
|
||||
- 第二轮收到 tool result 后,不会轻易掉回普通解释文本
|
||||
|
||||
## 2. Request Normalization
|
||||
|
||||
- OpenAI:
|
||||
- 解析 `tools`
|
||||
- 解析 `tool_choice`
|
||||
- 解析 `assistant.tool_calls`
|
||||
- 解析 `tool`
|
||||
- Anthropic:
|
||||
- 解析 `tools`
|
||||
- 解析 `tool_choice`
|
||||
- 解析 `tool_use`
|
||||
- 解析 `tool_result`
|
||||
- 统一归一化成内部结构:
|
||||
- tools
|
||||
- choice
|
||||
- messages
|
||||
- history state
|
||||
- 识别“当前轮没带 tools,但历史里已有 tool 调用”的场景
|
||||
|
||||
验收标准:
|
||||
|
||||
- 第二轮即使不重复传 `tools`,也能继续走 emulation
|
||||
|
||||
## 3. Tool History Projection
|
||||
|
||||
- 把历史 assistant 工具调用重投影成 action text
|
||||
- 不要把结构化历史原样丢给上游模型
|
||||
- 保留:
|
||||
- tool name
|
||||
- arguments
|
||||
- call id
|
||||
- 投影结果应和真实 action block 尽量一致
|
||||
|
||||
验收标准:
|
||||
|
||||
- 模型在多轮中能“看到”自己之前做过什么动作
|
||||
|
||||
## 4. Tool Result Continuation
|
||||
|
||||
- tool result 不要裸塞回去
|
||||
- 包装成明确续写指令:
|
||||
- 当前哪个 call 的结果回来了
|
||||
- 基于结果继续下一步动作
|
||||
- 对空结果、错误结果、部分结果做统一包装
|
||||
|
||||
验收标准:
|
||||
|
||||
- 模型收到 tool result 后能继续:
|
||||
- 再发起新工具调用
|
||||
- 或输出最终答案
|
||||
|
||||
## 5. Parser Contract
|
||||
|
||||
- 识别:
|
||||
- ` ```json action `
|
||||
- 普通 ` ```json `
|
||||
- 容忍:
|
||||
- 智能引号
|
||||
- 尾逗号
|
||||
- 参数是字符串化 JSON
|
||||
- 支持提取:
|
||||
- `tool`
|
||||
- `name`
|
||||
- `parameters`
|
||||
- `arguments`
|
||||
- `input`
|
||||
- 能从正文里剥离 action block
|
||||
- 支持多 block
|
||||
|
||||
验收标准:
|
||||
|
||||
- 同一回复里多个 action block 都能被解析
|
||||
- 正文和动作块可以正确拆分
|
||||
|
||||
## 6. Retry Policy
|
||||
|
||||
- 触发条件:
|
||||
- 明确要求工具调用但没产出 action block
|
||||
- 命中 refusal 文本
|
||||
- `tool_choice=any`
|
||||
- `tool_choice=tool`
|
||||
- retry 消息要更强约束:
|
||||
- 必须输出 action block
|
||||
- 不要解释
|
||||
- 必要时必须调用指定工具
|
||||
- 控制 retry 次数
|
||||
- 记录 retry 原因
|
||||
|
||||
验收标准:
|
||||
|
||||
- refusal 回复能被纠偏
|
||||
- retry 不会无限循环
|
||||
|
||||
## 7. Refusal Detection
|
||||
|
||||
- 维护 refusal 关键词表:
|
||||
- `I don't have tools`
|
||||
- `tools are unavailable`
|
||||
- `没有可用的工具`
|
||||
- `无法调用工具`
|
||||
- 识别“软拒答”:
|
||||
- 只解释、不行动
|
||||
- 强调环境限制
|
||||
- 区分:
|
||||
- 真正不该调用工具
|
||||
- 本该调用工具却在推脱
|
||||
|
||||
验收标准:
|
||||
|
||||
- 常见“我没有工具”类回复能稳定触发 retry
|
||||
|
||||
## 8. Response Re-encoding
|
||||
|
||||
- OpenAI:
|
||||
- `message.tool_calls`
|
||||
- `finish_reason = tool_calls`
|
||||
- Anthropic:
|
||||
- `content[].tool_use`
|
||||
- `stop_reason = tool_use`
|
||||
- 无工具时回普通文本
|
||||
- 文本和工具调用共存时保持协议兼容
|
||||
|
||||
验收标准:
|
||||
|
||||
- 下游客户端无需知道上游其实不支持 native tools
|
||||
|
||||
## 9. Streaming Strategy
|
||||
|
||||
- OpenAI stream:
|
||||
- 先发 role chunk
|
||||
- 再发 text delta
|
||||
- 再发 tool_calls delta
|
||||
- Anthropic stream:
|
||||
- `message_start`
|
||||
- `content_block_start`
|
||||
- `content_block_delta`
|
||||
- `content_block_stop`
|
||||
- `message_delta`
|
||||
- `message_stop`
|
||||
- 如果当前实现是“先完整拿结果再合成流”,文档里要明确说明
|
||||
|
||||
验收标准:
|
||||
|
||||
- 下游看到的流式协议字段合法
|
||||
|
||||
## 10. Multi-turn State Machine
|
||||
|
||||
- 状态至少区分:
|
||||
- 等待模型首次决策
|
||||
- 已发起工具调用
|
||||
- 等待 tool result
|
||||
- 收到 tool result,等待下一轮决策
|
||||
- 最终回答完成
|
||||
- 状态切换依据应来自消息历史,而不是只看本轮字段
|
||||
- 不要把“工具历史存在”误判成“必须再调工具”
|
||||
|
||||
验收标准:
|
||||
|
||||
- 一轮以上的 agent loop 稳定
|
||||
|
||||
## 11. Observability
|
||||
|
||||
- 打日志:
|
||||
- 是否进入 emulation
|
||||
- 解析到几个 tool calls
|
||||
- 是否触发 retry
|
||||
- refusal 命中原因
|
||||
- 最好记录:
|
||||
- prompt contract 是否注入
|
||||
- tool history 是否被识别
|
||||
|
||||
验收标准:
|
||||
|
||||
- 出问题时能判断是:
|
||||
- prompt 不够强
|
||||
- parser 失败
|
||||
- retry 没触发
|
||||
- 状态机断了
|
||||
|
||||
## 12. 测试矩阵
|
||||
|
||||
- OpenAI:
|
||||
- 单轮 tool call
|
||||
- 多轮 tool result 回灌
|
||||
- 第二轮不重复传 `tools`
|
||||
- 指定 tool
|
||||
- `tool_choice=any`
|
||||
- Anthropic:
|
||||
- 单轮 tool_use
|
||||
- 多轮 tool_result 回灌
|
||||
- 第二轮不重复传 `tools`
|
||||
- 流式 tool_use
|
||||
- 异常场景:
|
||||
- refusal
|
||||
- 无效 JSON
|
||||
- 多 action block
|
||||
- 普通文本结束
|
||||
|
||||
验收标准:
|
||||
|
||||
- 至少覆盖“第一轮调用工具”和“第二轮继续决策”两大关键场景
|
||||
|
||||
## 13. 下一步优先级
|
||||
|
||||
如果当前系统已经能跑,最值得优先继续做的是:
|
||||
|
||||
1. 多轮再次发起新工具调用的 few-shot
|
||||
2. 基于历史状态的 retry 强化
|
||||
3. 更细的 refusal 分类
|
||||
4. parser 容错增强
|
||||
5. 流式工具事件细化
|
||||
131
docs/tool-emulation-methodology.md
Normal file
131
docs/tool-emulation-methodology.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# Methodology: Simulating Tool Calls over a Plain Chat API
|
||||
|
||||
This document describes a practical pattern for supporting tool calling when the upstream model only exposes a plain chat API.
|
||||
|
||||
The core idea is:
|
||||
|
||||
1. Convert downstream tool definitions into a prompt-level contract.
|
||||
2. Ask the model to emit structured action text.
|
||||
3. Parse that action text in the proxy.
|
||||
4. Re-encode it back into standard protocol fields such as OpenAI `tool_calls` or Anthropic `tool_use`.
|
||||
|
||||
## Core Pattern
|
||||
|
||||
When the upstream model does not support native tool calls, do not rely on blindly forwarding `tools`.
|
||||
|
||||
Instead:
|
||||
|
||||
- treat the model as a text generator
|
||||
- define a stable action DSL
|
||||
- keep the proxy responsible for state, retries, parsing, and protocol mapping
|
||||
|
||||
In this project the action DSL is a fenced block:
|
||||
|
||||
```text
|
||||
```json action
|
||||
{"tool":"NAME","parameters":{"key":"value"}}
|
||||
```
|
||||
```
|
||||
|
||||
## What the Proxy Must Do
|
||||
|
||||
The proxy is not a passive transport anymore. Once tool emulation is enabled, it should:
|
||||
|
||||
- inject tool definitions into the prompt
|
||||
- preserve tool history across turns
|
||||
- project historical tool calls back into action text
|
||||
- wrap tool results into a continuation prompt
|
||||
- detect refusal patterns such as “I don't have tools”
|
||||
- retry with a stronger instruction when a tool call was expected but missing
|
||||
- map parsed actions back into downstream protocol fields
|
||||
|
||||
## Multi-turn Tool Calling
|
||||
|
||||
Single-turn emulation is not enough. A useful agent loop looks like this:
|
||||
|
||||
1. model emits a tool call
|
||||
2. external executor runs the tool
|
||||
3. tool result is fed back into the conversation
|
||||
4. model decides whether to call another tool or finish
|
||||
|
||||
To make this stable:
|
||||
|
||||
- do not feed tool results back as raw text only
|
||||
- wrap them in a continuation message that clearly asks for the next action
|
||||
- keep emulation active even when later turns do not repeat the original `tools` field
|
||||
|
||||
That last point matters. Many clients send `tools` only on the first turn. The proxy should still keep the conversation in emulation mode when it sees tool history.
|
||||
|
||||
## Few-shot Guidance
|
||||
|
||||
The minimum few-shot should teach the model the output shape.
|
||||
|
||||
A better few-shot also teaches state transitions:
|
||||
|
||||
- when to call a tool
|
||||
- when to wait for the tool result
|
||||
- when to call another tool
|
||||
- when to answer normally
|
||||
|
||||
For complex agent loops, a multi-step example with:
|
||||
|
||||
- user request
|
||||
- assistant tool call
|
||||
- tool result
|
||||
- assistant next action
|
||||
|
||||
is usually more effective than a single static action example.
|
||||
|
||||
## Retry Guidance
|
||||
|
||||
Retry is useful when:
|
||||
|
||||
- a tool call was expected but no action block was produced
|
||||
- the model says tools are unavailable
|
||||
- the request forces tool usage
|
||||
|
||||
A retry prompt should be explicit and procedural, for example:
|
||||
|
||||
```text
|
||||
Your last response did not include any ```json action``` block.
|
||||
You must respond with at least one valid action block now.
|
||||
Do not explain. Output the action block directly.
|
||||
```
|
||||
|
||||
Retries should be bounded. A small retry budget plus stronger instructions per retry is usually enough.
|
||||
|
||||
## Protocol Mapping
|
||||
|
||||
OpenAI side:
|
||||
|
||||
- input may contain `tools`, `tool_choice`, `assistant.tool_calls`, and `tool`
|
||||
- output should map back into `message.tool_calls` and `finish_reason = "tool_calls"`
|
||||
|
||||
Anthropic side:
|
||||
|
||||
- input may contain `tools`, `tool_choice`, `tool_use`, and `tool_result`
|
||||
- output should map back into `content[].tool_use` and `stop_reason = "tool_use"`
|
||||
|
||||
## Common Failure Modes
|
||||
|
||||
- only supporting the first tool turn
|
||||
- losing emulation state on later turns
|
||||
- not projecting historical tool calls back into text
|
||||
- feeding back raw tool results without continuation instructions
|
||||
- missing refusal detection
|
||||
- using a parser that is too brittle for real model output
|
||||
|
||||
## In This Repository
|
||||
|
||||
The implementation here follows exactly this pattern:
|
||||
|
||||
- downstream tool schemas are rewritten into prompt instructions
|
||||
- the model emits `json action` blocks
|
||||
- the proxy parses them
|
||||
- the proxy re-encodes them as OpenAI or Anthropic tool protocol fields
|
||||
- later turns can continue from tool history even when `tools` are not repeated
|
||||
|
||||
Implementation checklist:
|
||||
|
||||
- [tool-emulation-checklist.md](./tool-emulation-checklist.md)
|
||||
|
||||
378
docs/tool-emulation-methodology.zh-CN.md
Normal file
378
docs/tool-emulation-methodology.zh-CN.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# 纯聊天 API 模拟 Tools 调用的方法论
|
||||
|
||||
这份文档总结的是一种通用做法:
|
||||
|
||||
- 上游模型只有普通聊天接口
|
||||
- 不原生支持 `tools` / `tool_calls` / `tool_use`
|
||||
- 但下游调用方希望继续走 OpenAI 或 Anthropic 风格的工具调用协议
|
||||
|
||||
核心思路不是“骗上游说自己支持 tools”,而是:
|
||||
|
||||
1. 在代理层把工具定义改写成一套稳定的提示词契约
|
||||
2. 让模型用约定的结构化文本输出动作
|
||||
3. 再由代理把结构化文本还原成标准协议里的 `tool_calls` 或 `tool_use`
|
||||
|
||||
## 核心原则
|
||||
|
||||
### 1. 不依赖上游原生能力
|
||||
|
||||
如果上游不支持原生工具调用,最稳的路线不是继续透传 `tools` 字段,而是把工具定义下沉成提示词层协议。
|
||||
|
||||
换句话说:
|
||||
|
||||
- 对模型来说,它看到的是“你有这些动作,可以按某种格式发起调用”
|
||||
- 对客户端来说,它看到的仍然是标准 OpenAI / Anthropic 工具协议
|
||||
|
||||
代理层负责做两次映射。
|
||||
|
||||
### 2. 工具调用必须降维成可解析文本
|
||||
|
||||
一个可落地的格式必须满足:
|
||||
|
||||
- 模型容易学会
|
||||
- 人容易读
|
||||
- 代理容易解析
|
||||
- 多轮场景里不容易歧义
|
||||
|
||||
本项目采用的是 fenced block:
|
||||
|
||||
```text
|
||||
```json action
|
||||
{"tool":"NAME","parameters":{"key":"value"}}
|
||||
```
|
||||
```
|
||||
|
||||
这个格式比“自然语言里自己说我要调用某个工具”稳定很多。
|
||||
|
||||
### 3. 代理是状态机,不只是转发器
|
||||
|
||||
一旦进入 emulation 模式,代理就不能再只是简单透传。
|
||||
|
||||
它至少要承担这些职责:
|
||||
|
||||
- 注入工具说明
|
||||
- 把历史工具调用改写回上下文
|
||||
- 把工具结果回灌成下一轮提示
|
||||
- 识别拒答和跑偏
|
||||
- 必要时做 retry
|
||||
- 把文本动作重新编码成标准工具协议
|
||||
|
||||
## 一条完整链路
|
||||
|
||||
### 输入侧
|
||||
|
||||
客户端发来:
|
||||
|
||||
- OpenAI `tools` / `tool_choice`
|
||||
- 或 Anthropic `tools` / `tool_choice`
|
||||
|
||||
代理做三件事:
|
||||
|
||||
1. 抽取工具名称、描述、参数 schema
|
||||
2. 归一化 tool choice
|
||||
3. 判断是否进入 emulation 模式
|
||||
|
||||
进入 emulation 后,不再把原始 `tools` 直接交给上游,而是改写系统提示词。
|
||||
|
||||
### 提示词侧
|
||||
|
||||
提示词里至少要包含:
|
||||
|
||||
- 你有工具可用,不要声称“工具不可用”
|
||||
- 工具列表
|
||||
- 固定动作格式
|
||||
- 多轮规则
|
||||
- `tool_choice` 约束
|
||||
- 一个有效示例
|
||||
|
||||
建议的约束重点:
|
||||
|
||||
- 需要工具时必须输出 `json action`
|
||||
- 独立动作可以一次输出多个 block
|
||||
- 依赖动作必须等工具结果回来再继续
|
||||
- 不需要工具时才允许输出普通文本
|
||||
- 不要解释“为什么不能调用工具”
|
||||
|
||||
### 输出侧
|
||||
|
||||
模型回复后,代理扫描 `json action` block:
|
||||
|
||||
- 解析出 `tool`
|
||||
- 解析出 `parameters`
|
||||
- 从正文里剥离 action block
|
||||
|
||||
然后映射回:
|
||||
|
||||
- OpenAI `message.tool_calls`
|
||||
- Anthropic `content[].tool_use`
|
||||
|
||||
如果没有解析到动作,就把剩余文本当普通 assistant 回复。
|
||||
|
||||
## 多轮工具调用
|
||||
|
||||
这是最容易做坏的部分。
|
||||
|
||||
### 单轮模拟并不够
|
||||
|
||||
只做第一轮 `tool_calls` 很容易,但这还不是真正的 agent loop。
|
||||
|
||||
真正有用的是:
|
||||
|
||||
1. 第一轮模型发起工具调用
|
||||
2. 外部执行工具
|
||||
3. 把工具结果回灌
|
||||
4. 模型继续决策
|
||||
5. 可能再次发起工具调用
|
||||
6. 或输出最终回答
|
||||
|
||||
### 回灌工具结果时,不要只塞原始结果
|
||||
|
||||
稳定做法是把工具结果包装成明确的续写指令,而不是只把结果裸塞回去。
|
||||
|
||||
例如:
|
||||
|
||||
```text
|
||||
Tool result for call_1:
|
||||
pong
|
||||
|
||||
Based on the tool result above, continue with the next appropriate action using the structured format.
|
||||
```
|
||||
|
||||
这样模型更清楚当前处于“继续 agent loop”的阶段,而不是另起一轮普通问答。
|
||||
|
||||
### 第二轮不应强依赖重复传 tools
|
||||
|
||||
复杂客户端并不一定会在每一轮都重复把 `tools` 发回来。
|
||||
|
||||
因此代理应把这些历史也视作“仍处于 emulation 会话中”的信号:
|
||||
|
||||
- OpenAI:
|
||||
- assistant 消息里已有 `tool_calls`
|
||||
- 后续有 `tool` 角色消息
|
||||
- Anthropic:
|
||||
- 历史里已有 `tool_use`
|
||||
- 后续有 `tool_result`
|
||||
|
||||
只要这些历史存在,即使当前轮未重新传 `tools`,代理也应继续以 emulation 方式处理。
|
||||
|
||||
### 历史里的工具调用要重新投影成动作文本
|
||||
|
||||
模型并不理解 OpenAI / Anthropic 的结构化历史字段。
|
||||
|
||||
因此代理要把历史里的:
|
||||
|
||||
- `assistant.tool_calls`
|
||||
- `assistant tool_use`
|
||||
|
||||
重新投影成:
|
||||
|
||||
```text
|
||||
```json action
|
||||
{
|
||||
"tool": "ping",
|
||||
"parameters": {
|
||||
"value": "123"
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
这样模型才能在多轮里看到自己“之前做过什么动作”。
|
||||
|
||||
## Few-shot 怎么设计
|
||||
|
||||
### 最小 few-shot
|
||||
|
||||
至少给一个合法动作示例:
|
||||
|
||||
```text
|
||||
```json action
|
||||
{
|
||||
"tool": "read_file",
|
||||
"parameters": {
|
||||
"path": "README.md"
|
||||
}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
它的作用不是示范业务逻辑,而是强制模型学会“输出形状”。
|
||||
|
||||
### 更稳的 few-shot
|
||||
|
||||
如果目标是复杂 agent loop,推荐再补一个“工具结果回来后再次决策”的 few-shot。
|
||||
|
||||
例如三段式:
|
||||
|
||||
1. 用户请求
|
||||
2. assistant 发起工具调用
|
||||
3. user 提供 tool result
|
||||
4. assistant 再次发起新工具调用或结束
|
||||
|
||||
这个 few-shot 能显著减少模型在第二轮以后掉回普通文本解释。
|
||||
|
||||
### few-shot 要突出状态转换
|
||||
|
||||
最重要的不是工具本身,而是让模型明确以下三种状态:
|
||||
|
||||
- 该调用工具
|
||||
- 该等待工具结果
|
||||
- 该输出最终回答
|
||||
|
||||
复杂 loop 不稳,通常就是状态转换没教明白。
|
||||
|
||||
## Retry 怎么设计
|
||||
|
||||
### Retry 的触发条件
|
||||
|
||||
比较实用的触发条件:
|
||||
|
||||
- 本轮本应调用工具,但没有解析出 action block
|
||||
- 模型回复了“没有工具”“工具不可用”“我无法调用”
|
||||
- `tool_choice=any`
|
||||
- `tool_choice=tool`
|
||||
|
||||
### Retry 的方式
|
||||
|
||||
不要只重发原请求。应显式补一条纠偏消息,例如:
|
||||
|
||||
```text
|
||||
Your last response did not include any ```json action``` block.
|
||||
You must respond with at least one valid action block now.
|
||||
Do not explain. Output the action block directly.
|
||||
```
|
||||
|
||||
如果是强制指定某个工具,再额外加:
|
||||
|
||||
```text
|
||||
You must call "ping".
|
||||
```
|
||||
|
||||
### Retry 不要无限循环
|
||||
|
||||
建议设置:
|
||||
|
||||
- 小次数重试
|
||||
- 每次 retry 都更强约束
|
||||
- 只在明确需要工具调用时触发
|
||||
|
||||
否则很容易把普通自然回复误判成失败。
|
||||
|
||||
## 协议映射建议
|
||||
|
||||
### OpenAI
|
||||
|
||||
输入:
|
||||
|
||||
- `tools`
|
||||
- `tool_choice`
|
||||
- `assistant.tool_calls`
|
||||
- `tool`
|
||||
|
||||
输出:
|
||||
|
||||
- `finish_reason = "tool_calls"`
|
||||
- `message.tool_calls`
|
||||
|
||||
### Anthropic
|
||||
|
||||
输入:
|
||||
|
||||
- `tools`
|
||||
- `tool_choice`
|
||||
- `content[].tool_use`
|
||||
- `content[].tool_result`
|
||||
|
||||
输出:
|
||||
|
||||
- `stop_reason = "tool_use"`
|
||||
- `content[].tool_use`
|
||||
|
||||
流式时,再映射成对应的 SSE 事件。
|
||||
|
||||
## 常见坑
|
||||
|
||||
### 1. 只做第一轮
|
||||
|
||||
这会让你看起来“支持 tools”,但一进入 agent loop 就断掉。
|
||||
|
||||
### 2. 历史工具调用没有重投影
|
||||
|
||||
模型看不到自己的历史动作,多轮就不稳。
|
||||
|
||||
### 3. 工具结果回灌过于裸
|
||||
|
||||
只把 `pong` 塞回去,模型不一定知道自己该继续决策。
|
||||
|
||||
### 4. 没有 refusal 检测
|
||||
|
||||
很多模型会下意识说:
|
||||
|
||||
- 我没有工具
|
||||
- 当前环境无法调用
|
||||
- 我只能提供建议
|
||||
|
||||
不识别这类模式,就不会进入纠偏 retry。
|
||||
|
||||
### 5. 文本解析规则太脆弱
|
||||
|
||||
解析器至少要容忍:
|
||||
|
||||
- ` ```json action ` 或普通 ` ```json `
|
||||
- 智能引号
|
||||
- 末尾逗号
|
||||
- 参数对象有时是字符串化 JSON
|
||||
|
||||
## 推荐的最小实现
|
||||
|
||||
如果要做一个最小可用版,建议先只做:
|
||||
|
||||
1. 工具定义注入
|
||||
2. `json action` 解析
|
||||
3. refusal 检测
|
||||
4. 一次 retry
|
||||
5. OpenAI 非流式返回
|
||||
|
||||
然后再逐步补:
|
||||
|
||||
1. Anthropic 非流式
|
||||
2. OpenAI 流式
|
||||
3. Anthropic 流式
|
||||
4. 多轮 tool history 投影
|
||||
5. 更强 few-shot
|
||||
|
||||
## 适用边界
|
||||
|
||||
这套方法适合:
|
||||
|
||||
- 上游不支持原生 tools
|
||||
- 你又必须对外兼容标准工具协议
|
||||
- 目标任务以工程类、文件类、检索类工具为主
|
||||
|
||||
它不适合:
|
||||
|
||||
- 对工具调用正确率极高要求的强生产场景
|
||||
- 上游已经支持原生 tools,但你还硬要绕一层文本模拟
|
||||
|
||||
如果上游能原生支持工具调用,优先使用原生协议。
|
||||
|
||||
## 本项目里的落地经验
|
||||
|
||||
在 `lingma-ipc-proxy` 里,这套方法最终证明了两点:
|
||||
|
||||
1. 只靠透传 `tools` 给 Lingma 不够,模型会继续说“没有可用工具”
|
||||
2. 代理层做 emulation 后,可以稳定还原出:
|
||||
- OpenAI `tool_calls`
|
||||
- Anthropic `tool_use`
|
||||
- 多轮 tool result 回灌后的继续决策
|
||||
|
||||
进一步要增强稳定性,最值得继续打磨的是:
|
||||
|
||||
- 多轮再次发起新工具调用的 few-shot
|
||||
- 基于历史状态的更细 retry 策略
|
||||
- 不同工具类别的专用示例
|
||||
|
||||
配套实现清单:
|
||||
|
||||
- [tool-emulation-checklist.zh-CN.md](./tool-emulation-checklist.zh-CN.md)
|
||||
|
||||
Reference in New Issue
Block a user