Files
lingma-proxy-compose/docs/tool-emulation-methodology.md
coolxll df69105329 docs(tool-emulation): 添加工具调用模拟实现清单与方法论文档
- 创建英文版工具模拟实现清单,涵盖13个核心实现面
- 添加中文版工具模拟实现清单,详细说明各项验收标准
- 编写英文版工具模拟方法论文档,阐述核心实现模式
- 补充中文版方法论文档,包括多轮调用与重试策略指导
- 实现HTTP API服务器测试,验证工具历史保持功能
- 新增工具模拟核心模块,包含工具定义提取与注入功能
- 添加拒绝检测、动作块解析等关键工具模拟组件
2026-03-30 15:35:23 +08:00

4.0 KiB

Methodology: Simulating Tool Calls over a Plain Chat API

This document describes a practical pattern for supporting tool calling when the upstream model only exposes a plain chat API.

The core idea is:

  1. Convert downstream tool definitions into a prompt-level contract.
  2. Ask the model to emit structured action text.
  3. Parse that action text in the proxy.
  4. Re-encode it back into standard protocol fields such as OpenAI tool_calls or Anthropic tool_use.

Core Pattern

When the upstream model does not support native tool calls, do not rely on blindly forwarding tools.

Instead:

  • treat the model as a text generator
  • define a stable action DSL
  • keep the proxy responsible for state, retries, parsing, and protocol mapping

In this project the action DSL is a fenced block:

```json action
{"tool":"NAME","parameters":{"key":"value"}}

## What the Proxy Must Do

The proxy is not a passive transport anymore. Once tool emulation is enabled, it should:

- inject tool definitions into the prompt
- preserve tool history across turns
- project historical tool calls back into action text
- wrap tool results into a continuation prompt
- detect refusal patterns such as “I don't have tools”
- retry with a stronger instruction when a tool call was expected but missing
- map parsed actions back into downstream protocol fields

## Multi-turn Tool Calling

Single-turn emulation is not enough. A useful agent loop looks like this:

1. model emits a tool call
2. external executor runs the tool
3. tool result is fed back into the conversation
4. model decides whether to call another tool or finish

To make this stable:

- do not feed tool results back as raw text only
- wrap them in a continuation message that clearly asks for the next action
- keep emulation active even when later turns do not repeat the original `tools` field

That last point matters. Many clients send `tools` only on the first turn. The proxy should still keep the conversation in emulation mode when it sees tool history.

## Few-shot Guidance

The minimum few-shot should teach the model the output shape.

A better few-shot also teaches state transitions:

- when to call a tool
- when to wait for the tool result
- when to call another tool
- when to answer normally

For complex agent loops, a multi-step example with:

- user request
- assistant tool call
- tool result
- assistant next action

is usually more effective than a single static action example.

## Retry Guidance

Retry is useful when:

- a tool call was expected but no action block was produced
- the model says tools are unavailable
- the request forces tool usage

A retry prompt should be explicit and procedural, for example:

```text
Your last response did not include any ```json action``` block.
You must respond with at least one valid action block now.
Do not explain. Output the action block directly.

Retries should be bounded. A small retry budget plus stronger instructions per retry is usually enough.

Protocol Mapping

OpenAI side:

  • input may contain tools, tool_choice, assistant.tool_calls, and tool
  • output should map back into message.tool_calls and finish_reason = "tool_calls"

Anthropic side:

  • input may contain tools, tool_choice, tool_use, and tool_result
  • output should map back into content[].tool_use and stop_reason = "tool_use"

Common Failure Modes

  • only supporting the first tool turn
  • losing emulation state on later turns
  • not projecting historical tool calls back into text
  • feeding back raw tool results without continuation instructions
  • missing refusal detection
  • using a parser that is too brittle for real model output

In This Repository

The implementation here follows exactly this pattern:

  • downstream tool schemas are rewritten into prompt instructions
  • the model emits json action blocks
  • the proxy parses them
  • the proxy re-encodes them as OpenAI or Anthropic tool protocol fields
  • later turns can continue from tool history even when tools are not repeated

Implementation checklist: