mmc/lingma-proxy-compose

Fork 0

Files

lutc5 92c8735bfc feat: add desktop app release packaging

2026-04-29 18:45:25 +08:00

5.1 KiB

Raw Blame History

Tool Calling Implementation Checklist

This checklist covers the complete implementation of OpenAI / Anthropic compatible tool calling over a plain chat API.

It breaks the work into concrete surfaces that can be implemented and validated incrementally.

1. Prompt Contract

tell the model that tools are available
list tool names, short descriptions, and schema summaries
define a fixed action format
define multi-turn rules
encode tool_choice constraints
include at least one valid action example
ideally include one example where a tool result arrives and the model decides what to do next

Acceptance:

the first turn reliably emits a valid action block
later turns do not collapse into plain explanation after a tool result

2. Request Normalization

OpenAI:
- parse tools
- parse tool_choice
- parse assistant.tool_calls
- parse tool
Anthropic:
- parse tools
- parse tool_choice
- parse tool_use
- parse tool_result
normalize everything into one internal structure
detect tool history even when the current turn does not repeat tools

Acceptance:

tool calling stays active on later turns without repeated tool definitions

3. Tool History Projection

project historical assistant tool calls back into action text
do not pass downstream protocol-specific history directly to Lingma
preserve tool name, arguments, and call id where useful

Acceptance:

the model can “see” its own previous actions in later turns

4. Tool Result Continuation

do not feed raw tool output back without framing
wrap tool results into an explicit continuation message
handle empty, partial, and error outputs consistently

Acceptance:

after a tool result, the model can either call another tool or finish naturally

5. Parser Contract

recognize both ```json action and plain ```json
tolerate smart quotes, trailing commas, and stringified argument JSON
extract tool, name, parameters, arguments, or input
support multiple blocks in one reply
strip action blocks from normal assistant text

Acceptance:

multiple action blocks can be parsed reliably

6. Retry Policy

trigger when:
- a tool call was expected but no action block was produced
- refusal language is detected
- tool_choice=any
- tool_choice=tool
retry with a stricter message
bound retry count
log retry reason

Acceptance:

refusal-style replies can be corrected without infinite loops

7. Refusal Detection

maintain a refusal phrase set
detect both hard refusals and soft “environment limitation” answers
distinguish between:
- a legitimate no-tool answer
- a failed tool-use turn

Acceptance:

common “tools are unavailable” replies trigger retry when appropriate

8. Response Re-encoding

OpenAI:
- emit message.tool_calls
- set finish_reason = tool_calls
Anthropic:
- emit content[].tool_use
- set stop_reason = tool_use
preserve normal text when no tool call is present

Acceptance:

downstream clients remain unaware that Lingma does not expose native tools

9. Streaming Strategy

OpenAI:
- role chunk
- text deltas
- tool call deltas
Anthropic:
- message_start
- content_block_start
- content_block_delta
- content_block_stop
- message_delta
- message_stop
document clearly when streaming is synthesized from a completed non-stream result

Acceptance:

downstream stream consumers receive protocol-valid event sequences

10. Multi-turn State Machine

distinguish at least:
- first decision
- tool call emitted
- waiting for tool result
- tool result received, next decision pending
- final answer
derive state from message history, not only the current payload
do not confuse “tool history exists” with “another tool call is mandatory”

Acceptance:

agent loops remain stable across more than one turn

11. Observability

log:
- whether tool calling is active
- how many tool calls were parsed
- whether retry fired
- which refusal signal matched
ideally log whether:
- the prompt contract was injected
- tool history was detected

Acceptance:

failures can be localized to prompt, parser, retry, or state management

12. Test Matrix

OpenAI:
- single-turn tool call
- multi-turn tool result continuation
- later turn without repeated tools
- forced tool
- tool_choice=any
- tool_choice=none
- parallel_tool_calls=false
Anthropic:
- single-turn tool_use
- multi-turn tool_result continuation
- later turn without repeated tools
- streaming tool_use
- tool_choice=any / tool_choice=none
error cases:
- refusal
- invalid JSON
- multiple action blocks
- plain-text final answer

Acceptance:

both “first tool turn” and “second-turn continuation” are covered

13. Recommended Next Priorities

If the system already works, the highest-value next improvements are:

stronger few-shot for “tool result arrives, then call another tool”
better history-aware retry policy
finer refusal categories
stronger parser tolerance
richer streaming behavior

5.1 KiB Raw Blame History

Tool Calling Implementation Checklist

1. Prompt Contract

2. Request Normalization

3. Tool History Projection

4. Tool Result Continuation

5. Parser Contract

6. Retry Policy

7. Refusal Detection

8. Response Re-encoding

9. Streaming Strategy

10. Multi-turn State Machine

11. Observability

12. Test Matrix

13. Recommended Next Priorities

5.1 KiB

Raw Blame History