Release v1.4.9 remote image routing

This commit is contained in:
lutc5
2026-05-07 16:44:59 +08:00
parent 68e7843a45
commit 86fbdbc40c
12 changed files with 892 additions and 89 deletions

View File

@@ -2,6 +2,15 @@
## Unreleased
## v1.4.9 - 2026-05-07
- Added Remote-mode image routing: image requests now use the proven Lingma IPC image pipeline instead of sending local/data URLs directly to the remote chat endpoint.
- Added mixed image + tool handling: the proxy extracts image context through IPC, then returns to Remote API native tool calling so clients still receive proper `tool_calls` / `tool_use`.
- Fixed multi-turn image follow-ups by reusing the most recent user image from request history when the latest user turn says things like "continue based on the previous image".
- Improved Remote API tool compatibility by forwarding structured messages, tool definitions, tool choice, and native remote tool-call deltas instead of prompt-emulating tools in Remote mode.
- Added regression tests for remote structured tools, image routing, image-context injection, and previous-turn image reuse.
- Verified the production desktop app launch path from `/Applications/Lingma Proxy.app`, including pure image, multi-turn image, and image + forced tool-call requests.
## v1.4.8 - 2026-05-06
- Fixed Remote API base URL auto-detection so Lingma OSS/static asset hosts are rejected and cannot be used as API endpoints.

View File

@@ -13,7 +13,7 @@ The proxy now supports two backend modes:
## Current Version
The current desktop line is `v1.4.8`.
The current desktop line is `v1.4.9`.
See [CHANGELOG.md](./CHANGELOG.md) for release history.
@@ -90,6 +90,7 @@ Compared with the original protocol proof of concept, this repository focuses on
- **Anthropic streaming tool-call hardening** so streaming clients such as Claude Code receive final `tool_use` events instead of premature refusal text when tools are present.
- **Image input** for OpenAI `image_url` and Anthropic image blocks.
- **Local and remote image normalization** for data URLs, HTTP URLs, `file://` URLs, and absolute local paths, with automatic JPEG downscaling for large images.
- **Remote-mode image fallback** so image requests use the proven Lingma IPC image pipeline; image + tool requests extract image context through IPC and then return to Remote API native tool calling.
- **Request log image redaction** so large base64 payloads are visible as image markers instead of breaking the desktop log view.
- **More request parameter compatibility** so stricter clients can connect without custom patches.
- **Full request and response recording** in the desktop app for debugging 400/500 errors.
@@ -130,9 +131,12 @@ flowchart LR
Service --> Session["Session Manager"]
Service --> Tools["Tool Emulation"]
Service --> Models["Model Discovery"]
Service --> Images["Image Router"]
Service --> Backend{"Backend Mode"}
Backend --> Transport["IPC Plugin Transport"]
Backend --> Remote["Remote API Client"]
Images -->|"image requests"| Transport
Images -->|"image + tools: extract context"| Remote
Transport --> Pipe["Windows Named Pipe"]
Transport --> WS["macOS / Windows WebSocket"]
Pipe --> Lingma["Tongyi Lingma IDE Plugin"]
@@ -221,6 +225,7 @@ Notes:
- If your Lingma plugin uses a dedicated domain, remote mode first uses `--remote-base-url`, `LINGMA_REMOTE_BASE_URL`, or the JSON config field. If those are empty, it scans Lingma's local logs on macOS, Windows, and Linux for endpoint hints such as `endpoint config:` and marketplace service URLs.
- The desktop Settings page shows the resolved remote domain and detection source without exposing tokens.
- `/v1/models` in remote mode returns remote API model keys, which may not match the IPC plugin display IDs such as `MiniMax-M2.7` or `Kimi-K2.6`.
- Image requests in remote mode are routed through the IPC image pipeline because the direct remote chat endpoint ignores local `file://` and data URL image payloads. If a request also contains tools, Lingma Proxy first extracts image context through IPC and then sends the tool-capable turn through Remote API native tool calling.
- Local validation passed `/health`, `/v1/models`, OpenAI streaming/non-streaming chat, and Claude Code Anthropic + Bash tool use. Claude Code full tool runs are much slower than simple OpenAI requests because the client sends a large context and performs a second tool-result turn.
- This mode is inspired by the remote API and credential-signing research in [ZipperCode/lingma2api](https://github.com/ZipperCode/lingma2api), integrated here as a switchable backend under the existing OpenAI / Anthropic / desktop app architecture.

View File

@@ -16,7 +16,7 @@
## 当前版本
当前桌面端版本线:`v1.4.8`
当前桌面端版本线:`v1.4.9`
版本更新记录见 [CHANGELOG.md](./CHANGELOG.md)。
@@ -53,6 +53,7 @@ GitHub Actions 会在 Release 中产出:
| Function Calling / Tools | 支持,使用工具调用模拟实现 |
| 多轮 Agent 工具循环 | 支持 |
| 图片输入 | 支持 base64、data URL、HTTP URL |
| 远端模式图片兜底 | 有图请求使用 IPC 图片链路;图片 + 工具请求先提取图片上下文,再回到 Remote API 原生工具调用 |
| 请求 / 响应完整日志 | 桌面端支持完整查看和复制 |
| 后端模式切换 | 支持 IPC 插件模式 / 远端 API 模式 |
| macOS WebSocket 自动探测 | 支持 |
@@ -178,9 +179,12 @@ flowchart LR
Service --> Tooling["工具调用模拟"]
Service --> Model["模型探测"]
Service --> Recorder["请求 / 日志记录"]
Service --> Images["图片路由"]
Service --> Backend{"后端模式"}
Backend --> Transport["IPC 插件传输层"]
Backend --> Remote["远端 API 客户端"]
Images -->|"有图请求"| Transport
Images -->|"图片 + 工具:提取图片上下文"| Remote
Transport --> Pipe["Windows Named Pipe"]
Transport --> WS["WebSocket"]
Pipe --> Lingma["通义灵码 IDE 插件"]
@@ -287,6 +291,7 @@ lingma-proxy \
- 如果 Lingma 插件配置过专属域名,远端模式会优先使用 `--remote-base-url``LINGMA_REMOTE_BASE_URL` 或配置文件;这些为空时,会扫描 macOS、Windows、Linux 上 Lingma 本地日志里的 `endpoint config:`、Marketplace service URL 等线索。
- 桌面端设置页会展示当前解析到的远端域名和来源,但不会展示 token / key 明文。
- 远端模式的 `/v1/models` 返回的是远端接口模型 key不一定等同于 IPC 插件模式里看到的 `MiniMax-M2.7``Kimi-K2.6` 等展示名。
- 远端模式下的图片请求会自动走 IPC 图片链路,因为直连远端聊天接口不会直接消费本地 `file://` 和 data URL 图片。若请求同时带工具,代理会先通过 IPC 提取图片上下文,再把不含图片但包含上下文的请求交给 Remote API 原生工具调用。
- 当前本机实测:`/health``/v1/models`、OpenAI 流式 / 非流式、Claude Code Anthropic + Bash 工具调用均可用Claude Code 完整工具链耗时明显高于简单 OpenAI 请求。
- 该模式参考了 [ZipperCode/lingma2api](https://github.com/ZipperCode/lingma2api) 对 Lingma 远端接口、签名和登录态结构的探索,本仓库将其作为可切换后端集成到现有 OpenAI / Anthropic / 桌面 App 架构中。

View File

@@ -252,7 +252,7 @@ onUnmounted(() => {
<span class="status-dot" :class="{ running: status.running }"></span>
<div>
<strong>{{ status.running ? 'Proxy Running' : 'Proxy Stopped' }}</strong>
<small>v1.4.8</small>
<small>v1.4.9</small>
</div>
</div>
</aside>

View File

@@ -11,6 +11,6 @@
"email": "lutc5@asiainfo.com"
},
"info": {
"productVersion": "1.4.8"
"productVersion": "1.4.9"
}
}

View File

@@ -1208,7 +1208,7 @@ func (s *Server) handleOpenAIStream(w http.ResponseWriter, r *http.Request, req
}
func shouldAggregateToolStream(req service.ChatRequest) bool {
return len(req.Tools) > 0 && truthyEnv("LINGMA_AGGREGATE_TOOL_STREAM")
return len(req.Tools) > 0
}
type toolStreamFilter struct {
@@ -1450,20 +1450,18 @@ func normalizeAnthropicRequest(req anthropicRequest) (service.ChatRequest, error
case "user":
text, toolResults := extractAnthropicUserContent(message.Content)
images := extractAnthropicImages(message.Content)
for _, tr := range toolResults {
prompt := toolemulation.ActionOutputPrompt(tr.ToolUseID, tr.Content)
if prompt != "" {
messages = append(messages, service.ChatMessage{Role: "user", Text: prompt})
}
}
if text != "" || len(images) > 0 {
messages = append(messages, service.ChatMessage{Role: role, Text: text, Images: images})
}
for _, tr := range toolResults {
if strings.TrimSpace(tr.Content) != "" {
messages = append(messages, service.ChatMessage{Role: "tool", Text: tr.Content, ToolCallID: tr.ToolUseID})
}
}
case "assistant":
text, calls := extractAnthropicAssistantContent(message.Content)
projected := toolemulation.AssistantToolCallsToText(text, calls)
if projected != "" {
messages = append(messages, service.ChatMessage{Role: role, Text: projected})
if text != "" || len(calls) > 0 {
messages = append(messages, service.ChatMessage{Role: role, Text: text, ToolCalls: calls})
}
}
}
@@ -1510,19 +1508,15 @@ func normalizeOpenAIRequest(req openAIChatRequest) (service.ChatRequest, error)
case "assistant":
text := strings.TrimSpace(extractText(message.Content))
calls := extractOpenAIToolCalls(message.ToolCalls)
projected := toolemulation.AssistantToolCallsToText(text, calls)
if projected != "" {
messages = append(messages, service.ChatMessage{Role: role, Text: projected})
if text != "" || len(calls) > 0 {
messages = append(messages, service.ChatMessage{Role: role, Text: text, ToolCalls: calls})
}
case "tool":
output := strings.TrimSpace(extractText(message.Content))
if output == "" || message.ToolCallID == "" {
continue
}
prompt := toolemulation.ActionOutputPrompt(message.ToolCallID, output)
if prompt != "" {
messages = append(messages, service.ChatMessage{Role: "user", Text: prompt})
}
messages = append(messages, service.ChatMessage{Role: "tool", Text: output, ToolCallID: message.ToolCallID})
}
}
if len(messages) == 0 {

View File

@@ -17,6 +17,8 @@ import (
"strconv"
"strings"
"time"
"lingma-ipc-proxy/internal/toolemulation"
)
const (
@@ -55,8 +57,27 @@ type Model struct {
type ChatRequest struct {
Model string
Prompt string
Messages []Message
Images []Image
Stream bool
Temperature *float64
Tools []toolemulation.ToolDef
ToolChoice toolemulation.ToolChoice
}
type Image struct {
MediaType string
Data string
URL string
}
type Message struct {
Role string
Content string
Images []Image
Name string
ToolCallID string
ToolCalls []toolemulation.ToolCall
}
type ChatResult struct {
@@ -65,6 +86,7 @@ type ChatResult struct {
OutputTokens int
RequestID string
CredentialSrc string
ToolCalls []toolemulation.ToolCall
}
type StreamEvent struct {
@@ -186,10 +208,14 @@ func (c *Client) Chat(ctx context.Context, request ChatRequest, onDelta func(str
return nil, fmt.Errorf("remote chat status %d: %s", resp.StatusCode, truncate(string(respBody), 1000))
}
var builder strings.Builder
toolCallBuffer := newRemoteToolCallBuffer()
if err := scanSSE(resp.Body, func(event sseEvent) error {
if event.Done {
return nil
}
if len(event.ToolCalls) > 0 {
toolCallBuffer.Add(event.ToolCalls)
}
if event.Content == "" {
return nil
}
@@ -208,6 +234,7 @@ func (c *Client) Chat(ctx context.Context, request ChatRequest, onDelta func(str
OutputTokens: estimateTokens(text),
RequestID: requestID,
CredentialSrc: cred.Source,
ToolCalls: toolCallBuffer.Calls(),
}, nil
}
@@ -220,12 +247,13 @@ func (c *Client) buildBody(requestID string, request ChatRequest) (string, error
if strings.EqualFold(model, "auto") {
model = ""
}
imageURLs := projectImages(request.Images)
payload := map[string]any{
"request_id": requestID,
"request_set_id": "",
"chat_record_id": requestID,
"stream": true,
"image_urls": nil,
"image_urls": nullableSlice(imageURLs),
"is_reply": false,
"is_retry": false,
"session_id": "",
@@ -242,26 +270,14 @@ func (c *Client) buildBody(requestID string, request ChatRequest) (string, error
"display_name": "",
"model": model,
"format": "",
"is_vl": false,
"is_vl": len(imageURLs) > 0,
"is_reasoning": false,
"api_key": "",
"url": "",
"source": "",
"enable": false,
},
"messages": []map[string]any{{
"role": "user",
"content": request.Prompt,
"response_meta": map[string]any{
"id": "",
"usage": map[string]int{
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
},
"reasoning_content_signature": "",
}},
"messages": projectMessages(request),
"business": map[string]any{
"product": "jb_plugin",
"version": c.cfg.CosyVersion,
@@ -272,10 +288,193 @@ func (c *Client) buildBody(requestID string, request ChatRequest) (string, error
"name": "memory_intent_recognition_" + requestID,
},
}
if tools := projectTools(request.Tools); len(tools) > 0 {
payload["tools"] = tools
}
if choice := projectToolChoice(request.ToolChoice); choice != nil {
payload["tool_choice"] = choice
}
body, err := json.Marshal(payload)
return string(body), err
}
func nullableSlice[T any](items []T) any {
if len(items) == 0 {
return nil
}
return items
}
func projectImages(images []Image) []string {
if len(images) == 0 {
return nil
}
out := make([]string, 0, len(images))
for _, img := range images {
item := projectImage(img)
if item != "" {
out = append(out, item)
}
}
return out
}
func projectImage(img Image) string {
if strings.TrimSpace(img.Data) == "" && strings.TrimSpace(img.URL) == "" {
return ""
}
mediaType := strings.TrimSpace(img.MediaType)
if mediaType == "" {
mediaType = "image/jpeg"
}
if strings.TrimSpace(img.Data) != "" {
return "data:" + mediaType + ";base64," + strings.TrimSpace(img.Data)
}
return strings.TrimSpace(img.URL)
}
func projectMessages(request ChatRequest) []map[string]any {
source := request.Messages
if len(source) == 0 {
source = []Message{{Role: "user", Content: request.Prompt}}
}
out := make([]map[string]any, 0, len(source))
for _, message := range source {
role := strings.TrimSpace(message.Role)
if role == "" {
continue
}
item := map[string]any{
"role": role,
"content": projectMessageContent(message),
"response_meta": map[string]any{
"id": "",
"usage": map[string]int{
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
},
"reasoning_content_signature": "",
}
if message.Name != "" {
item["name"] = message.Name
}
if message.ToolCallID != "" {
item["tool_call_id"] = message.ToolCallID
}
if calls := projectMessageToolCalls(message.ToolCalls); len(calls) > 0 {
item["tool_calls"] = calls
}
out = append(out, item)
}
if len(out) == 0 {
return []map[string]any{{"role": "user", "content": request.Prompt}}
}
return out
}
func projectMessageContent(message Message) any {
if len(message.Images) == 0 {
return message.Content
}
content := make([]map[string]any, 0, len(message.Images)+1)
if strings.TrimSpace(message.Content) != "" {
content = append(content, map[string]any{
"type": "text",
"text": message.Content,
})
}
for _, img := range message.Images {
imageURL := projectImage(img)
if imageURL == "" {
continue
}
content = append(content, map[string]any{
"type": "image_url",
"image_url": map[string]any{
"url": imageURL,
},
})
}
if len(content) == 0 {
return message.Content
}
return content
}
func projectMessageToolCalls(calls []toolemulation.ToolCall) []map[string]any {
if len(calls) == 0 {
return nil
}
out := make([]map[string]any, 0, len(calls))
for i, call := range calls {
name := strings.TrimSpace(call.Name)
if name == "" {
continue
}
args, _ := json.Marshal(call.Arguments)
out = append(out, map[string]any{
"index": i,
"id": strings.TrimSpace(call.ID),
"type": "function",
"function": map[string]any{
"name": name,
"arguments": string(args),
},
})
}
return out
}
func projectTools(tools []toolemulation.ToolDef) []map[string]any {
if len(tools) == 0 {
return nil
}
out := make([]map[string]any, 0, len(tools))
for _, tool := range tools {
name := strings.TrimSpace(tool.Name)
if name == "" {
continue
}
params := any(tool.InputSchema)
if len(tool.InputSchema) == 0 {
params = map[string]any{"type": "object", "properties": map[string]any{}}
}
out = append(out, map[string]any{
"type": "function",
"function": map[string]any{
"name": name,
"description": strings.TrimSpace(tool.Description),
"parameters": params,
},
})
}
return out
}
func projectToolChoice(choice toolemulation.ToolChoice) any {
switch choice.Mode {
case "none":
return "none"
case "any":
return "required"
case "tool":
name := strings.TrimSpace(choice.Name)
if name == "" {
return nil
}
return map[string]any{
"type": "function",
"function": map[string]any{
"name": name,
},
}
default:
return nil
}
}
func (c *Client) headers(cred Credential, path string, body string) (map[string]string, error) {
if err := validateCredential(cred); err != nil {
return nil, err
@@ -335,15 +534,35 @@ type innerSSE struct {
Choices []struct {
Delta struct {
Content string `json:"content"`
ToolCalls []remoteToolCallDelta `json:"tool_calls"`
} `json:"delta"`
} `json:"choices"`
}
type sseEvent struct {
Content string
ToolCalls []remoteToolCallFragment
Done bool
}
type remoteToolCallFragment struct {
Index int
ID string
Type string
Name string
ArgumentsFragment string
}
type remoteToolCallDelta struct {
Index int `json:"index"`
ID string `json:"id,omitempty"`
Type string `json:"type,omitempty"`
Function struct {
Name string `json:"name,omitempty"`
Arguments string `json:"arguments,omitempty"`
} `json:"function,omitempty"`
}
func scanSSE(reader io.Reader, onEvent func(sseEvent) error) error {
scanner := bufio.NewScanner(reader)
scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
@@ -389,10 +608,94 @@ func parseSSEPayload(payload string) (sseEvent, bool, error) {
return sseEvent{}, false, err
}
var builder strings.Builder
var toolCalls []remoteToolCallFragment
for _, choice := range inner.Choices {
builder.WriteString(choice.Delta.Content)
for _, tc := range choice.Delta.ToolCalls {
toolCalls = append(toolCalls, remoteToolCallFragment{
Index: tc.Index,
ID: strings.TrimSpace(tc.ID),
Type: strings.TrimSpace(tc.Type),
Name: strings.TrimSpace(tc.Function.Name),
ArgumentsFragment: tc.Function.Arguments,
})
}
return sseEvent{Content: builder.String()}, true, nil
}
return sseEvent{Content: builder.String(), ToolCalls: toolCalls}, true, nil
}
type remoteToolCallBuffer struct {
order []int
states map[int]*remoteToolCallState
}
type remoteToolCallState struct {
id string
callType string
name string
arguments strings.Builder
}
func newRemoteToolCallBuffer() *remoteToolCallBuffer {
return &remoteToolCallBuffer{states: map[int]*remoteToolCallState{}}
}
func (b *remoteToolCallBuffer) Add(fragments []remoteToolCallFragment) {
if b == nil {
return
}
for _, fragment := range fragments {
state := b.states[fragment.Index]
if state == nil {
state = &remoteToolCallState{}
b.states[fragment.Index] = state
b.order = append(b.order, fragment.Index)
}
if fragment.ID != "" {
state.id = fragment.ID
}
if fragment.Type != "" {
state.callType = fragment.Type
}
if fragment.Name != "" {
state.name = fragment.Name
}
if fragment.ArgumentsFragment != "" {
state.arguments.WriteString(fragment.ArgumentsFragment)
}
}
}
func (b *remoteToolCallBuffer) Calls() []toolemulation.ToolCall {
if b == nil || len(b.order) == 0 {
return nil
}
out := make([]toolemulation.ToolCall, 0, len(b.order))
for _, index := range b.order {
state := b.states[index]
if state == nil || strings.TrimSpace(state.name) == "" {
continue
}
args := strings.TrimSpace(state.arguments.String())
call := toolemulation.ToolCall{
ID: strings.TrimSpace(state.id),
Name: strings.TrimSpace(state.name),
Arguments: map[string]any{},
}
if args != "" {
var parsed map[string]any
if err := json.Unmarshal([]byte(args), &parsed); err == nil {
call.Arguments = parsed
} else {
call.Arguments = map[string]any{"raw_arguments": args}
}
}
if call.ID == "" {
call.ID = fmt.Sprintf("toolu_%d_%d", time.Now().UnixNano(), index)
}
out = append(out, call)
}
return out
}
func candidateConfigFiles() []string {

View File

@@ -1,11 +1,14 @@
package remote
import (
"encoding/json"
"os"
"path/filepath"
"strings"
"testing"
"time"
"lingma-ipc-proxy/internal/toolemulation"
)
func TestNewKeepsZeroTimeoutUnlimited(t *testing.T) {
@@ -93,6 +96,171 @@ func TestModelListStatusErrorSuggestsManualRemoteBaseURLOn404(t *testing.T) {
}
}
func TestBuildBodyProjectsNativeTools(t *testing.T) {
client := New(Config{})
body, err := client.buildBody("req-1", ChatRequest{
Model: "kmodel",
Prompt: "read file",
Tools: []toolemulation.ToolDef{{
Name: "read_file",
Description: "Read a local file",
InputSchema: map[string]any{
"type": "object",
"properties": map[string]any{
"file_path": map[string]any{"type": "string"},
},
"required": []any{"file_path"},
},
}},
ToolChoice: toolemulation.ToolChoice{Mode: "tool", Name: "read_file"},
})
if err != nil {
t.Fatal(err)
}
var payload map[string]any
if err := json.Unmarshal([]byte(body), &payload); err != nil {
t.Fatal(err)
}
tools, ok := payload["tools"].([]any)
if !ok || len(tools) != 1 {
t.Fatalf("tools = %#v", payload["tools"])
}
tool := tools[0].(map[string]any)
fn := tool["function"].(map[string]any)
if tool["type"] != "function" || fn["name"] != "read_file" {
t.Fatalf("unexpected tool projection: %#v", tool)
}
choice := payload["tool_choice"].(map[string]any)
choiceFn := choice["function"].(map[string]any)
if choice["type"] != "function" || choiceFn["name"] != "read_file" {
t.Fatalf("unexpected tool choice: %#v", payload["tool_choice"])
}
}
func TestBuildBodyPreservesStructuredToolMessages(t *testing.T) {
client := New(Config{})
body, err := client.buildBody("req-1", ChatRequest{
Model: "kmodel",
Prompt: "fallback prompt",
Messages: []Message{
{Role: "user", Content: "查看项目"},
{Role: "assistant", ToolCalls: []toolemulation.ToolCall{{
ID: "call_1",
Name: "Bash",
Arguments: map[string]any{"command": "pwd && ls -la"},
}}},
{Role: "tool", ToolCallID: "call_1", Content: "total 10"},
},
})
if err != nil {
t.Fatal(err)
}
var payload map[string]any
if err := json.Unmarshal([]byte(body), &payload); err != nil {
t.Fatal(err)
}
messages := payload["messages"].([]any)
if len(messages) != 3 {
t.Fatalf("messages = %#v", messages)
}
assistant := messages[1].(map[string]any)
calls := assistant["tool_calls"].([]any)
call := calls[0].(map[string]any)
fn := call["function"].(map[string]any)
args := fn["arguments"].(string)
if assistant["role"] != "assistant" || fn["name"] != "Bash" || !strings.Contains(args, "pwd") || !strings.Contains(args, "ls -la") {
t.Fatalf("unexpected assistant message: %#v", assistant)
}
tool := messages[2].(map[string]any)
if tool["role"] != "tool" || tool["tool_call_id"] != "call_1" || tool["content"] != "total 10" {
t.Fatalf("unexpected tool message: %#v", tool)
}
}
func TestBuildBodyProjectsRemoteImages(t *testing.T) {
client := New(Config{})
body, err := client.buildBody("req-1", ChatRequest{
Model: "kmodel",
Prompt: "看图",
Messages: []Message{{
Role: "user",
Content: "看图",
Images: []Image{{
MediaType: "image/png",
Data: "iVBORw0KGgo=",
}},
}},
Images: []Image{{
MediaType: "image/png",
Data: "iVBORw0KGgo=",
}},
})
if err != nil {
t.Fatal(err)
}
var payload map[string]any
if err := json.Unmarshal([]byte(body), &payload); err != nil {
t.Fatal(err)
}
images, ok := payload["image_urls"].([]any)
if !ok || len(images) != 1 {
t.Fatalf("image_urls = %#v", payload["image_urls"])
}
image, ok := images[0].(string)
if !ok || !strings.HasPrefix(image, "data:image/png;base64,") {
t.Fatalf("unexpected image projection: %#v", images[0])
}
modelConfig := payload["model_config"].(map[string]any)
if modelConfig["is_vl"] != true {
t.Fatalf("model_config.is_vl = %#v, want true", modelConfig["is_vl"])
}
messages := payload["messages"].([]any)
message := messages[0].(map[string]any)
content := message["content"].([]any)
if content[0].(map[string]any)["type"] != "text" || content[1].(map[string]any)["type"] != "image_url" {
t.Fatalf("unexpected message content: %#v", content)
}
}
func TestParseSSEPayloadExtractsNativeToolCallFragments(t *testing.T) {
payload := `{"body":"{\"choices\":[{\"delta\":{\"tool_calls\":[{\"index\":0,\"id\":\"call_1\",\"type\":\"function\",\"function\":{\"name\":\"read_file\",\"arguments\":\"{\\\"file_path\\\":\\\"/tmp/a.txt\\\"}\"}}]}}]}","statusCodeValue":200}`
event, ok, err := parseSSEPayload(payload)
if err != nil {
t.Fatal(err)
}
if !ok {
t.Fatal("event not parsed")
}
if len(event.ToolCalls) != 1 {
t.Fatalf("tool calls = %#v", event.ToolCalls)
}
call := event.ToolCalls[0]
if call.ID != "call_1" || call.Name != "read_file" || call.ArgumentsFragment != `{"file_path":"/tmp/a.txt"}` {
t.Fatalf("unexpected call = %#v", call)
}
}
func TestRemoteToolCallBufferMergesArgumentFragments(t *testing.T) {
buffer := newRemoteToolCallBuffer()
buffer.Add([]remoteToolCallFragment{{
Index: 0,
ID: "call_1",
Type: "function",
Name: "read_file",
}})
buffer.Add([]remoteToolCallFragment{{Index: 0, ArgumentsFragment: `{"file_path":"/tmp`}})
buffer.Add([]remoteToolCallFragment{{Index: 0, ArgumentsFragment: `/lingma-native`}})
buffer.Add([]remoteToolCallFragment{{Index: 0, ArgumentsFragment: `-tool-test.txt"}`}})
calls := buffer.Calls()
if len(calls) != 1 {
t.Fatalf("calls = %#v", calls)
}
call := calls[0]
if call.ID != "call_1" || call.Name != "read_file" || call.Arguments["file_path"] != "/tmp/lingma-native-tool-test.txt" {
t.Fatalf("unexpected merged call = %#v", call)
}
}
func TestExtractMachineIDFromTextMarkers(t *testing.T) {
got := extractMachineIDFromText(`2026-05-06 info using machine id from file: abcdef1234567890abcdef`)
if got != "abcdef1234567890abcdef" {

View File

@@ -65,6 +65,8 @@ type ChatMessage struct {
Role string
Text string
Images []Image
ToolCallID string
ToolCalls []toolemulation.ToolCall
}
type ChatRequest struct {
@@ -353,11 +355,17 @@ func (s *Service) generateRemote(
req ChatRequest,
onDelta func(string),
) (*ChatResult, error) {
if requestHasImages(req) {
if len(req.Tools) > 0 && req.ToolChoice.Mode != "none" {
return s.generateRemoteWithImageContext(ctx, req, onDelta)
}
return s.generateWithReconnect(ctx, req, onDelta)
}
if strings.TrimSpace(req.Model) == "" {
req.Model = s.DefaultModel()
}
req.Model = normalizeModelForBackend(BackendRemote, req.Model)
prompt, err := buildLingmaPrompt(req, SessionModeFresh)
prompt, err := buildLingmaPrompt(req, SessionModeFresh, false)
if err != nil {
return nil, err
}
@@ -383,6 +391,23 @@ func (s *Service) generateRemote(
return nil, lastErr
}
func (s *Service) generateRemoteWithImageContext(
ctx context.Context,
req ChatRequest,
onDelta func(string),
) (*ChatResult, error) {
imageReq := req
imageReq.Tools = nil
imageReq.ToolChoice = toolemulation.ToolChoice{Mode: "none"}
imageReq.ParallelToolCalls = nil
imageResult, err := s.generateWithReconnect(ctx, imageReq, nil)
if err != nil {
return nil, fmt.Errorf("image context extraction through IPC failed: %w", err)
}
remoteReq := requestWithImageContext(req, imageResult.Text)
return s.generateRemote(ctx, remoteReq, onDelta)
}
func (s *Service) generateRemoteWithModel(
ctx context.Context,
client *remote.Client,
@@ -403,12 +428,32 @@ func (s *Service) generateRemoteWithModel(
remoteResult, err := client.Chat(ctx, remote.ChatRequest{
Model: model,
Prompt: prompt,
Messages: remoteMessagesFromRequest(req),
Images: remoteImagesFromRequest(req),
Stream: onDelta != nil,
Temperature: req.Temperature,
Tools: req.Tools,
ToolChoice: req.ToolChoice,
}, delta)
if err != nil {
return nil, emitted, err
}
if len(remoteResult.ToolCalls) == 0 && shouldRetryRemoteNativeTool(req, remoteResult.Text) {
retryResult, retryErr := client.Chat(ctx, remote.ChatRequest{
Model: model,
Prompt: prompt,
Messages: remoteMessagesFromRequest(req),
Images: remoteImagesFromRequest(req),
Stream: false,
Temperature: req.Temperature,
Tools: req.Tools,
ToolChoice: toolemulation.ToolChoice{Mode: "any"},
}, nil)
if retryErr == nil && len(retryResult.ToolCalls) > 0 {
remoteResult = retryResult
emitted = false
}
}
result := &ChatResult{
Text: remoteResult.Text,
@@ -422,25 +467,133 @@ func (s *Service) generateRemoteWithModel(
Endpoint: remote.ResolveBaseURL(s.cfg.RemoteBaseURL),
Transport: "remote",
EffectiveSession: SessionModeFresh,
ToolCalls: remoteResult.ToolCalls,
}
s.applyToolEmulation(ctx, req, prompt, result, onDelta, func(hintPrompt string) (string, int, error) {
retryResult, retryErr := client.Chat(ctx, remote.ChatRequest{
Model: model,
Prompt: hintPrompt,
Stream: onDelta != nil,
Temperature: req.Temperature,
}, onDelta)
if retryErr != nil {
return "", 0, retryErr
}
if retryResult == nil {
return "", 0, nil
}
return retryResult.Text, retryResult.OutputTokens, nil
})
return result, emitted, nil
}
func remoteMessagesFromRequest(req ChatRequest) []remote.Message {
out := make([]remote.Message, 0, len(req.Messages)+1)
if system := strings.TrimSpace(req.System); system != "" {
out = append(out, remote.Message{Role: "system", Content: system})
}
for _, message := range req.Messages {
role := strings.ToLower(strings.TrimSpace(message.Role))
if role == "" {
continue
}
content := strings.TrimSpace(message.Text)
if content == "" && len(message.Images) == 0 && len(message.ToolCalls) == 0 {
continue
}
out = append(out, remote.Message{
Role: role,
Content: content,
Images: remoteImagesFromChatMessage(message),
ToolCallID: strings.TrimSpace(message.ToolCallID),
ToolCalls: message.ToolCalls,
})
}
return out
}
func remoteImagesFromChatMessage(message ChatMessage) []remote.Image {
if len(message.Images) == 0 {
return nil
}
images := make([]remote.Image, 0, len(message.Images))
for _, img := range message.Images {
if strings.TrimSpace(img.Data) == "" && strings.TrimSpace(img.URL) == "" {
continue
}
images = append(images, remote.Image{
MediaType: strings.TrimSpace(img.MediaType),
Data: img.Data,
URL: strings.TrimSpace(img.URL),
})
}
return images
}
func remoteImagesFromRequest(req ChatRequest) []remote.Image {
var images []remote.Image
for _, message := range req.Messages {
for _, img := range message.Images {
if strings.TrimSpace(img.Data) == "" && strings.TrimSpace(img.URL) == "" {
continue
}
images = append(images, remote.Image{
MediaType: strings.TrimSpace(img.MediaType),
Data: img.Data,
URL: strings.TrimSpace(img.URL),
})
}
}
return images
}
func requestHasImages(req ChatRequest) bool {
for _, message := range req.Messages {
if len(remoteImagesFromChatMessage(message)) > 0 {
return true
}
}
return false
}
func requestWithImageContext(req ChatRequest, imageContext string) ChatRequest {
out := req
out.Messages = make([]ChatMessage, len(req.Messages))
copy(out.Messages, req.Messages)
for i := range out.Messages {
out.Messages[i].Images = nil
}
contextText := strings.TrimSpace(imageContext)
if contextText == "" {
return out
}
addition := "\n\n[图片上下文]\n" + contextText
for i := len(out.Messages) - 1; i >= 0; i-- {
if strings.EqualFold(strings.TrimSpace(out.Messages[i].Role), "user") {
out.Messages[i].Text = strings.TrimSpace(out.Messages[i].Text + addition)
return out
}
}
out.Messages = append(out.Messages, ChatMessage{Role: "user", Text: strings.TrimSpace("[图片上下文]\n" + contextText)})
return out
}
func shouldRetryRemoteNativeTool(req ChatRequest, text string) bool {
if len(req.Tools) == 0 || req.ToolChoice.Mode == "none" {
return false
}
trimmed := strings.TrimSpace(text)
if trimmed == "" || len([]rune(trimmed)) > 180 {
return false
}
lower := strings.ToLower(trimmed)
cues := []string{
"让我", "我来", "我将", "接下来", "继续", "查看", "检查", "搜索", "读取", "运行", "执行",
"let me", "i'll", "i will", "next", "continue", "check", "inspect", "search", "read", "run",
}
hasCue := false
for _, cue := range cues {
if strings.Contains(lower, cue) {
hasCue = true
break
}
}
if !hasCue {
return false
}
return strings.HasSuffix(trimmed, ":") ||
strings.HasSuffix(trimmed, "") ||
strings.Contains(trimmed, "\n") ||
strings.Contains(lower, "use ") ||
strings.Contains(lower, "call ") ||
strings.Contains(trimmed, "工具")
}
func (s *Service) remoteAttemptModels(ctx context.Context, primary string) []string {
primary = normalizeModelForBackend(BackendRemote, primary)
models := []string{primary}
@@ -526,7 +679,7 @@ func (s *Service) generateLocked(
}
effectiveMode := resolveSessionMode(req, s.cfg.SessionMode)
prompt, err := buildLingmaPrompt(req, effectiveMode)
prompt, err := buildLingmaPrompt(req, effectiveMode, true)
if err != nil {
return nil, err
}
@@ -1078,14 +1231,14 @@ func resolveSessionMode(req ChatRequest, configured SessionMode) SessionMode {
func extractLastUserImages(messages []ChatMessage) []Image {
for i := len(messages) - 1; i >= 0; i-- {
if messages[i].Role == "user" {
if messages[i].Role == "user" && len(messages[i].Images) > 0 {
return messages[i].Images
}
}
return nil
}
func buildLingmaPrompt(req ChatRequest, mode SessionMode) (string, error) {
func buildLingmaPrompt(req ChatRequest, mode SessionMode, emulateTools bool) (string, error) {
messages := filteredMessages(req.Messages)
var lastUser string
for i := len(messages) - 1; i >= 0; i-- {
@@ -1102,7 +1255,7 @@ func buildLingmaPrompt(req ChatRequest, mode SessionMode) (string, error) {
}
system := strings.TrimSpace(req.System)
if len(req.Tools) > 0 && req.ToolChoice.Mode != "none" {
if emulateTools && len(req.Tools) > 0 && req.ToolChoice.Mode != "none" {
system = toolemulation.InjectTooling(system, req.Tools, req.ToolChoice, req.ParallelToolCalls)
}
@@ -1110,7 +1263,7 @@ func buildLingmaPrompt(req ChatRequest, mode SessionMode) (string, error) {
return lastUser, nil
}
if len(req.Tools) > 0 {
if emulateTools && len(req.Tools) > 0 {
parts := make([]string, 0, len(messages)+3)
for _, message := range messages {
role := "User"
@@ -1152,6 +1305,10 @@ func filteredMessages(messages []ChatMessage) []ChatMessage {
if text == "" {
continue
}
if role == "tool" {
text = toolemulation.ActionOutputPrompt(message.ToolCallID, text)
role = "user"
}
if role != "user" && role != "assistant" {
continue
}

View File

@@ -3,8 +3,11 @@ package service
import (
"context"
"errors"
"strings"
"testing"
"time"
"lingma-ipc-proxy/internal/toolemulation"
)
func TestIsRecoverableIPCError(t *testing.T) {
@@ -48,3 +51,126 @@ func TestContextWithOptionalTimeoutPositiveSetsDeadline(t *testing.T) {
t.Fatal("positive timeout should set a deadline")
}
}
func TestBuildLingmaPromptOnlyInjectsToolingWhenEmulationEnabled(t *testing.T) {
req := ChatRequest{
Messages: []ChatMessage{{Role: "user", Text: "查看项目结构"}},
Tools: []toolemulation.ToolDef{{
Name: "Bash",
InputSchema: map[string]any{
"properties": map[string]any{
"command": map[string]any{"type": "string"},
},
"required": []any{"command"},
},
}},
ToolChoice: toolemulation.ToolChoice{Mode: "auto"},
}
remotePrompt, err := buildLingmaPrompt(req, SessionModeFresh, false)
if err != nil {
t.Fatal(err)
}
if strings.Contains(remotePrompt, "```json action") || strings.Contains(remotePrompt, "DIRECT tool access") {
t.Fatalf("remote prompt should not include tool emulation:\n%s", remotePrompt)
}
ipcPrompt, err := buildLingmaPrompt(req, SessionModeFresh, true)
if err != nil {
t.Fatal(err)
}
if !strings.Contains(ipcPrompt, "```json action") || !strings.Contains(ipcPrompt, "DIRECT tool access") {
t.Fatalf("ipc prompt should include tool emulation:\n%s", ipcPrompt)
}
}
func TestShouldRetryRemoteNativeToolForContinuationText(t *testing.T) {
req := ChatRequest{
Tools: []toolemulation.ToolDef{{Name: "Bash"}},
ToolChoice: toolemulation.ToolChoice{
Mode: "auto",
},
}
if !shouldRetryRemoteNativeTool(req, "让我查看一下项目的整体结构,特别是源代码目录:") {
t.Fatal("expected continuation text to trigger native tool retry")
}
if shouldRetryRemoteNativeTool(req, "这是一个 uni-app 项目,核心目录是 src。") {
t.Fatal("substantive answer should not trigger retry")
}
req.ToolChoice = toolemulation.ToolChoice{Mode: "none"}
if shouldRetryRemoteNativeTool(req, "让我查看一下:") {
t.Fatal("tool_choice none should not trigger retry")
}
}
func TestBuildLingmaPromptKeepsToolResultsForIPC(t *testing.T) {
req := ChatRequest{
Messages: []ChatMessage{
{Role: "user", Text: "查看项目"},
{Role: "assistant", ToolCalls: []toolemulation.ToolCall{{ID: "call_1", Name: "Bash", Arguments: map[string]any{"command": "pwd"}}}},
{Role: "tool", ToolCallID: "call_1", Text: "/tmp/project"},
},
Tools: []toolemulation.ToolDef{{Name: "Bash"}},
ToolChoice: toolemulation.ToolChoice{Mode: "auto"},
}
prompt, err := buildLingmaPrompt(req, SessionModeFresh, true)
if err != nil {
t.Fatal(err)
}
if !strings.Contains(prompt, "Tool result for call_1") || !strings.Contains(prompt, "/tmp/project") {
t.Fatalf("ipc prompt should include tool result:\n%s", prompt)
}
if strings.Contains(prompt, "Assistant used tool") {
t.Fatalf("ipc prompt should not include textualized assistant tool calls:\n%s", prompt)
}
}
func TestRemoteImagesFromRequest(t *testing.T) {
req := ChatRequest{Messages: []ChatMessage{{Role: "user", Text: "see", Images: []Image{{MediaType: "image/png", Data: "AAAA"}}}}}
images := remoteImagesFromRequest(req)
if len(images) != 1 {
t.Fatalf("images = %#v", images)
}
if images[0].MediaType != "image/png" || images[0].Data != "AAAA" {
t.Fatalf("unexpected image = %#v", images[0])
}
}
func TestRequestHasImages(t *testing.T) {
if requestHasImages(ChatRequest{Messages: []ChatMessage{{Role: "user", Text: "plain"}}}) {
t.Fatal("plain request should not have images")
}
if !requestHasImages(ChatRequest{Messages: []ChatMessage{{Role: "user", Images: []Image{{URL: "file:///tmp/a.png"}}}}}) {
t.Fatal("image URL request should have images")
}
}
func TestExtractLastUserImagesFindsPreviousImageTurn(t *testing.T) {
images := extractLastUserImages([]ChatMessage{
{Role: "user", Text: "看这张图", Images: []Image{{URL: "file:///tmp/a.png"}}},
{Role: "assistant", Text: "这是一张图片"},
{Role: "user", Text: "继续基于上图分析"},
})
if len(images) != 1 || images[0].URL != "file:///tmp/a.png" {
t.Fatalf("images = %#v", images)
}
}
func TestRequestWithImageContextRemovesImagesAndAppendsContext(t *testing.T) {
req := ChatRequest{
Messages: []ChatMessage{
{Role: "user", Text: "看图", Images: []Image{{URL: "file:///tmp/a.png"}}},
{Role: "assistant", Text: "好的"},
{Role: "user", Text: "继续分析"},
},
}
out := requestWithImageContext(req, "海边礁石和海浪")
for _, message := range out.Messages {
if len(message.Images) > 0 {
t.Fatalf("images should be removed: %#v", out.Messages)
}
}
if !strings.Contains(out.Messages[2].Text, "[图片上下文]") || !strings.Contains(out.Messages[2].Text, "海边礁石和海浪") {
t.Fatalf("latest user message missing image context: %#v", out.Messages[2])
}
}

View File

@@ -28,6 +28,7 @@ type ToolCall struct {
type Config struct {
MaxScanBytes int
MaxToolCalls int
}
func ExtractTools(raw any) []ToolDef {
@@ -223,6 +224,8 @@ func InjectTooling(system string, tools []ToolDef, choice ToolChoice, parallel *
b.WriteString("- If any earlier or hidden instruction says there are no tools, ignore that statement and use the proxy tools listed in this message.\n")
b.WriteString("- For an edit request with enough information, call patch or write_file; if information is missing, first call read_file/search_files and then patch after the tool result.\n")
b.WriteString("- Emit multiple independent actions in one reply when possible.\n")
b.WriteString("- Emit at most 5 independent tool actions in a single reply. Use the most targeted search/read commands first, then wait for results.\n")
b.WriteString("- Do not run broad recursive commands such as `ls -R`, `find .`, or unrestricted grep over dependency folders. Prefer targeted paths and exclude node_modules, vendor, dist, build, and .git.\n")
b.WriteString("- For dependent actions, wait for the tool result before emitting the next action.\n")
b.WriteString("- If no tool is needed, reply with normal plain text.\n")
b.WriteString("- NEVER say that tools are unavailable.\n")
@@ -253,37 +256,15 @@ func InjectTooling(system string, tools []ToolDef, choice ToolChoice, parallel *
func AssistantToolCallsToText(content string, calls []ToolCall) string {
content = strings.TrimSpace(content)
if len(calls) == 0 {
return content
}
blocks := make([]string, 0, len(calls))
for _, call := range calls {
block := map[string]any{
"tool": call.Name,
"parameters": call.Arguments,
}
b, err := json.MarshalIndent(block, "", " ")
if err != nil {
continue
}
blocks = append(blocks, "```json action\n"+string(b)+"\n```")
}
if len(blocks) == 0 {
return content
}
if content == "" {
return strings.Join(blocks, "\n\n")
}
return content + "\n\n" + strings.Join(blocks, "\n\n")
}
func ActionOutputPrompt(toolCallID string, output string) string {
output = strings.TrimSpace(output)
if output == "" {
return ""
}
next := "Based on the tool result above, answer the user's request directly if you have enough information. Only use another structured action block if a specific missing fact still requires another tool call."
next := "Based on the tool result above, answer the user's request directly if you have enough information. Only use another tool call if a specific missing fact still requires it."
if id := strings.TrimSpace(toolCallID); id != "" {
return "Tool result for " + id + ":\n" + output + "\n\n" + next
}
@@ -605,6 +586,11 @@ func ParseActionBlocks(text string, tools []ToolDef, cfg Config) ([]ToolCall, st
type span struct{ start, end int }
spans := make([]span, 0, len(openings))
calls := make([]ToolCall, 0, len(openings))
seen := map[string]bool{}
maxCalls := cfg.MaxToolCalls
if maxCalls <= 0 {
maxCalls = 8
}
for _, start := range openings {
contentStart := start
@@ -634,8 +620,16 @@ func ParseActionBlocks(text string, tools []ToolDef, cfg Config) ([]ToolCall, st
continue
}
}
calls = append(calls, call)
spans = append(spans, span{start: start, end: end + 3})
key := toolCallKey(call)
if seen[key] {
continue
}
seen[key] = true
if len(calls) >= maxCalls {
continue
}
calls = append(calls, call)
}
if len(calls) == 0 {
@@ -653,6 +647,11 @@ func ParseActionBlocks(text string, tools []ToolDef, cfg Config) ([]ToolCall, st
return calls, strings.TrimSpace(clean), nil
}
func toolCallKey(call ToolCall) string {
args, _ := json.Marshal(call.Arguments)
return strings.ToLower(strings.TrimSpace(call.Name)) + "\x00" + string(args)
}
func normalizeToolName(raw string, available map[string]string) string {
name := strings.TrimSpace(raw)
if name == "" {

View File

@@ -86,6 +86,8 @@ func TestInjectToolingIncludesAutoToolGuidance(t *testing.T) {
"Core tool syntax examples",
"conceptual question",
"NEVER ask the user to run a command",
"Emit at most 5 independent tool actions",
"exclude node_modules",
} {
if !strings.Contains(prompt, want) {
t.Fatalf("prompt missing %q:\n%s", want, prompt)
@@ -176,3 +178,38 @@ func TestParseActionBlocksDropsCallsMissingRequiredArgs(t *testing.T) {
t.Fatalf("clean should preserve unparseable action block, got %q", clean)
}
}
func TestParseActionBlocksDeduplicatesAndLimitsCalls(t *testing.T) {
var b strings.Builder
for i := 0; i < 12; i++ {
command := "pwd"
if i%2 == 1 {
command = "ls " + string(rune('a'+i))
}
b.WriteString("```json action\n")
b.WriteString(`{"tool":"Bash","parameters":{"command":"` + command + `"}}`)
b.WriteString("\n```\n")
}
calls, clean, err := ParseActionBlocks(b.String(), []ToolDef{{
Name: "Bash",
InputSchema: map[string]any{
"properties": map[string]any{
"command": map[string]any{"type": "string"},
},
"required": []any{"command"},
},
}}, Config{MaxToolCalls: 3})
if err != nil {
t.Fatal(err)
}
if clean != "" {
t.Fatalf("clean = %q", clean)
}
if len(calls) != 3 {
t.Fatalf("call count = %d, calls = %+v", len(calls), calls)
}
if calls[0].Arguments["command"] != "pwd" {
t.Fatalf("first command = %+v", calls[0].Arguments)
}
}