单模型成本高风险大如何破：Spring AI多模型路由实战-成本直降70%可用性倍增

时间：2026-06-02 16:30:01 编辑：袖梨来源：一聚教程网

在AI应用开发中，单模型架构往往面临成本与可用性双重挑战。本文将展示如何通过Spring AI实现智能路由，显著降低运营成本的同时提升系统稳定性。

单模型成本高、风险大？Spring AI多模型路由实战：成本降70%，可用性更稳

许多AI应用初期采用简单架构：

单模型成本高、风险大？Spring AI多模型路由实战：成本降70%，可用性更稳

用户请求 -> GPT-4o -> 返回结果

这种设计在流量增长时会暴露三大痛点：

简单查询消耗高端模型资源，造成成本浪费
单一模型故障导致业务中断
难以整合不同价位模型形成质量梯度

解决方案是建立智能路由机制，根据任务复杂度分配模型资源。

本文实现的多模型路由方案包含：

SIMPLE：Ollama本地模型处理FAQ和短问答
MEDIUM：GPT-4o-mini负责摘要改写等中等任务
COMPLEX：GPT-4o处理代码生成等复杂需求
异常时按COMPLEX->MEDIUM->SIMPLE顺序降级
预算超限时自动切换到SIMPLE模式

1. 架构思路

Spring AI的ChatClient支持创建多个实例，官方文档明确推荐将不同模型用于不同场景。核心架构如下：

Request
  -> Complexity Assessor
  -> Model Router
      -> SIMPLE  -> Ollama(qwen3:4b)
      -> MEDIUM  -> GPT-4o-mini
      -> COMPLEX -> GPT-4o
  -> Fallback Chain
  -> Cost Monitor

路由器关注任务需求而非供应商，确保业务Controller不受底层模型变更影响。

2. 定义任务复杂度

public enum TaskComplexity {
    SIMPLE,   // FAQ、短问答、知识查询
    MEDIUM,   // 摘要、改写、普通文案
    COMPLEX   // 代码生成、长文分析、多步骤推理
}

初期可采用规则判断，后续可升级为轻量级分类模型。

3. 配置多个 ChatClient

application.yml配置示例：

spring:
  ai:
    chat:
      client:
        enabled: false
    openai:
      api-key: ${OPENAI_API_KEY}
    ollama:
      base-url: ai:
  budget:
    monthly: 1000

本地模型需预先拉取：

ollama pull qwen3:4b

多模型配置类实现：

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.ollama.OllamaChatModel;
import org.springframework.ai.ollama.api.OllamaChatOptions;
import org.springframework.ai.openai.OpenAiChatModel;
import org.springframework.ai.openai.OpenAiChatOptions;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;@Configuration
public class MultiModelConfig {    @Bean("simpleClient")
    public ChatClient simpleClient(OllamaChatModel ollamaChatModel) {
        return ChatClient.builder(ollamaChatModel)
            .defaultOptions(OllamaChatOptions.builder()
                .model("qwen3:4b")
                .temperature(0.2)
                .build())
            .build();
    }    @Bean("mediumClient")
    public ChatClient mediumClient(OpenAiChatModel openAiChatModel) {
        return ChatClient.builder(openAiChatModel)
            .defaultOptions(OpenAiChatOptions.builder()
                .model("gpt-4o-mini")
                .temperature(0.3)
                .build())
            .build();
    }    @Bean("complexClient")
    public ChatClient complexClient(OpenAiChatModel openAiChatModel) {
        return ChatClient.builder(openAiChatModel)
            .defaultOptions(OpenAiChatOptions.builder()
                .model("gpt-4o")
                .temperature(0.2)
                .build())
            .build();
    }
}

4. 实现路由器

import java.util.List;import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;@Service
@Slf4j
public class ModelRouter {    private final ChatClient simpleClient;
    private final ChatClient mediumClient;
    private final ChatClient complexClient;    public ModelRouter(
            @Qualifier("simpleClient") ChatClient simpleClient,
            @Qualifier("mediumClient") ChatClient mediumClient,
            @Qualifier("complexClient") ChatClient complexClient) {
        this.simpleClient = simpleClient;
        this.mediumClient = mediumClient;
        this.complexClient = complexClient;
    }    public String route(String userInput) {
        TaskComplexity complexity = assessComplexity(userInput);
        return routeTo(complexity, userInput);
    }    public String routeTo(TaskComplexity complexity, String userInput) {
        return executeWithFallback(fallbackChain(complexity), userInput);
    }    public String routeToSimple(String userInput) {
        return executeWithFallback(List.of(simpleClient), userInput);
    }    public TaskComplexity assessComplexity(String input) {
        int length = input.length();
        String lower = input.toLowerCase();        if (length < 50 &&
            (lower.contains("怎么") || lower.contains("什么是") || lower.contains("请问"))) {
            return TaskComplexity.SIMPLE;
        }        if (length > 200 ||
            lower.contains("实现") || lower.contains("编写") ||
            lower.contains("分析") || lower.contains("比较") ||
            lower.contains("架构") || lower.contains("性能优化")) {
            return TaskComplexity.COMPLEX;
        }        return TaskComplexity.MEDIUM;
    }    private List fallbackChain(TaskComplexity complexity) {
        return switch (complexity) {
            case COMPLEX -> List.of(complexClient, mediumClient, simpleClient);
            case MEDIUM -> List.of(mediumClient, simpleClient);
            case SIMPLE -> List.of(simpleClient);
        };
    }    private String executeWithFallback(List clients, String input) {
        for (int i = 0; i < clients.size(); i++) {
            try {
                return clients.get(i)
                    .prompt()
                    .system("你是企业级 AI 助手。回答要准确、简洁；不确定时说明不确定。")
                    .user(input)
                    .call()
                    .content();
            } catch (Exception e) {
                log.warn("模型调用失败，index={}, reason={}", i, e.getMessage());
            }
        }        throw new IllegalStateException("所有模型调用失败，AI 服务暂时不可用");
    }
}

关键设计原则：fallback链必须保持单向流动，避免循环跳转，确保系统最终收敛。

5. 加上预算降级

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicLong;import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;@Service
@Slf4j
public class CostMonitoredRouter {    private static final Map ESTIMATED_COST = Map.of(
        TaskComplexity.SIMPLE, 0.0,
        TaskComplexity.MEDIUM, 0.0003,
        TaskComplexity.COMPLEX, 0.003
    );    private final ModelRouter router;
    private final double monthlyBudget;
    private final Map callCounts = new ConcurrentHashMap<>();    public CostMonitoredRouter(ModelRouter router,
                               @Value("${ai.budget.monthly:1000}") double monthlyBudget) {
        this.router = router;
        this.monthlyBudget = monthlyBudget;
    }    public String routeWithBudgetCheck(String userInput) {
        TaskComplexity complexity = router.assessComplexity(userInput);
        double nextCost = ESTIMATED_COST.getOrDefault(complexity, 0.001);        if (currentMonthlySpend() + nextCost > monthlyBudget) {
            log.warn("预算即将超限，降级到 simpleClient");
            record(TaskComplexity.SIMPLE);
            return router.routeToSimple(userInput);
        }        String response = router.routeTo(complexity, userInput);
        record(complexity);
        return response;
    }    private void record(TaskComplexity complexity) {
        callCounts.computeIfAbsent(complexity, key -> new AtomicLong()).incrementAndGet();
    }    private double currentMonthlySpend() {
        return callCounts.entrySet().stream()
            .mapToDouble(entry -> entry.getValue().get()
                * ESTIMATED_COST.getOrDefault(entry.getKey(), 0.001))
            .sum();
    }
}

生产环境建议将使用数据持久化存储，便于成本分析和审计。

6. Controller 接入

@RestController
@RequestMapping("/api/ai")
public class AiController {    private final ModelRouter router;
    private final CostMonitoredRouter costRouter;    public AiController(ModelRouter router, CostMonitoredRouter costRouter) {
        this.router = router;
        this.costRouter = costRouter;
    }    @GetMapping("/chat")
    public ResponseEntity chat(@RequestParam String message) {
        return ResponseEntity.ok(router.route(message));
    }    @GetMapping("/chat-with-budget")
    public ResponseEntity chatWithBudget(@RequestParam String message) {
        return ResponseEntity.ok(costRouter.routeWithBudgetCheck(message));
    }
}

采用构造器注入方式，符合Spring最佳实践。

7. 成本收益怎么判断

典型任务分布比例：

60%简单FAQ/短问答
30%摘要改写等中等任务
10%复杂代码分析

将简单任务迁移到本地模型后，月度成本可降低50%-70%。实际收益需根据具体业务数据测算。

上线前需重点验证：

路由准确率：避免复杂任务被错误分配
fallback成功率：确保备用模型有效接管
单位请求成本：按业务维度精细统计

8. 踩坑

第一，模型能力差异需重视

本地模型不适合处理法律财务等高敏感任务，复杂场景仍需强模型或人工审核。

第二，异常处理需完善监控

记录降级路径中的每个环节，便于问题排查和质量分析。

第三，降级策略需保持功能完整

预算超限时仍需返回有效答案，而非简单状态提示。

第四，版本兼容性需验证

Spring AI更新频繁，生产部署前需确认配置项与当前版本匹配。

参考资料

Spring AI ChatClient文档
Spring AI OpenAI Chat文档
Spring AI Ollama Chat文档

通过智能路由实现模型资源的最优配置，不仅能大幅降低运营成本，更能提升系统整体可用性。这种架构设计已成为企业级AI应用从原型走向生产的关键进化路径。

推荐专题

最新下载

热门教程

单模型成本高风险大如何破：Spring AI多模型路由实战-成本直降70%可用性倍增

单模型成本高、风险大？Spring AI多模型路由实战：成本降70%，可用性更稳

1. 架构思路

2. 定义任务复杂度

3. 配置多个 ChatClient

4. 实现路由器

5. 加上预算降级

6. Controller 接入

7. 成本收益怎么判断

8. 踩坑

参考资料

相关文章

热门栏目

php教程

asp.net教程

手机开发

css教程

网页制作

办公数码

jsp教程