LLM
2025-07-09
内网搭建LLM服务架构
+-----------------------------+
| Frontend UI | ←→ 用户
| (Open WebUI) |
+-----------------------------+
│
▼
+-----------------------------+
| API Gateway | ←→ 鉴权 / 限流 / 多模型路由
| (LiteLLM) |
+-----------------------------+
│
▼
+-----------------------------+
| LLM Inference Layer | ←→ 多实例,支持横向扩容
| (vLLM / TGI / Ollama 等) |
+-----------------------------+
│
▼
+-----------------------------+
| Model Store | ←→ 存模型文件、本地磁盘或对象存储
+-----------------------------+
https://github.com/open-webui/open-webui
https://github.com/vllm-project/vllm
litellm
启动
litellm –config config.yaml –port 5000
config.yaml 配置:
# 这是一个示例的 litellm 配置文件 config.yaml
litellm_settings:
drop_params: true # 是否丢弃未使用的参数
enable_stream: true # 是否启用流式输出
model_list:
- model_name: GLM-4-Flash-250414
litellm_params:
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: your_api_key_here
model: openai/GLM-4-Flash-250414
temperature: 0.7 # 模型温度参数
- model_name: glm-4v-flash
litellm_params:
api_base: https://open.bigmodel.cn/api/paas/v4/
api_key: your_api_key_here
model: openai/glm-4v-flash
max_tokens: 150 # 最大输出token数
- model_name: claude-3-5-haiku-20241022
litellm_params:
api_base: https://api.siliconflow.cn/v1
api_key: your_api_key_here
model: openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- model_name: DeepSeek-R1-0528-Qwen3-8B
litellm_params:
api_base: https://api.siliconflow.cn/v1
api_key: your_api_key_here
model: openai/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
# - model_name: claude-3-5-haiku-20241022
# 给claude-code使用这里可以用这个模型名称
- model_name: deepseek-v3-250324
litellm_params:
model: volcengine/deepseek-v3-250324
api_key: your_api_key_here
- model_name: deepseek-r1-250528
litellm_params:
model: volcengine/deepseek-r1-250528
api_key: your_api_key_here
- model_name: gemma-3-4b-it
litellm_params:
model: hosted_vllm/google/gemma-3-4b-it
api_base: http://172.16.1.166:8000/v1
provider: google
- model_name: gemini-2.5-flash
litellm_params:
model: gemini/gemini-2.5-flash-preview-04-17
api_key: your_api_key_here
- model_name: kimi-dev-72b
litellm_params:
model: openrouter/moonshotai/kimi-dev-72b:free
api_key: your_api_key_here
- model_name: deepseek-r1t2-chimera
litellm_params:
model: openrouter/tngtech/deepseek-r1t2-chimera:free
api_key: your_api_key_here
- model_name: deepseek-r1
litellm_params:
model: openrouter/deepseek/deepseek-r1:free
api_key: your_api_key_here
- model_name: mistral-small-3.2-24b-instruct-2506:free
litellm_params:
model: openrouter/mistralai/mistral-small-3.2-24b-instruct-2506:free
api_key: your_api_key_here
api_base: https://openrouter.ai/api/v1
FIM
continue、copilot、fitten code、Amazon Q
continue 插件
~/.continue/config.yaml
name: Local Assistant
version: 1.0.0
schema: v1
models:
- name: GLM-4-Flash-250414
provider: openai
model: GLM-4-Flash-250414
apiKey: your_api_key_here
apiBase: https://open.bigmodel.cn/api/paas/v4/
- name: a100/GLM-4-Flash-250414
provider: openai
model: GLM-4-Flash-250414
apiKey: your_api_key_here
apiBase: http://172.16.1.166:4000/v1
- name: a100/gemma-3-4b-it
provider: openai
model: google/gemma-3-4b-it
apiKey: your_api_key_here
apiBase: http://172.16.1.166:8000/v1
roles:
- autocomplete
- chat
- embed
max-tokens: 3000
- name: a100/火山引擎/deepseek-v3-250324
provider: deepseek
model: deepseek-v3-250324
apiKey: your_api_key_here
apiBase: http://172.16.1.166:4000/v1
- name: a100/火山引擎/DeepSeek-R1-0528-Qwen3-8B
provider: openai
model: DeepSeek-R1-0528-Qwen3-8B
apiKey: your_api_key_here
apiBase: http://172.16.1.166:4000/v1
- name: 硅基流动/DeepSeek-R1-Distill-Qwen-7B
provider: siliconflow
model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
apiKey: your_api_key_here
apiBase: https://api.siliconflow.cn/v1
- name: FIM/硅基流动/Qwen2.5-Coder-7B-Instruct
provider: siliconflow
model: Qwen/Qwen2.5-Coder-7B-Instruct
apiKey: your_api_key_here
apiBase: https://api.siliconflow.cn/v1
roles:
- autocomplete
- name: Gemini 2.5 Pro Experimental
provider: gemini
model: gemini-2.5-pro-exp-03-25
apiKey: your_api_key_here
- name: a100-vllm/gemma-3-4b-it
provider: openai
model: google/gemma-3-4b-it
apiKey: your_api_key_here
apiBase: http://172.16.1.166:8000/v1
chatOptions:
baseSystemMessage: >-
<important_rules>
You are in chat mode.
If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.
If needed consisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.
Always include the language and file name in the info string when you write code blocks.
If you are editing "src/main.py" for example, your code block should start with '```python src/main.py'
When addressing code modification requests, present a concise code snippet that
emphasizes only the necessary changes and uses abbreviated placeholders for
unmodified sections. For example:
```language /path/to/file
// ... existing code ...
// ... existing code ...
// ... rest of code ...
```
In existing files, you should always restate the function or class that the snippet belongs to:
```language /path/to/file
// ... existing code ...
function exampleFunction() {
// ... existing code ...
// ... rest of function ...
}
// ... rest of code ...
```
Since users have access to their complete file, they prefer reading only the
relevant modifications. It's perfectly acceptable to omit unmodified portions
at the beginning, middle, or end of files using these "lazy" comments. Only
provide the complete file when explicitly requested. Include a concise explanation
of changes unless the user specifically asks for code only.
</important_rules>
You are an expert software developer. You give helpful and concise
responses.
用中文回答问题
roles:
- autocomplete
- chat
- embed
- name: openrouter/deepseek-r1t2-chimera:free
provider: openai
model: tngtech/deepseek-r1t2-chimera:free
apiBase: https://openrouter.ai/api/v1
apiKey: your_api_key_here
- name: openrouter/mistral-small-3.2-24b-instruct-2506:free
provider: openai
model: mistralai/mistral-small-3.2-24b-instruct-2506:free
apiKey: your_api_key_here
apiBase: https://openrouter.ai/api/v1
roles:
- chat
- embed
max-tokens: 3000
context:
- provider: code
- provider: docs
- provider: diff
- provider: terminal
- provider: problems
- provider: folder
- provider: codebase
code
claude code
https://docs.anthropic.com/zh-CN/docs/claude-code/settings
切换到litellm
~/.claude/settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "http://127.0.0.1:4000",
"ANTHROPIC_AUTH_TOKEN": "sk-litellm-static-key",
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": "16384"
},
"model": "claude-3-5-haiku-20241022"
}
windows下使用claude code
1.0.51 (Claude Code) 以上版本支持windows了
需在添加用户环境变量:
CLAUDE_CODE_GIT_BASH_PATH=C:\Program Files\Git\bin\bash.exe