Mastering MCP Servers
Vox MCP: Multi-Model AI Gateway
Access 8+ AI providers from any MCP client — pure passthrough with no system prompt injection. Created by Xule Lin.
Vox MCP is a multi-model AI gateway that lets you access any AI provider directly from Claude Code, Claude Desktop, Cursor, or any MCP client. Unlike other multi-model tools, Vox uses a pure passthrough design — prompts go to providers unmodified, responses come back unmodified. No system prompt injection, no response formatting, no behavioral directives.
Info
Created by: Xule Lin
GitHub: linxule/vox-mcp
Runtime: Python / uv
Design philosophy: Minimal intervention. The only value Vox adds is routing and conversation memory — everything else is pure passthrough.
Why Vox?
When you're working in Claude Code and want a second opinion from Gemini, GPT, or DeepSeek, you'd normally have to switch applications. Vox lets you query any model without leaving your current workflow.
Key difference from alternatives: Most multi-model tools inject their own system prompts or modify your messages. Vox doesn't. What you send is what the model receives.
Supported Providers
| Provider | Env Variable | Example Models |
|---|---|---|
| Google Gemini | GEMINI_API_KEY | gemini-2.5-pro |
| OpenAI | OPENAI_API_KEY | gpt-5.1, gpt-5, o3, o4-mini |
| Anthropic | ANTHROPIC_API_KEY | claude-4-opus, claude-4-sonnet |
| xAI | XAI_API_KEY | grok-3, grok-3-fast |
| DeepSeek | DEEPSEEK_API_KEY | deepseek-chat, deepseek-reasoner |
| Moonshot (Kimi) | MOONSHOT_API_KEY | kimi-k2-thinking-turbo, kimi-k2.5 |
| OpenRouter | OPENROUTER_API_KEY | Any OpenRouter model |
| Custom/Local | CUSTOM_API_URL | Ollama, vLLM, LM Studio |
You only need API keys for providers you want to use. Vox works with any subset.
Core Tools
Vox provides three tools through the MCP protocol:
chat
Send prompts to any supported model with optional file or image attachments.
"Use vox chat with gemini-2.5-pro:
Compare these two theoretical frameworks and identify tensions..."listmodels
Show all available models, aliases, and capabilities across your configured providers.
dump_threads
Export conversation threads as JSON or Markdown — useful for documenting multi-model analysis.
Multi-Turn Conversations
Vox supports persistent threads via continuation_id. This means you can:
- Start a conversation with Gemini about a theoretical framework
- Continue the same thread with follow-up questions
- Switch to DeepSeek mid-conversation to get a different perspective
- Export the entire multi-model dialogue
Threads are shadow-persisted to disk as JSONL for durability and can be exported as Markdown.
Research Workflows
Compare perspectives on the same research question:
Ask the same analytical question to 3-4 models and compare their responses. Each model brings different strengths — Claude for nuanced interpretation, Gemini for large-context synthesis, DeepSeek for cost-effective exploration.
This is particularly valuable for:
- Theory development (different models foreground different tensions)
- Literature gap identification
- Methodological critique
Setup
Clone and install
git clone https://github.com/linxule/vox-mcp.git
cd vox-mcp
uv syncConfigure API keys
cp .env.example .env
# Edit .env — add at least one provider API keyTest the server
uv run python server.pyAdd to your MCP client
See the configuration tabs below for your specific client.
MCP Client Configuration
Vox runs as a stdio MCP server. Replace /path/to/vox-mcp with the absolute path to your cloned repo.
Via CLI:
claude mcp add vox-mcp \
-e GEMINI_API_KEY=your-key-here \
-- uv run --directory /path/to/vox-mcp python server.pyOr add to .mcp.json in your project root:
{
"mcpServers": {
"vox-mcp": {
"command": "uv",
"args": ["run", "--directory", "/path/to/vox-mcp", "python", "server.py"],
"env": {
"GEMINI_API_KEY": "your-key-here"
}
}
}
}Tip
API keys can live in either the MCP client config or the .env file inside the vox-mcp directory (loaded automatically). If both are set and conflict, add VOX_FORCE_ENV_OVERRIDE=true to .env to prefer your local values.
Configuration Options
Beyond API keys, Vox supports several configuration options in .env:
DEFAULT_MODEL
Set to auto (default) to let the agent pick the best model, or specify a model name like gemini-2.5-pro to always route to that model.
CONVERSATION_TIMEOUT_HOURS
How long conversation threads stay alive. Default: 24 hours. Threads expire after this period of inactivity.
MAX_CONVERSATION_TURNS
Maximum number of turns per conversation thread. Default: 100. Prevents runaway threads from consuming memory.
Model Restrictions
Per-provider allowlists like GOOGLE_ALLOWED_MODELS, OPENAI_ALLOWED_MODELS, etc. Restrict which models are available to prevent accidental use of expensive models.
Warning
See .env.example in the repository for the full reference of all configuration options.
Part of the Research Memex Ecosystem
Vox integrates naturally with other Research Memex tools:
- Interpretive Orchestration Plugin — Multi-model triangulation during qualitative analysis
- Claude Code Setup Guide — Your primary research environment
- AI Model Reference Guide — Understanding which models to query for what