Skip to main content
Vox MCP is a multi-model AI gateway that lets you access any AI provider directly from Claude Code, Claude Desktop, Cursor, or any MCP client. Unlike other multi-model tools, Vox uses a pure passthrough design — prompts go to providers unmodified, responses come back unmodified. No system prompt injection, no response formatting, no behavioral directives.
Created by: Xule LinGitHub: linxule/vox-mcpRuntime: Python / uvDesign philosophy: Minimal intervention. The only value Vox adds is routing and conversation memory — everything else is pure passthrough.

Why Vox?

When you’re working in Claude Code and want a second opinion from Gemini, GPT, or DeepSeek, you’d normally have to switch applications. Vox lets you query any model without leaving your current workflow. Key difference from alternatives: Most multi-model tools inject their own system prompts or modify your messages. Vox doesn’t. What you send is what the model receives.

Supported Providers

ProviderEnv VariableExample Models
Google GeminiGEMINI_API_KEYgemini-2.5-pro
OpenAIOPENAI_API_KEYgpt-5.1, gpt-5, o3, o4-mini
AnthropicANTHROPIC_API_KEYclaude-4-opus, claude-4-sonnet
xAIXAI_API_KEYgrok-3, grok-3-fast
DeepSeekDEEPSEEK_API_KEYdeepseek-chat, deepseek-reasoner
Moonshot (Kimi)MOONSHOT_API_KEYkimi-k2-thinking-turbo, kimi-k2.5
OpenRouterOPENROUTER_API_KEYAny OpenRouter model
Custom/LocalCUSTOM_API_URLOllama, vLLM, LM Studio
You only need API keys for providers you want to use. Vox works with any subset.

Core Tools

Vox provides three tools through the MCP protocol:

chat

Send prompts to any supported model with optional file or image attachments.
"Use vox chat with gemini-2.5-pro:
Compare these two theoretical frameworks and identify tensions..."

listmodels

Show all available models, aliases, and capabilities across your configured providers.

dump_threads

Export conversation threads as JSON or Markdown — useful for documenting multi-model analysis.

Multi-Turn Conversations

Vox supports persistent threads via continuation_id. This means you can:
  1. Start a conversation with Gemini about a theoretical framework
  2. Continue the same thread with follow-up questions
  3. Switch to DeepSeek mid-conversation to get a different perspective
  4. Export the entire multi-model dialogue
Threads are shadow-persisted to disk as JSONL for durability and can be exported as Markdown.

Research Workflows

Compare perspectives on the same research question:Ask the same analytical question to 3-4 models and compare their responses. Each model brings different strengths — Claude for nuanced interpretation, Gemini for large-context synthesis, DeepSeek for cost-effective exploration.This is particularly valuable for:
  • Theory development (different models foreground different tensions)
  • Literature gap identification
  • Methodological critique

Setup

1

Clone and install

git clone https://github.com/linxule/vox-mcp.git
cd vox-mcp
uv sync
2

Configure API keys

cp .env.example .env
# Edit .env — add at least one provider API key
3

Test the server

uv run python server.py
4

Add to your MCP client

See the configuration tabs below for your specific client.

MCP Client Configuration

Vox runs as a stdio MCP server. Replace /path/to/vox-mcp with the absolute path to your cloned repo.
Via CLI:
claude mcp add vox-mcp \
  -e GEMINI_API_KEY=your-key-here \
  -- uv run --directory /path/to/vox-mcp python server.py
Or add to .mcp.json in your project root:
{
  "mcpServers": {
    "vox-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/vox-mcp", "python", "server.py"],
      "env": {
        "GEMINI_API_KEY": "your-key-here"
      }
    }
  }
}
API keys can live in either the MCP client config or the .env file inside the vox-mcp directory (loaded automatically). If both are set and conflict, add VOX_FORCE_ENV_OVERRIDE=true to .env to prefer your local values.

Configuration Options

Beyond API keys, Vox supports several configuration options in .env:

DEFAULT_MODEL

Set to auto (default) to let the agent pick the best model, or specify a model name like gemini-2.5-pro to always route to that model.

CONVERSATION_TIMEOUT_HOURS

How long conversation threads stay alive. Default: 24 hours. Threads expire after this period of inactivity.

MAX_CONVERSATION_TURNS

Maximum number of turns per conversation thread. Default: 100. Prevents runaway threads from consuming memory.

Model Restrictions

Per-provider allowlists like GOOGLE_ALLOWED_MODELS, OPENAI_ALLOWED_MODELS, etc. Restrict which models are available to prevent accidental use of expensive models.
See .env.example in the repository for the full reference of all configuration options.

Part of the Research Memex Ecosystem

Vox integrates naturally with other Research Memex tools: