Skip to content

MCP Servers

DeepThonk: OpenDeepThink for Agents

A TypeScript CLI and MCP server that wraps hard tasks in OpenDeepThink-style candidate generation, pairwise judging, mutation, and ranking.

When one answer is too brittle, make the model argue with alternatives.

DeepThonk implements the OpenDeepThink algorithm as a provider-neutral CLI and MCP server. It runs a population of candidate answers through pairwise judging, Bradley-Terry ranking, critique-guided mutation, elite preservation, and a final dense ranking pass. The point is not speed. It is spending controlled test-time compute on tasks where breadth plus judgment can beat one expensive single shot.

Info

Created by: Xule Lin

GitHub: linxule/deepthonk

Runtime: TypeScript / Node.js

Package: deepthonk on npm, with dt as the short CLI alias

Algorithm source: OpenDeepThink, Zhou et al. (arXiv:2605.15177)


Host Support

DeepThonk exposes the same engine through a shell CLI and an MCP server. Use the CLI when you want a visible run directory. Use MCP when you want an agent to plan, start, poll, inspect, and export runs from inside its own workspace.

HostStatusHow to add
Claude Code (CLI / Desktop)Fullclaude mcp add deepthonk -- npx -y deepthonk serve-mcp --transport stdio
Codex CLI / Codex DesktopFullcodex mcp add deepthonk -- npx -y deepthonk serve-mcp --transport stdio
Claude Desktop (chat)FullAdd an npx -y deepthonk serve-mcp --transport stdio server to claude_desktop_config.json
Cursor / VS Code / WindsurfFullStandard MCP stdio config
Any shellFull CLInpx -y deepthonk ... or npm install -g deepthonk

Provider API keys come from the host process environment. DeepSeek is first-class; OpenAI-compatible endpoints and OpenRouter are configurable.


When to Use DeepThonk

  • Hard reasoning tasks where a single response collapses too early.
  • Coding and debugging plans that benefit from multiple candidate approaches before implementation.
  • Literature synthesis where you want alternative framings ranked against an explicit rubric.
  • Methodology design where critique-guided mutation can surface better versions of an initial design.

Avoid it for tiny questions, subjective taste calls, or tasks where judge noise is likely to dominate. DeepThonk spends many model calls by design; plan the budget before running.


Agent-Readable Surface

The MCP server is intentionally inspectable. An agent can:

SurfacePurpose
deepthonk.planEstimate calls and rounds before spending money
deepthonk.start / deepthonk.status / deepthonk.resultRun long jobs asynchronously
deepthonk.runBlocking convenience for shorter jobs
deepthonk.rankRank your own candidate set without generation
deepthonk.mutateImprove one candidate with critique
deepthonk.exportExport run summaries or full traces
deepthonk://runs/... resourcesInspect config, candidates, comparisons, scores, usage, and traces

That traceability matters for research work. If a synthesis improves, you can inspect the candidates and judgments that got it there instead of treating the final answer as magic.


Quick Start

Run without installing:

npx -y deepthonk plan --profile paper
npx -y deepthonk run --provider fake --profile quick \
  --task "Find the smallest positive integer divisible by 3, 4, and 5." \
  --out runs/test-quick
npx -y deepthonk inspect runs/test-quick

For paid providers, configure a reusable profile first:

deepthonk setup \
  --provider deepseek \
  --api-key-env DEEPSEEK_API_KEY \
  --fast-model deepseek-v4-flash \
  --judge-model deepseek-v4-pro

Then plan before you run:

deepthonk plan --config ~/.config/deepthonk/config.yaml
deepthonk run --task task.md --config ~/.config/deepthonk/config.yaml --profile quick --dry-run

Relationship to the Stack

DeepThonk sits next to Sequential Thinking, not underneath it. Sequential Thinking gives a single model a structured scratchpad. DeepThonk creates and ranks many attempts. Use Sequential Thinking when you want one transparent reasoning chain; use DeepThonk when you want search, mutation, judging, and a trace of alternatives.

It also pairs naturally with Vox: Vox gives you access to many models; DeepThonk gives you a repeatable protocol for spending more compute on a hard task.

Carrel status: DeepThonk is not installed by Carrel today. Add it manually when a project needs OpenDeepThink-style test-time compute.