MCP Servers

DeepThonk: OpenDeepThink for Agents

A TypeScript CLI and MCP server that wraps hard tasks in OpenDeepThink-style candidate generation, pairwise judging, mutation, and ranking.

When one answer is too brittle, make the model argue with alternatives.

DeepThonk implements the OpenDeepThink algorithm as a provider-neutral CLI and MCP server. It runs a population of candidate answers through pairwise judging, Bradley-Terry ranking, critique-guided mutation, elite preservation, and a final dense ranking pass. The point is not speed. It is spending controlled test-time compute on tasks where breadth plus judgment can beat one expensive single shot.

Info

Created by: Xule Lin

GitHub: linxule/deepthonk

Runtime: TypeScript / Node.js

Package: deepthonk on npm, with dt as the short CLI alias

Algorithm source: OpenDeepThink, Zhou et al. (arXiv:2605.15177)

Host Support

DeepThonk exposes the same engine through a shell CLI and an MCP server. Use the CLI when you want a visible run directory. Use MCP when you want an agent to plan, start, poll, inspect, and export runs from inside its own workspace.

Host	Status	How to add
Claude Code (CLI / Desktop)	Full	`claude mcp add deepthonk -- npx -y deepthonk serve-mcp --transport stdio`
Codex CLI / Codex Desktop	Full	`codex mcp add deepthonk -- npx -y deepthonk serve-mcp --transport stdio`
Claude Desktop (chat)	Full	Add an `npx -y deepthonk serve-mcp --transport stdio` server to `claude_desktop_config.json`
Cursor / VS Code / Windsurf	Full	Standard MCP stdio config
Any shell	Full CLI	`npx -y deepthonk ...` or `npm install -g deepthonk`

Provider API keys come from the host process environment. DeepSeek is first-class; OpenAI-compatible endpoints and OpenRouter are configurable.

When to Use DeepThonk

Hard reasoning tasks where a single response collapses too early.
Coding and debugging plans that benefit from multiple candidate approaches before implementation.
Literature synthesis where you want alternative framings ranked against an explicit rubric.
Methodology design where critique-guided mutation can surface better versions of an initial design.

Avoid it for tiny questions, subjective taste calls, or tasks where judge noise is likely to dominate. DeepThonk spends many model calls by design; plan the budget before running.

Agent-Readable Surface

The MCP server is intentionally inspectable. An agent can:

Surface	Purpose
`deepthonk.plan`	Estimate calls and rounds before spending money
`deepthonk.start` / `deepthonk.status` / `deepthonk.result`	Run long jobs asynchronously
`deepthonk.run`	Blocking convenience for shorter jobs
`deepthonk.rank`	Rank your own candidate set without generation
`deepthonk.mutate`	Improve one candidate with critique
`deepthonk.export`	Export run summaries or full traces
`deepthonk://runs/...` resources	Inspect config, candidates, comparisons, scores, usage, and traces

That traceability matters for research work. If a synthesis improves, you can inspect the candidates and judgments that got it there instead of treating the final answer as magic.

Quick Start

Run without installing:

npx -y deepthonk plan --profile paper
npx -y deepthonk run --provider fake --profile quick \
  --task "Find the smallest positive integer divisible by 3, 4, and 5." \
  --out runs/test-quick
npx -y deepthonk inspect runs/test-quick

For paid providers, configure a reusable profile first:

deepthonk setup \
  --provider deepseek \
  --api-key-env DEEPSEEK_API_KEY \
  --fast-model deepseek-v4-flash \
  --judge-model deepseek-v4-pro

Then plan before you run:

deepthonk plan --config ~/.config/deepthonk/config.yaml
deepthonk run --task task.md --config ~/.config/deepthonk/config.yaml --profile quick --dry-run

Relationship to the Stack

DeepThonk sits next to Sequential Thinking, not underneath it. Sequential Thinking gives a single model a structured scratchpad. DeepThonk creates and ranks many attempts. Use Sequential Thinking when you want one transparent reasoning chain; use DeepThonk when you want search, mutation, judging, and a trace of alternatives.

It also pairs naturally with Vox: Vox gives you access to many models; DeepThonk gives you a repeatable protocol for spending more compute on a hard task.

Carrel status: DeepThonk is not installed by Carrel today. Add it manually when a project needs OpenDeepThink-style test-time compute.