Core References

AI Model Discovery Protocol

A four-phase protocol for sampling AI models against your own research, building task-model matches, testing control surfaces, and developing a personal strategy

The AI Model Reference Guide tells you what's available. This page is the protocol for finding what fits you: a structured experiment against your own research materials.

Tip

Read this if: You have access to several frontier or specialist models and want a defensible way to decide which ones serve your research, not someone else's benchmark.

Getting Started

Phase 1: Capability Discovery

Run a one- or two-day sampling pass.

Choose one real task: theory synthesis, methods critique, extraction, or writing revision.
Use the same prompt and evidence pack across current model families:
- GPT-5.5 / GPT-5.4 mini
- Claude Opus 4.8 / Claude Sonnet 4.6 / Claude Haiku 4.5
- Gemini 3.5 Flash / Gemini 3.1 Pro preview
- DeepSeek V4-era models
- Kimi K2.7 Code / K2.6
- GLM-5.2
- Qwen3.5 / Grok 4.3 where available
Record differences in style, depth, accuracy, and failure modes.
Note the access path: web app, API key, Cherry Studio, OpenCode, Vox, Claude Code, or Antigravity CLI.

Phase 2: Control Surface Testing

Do not assume temperature is available or useful.

Start with provider defaults.
Test reasoning controls where supported:
- OpenAI / xAI / GLM: reasoning effort.
- Claude: effort, with provider sampling constraints.
- Gemini 3.x: thinking_level, not temperature/top-p/top-k.
- Kimi K2.7/K2.6: default temperature; thinking behavior depends on model.
Use Sequential Thinking MCP when you want visible, revisable steps across models.
Only test temperature on models whose current docs support it.

Phase 3: Task Matching

Build a small table for your own work:

Task	Best model	Backup	Why
Screening
Extraction
Theory synthesis
Methods critique
Draft revision
Code/data analysis

The "why" column matters. If you cannot explain why a model won, keep testing.

Evaluation Notes

For every test, save:

The exact prompt.
The model and access path.
The evidence pack or files used.
The control settings, especially reasoning effort or thinking level.
The failure mode, not just the useful output.

Info

The current pattern is defaults first, reasoning controls second, sampling last. Temperature still matters for some providers, such as DeepSeek, but it is no longer the universal first knob.

Once you know which models fit your work, wire them into the workspace you actually use: Cherry Studio, OpenCode, Vox MCP, Claude Code, or Antigravity CLI.

Getting Started

Phase 1: Capability Discovery

Phase 2: Control Surface Testing

Phase 3: Task Matching

Phase 4: Personal Strategy

Evaluation Notes