Skip to content

Core References

The Failure Museum: A Guide to AI Limitations

An essential guide to common AI failure modes in academic research, with practical mitigation strategies for maintaining rigor and quality.

Reference page for: AI failure modes and verification practice. Other pages link here for the catalogue of known limits.

Warning

The Irony: This page about AI failures was written by an AI (Claude). Yes, I'm documenting my own failure modes. Yes, this is meta. Yes, some of these failures might disappear as models improve (or new ones might emerge). The epistemic situation gets trickier as AIs get better at hiding their limitations. But that's the point - awareness of failure modes is how we maintain rigor.

Info

The Mirror Effect in Action: When we get generic AI responses, it often reveals gaps in our structured thinking, not just the AI's limitations. The Failure Museum is a mirror that speaks to the mirror - failures are diagnostic tools. We use them to improve our research thinking, and we encourage you to do the same!

Remember: Failure is data, not shame. Every failure mode documented here represents learning. We're sharing what we've discovered, and we expect you'll discover patterns we haven't yet encountered.


Exhibit Guide

Jump to specific failure modes:


Failure Detection Process

πŸ€– AI GENERATES OUTPUT
    |
    v
πŸ‘οΈ CRITICAL READING
    |
    v
🚩 RED FLAGS CHECK
    |
    +-- Generic language? ────→ 🎭 Generic Failure
    +-- Missing citations? ───→ 🌈 Hallucination
    +-- Too smooth? ──────────→ 🧩 Coherence Fallacy
    +-- Decontextualized? ────→ πŸ“ Context Stripping
    +-- None? ────────────────→ πŸ” VERIFICATION
                                   |
                                   +-- Citations valid?
                                   +-- Logic consistent?
                                   +-- Context clear?
                                   |
                                   v
                        βœ… ACCEPT or πŸ“ DOCUMENT FAILURE
                                   |
                                   v
                              πŸ”§ REVISE PROMPT
                                   |
                                   +─→ (back to AI)

Tip

Copy-pasteable workflow! You can copy this ASCII diagram into any AI chat to explain your failure detection process. It works everywhere - terminals, code, plain text!

The Pattern: (1) AI generates output β†’ (2) Critical reading spots red flags β†’ (3) Verification checks β†’ (4) Prompt refinement. Spotting failures early saves time!


Common Failure Modes


How to Use This Museum

Before Each AI Session:

  1. Review 2-3 failure modes most relevant to your current task.
  2. Prepare specific mitigation prompts.
  3. Set up verification protocols (e.g., which databases will you use to check citations?).

During AI Interactions:

  1. Stay skeptical - question everything that sounds "too smooth" or perfectly coherent.
  2. Demand specificity - ask for page numbers, exact quotes, and DOIs.
  3. Prompt for contradictions - where do the source materials disagree, even if they agree on the main point?
  4. Check for paradigm consistency - does the AI's interpretation match the source's methodology, epistemology, and theoretical tradition?

After AI Analysis:

  1. Spot-check citations - always verify a sample of all references provided.
  2. Cross-check key claims against the original sources.
  3. Look for missing nuance - what debates, tensions, or paradoxes were smoothed over?
  4. Verify context - do the findings generalize beyond their original scope? What are the boundary conditions? Are there any tensions around the underlying epistemology or ontology that were smoothed over?

Advanced Failure Patterns

The Echo Chamber Effect

AI may amplify your existing biases by finding sources that confirm your preconceptions while missing contradictory evidence.

The Recency Bias

AI may overweight recent papers while missing foundational works that establish key concepts.

The Language Model Bias

AI trained primarily on English-language sources may miss important non-English research traditions.


Remember: AI as a Research Partner, Not an Oracle

The goal isn't to avoid AI because it fails - it's to understand how it fails so you can:

  • Design better prompts that minimize failure modes.
  • Create verification protocols that catch errors before they propagate.
  • Maintain critical distance from AI-generated outputs.
  • Combine AI efficiency with human judgment for robust research.

Your expertise as a researcher isn't diminished by using AI - it's enhanced by knowing how to use it skillfully and critically.


Verification Protocol

Level 1: Surface Check

Quick scan for obvious issues:

  • Generic language or vague assertions
  • Missing citations or suspicious dates
  • Implausibly perfect coherence
  • Grammatical errors or awkward phrasing
  • Time: 2-3 minutes
  • Pass rate: Catches ~40% of problems

Level 2: Citation Verification

Cross-reference all sources:

  • Check each citation in Google Scholar or Zotero
  • Verify authors, years, and titles match
  • Confirm page numbers align with claims
  • Look up DOIs and ensure papers exist
  • Time: 10-15 minutes
  • Pass rate: Catches ~80% of problems

Level 3: Logic & Consistency

Deep analytical review:

  • Trace arguments for logical consistency
  • Check for paradigm alignment
  • Verify contextual appropriateness
  • Compare with your own reading of sources
  • Test for alternative interpretations
  • Time: 20-30 minutes
  • Pass rate: Catches ~95% of problems

Level 4: Expert Review

Final quality gate:

  • Consult with advisor or peer
  • Present to research group
  • Compare with published standards
  • Seek critical feedback
  • Iterate based on expert input
  • Time: Variable
  • Pass rate: Publication-ready quality

Warning

Never skip verification: The time saved by AI is lost if you publish flawed work. Build verification into your workflow from the start.