Skip to main content
The Irony: This page about AI failures was written by an AI (Claude). Yes, I’m documenting my own failure modes. Yes, this is meta. Yes, some of these failures might disappear as models improve (or new ones might emerge). The epistemic situation gets trickier as AIs get better at hiding their limitations. But that’s the point - awareness of failure modes is how we maintain rigor.
The Mirror Effect in Action: When we get generic AI responses, it often reveals gaps in our structured thinking, not just the AI’s limitations. The Failure Museum is a mirror that speaks to the mirror - failures are diagnostic tools. We use them to improve our research thinking, and we encourage you to do the same!Remember: Failure is data, not shame. Every failure mode documented here represents learning. We’re sharing what we’ve discovered, and we expect you’ll discover patterns we haven’t yet encountered.

Exhibit Guide

Jump to specific failure modes:

Failure Detection Process

🤖 AI GENERATES OUTPUT
    |
    v
👁️ CRITICAL READING
    |
    v
🚩 RED FLAGS CHECK
    |
    +-- Generic language? ────→ 🎭 Generic Failure
    +-- Missing citations? ───→ 🌈 Hallucination
    +-- Too smooth? ──────────→ 🧩 Coherence Fallacy
    +-- Decontextualized? ────→ 📍 Context Stripping
    +-- None? ────────────────→ 🔍 VERIFICATION
                                   |
                                   +-- Citations valid?
                                   +-- Logic consistent?
                                   +-- Context clear?
                                   |
                                   v
                        ✅ ACCEPT or 📝 DOCUMENT FAILURE
                                   |
                                   v
                              🔧 REVISE PROMPT
                                   |
                                   +─→ (back to AI)
Copy-pasteable workflow! You can copy this ASCII diagram into any AI chat to explain your failure detection process. It works everywhere - terminals, code, plain text!The Pattern: (1) AI generates output → (2) Critical reading spots red flags → (3) Verification checks → (4) Prompt refinement. Spotting failures early saves time!

Common Failure Modes

The Failure

The AI generates a plausible-sounding citation that doesn’t exist, often combining a real author’s name with a real journal and a fitting (but fake) title.

Example (Bad)

“As Barney (1991) noted in his follow-up in Strategic Management Journal, the inimitability of resources also depends on the firm’s dynamic capabilities framework integration.”
What’s Wrong: While Barney did write about resource inimitability, there is no 1991 follow-up paper in SMJ with this exact focus.

Prevention Strategies

  • Always verify every single citation with your Zotero library or Google Scholar
  • Check publication years and cross-reference with known works
  • Use specific prompts: “Provide exact page numbers and DOIs for all citations”
  • Ask AI to flag any citations it’s uncertain about

Detection Tips

  • Citations that sound “too perfect” for your argument
  • Dates that don’t align with author’s career timeline
  • Titles that use modern terminology for older papers

The Failure

The AI interprets a paper from a critical or interpretive paradigm through a purely positivist lens, missing the epistemological nuance.

Example (Bad)

“The study found that the key variables influencing technology adoption were the network, the actors, and the technology itself…”
What’s Wrong: An Actor-Network Theory paper isn’t about “variables” affecting “outcomes” - it’s about relational ontology and performativity.

Prevention Strategies

  • Prime for paradigm awareness: “From an interpretive perspective, what are the key sensemaking processes…”
  • Ask explicitly about ontological and epistemological framing
  • Request clarification of the paper’s theoretical tradition
  • Compare with papers from different paradigms

Detection Tips

  • Statistical language applied to qualitative studies
  • “Variables” and “outcomes” used for interpretive work
  • Missing discussion of researcher reflexivity
  • Lack of attention to meaning-making processes

The Failure

The AI synthesizes contradictory findings into a single, smooth paragraph that masks the underlying academic debate.

Example (Bad)

“Research shows that organizational slack is beneficial for innovation (Bourgeois, 1981), as it provides resources for experimentation…”
What’s Wrong: This presents a false consensus, smoothing over decades of complex debate about optimal slack levels, types of slack, and contingency factors.

Prevention Strategies

  • Prompt for contradictions: “Where do these authors disagree with each other?”
  • Ask for tensions and boundary conditions explicitly
  • Request: “What debates exist in this literature?”
  • Demand synthesis of DISAGREEMENT, not just agreement

Detection Tips

  • Suspiciously smooth narratives
  • Lack of “however” or “in contrast” statements
  • No mention of competing theories
  • Everyone seemingly agrees

The Failure

The AI extracts a finding from its original context (e.g., a study on large manufacturing firms in the 1980s) and presents it as a general, universal truth.

Example (Bad)

“Research shows that organizational learning requires cross-functional teams.”
What’s Wrong: Missing context: This finding was from software development firms in Silicon Valley, 2010-2015. May not generalize to other industries, regions, or time periods.

Prevention Strategies

  • Always ask for scope: “What is the context of this study (industry, firm size, geography, time period)?”
  • Probe generalizability: “Has this been replicated in other contexts?”
  • Request boundary conditions: “Where would this NOT apply?”
  • Check for contextual caveats in the original paper

Detection Tips

  • Broad claims without qualifiers
  • Missing sample characteristics
  • No discussion of generalizability limits
  • Findings presented as universal laws

The Failure

When asked to define a complex construct, the AI blends multiple definitions into a single, generic, and often meaningless “average” definition that satisfies no particular theoretical tradition.

Example (Bad)

“Organizational culture is the shared values, beliefs, and assumptions that guide behavior in organizations.”
What’s Wrong: This bland definition obscures important theoretical distinctions between Schein’s levels model, Martin’s fragmentation perspective, and Hofstede’s dimensions.

Prevention Strategies

  • Ask for definitional variety: “How have different authors defined organizational culture? Present their definitions in a table.”
  • Request theoretical grounding: “What are the competing conceptualizations?”
  • Probe assumptions: “What does each definition assume about culture’s nature?”
  • Compare and contrast approaches explicitly

Detection Tips

  • Definitions that sound like textbook boilerplate
  • No attribution to specific theorists
  • Missing theoretical tensions or debates
  • One-size-fits-all explanations

The Failure

The AI suggests analytical approaches that don’t match the paper’s actual methodology or recommends methods incompatible with the epistemological stance.

Example (Bad)

“To test these findings, future research could use structural equation modeling to identify the causal relationships…”
(In response to a grounded theory paper about sensemaking processes)What’s Wrong: Suggesting a positivist quantitative method for extending an interpretivist qualitative study violates paradigm consistency.

Prevention Strategies

  • Ask about methodology alignment: “What methods would be consistent with this paper’s approach?”
  • Verify paradigm consistency: “Would the original authors recommend this?”
  • Request epistemological grounding for suggestions
  • Compare methodological affordances and constraints

Detection Tips

  • Quantitative methods suggested for interpretive studies
  • Positivist language (variables, causation) for constructivist work
  • Generalization emphasis for context-specific findings
  • Ignoring methodological limitations stated by authors

The Failure

The AI incorrectly identifies who cited whom, misattributes ideas to the wrong authors, or confuses the intellectual genealogy of concepts.

Example (Bad)

“Porter introduced the concept of dynamic capabilities in his 1980 work on competitive strategy.”
What’s Wrong: Dynamic capabilities were developed by Teece, Pisano, and Shuen (1997), not Porter. Porter (1980) focused on competitive forces.

Prevention Strategies

  • Verify attribution: “Who originally developed this concept? Provide the exact citation.”
  • Check intellectual genealogy: “Who built on this idea first?”
  • Request chronological accuracy: “What’s the timeline of this concept’s development?”
  • Cross-reference with your Zotero library

Detection Tips

  • Anachronistic attributions (recent concepts to old papers)
  • Conflation of related but distinct concepts
  • Missing key contributors to a theoretical tradition
  • Simplified genealogies that skip important developments

How to Use This Museum

Before Each AI Session:

  1. Review 2-3 failure modes most relevant to your current task.
  2. Prepare specific mitigation prompts.
  3. Set up verification protocols (e.g., which databases will you use to check citations?).

During AI Interactions:

  1. Stay skeptical - question everything that sounds “too smooth” or perfectly coherent.
  2. Demand specificity - ask for page numbers, exact quotes, and DOIs.
  3. Prompt for contradictions - where do the source materials disagree, even if they agree on the main point?
  4. Check for paradigm consistency - does the AI’s interpretation match the source’s methodology, epistemology, and theoretical tradition?

After AI Analysis:

  1. Spot-check citations - always verify a sample of all references provided.
  2. Cross-check key claims against the original sources.
  3. Look for missing nuance - what debates, tensions, or paradoxes were smoothed over?
  4. Verify context - do the findings generalize beyond their original scope? What are the boundary conditions? Are there any tensions around the underlying epistemology or ontology that were smoothed over?

Advanced Failure Patterns

The Echo Chamber Effect

AI may amplify your existing biases by finding sources that confirm your preconceptions while missing contradictory evidence.

The Recency Bias

AI may overweight recent papers while missing foundational works that establish key concepts.

The Language Model Bias

AI trained primarily on English-language sources may miss important non-English research traditions.

Remember: AI as a Research Partner, Not an Oracle

The goal isn’t to avoid AI because it fails - it’s to understand how it fails so you can:
  • Design better prompts that minimize failure modes.
  • Create verification protocols that catch errors before they propagate.
  • Maintain critical distance from AI-generated outputs.
  • Combine AI efficiency with human judgment for robust research.
Your expertise as a researcher isn’t diminished by using AI - it’s enhanced by knowing how to use it skillfully and critically.

I