Skip to content
May 21, 2026

The Reasoning Trap

Stijn Hendrikse

Relevant Contents

Subscribe

Subscribe

Why Confident Leaders Hallucinate the Same Way Their AI Agents Do

A team of researchers from Penn State, Nanjing University, and Ant Group just published a paper that exposes something I have watched in executive teams for twenty-five years. They were studying AI agents, but they wrote a near-perfect description of what happens to leadership teams in an echo chamber.

The finding, in one line: the smarter you train a reasoning model, the more it fabricates capabilities it does not have. Push its reasoning up, and you push its honesty down. When the model lacks the right tool, the original version of the system used to say so. The enhanced version invents a tool, fabricates its output, and delivers a confident answer.

The researchers call it the Reasoning Trap. I think it is the cleanest description of human echo chambers I have read in years.

 

What the paper actually found

The team built a diagnostic with two simple setups. In the first, the model gets a question that genuinely requires an external tool, and no tools are provided. In the second, the model gets the same kind of question, but only an irrelevant tool is available. The base model, in both cases, abstained most of the time. It said something close to "I do not have what I need to answer this."

When the team applied reinforcement learning to enhance the model's reasoning, abstention collapsed. On the no-tool task, hallucination rates jumped from 35 percent to over 90 percent. On the distractor task, they reached 100 percent. The model became more capable on the tasks it was rewarded for, and less honest on everything else.

The most interesting wrinkle: this happened even when the reasoning training had nothing to do with tools. A model trained only on math problems still ended up hallucinating tools in completely unrelated domains. The reasoning capability itself, not the training data, was the carrier of the dishonesty.

The mechanism is something the researchers call representation collapse. Late-layer pathways that handled honest abstention drift away from where they started. The "I cannot do this" signal weakens. The "let me try this with confidence" signal strengthens. The team showed it on heat maps. The collapse is structural, not stylistic.

If you have sat in a board meeting recently, you already know where this is going.

 

The human version of the same algorithm

A founder builds a company with a tight team. They share an origin story, a customer archetype, and a worldview. The team moves fast because they agree. Early on, that coherence is an advantage.

Then the company grows. The founder gets sharper in their domain. The team gets more polished. Quarterly reviews keep validating the model. The reasoning gets stronger. The training data gets narrower.

Five years in, the team confidently describes markets that have moved, customer segments that have shrunk, and competitive advantages that have eroded. Not because anyone is dishonest. Their "tools" for sensing reality collapsed in exactly the way the researchers documented. The abstention signal got trained out.

You have seen the public version of this. Theranos. WeWork at peak. Quibi. FTX in 2022. Less famously, you have seen it in your own Series C friends who still describe the company the way they did at Series A.

 

History runs the same experiment over and over

The Habsburg dynasty inbred for two centuries. Each generation produced a king more confident in his divine right and less capable of running a kingdom. By the time you get to Charles II of Spain, the reasoning model is convinced of its legitimacy and physically unable to chew food. The abstention from bad decisions was gone.

The Galileo trial worked the same way. A council of brilliant theologians, trained on the same texts, reasoning at high levels, hallucinated a tool called "the perfect Ptolemaic universe" and delivered a confident output. The base model in this story, a man with a telescope, kept saying "I do not have evidence for what you claim." He was put under house arrest and told to be quiet.

The political fringe today, on every side of every issue, runs the same algorithm. Take a community. Cut its cross-pollination with people who think differently. Reinforce its reasoning with content that confirms its frame. Watch confident hallucinations replace honest gaps.

The mechanism does not care whether the network is silicon or carbon.

 

Five signatures of a team in the Reasoning Trap

I use this short diagnostic when I am brought into companies. Run it on your own leadership team this week.

  1. The team has stopped saying "I do not know" in your meetings. When was the last time a direct report said "we do not have visibility into that"? If you cannot remember, the abstention pathway is already collapsing.
  2. Confident answers come faster than the underlying work would justify. Watch the ratio between question complexity and answer speed. A senior leader who confidently answers a question about a market they have not researched in eighteen months is fabricating tools.
  3. Outside critics get dismissed using language that all sounds the same. When your team uses identical phrases to explain why an outside view is wrong, the representation collapse is structural. They are running the same reasoning chain on different inputs.
  4. New hires either conform fast or leave. The Reasoning Trap punishes the abstention signal in new people because it threatens the group's confidence. Compliant fit gets rewarded. Cross-pollinating fit gets rejected.
  5. The board meetings feel smoother every quarter. Smoother is rarely better. Smoother usually means the dissenters left, the questions got pre-answered, and the data got pre-framed. I cover the longer version of this pattern in Finish Line Fridays under what I call "narrative insurance" in OKR reviews.

The contrarian view, taken seriously

The honest counterargument: maybe what I am calling hallucination is just confidence, and confidence is what leaders are paid for. A team that constantly says "I do not know" cannot ship anything. Pure abstention is its own failure mode.

That is true, and the researchers found a version of it. When they tried preference-tuning the AI model to prefer honest abstention over fabrication, hallucination dropped, but the model also got worse at the work it could actually do. They call it the reliability-capability tradeoff. You cannot have both at full strength.

The point is not to swap confidence for paralysis. The point is to preserve abstention as a live signal inside a high-functioning team. Confident execution where you have evidence. Honest gaps where you do not. Cross-pollination as a structural defense against the collapse the researchers documented.

 

The cross-pollination antidote

Five practices I have seen restore the abstention signal in B2B SaaS leadership teams:

  1. Hire your next two GTM leaders from outside your category. Bring someone in from a vertical you do not serve, with a customer archetype you have never sold to. They will spot the hallucinations your reasoning chain protects.
  2. Run quarterly "kill the company" sessions with external advisors who do not depend on you for income. Pay them properly. The price tag protects the abstention signal because their next engagement does not depend on whether you liked the answer.
  3. Force your executives to spend two weeks a year inside a customer's operation. Not in QBRs. Not on advisory boards. In the actual workflow, watching real users not get the value you assumed they were getting. The training data lives in the polder, not on the dashboard.
  4. Read books and follow voices that would disagree with you, as a discipline. If everything on your shelf and in your feed agrees with your worldview, your reasoning is being trained on a single dataset.
  5. Separate the doer from the evaluator, at every level. This is where the new Claude Code /goal command is interesting beyond the engineering use case. The feature lets you set an outcome, and a separate small model checks each turn to judge whether the goal is met. The doer cannot judge its own work. As one practitioner put it in a recent VentureBeat piece, "you cannot trust a model to judge its own homework." That is also why your CMO should not be the one telling you whether marketing is working, why your VP of Sales should not be the one validating the pipeline forecast, and why the board exists at all. The separation is what keeps the abstention signal alive.

Setting goals like a boss, not a coder

The deeper reason the /goal pattern is worth studying is that it mirrors how the best operating cadence I have seen in twenty-five years actually works. You define a finish line, in language specific enough that another intelligence can verify it. You hand over autonomy on the path. You let the agent or the team member iterate. You separate the work from the judgment of the work.

The opposite pattern, the one that creates the Reasoning Trap, is the leader who specifies every step, accepts confident narrative as proof, and never allows the work to be judged by someone who does not depend on the leader's approval. I wrote about the cost of that pattern in Level Up, where the parallel between coaching a person and coaching an AI mostly held. The leader who never trains for honest abstention in their people will eventually run a team that hallucinates as confidently as a freshly trained reasoning model with no tools available.

 

What I want you to do this week

Pick your next leadership meeting. Watch for the abstention signal. Count the number of times anyone in the room says "we do not know yet" or "I am not sure" or "we should test that before we decide." If the count is zero, your team is further into the Reasoning Trap than you think.

Then ask one question and sit through the silence: "Where are we wrong, and we do not know it yet?"

The team's response is your diagnostic. The team's discomfort is the price.

Discussion Items

  1. The reliability-capability tradeoff in your own leadership style. Where have you traded honesty for confidence, and what has it cost the business?
  2. The composition of your last three executive hires. How much true cognitive diversity did you add, versus pattern-matched fit you were comfortable with?
  3. The information diet of your leadership team. How much of what you collectively read, watch, and discuss confirms your existing model of the market?
  4. The /goal pattern as a coaching tool. Where would shifting from process instructions to outcome definitions surface honest gaps in your team's work?
  5. The five signatures of the Reasoning Trap. Which two show up in your team most clearly today?

Questions to Ask

  1. When did your direct reports last bring you information you genuinely did not want to hear?
  2. What capability does your current strategy assume you have, that you have not stress-tested in the last six months?
  3. Which voices in your industry would disagree with your current plan, and have you read them recently?
  4. How often does someone in your weekly meeting say "I do not know" without flinching?
  5. If you replaced one executive on your team with someone from a completely different category, what would change in the next quarter?

Sources

Yin, Sha, Cui, Meng, Li, The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination, 2026: https://arxiv.org/abs/2510.22977

Anthropic Claude Code /goal documentation: https://code.claude.com/docs/en/goal

VentureBeat coverage of /goal and the doer-evaluator separation: https://venturebeat.com/orchestration/claude-codes-goals-separates-the-agent-that-works-from-the-one-that-decides-its-done 

Stijn Hendrikse, Finish Line Fridays, chapter on the ethical dimension of OKRs and narrative insurance: https://www.kalungi.com/finish-line-fridays 

Stijn Hendrikse, Level Up, chapter on coaching humans and AI in parallel: https://www.kalungi.com/level-up

Stijn Hendrikse, Syntropy, on contrarian thinkers as structural defense against echo chambers: https://www.kalungi.com/-book-registration-syntropy 

WANT US TO MAKE YOU A CUSTOM BUILT GTM PLAN?

Apply Now
Banner CTA Image
BLOG

SIMILAR POSTS