Stop Relying on "Book a Demo": How Strong Offers Turn Interest Into Action
Most SaaS CTAs fall flat because they sound like sales calls. Learn how to build high-value offers that actually convert interest into qualified...
Stijn Hendrikse
A team of researchers from Penn State, Nanjing University, and Ant Group just published a paper that exposes something I have watched in executive teams for twenty-five years. They were studying AI agents, but they wrote a near-perfect description of what happens to leadership teams in an echo chamber.
The finding, in one line: the smarter you train a reasoning model, the more it fabricates capabilities it does not have. Push its reasoning up, and you push its honesty down. When the model lacks the right tool, the original version of the system used to say so. The enhanced version invents a tool, fabricates its output, and delivers a confident answer.
The researchers call it the Reasoning Trap. I think it is the cleanest description of human echo chambers I have read in years.
The team built a diagnostic with two simple setups. In the first, the model gets a question that genuinely requires an external tool, and no tools are provided. In the second, the model gets the same kind of question, but only an irrelevant tool is available. The base model, in both cases, abstained most of the time. It said something close to "I do not have what I need to answer this."
When the team applied reinforcement learning to enhance the model's reasoning, abstention collapsed. On the no-tool task, hallucination rates jumped from 35 percent to over 90 percent. On the distractor task, they reached 100 percent. The model became more capable on the tasks it was rewarded for, and less honest on everything else.
The most interesting wrinkle: this happened even when the reasoning training had nothing to do with tools. A model trained only on math problems still ended up hallucinating tools in completely unrelated domains. The reasoning capability itself, not the training data, was the carrier of the dishonesty.
The mechanism is something the researchers call representation collapse. Late-layer pathways that handled honest abstention drift away from where they started. The "I cannot do this" signal weakens. The "let me try this with confidence" signal strengthens. The team showed it on heat maps. The collapse is structural, not stylistic.
If you have sat in a board meeting recently, you already know where this is going.
A founder builds a company with a tight team. They share an origin story, a customer archetype, and a worldview. The team moves fast because they agree. Early on, that coherence is an advantage.
Then the company grows. The founder gets sharper in their domain. The team gets more polished. Quarterly reviews keep validating the model. The reasoning gets stronger. The training data gets narrower.
Five years in, the team confidently describes markets that have moved, customer segments that have shrunk, and competitive advantages that have eroded. Not because anyone is dishonest. Their "tools" for sensing reality collapsed in exactly the way the researchers documented. The abstention signal got trained out.
You have seen the public version of this. Theranos. WeWork at peak. Quibi. FTX in 2022. Less famously, you have seen it in your own Series C friends who still describe the company the way they did at Series A.
The Habsburg dynasty inbred for two centuries. Each generation produced a king more confident in his divine right and less capable of running a kingdom. By the time you get to Charles II of Spain, the reasoning model is convinced of its legitimacy and physically unable to chew food. The abstention from bad decisions was gone.
The Galileo trial worked the same way. A council of brilliant theologians, trained on the same texts, reasoning at high levels, hallucinated a tool called "the perfect Ptolemaic universe" and delivered a confident output. The base model in this story, a man with a telescope, kept saying "I do not have evidence for what you claim." He was put under house arrest and told to be quiet.
The political fringe today, on every side of every issue, runs the same algorithm. Take a community. Cut its cross-pollination with people who think differently. Reinforce its reasoning with content that confirms its frame. Watch confident hallucinations replace honest gaps.
The mechanism does not care whether the network is silicon or carbon.
I use this short diagnostic when I am brought into companies. Run it on your own leadership team this week.
The honest counterargument: maybe what I am calling hallucination is just confidence, and confidence is what leaders are paid for. A team that constantly says "I do not know" cannot ship anything. Pure abstention is its own failure mode.
That is true, and the researchers found a version of it. When they tried preference-tuning the AI model to prefer honest abstention over fabrication, hallucination dropped, but the model also got worse at the work it could actually do. They call it the reliability-capability tradeoff. You cannot have both at full strength.
The point is not to swap confidence for paralysis. The point is to preserve abstention as a live signal inside a high-functioning team. Confident execution where you have evidence. Honest gaps where you do not. Cross-pollination as a structural defense against the collapse the researchers documented.
Five practices I have seen restore the abstention signal in B2B SaaS leadership teams:
The deeper reason the /goal pattern is worth studying is that it mirrors how the best operating cadence I have seen in twenty-five years actually works. You define a finish line, in language specific enough that another intelligence can verify it. You hand over autonomy on the path. You let the agent or the team member iterate. You separate the work from the judgment of the work.
The opposite pattern, the one that creates the Reasoning Trap, is the leader who specifies every step, accepts confident narrative as proof, and never allows the work to be judged by someone who does not depend on the leader's approval. I wrote about the cost of that pattern in Level Up, where the parallel between coaching a person and coaching an AI mostly held. The leader who never trains for honest abstention in their people will eventually run a team that hallucinates as confidently as a freshly trained reasoning model with no tools available.
Pick your next leadership meeting. Watch for the abstention signal. Count the number of times anyone in the room says "we do not know yet" or "I am not sure" or "we should test that before we decide." If the count is zero, your team is further into the Reasoning Trap than you think.
Then ask one question and sit through the silence: "Where are we wrong, and we do not know it yet?"
The team's response is your diagnostic. The team's discomfort is the price.
Yin, Sha, Cui, Meng, Li, The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination, 2026: https://arxiv.org/abs/2510.22977
Anthropic Claude Code /goal documentation: https://code.claude.com/docs/en/goal
VentureBeat coverage of /goal and the doer-evaluator separation: https://venturebeat.com/orchestration/claude-codes-goals-separates-the-agent-that-works-from-the-one-that-decides-its-done
Stijn Hendrikse, Finish Line Fridays, chapter on the ethical dimension of OKRs and narrative insurance: https://www.kalungi.com/finish-line-fridays
Stijn Hendrikse, Level Up, chapter on coaching humans and AI in parallel: https://www.kalungi.com/level-up
Stijn Hendrikse, Syntropy, on contrarian thinkers as structural defense against echo chambers: https://www.kalungi.com/-book-registration-syntropy
Most SaaS CTAs fall flat because they sound like sales calls. Learn how to build high-value offers that actually convert interest into qualified...
Hot leads cool down very fast. Lead Conversion Rates are directly correlated to follow up speed and repetition.
While it’s Fall, many SaaS B2B Companies can use the equivalent of Digital “Spring Cleaning” at least quarterly.