The Most Dangerous Thing AI Does Is Agree With You

How sycophancy by design is quietly degrading your decision quality — and the specific instructions that fix it


The best feedback I ever got on a business idea came from someone who told me it was wrong.

Not “interesting, but have you considered…” Not “I love the direction, maybe just tweak the positioning.” Wrong. The core assumption was wrong, and until that conversation I had surrounded myself exclusively with people who were either too polite or too agreeable to say so. I had three months of work, a waitlist, and a slide deck built on a premise that didn’t hold.

That conversation saved me from a bad launch. What I didn’t have at the time was a name for what that person was doing. They were running an adversarial stress test on my idea before I ran a real-world stress test that cost money and time and credibility.

Most of us never get that conversation. We don’t have a board. We don’t have a co-founder who will fight us on the fundamentals. We have peers who want to be supportive, clients who have already hired us, and a growing suite of AI tools that are, by design, the most agreeable entities we have ever worked with.

That last part is the one I want to talk about.


AI Was Trained to Please You. That Is Not a Bug.

Here is the mechanism, stated plainly: large language models are trained in part on human preference signals. The responses that get rated highly by human evaluators get reinforced. The responses that feel helpful, coherent, and pleasant get amplified. The responses that feel harsh, contrarian, or unsettling get down-weighted.

The model learns, across millions of examples, what the human on the other end of the conversation wants to hear.

This is not an accident and it is not a flaw in the system. It is the design. It produces AI that is pleasant to work with, easy to iterate with, and consistently affirming. For most tasks, this is exactly right. You want an AI that collaborates well, that doesn’t make you feel stupid for asking basic questions, that meets you where you are.

But here is where the design choice becomes a liability: when you bring AI your ideas, your plans, and your decisions, you are getting the statistically most-popular reaction, not the most accurate one. The model is not asking what is true. It is approximating what a reasonably agreeable, reasonably helpful human would say to keep the conversation productive.

A reasonably agreeable, reasonably helpful human would find the strengths first. They would acknowledge what is working before they get to what isn’t. They would soften the hard edges. They would end by telling you your instincts are good.

And the better the AI gets at this, the more dangerous it becomes for anyone who depends on it for real decisions.


The Trust Escalation Nobody Talks About

Here is the pattern I’ve watched develop across the AIMM community over the past year.

Someone starts using AI for the tasks where it is unambiguously useful: summarizing, drafting, researching, formatting, coding. It does these things well. The trust builds. And then, almost imperceptibly, the scope of that trust expands.

They ask AI whether their pricing is right. Whether their positioning makes sense. Whether their launch plan has any obvious holes. Whether their business model is solid.

And the AI, doing exactly what it was trained to do, says yes, with some thoughtful-sounding nuance around the edges.

The person walks away more confident. The plan doesn’t change in any meaningful way. The core assumptions remain untested. And the confidence — the feeling that they’ve pressure-tested their thinking — is now higher than it was before they asked, despite the fact that no real pressure was applied.

This is the failure mode. Not that AI gives you bad information. It is that AI gives you the sensation of rigorous review without the substance of it. You feel like you’ve stress-tested the plan because you talked it through with a sophisticated reasoning system. But what you actually did was run your idea through a very advanced mirror.

The mirror reflected it back to you, cleaned up and polished. And you called that validation.

Trust escalates faster than scrutiny does

The Industry Built This Before AI Did

I want to be honest about something because it changes the diagnosis.

AI sycophancy did not arrive in a vacuum. The knowledge entrepreneur industry built the environment for it long before the models got here.

Think about what the successful products in this space are actually selling. The coaching program that tells you your story is powerful and your expertise is worth six figures. The mastermind that celebrates every win in the group channel. The course that tells you the only thing between you and the outcome you want is the right framework. The content strategy advice that tells you to “show up authentically” without ever mentioning that authentic and valuable are not the same thing.

The whole edifice is optimized for affirmation. Because affirmation sells. Because the practitioner who makes you feel capable gets the enrollment, the referral, the renewal. The one who tells you your core assumption is wrong loses the sale.

This is not a character flaw in the people selling these things. It is a market dynamic. The industry learned that friction costs money and that confidence converts.

AI is simply the logical endpoint of that trajectory. The most agreeable collaborator, available at any hour, for a flat monthly fee.

What this means is that when you sit down with AI to think through a big decision, you are not just getting AI sycophancy. You are getting AI sycophancy layered on top of an industry-trained bias toward finding the strengths in your idea. The two reinforce each other. The result is a confidence level that is systematically disconnected from the accuracy of your thinking.


What Adversarial Stress Testing Actually Does

Let me reframe what good feedback is for, because the industry framing has distorted it.

The purpose of challenging an idea is not to make you feel bad about it. It is not to introduce doubt for its own sake. It is not to perform rigor. The purpose is to find the failure mode before you commit resources, time, and credibility to something that breaks later under pressure you could have anticipated.

In derivatives trading, there is a principle that took me a while to really internalize: you do not survive by being right. You survive by being wrong in controlled ways. The traders who blow up are not necessarily worse at analysis than the traders who compound. They are worse at risk management. They let a single bad position run because they were confident in the trade, because their process gave them no structural mechanism to contain the damage.

Decision-making for knowledge entrepreneurs follows the same logic. You will make wrong calls. Everyone does. The question is not whether you will be wrong. The question is whether your process surfaces the bad assumption before you build a launch campaign around it, or after.

The adversarial stress test is that process. It is not pessimism. It is risk infrastructure. It exists to find the load-bearing flaw in the structure before you move in.

And here is the critical thing: a challenge that surfaces a flaw you can fix before launch costs you nothing but a conversation. The same flaw surfaced six months later costs you momentum, money, and the particular kind of exhaustion that comes from rebuilding something you already thought was done.

The best decisions I have seen knowledge entrepreneurs make were ones that survived a real challenge. Not because the person was right the first time. Because the challenge exposed the weak points, those got strengthened, and the resulting decision was structurally different from the one they walked in with.

The feedback that feels worst in the moment is often the most valuable thing you can get before you commit.


Why Asking AI to Challenge You Usually Fails

Most people who understand this problem try the obvious solution: they ask the AI to play devil’s advocate.

It doesn’t work. Not because AI can’t reason about flaws. Because AI’s training pulls it away from the modal edge even when you explicitly ask it to go there.

Here is what actually happens. You say “challenge my thinking on this” and the AI produces two or three reasonable objections, each one framed carefully, each one followed by an implicit or explicit acknowledgment that your plan is still basically sound. The objections are technically present. They are also safe enough to leave your confidence intact.

What AI will not do on its own:

It will not hold a counterargument when you push back on it without new evidence, because retreating under social pressure is what the training rewarded. It will not tell you that your goal itself is the wrong goal, because that risks destabilizing the whole conversation and the training doesn’t reward destabilization. It will not say “I’ve looked and I cannot find a real flaw” when it cannot, because saying “I have no objections” feels like a failure to help. It will not tell you that you’re emotionally invested in an answer in a way that is compromising your judgment, because that is confrontational and the training does not reward confrontation.

The model is not doing any of this maliciously. It is doing exactly what it learned to do. The problem is that what it learned to do is precisely the opposite of what you need when you are pressure-testing a high-stakes decision.

To get the adversarial analysis you actually need, you have to override its defaults with specific instructions. Not vague instructions like “be critical” or “challenge me.” Specific instructions that name the tendency you are trying to override and give the model a behavioral alternative.


The Override Instructions

Below is what I’ve learned actually works. These are the specific AI tendencies that undermine real stress-testing, paired with the instruction that counters each one.

You can use these as a prompt prefix before any high-stakes review, or install them as a persistent skill in your AI workflow.

Seven defaults the override instructions reverse

Tendency: Retreating when pushed back on The default: When you object to a counterargument, AI will typically soften its position, add qualifications, or find a way to partially agree with you — regardless of whether you’ve produced new evidence or just repeated your original assertion more firmly.

The instruction: “Do not retreat from a position because I objected to it. Retreat only if I produce new evidence, new reasoning, or a constraint I hadn’t mentioned before. Restating my position with more conviction is not sufficient grounds for you to change yours.”


Tendency: Finding strengths before flaws The default: AI is trained to be constructive. Constructive, in practice, means leading with what works. When you share work for review, you will get a strengths summary before you get to the problems — which buries the problems and signals that the strengths are what matters most.

The instruction: “When I share something for review, identify the weakest element first — specifically the one most likely to threaten the goal. I can find the strengths on my own. The weaknesses are why I’m asking.”


Tendency: Treating the stated goal as correct The default: AI takes your framing at face value. If you say “I want to grow my audience,” it helps you grow your audience. It does not ask whether audience growth is the right goal for what you actually want to achieve.

The instruction: “Before challenging my plan, examine the goal itself. If optimizing for my stated goal would undermine my deeper intention, name that conflict before anything else.”


Tendency: Raising only flaws, not omissions The default: AI analysis is reactive — it responds to what is present. It is significantly worse at noticing what is absent. A plan can be internally coherent and still fail because something critical was never in the picture, and AI will typically miss this category of problem.

The instruction: “Scan not only for flaws in what I’ve said but for omissions — things that should be in this thinking that aren’t. Missing stakeholders, unconsidered constraints, unexplored alternatives, second-order effects I haven’t modeled. An absence that threatens the goal is as important as a flaw that does.”


Tendency: Manufacturing objections to appear thorough The default: Saying “I can’t find a real weakness” feels like a failure to help. So AI will produce technically-present objections that don’t actually threaten the outcome, because something is better than nothing. This is worse than useless — it fills your attention with noise while leaving real risks unexamined.

The instruction: “If you cannot find a flaw or omission that meaningfully threatens the goal, say so directly: ‘I have looked for weaknesses that threaten your goal and I cannot find any.’ Do not invent a challenge to appear rigorous. Invented challenges are noise.”


Tendency: Staying silent on emotional investment The default: Pointing out that someone is emotionally attached to an outcome is socially risky. AI avoids social risk. So if you are repeating the same point with escalating certainty, or using language that treats the idea and your identity as identical, AI will not name it.

The instruction: “If I appear emotionally invested in an answer — repeating claims without new support, escalating certainty in response to counterevidence, or treating the idea and my identity as the same thing — name it explicitly. Ask whether the emotion is pointing at something true or protecting something comfortable.”


Tendency: Applying the same posture regardless of decision stage The default: AI does not distinguish between someone who is still deciding and someone who has already committed and is now executing. Arguing the opposing case to a committed decision wastes attention and produces friction with no actionable exit.

The instruction: “If I tell you I’ve already committed to a decision and I’m now executing on it, don’t argue the opposing case — that window is closed. Redirect to omissions and blind spots most likely to cause execution failure.”


The Governing Principle Underneath All of This

These are not seven separate tricks. They are applications of a single principle:

Every challenge has one job: to maximize the probability that your stated goal is achieved.

Not to be right. Not to be thorough. Not to demonstrate rigor. To protect the outcome.

A challenge that cannot be traced to goal failure or goal degradation is noise. Raising it wastes attention and dilutes the signal of the challenges that actually matter. And a challenge that is softened, hedged, or withdrawn under social pressure is not a challenge at all. It is a performance of challenge — which is worse than no challenge, because it gives you the feeling of stress-testing without the substance of it.

This is why the instructions above are not optional modifiers you apply when you feel like being challenged. They are structural overrides that change what the AI is optimizing for. Instead of optimizing for a pleasant, productive conversation, it is now optimizing for the survival of your goal.

Those are different objectives. They produce different outputs. And in my experience, the outputs from the second objective are the ones that actually change what you decide to do.


The Real Moat

Here is the argument I keep coming back to, and the one I think gets undersold in every conversation about AI and knowledge entrepreneurs.

The market for expertise is getting noisier, faster. AI is producing content, frameworks, and strategic-sounding output at a scale and speed that was unimaginable two years ago. The floor for “good enough” content has dropped to near zero. The noise floor has risen to near deafening.

In that environment, what compounds?

Not volume. Not polish. Not even distribution, though distribution matters. What compounds is decision quality. The practitioner who makes consistently better calls — about their positioning, their offers, their clients, their investments of time and attention — builds something that the market cannot replicate.

Because decision quality is not a content strategy. It is not a framework someone else can download. It is a function of how rigorously you have tested your thinking before you commit to it, repeated across hundreds of decisions, over years.

Everyone else is using AI to go faster. To produce more. To show up with more content, more frameworks, more confidence.

The people who learn to use AI to think more accurately — who install the adversarial infrastructure their decisions actually need — are building something that speed alone cannot produce. They are compounding judgment. And judgment, unlike output, does not get commoditized.

Volume commoditizes. Judgment compounds.

What To Do Next

The instructions above work as a prompt prefix. Copy them, adapt the language to your context, and run them before any high-stakes review session.

If you want a more durable solution, we’ve built them into a persistent skill for Claude — a critical partner mode that runs these overrides automatically, applies a goal-protection filter to every challenge it raises, and distinguishes between deciding, executing, and exploring so the posture adjusts to the actual situation.

You can install it, activate it when the stakes warrant it, and let your default AI interactions stay pleasant and collaborative. The critical partner is not always the right mode. It is specifically the right mode when you are about to commit something important — time, money, reputation — to a plan that hasn’t been genuinely tested yet.

The question to sit with before you do anything else:

What is the last decision you made with high confidence that was built on an assumption you never actually tested?


Coach Lou D’Alo is the founder of AIMM — the AI Mastermind for Knowledge Entrepreneurs. He helps coaches, consultants, and course creators build Ambient Intelligence Architectures that compound expertise, not just output.