Why agentic AI pilots fail and how to scale safely

   ​

 [[{“value”:”

At the AI Accelerator Institute Summit in New York, Oren Michels, Co-founder and CEO of Barndoor AI, joined a one-on-one discussion with Alexander Puutio, Professor and Author, to explore a question facing every enterprise experimenting with AI: Why do so many AI pilots stall, and what will it take to unlock real value?

Barndoor AI launched in May 2025. Its mission addresses a gap Oren has seen over decades working in data access and security: how to secure and manage AI agents so they can deliver on their promise in enterprise settings.

“What you’re really here for is the discussion about AI access,” he told the audience. “There’s a real need to secure AI agents, and frankly, the approaches I’d seen so far didn’t make much sense to me.”

AI pilots are being built, but Oren was quick to point out that deployment is where the real challenges begin.

As Alexander noted:

“If you’ve been around AI, as I know everyone here has, you’ve seen it. There are pilots everywhere…”

Why AI pilots fail

Oren didn’t sugarcoat the current state of enterprise AI pilots:

“There are lots of them. And many are wrapping up now without much to show for it.”

Alexander echoed that hard truth with a personal story. In a Forbes column, he’d featured a CEO who was bullish on AI, front-loading pilots to automate calendars and streamline doctor communications. But just three months later, the same CEO emailed him privately:

“Alex, I need to talk to you about the pilot.”

The reality?

“The whole thing went off the rails. Nothing worked, and the vendor pulled out.”

Why is this happening? According to Oren, it starts with a misconception about how AI fits into real work:

“When we talk about AI today, people often think of large language models, like ChatGPT. And that means a chat interface.”

But this assumption is flawed.

“That interface presumes that people do their jobs by chatting with a smart PhD about what to do. That’s just not how most people work.”

Oren explained that most employees engage with specific tools and data. They apply their training, gather information, and produce work products. That’s where current AI deployments miss the mark, except in coding:

“Coding is one of those rare jobs where you do hand over your work to a smart expert and say, ‘Here’s my code, it’s broken, help me fix it.’ LLMs are great at that. But for most functions, we need AI that engages with tools the way people do, so it can do useful, interesting work.”

The promise of agents and the real bottleneck

Alexander pointed to early agentic AI experiments, like Devin, touted as the first AI software engineer:

“When you actually looked at what the agent did, it didn’t really do that much, right?”

Oren agreed. The issue wasn’t the technology; it was the disconnect between what people expect agents to do and how they actually work:

“There’s this promise that someone like Joe in finance will know how to tell an agent to do something useful. Joe’s probably a fantastic finance professional, but he’s not part of that subset who knows how to instruct computers effectively.”

He pointed to Zapier as proof: a no-code tool that didn’t replace coders.

“The real challenge isn’t just knowing how to code. It’s seeing these powerful tools, understanding the business problems, and figuring out how to connect the two. That’s where value comes from.”

And too often, Oren noted, companies think money alone will solve it. CEOs invest heavily and end up with nothing to show because:

“Maybe the human process, or how people actually use these tools, just isn’t working.”

This brings us to what Oren called the real bottleneck: access, not just to AI, but what AI can access.

“We give humans access based on who they are, what they’re doing, and how much we trust them. But AI hasn’t followed that same path. Just having AI log in like a human and click around isn’t that interesting; that’s just scaled-up robotic process automation.”

Instead, enterprises need to define:

What they trust an agent to doThe rights of the human behind itThe rules of the system it’s interacting withAnd the specific task at hand

These intersect to form what Oren called a multi-dimensional access problem:

“Without granular controls, you end up either dialing agents back so much they’re less useful than humans, or you risk over-permissioning. The goal is to make them more useful than humans.”

Why specialized agents are the future (and how to manage the “mess”)

As the conversation shifted to access, Alexander posed a question many AI leaders grapple with: When we think about role- and permission-based access, are we really debating the edges of agentic AI?

“Should agents be able to touch everything, like deleting Salesforce records, or are we heading toward hyper-niche agents?”

Oren was clear on where he stands:

“I’d be one of those people making the case for niche agents. It’s the same as how we hire humans. You don’t hire one person to do everything. There’s not going to be a single AI that rules them all, no matter how good it is.”

Instead, as companies evolve, they’ll seek out specialized tools, just like they hire specialized people.

“You wouldn’t hire a bunch of generalists and hope the company runs smoothly. The same will happen with agents.”

But with specialization comes complexity. Alexander put it bluntly:

“How do we manage the mess? Because, let’s face it, there’s going to be a mess.”

Oren welcomed that reality:

“The mess is actually a good thing. We already have it with software. But you don’t manage it agent by agent, there will be way too many.”

The key is centralized management:

A single place to manage all agentsControls based on what agents are trying to do, and the role of the human behind themSystem-specific safeguards, because admins (like your Salesforce or HR lead) need to manage what’s happening in their domain“If each agent or its builder had its own way of handling security, that wouldn’t be sustainable. And you don’t want agents or their creators deciding their own security protocols – that’s probably not a great idea.”

Why AI agents need guardrails and onboarding

The question of accountability loomed large. When humans manage fleets of AI agents, where does responsibility sit?

Oren was clear:

“There’s human accountability. But we have to remember: humans don’t always know what the agents are going to do, or how they’re going to do it. If we’ve learned anything about AI so far, it’s that it can have a bit of a mind of its own.”

He likened agents to enthusiastic interns – eager to prove themselves, sometimes overstepping in their zeal:

“They’ll do everything they can to impress. And that’s where guardrails come in. But it’s hard to build those guardrails inside the agent. They’re crafty. They’ll often find ways around internal limits.”

The smarter approach? Start small:

Give agents a limited scope.Watch their behavior.Extend trust gradually, just as you would with a human intern who earns more responsibility over time.

This led to the next logical step: onboarding. Alexander asked whether bringing in AI agents is like an HR function.

Oren agreed and shared a great metaphor from Nvidia’s Jensen Huang:

“You have your biological workforce, managed by HR, and your agent workforce, managed by IT.”

Just as companies use HR systems to manage people, they’ll need systems to manage, deploy, and train AI agents so they’re efficient and, as Alexander added, safe.

How to manage AI’s intent

Speed is one of AI’s greatest strengths and risks. As Oren put it:

“Agents are, at their core, computers, and they can do things very, very fast. One CISO I know described it perfectly: she wants to limit the blast radius of the agents when they come in.”

That idea resonated. Alexander shared a similar reflection from a security company CEO:

“AI can sometimes be absolutely benevolent, no problem at all, but you still want to track who’s doing what and who’s accessing what. It could be malicious. Or it could be well-intentioned but doing the wrong thing.”

Real-world examples abound from models like Anthropic’s Claude “snitching” on users, to AI trying to protect its own code base in unintended ways.

So, how do we manage the intent of AI agents?

Oren drew a striking contrast to traditional computing:

“Historically, computers did exactly what you told them; whether that’s what you wanted or not. But that’s not entirely true anymore. With AI, sometimes they won’t do exactly what you tell them to.”

That makes managing them a mix of art and science. And, as Oren pointed out, this isn’t something you can expect every employee to master:

“It’s not going to be Joe in finance spinning up an agent to do their job. These tools are too powerful, too complex. Deploying them effectively takes expertise.”

Why pilots stall and how innovation spreads

If agents could truly do it all, Oren quipped:

“They wouldn’t need us here, they’d just handle it all on their own.”

But the reality is different. When Alexander asked about governance failures, Oren pointed to a subtle but powerful cause of failure. Not reckless deployments, but inertia:

“The failure I see isn’t poor governance in action, it’s what’s not happening. Companies are reluctant to really turn these agents loose because they don’t have the visibility or control they need.”

The result? Pilot projects that go nowhere.

“It’s like hiring incredibly talented people but not giving them access to the tools they need to do their jobs and then being disappointed with the results.”

In contrast, successful AI deployments come from open organizations that grant broader access and trust. But Oren acknowledged the catch:

“The larger you get as a company, the harder it is to pull off. You can’t run a large enterprise that way.”

So, where does innovation come from?

“It’s bottom-up, but also outside-in. You’ll see visionary teams build something cool, showcase it, and suddenly everyone wants it. That’s how adoption spreads, just like in the API world.”

And to bring that innovation into safe, scalable practice:

Start with governance and security so people feel safe experimenting.Engage both internal teams and outside experts.Focus on solving real business problems, not just deploying tech for its own sake.

Oren put it bluntly:

“CISOs and CTOs, they don’t really have an AI problem. But the people creating products, selling them, managing finance – they need AI to stay competitive.”

Trusting AI from an exoskeleton to an independent agent

The conversation circled back to a critical theme: trust.

Alexander shared a reflection that resonated deeply:

“Before ChatGPT, the human experience with computers was like Excel: one plus one is always two. If something went wrong, you assumed it was your mistake. The computer was always right.”

But now, AI behaves in ways that can feel unpredictable, even untrustworthy. What does that mean for how we work with it?

Oren saw this shift as a feature, not a flaw:

“If AI were completely linear, you’d just be programming, and that’s not what AI is meant to be. These models are trained on the entirety of human knowledge. You want them to go off and find interesting, different ways of looking at problems.”

The power of AI, he argued, comes not from treating it like Google, but from engaging it in a process:

“My son works in science at a biotech startup in Denmark. He uses AI not to get the answer, but to have a conversation about how to find the answer. That’s the mindset that leads to success with AI.”

And that mindset extends to gradual trust:

“Start by assigning low-risk tasks. Keep a human in the loop. As the AI delivers better results over time, you can reduce that oversight. Eventually, for certain tasks, you can take the human out of the loop.”

Oren summed it up with a powerful metaphor:

“You start with AI as an exoskeleton; it makes you bigger, stronger, faster. And over time, it can become more like the robot that does the work itself.”

The spectrum of agentic AI and why access controls are key

Alexander tied the conversation to a helpful analogy from a JP Morgan CTO: agentic AI isn’t binary.

“There’s no clear 0 or 1 where something is agentic or isn’t. At one end, you have a fully trusted system of agents. On the other hand, maybe it’s just a one-shot prompt or classic RPA with a bit of machine learning on top.”

Oren agreed:

“You’ve described the two ends of the spectrum perfectly. And with all automation, the key is deciding where on that spectrum we’re comfortable operating.”

He compared it to self-driving cars:

“Level 1 is cruise control; Level 5 is full autonomy. We’re comfortable somewhere in the middle right now. It’ll be the same with agents. As they get better, and as we get better at guiding them, we’ll move further along that spectrum.”

And how do you navigate that safely? Oren returned to the importance of access controls:

“When you control access outside the agent layer, you don’t have to worry as much about what’s happening inside. The agent can’t see or write to anything it isn’t allowed to.”

That approach offers two critical safeguards:

It prevents unintended actions.It provides visibility into attempts, showing when an agent tries to do something it shouldn’t, so teams can adjust the instructions before harm is done.“That lets you figure out what you’re telling it that’s prompting that behavior, without letting it break anything.”

The business imperative and the myth of the chat interface

At the enterprise level, Oren emphasized that the rise of the Chief AI Officer reflects a deeper truth:

“Someone in the company recognized that we need to figure this out to compete. Either you solve this before your competitors and gain an advantage, or you fall behind.”

And that, Oren stressed, is why this is not just a technology problem, it’s a business problem:

“You’re using technology, but you’re solving business challenges. You need to engage the people who have the problems, and the folks solving them, and figure out how AI can make that more efficient.”

When Alexander asked about the biggest myth in AI enterprise adoption, Oren didn’t hesitate:

“That the chat interface will win.”

While coders love chat interfaces because they can feed in code and get help most employees don’t work that way:

“Most people don’t do their jobs through chat-like interaction. And most don’t know how to use a chat interface effectively. They see a box, like Google search, and that doesn’t work well with AI.”

He predicted that within five years, chat interfaces will be niche. The real value?

“Agents doing useful things behind the scenes.”

How to scale AI safely

Finally, in response to a closing question from Alexander, Oren offered practical advice for enterprises looking to scale AI safely:

“Visibility is key. We don’t fully understand what happens inside these models; no one really does. Any tool that claims it can guarantee behavior inside the model? I’m skeptical.”

Instead, Oren urged companies to focus on where they can act:

“Manage what goes into the tools, and what comes out. Don’t believe you can control what happens within them.”

Final thoughts

As enterprises navigate the complex realities of AI adoption, one thing is clear: success won’t come from chasing hype or hoping a chat interface will magically solve business challenges. 

It will come from building thoughtful guardrails, designing specialized agents, and aligning AI initiatives with real-world workflows and risks. The future belongs to companies that strike the right balance; trusting AI enough to unlock its potential, but governing it wisely to protect their business. 

The path forward isn’t about replacing people; it’s about empowering them with AI that truly works with them, not just beside them.

“}]] 

Why agentic AI pilots stall, what causes failure, and how enterprises can scale AI safely with strong governance and access controls. 

Related Posts

Recent Events

Scroll to Top