Part II: The Agent Revolution

Organizing for AI Innovation

18 min read

In early 2024, Klarna’s CEO Sebastian Siemiatkowski announced a triumph. The Swedish fintech’s AI assistant, built with OpenAI, had taken over 75 percent of customer service chats, roughly 2.3 million conversations across 35 languages. The company had cut around 700 support roles. The math looked irresistible: faster resolution times, lower costs, AI handling the volume that humans once did.

Within a year, Siemiatkowski was publicly walking it back. Customer satisfaction had dropped. The AI handled routine queries well enough, but edge cases overwhelmed it: emotionally charged disputes, multi-step billing problems, the kind of thing that required someone who actually understood how Klarna’s products worked. A system trained on generic patterns couldn’t absorb that complexity. The company began rehiring human agents. They framed it as a “dual-track approach” rather than a retreat. The lesson wasn’t that AI failed. It was that a centralized AI system, built far from the business domains it was supposed to serve, couldn’t handle the complexity of real customer problems at scale.

Klarna’s reversal reflects a much larger pattern. BCG reported in 2024 that 74 percent of companies struggle to achieve and scale value from AI. McKinsey’s 2025 State of AI survey found that while 88 percent of organizations use AI in at least one function, only 39 percent see measurable impact on earnings. The most striking finding: roughly 70 percent of the barriers to scaling AI are about people and processes. Org charts. Workflows. Incentive structures that reward hoarding data rather than sharing it. The bottleneck is almost never the model.

By 2025, MIT data showed that 95 percent of enterprise GenAI pilots failed to move beyond the proof-of-concept stage. The number is dramatic but shouldn’t be surprising. Every major platform shift follows this adoption curve. Half of Fortune 500 CEOs called the internet “a fad” in the mid-1990s. Some of those companies no longer exist. In the 2000s, enterprises thought an iPhone app was a mobile strategy. In the 2010s, cloud migration was an endless cycle of proofs of concept before genuine transformation. In each case, the technology worked. What didn’t work was the organizational capacity to absorb it.

The pattern is consistent: the lag is not a bug in the adoption process. It is the default setting. And the companies that survive the lag are not the ones with the best technology but the ones willing to restructure around what the technology makes possible.

The centralized AI lab and its discontents

Most large organizations, when they commit to AI, follow a predictable playbook. They hire a Chief AI Officer. They assemble a centralized team of data scientists, ML engineers, prompt engineers, and LLM specialists, housing them in a dedicated unit sometimes branded as an “AI Center of Excellence.” They invest in a shared platform: a vector database, an orchestration layer, a model gateway. They task this central team with serving the entire enterprise.

There’s a reasonable logic to this. Large language models require specialized expertise. Centralizing it avoids redundancy. A single platform prevents fragmentation. The central team can enforce responsible AI practices, share best prompts and fine-tuning approaches, and direct resources where they’ll have the most impact.

This works at small scale. A handful of proof-of-concepts, a flagship chatbot, an internal knowledge assistant. Morgan Stanley’s central AI team built an internal GPT-4-powered assistant that made hundreds of thousands of pages of investment research instantly searchable for financial advisors. Adoption hit 98 percent. But it was a specific kind of success: the tool served a single, well-defined domain (wealth management research retrieval) and was built by people who understood that domain intimately.

The problems show up when the central AI team is expected to serve everyone. Marketing wants an AI agent that generates campaign copy in the brand’s voice. Supply chain wants a forecasting model trained on their logistics data. Legal wants a contract review agent that understands their jurisdiction’s regulations. Customer service wants an AI assistant that actually knows the product.

Each request enters the central team’s backlog. Each requires domain-specific data the central team doesn’t own or fully understand. Each demands context, the unwritten rules, the edge cases, the institutional knowledge, that can’t be captured in a requirements document. The central team becomes a translation layer between the people who understand the business problem and the people who understand the technology. That translation is where meaning gets lost.

Melvin Conway identified this dynamic in 1968: the structure of a system mirrors the communication structure of the organization that builds it. A centralized AI team produces centralized AI systems. Those systems reflect the central team’s understanding, which is inevitably a flattened, simplified version of the domains they serve. The AI agent built by a team that doesn’t deeply understand customer service will handle customer service generically. The forecasting model built by engineers who have never worked in logistics will miss patterns a supply chain manager would catch immediately.

Why the pipeline model breaks down

The deeper structural problem is how centralized AI teams decompose their work. They tend to organize by technical function: one group handles data engineering, another does model training, another manages deployment, another monitors performance. Each group optimizes its own stage of the pipeline.

But this decomposition runs perpendicular to how value actually gets delivered. When the supply chain team needs an AI agent that predicts warehouse stockouts, that need cuts across every stage. New data must be ingested. A model must be trained on domain-specific patterns. The agent must be deployed with access to the right inventory systems. Monitoring must track supply chain metrics, not generic model accuracy.

Delivering that outcome requires coordinating across every sub-team. Priorities must align. Handoffs must be managed. The person training the model must understand what a stockout actually means in this company’s operations, which warehouses matter most, which products are seasonal, what lead times look like. That understanding lives in the supply chain team.

This is the organizational version of what Carl Benedikt Frey documents across centuries of economic history in The Technology Trap. Enabling technologies, the kind that augment what people can do rather than simply replacing them, deliver their value only when the surrounding structures allow it. The spinning jenny existed for decades before factory organization caught up. The electric motor was available for thirty years before manufacturers figured out that you had to redesign the factory floor, not just swap out the power source. The technology arrives. Whether it produces transformation or expensive disappointment depends on the organizational architecture.

For large language models, this question is particularly acute. These are not narrow tools that slot into existing workflows. They are general-purpose capabilities that can, in theory, reshape how every part of an organization operates. Whether they actually do depends less on the technology than on whether the people who understand each part are empowered to shape how AI is applied in their domain. That’s an organizational design problem, and we are still in the early stages of figuring out what good solutions look like.

Pushing AI to the domains

The alternative is to put AI ownership in the business domains where the problems and the expertise actually live.

Software went through something similar. Over the past two decades, software engineering went through an analogous transformation. Monolithic applications gave way to microservices. Centralized IT departments gave way to product teams that own their services end to end. The principle Eric Evans articulated in Domain-Driven Design (2003), that complex organizations are best served by multiple bounded contexts, each owned by the team closest to that reality, became the foundation of modern software architecture.

The same principle applies to AI. Each business domain, whether marketing, operations, customer experience, finance, or legal, takes ownership of its own AI capabilities. The domain team doesn’t file a request with a central AI lab and wait. It owns the data, defines the problem, builds or configures the AI agents, deploys them, and is accountable for the outcomes.

Walmart’s evolution shows what this looks like in practice. By mid-2025, the company had consolidated its sprawling AI initiatives into four domain-specific “super agents,” each aligned with a distinct constituency: one for customers (a shopping assistant that answers detailed product questions), one for store associates (handling scheduling, HR queries, and support tickets), one for suppliers and advertisers (order management and campaign optimization), and one for developers (an internal agent that navigates Walmart’s engineering systems). Each agent is owned by the team that understands its domain. The customer-facing agent is shaped by people who understand shopping behavior. The associate agent is shaped by people who understand store operations. The technology platform underneath is shared, but the intelligence layer, the prompts, the data, the domain logic, the evaluation criteria, belongs to the domain.

That’s where the difference actually matters. You can centralize infrastructure. You can’t centralize intelligence, at least not without flattening it into something generic. An LLM is a general-purpose reasoning engine. What makes it useful in a specific domain is the context wrapped around it: the retrieval sources, the system prompts, the tools it can call, the guardrails that keep it on track in domain-specific ways. That context is domain knowledge. Domain knowledge lives in domain teams.

The new team composition

Distributing AI to the domains demands a different kind of team. The traditional split, “business people” on one side, “technical people” on the other, doesn’t hold when AI becomes embedded in the business function itself.

In April 2025, Shopify CEO Tobi Lutke circulated an internal memo that became something of a manifesto. “Reflexive AI usage is now a baseline expectation at Shopify,” he wrote. Teams would need to demonstrate why they couldn’t accomplish a goal using AI before requesting additional headcount. AI competency would become part of performance reviews. Every team was asked to imagine autonomous AI agents as members of the team, not tools handed to the team by a separate department, but capabilities native to how the team works.

The memo was polarizing. Critics saw it as cost-cutting dressed in innovation language. But the organizational insight underneath was sound: AI can’t be a service that one team provides to another. It has to be a capability that every team possesses.

What does this look like concretely? A domain team building AI capabilities needs at least three kinds of expertise working together. Domain specialists, the people who understand the business problem, the data, the edge cases, and what “good” looks like: customer service managers, supply chain analysts, underwriters, merchandisers. AI engineers, people who can configure LLM agents, build retrieval pipelines, design evaluation frameworks, and wire up tool integrations. And product thinkers, people accountable for whether the capability actually delivers value to its users, who define success metrics and iterate based on feedback.

These roles often blend in practice. A supply chain analyst who learns to configure an AI agent becomes both the domain specialist and the AI builder. An ML engineer embedded in the legal team for six months absorbs enough context to anticipate edge cases that a visiting consultant never would. The boundaries between “business” and “technical” become permeable, which is exactly what mature AI adoption requires.

The old model, where a central AI team builds something and “throws it over the wall” to a business unit, fails for the same reason the old model of software development failed. Software engineers learned decades ago that you can’t build a good product without being embedded in the problem. The same is true for AI, perhaps more so, because AI systems are more sensitive to context. A retrieval-augmented generation system that works beautifully in one domain can hallucinate confidently in another, and the difference often comes down to domain-specific tuning that only an insider would think to apply.

The platform that makes distribution possible

If every domain team must build its own LLM infrastructure from scratch, configuring its own vector databases, managing its own model deployments, building its own evaluation pipelines, solving its own security and compliance challenges, the cost is prohibitive and the fragmentation severe. Distributing AI capabilities to domains only works if a shared platform handles the undifferentiated heavy lifting underneath.

That’s the second organizational principle: a self-serve AI platform, built by a dedicated platform team whose job is to make domain teams fast and autonomous.

Amazon’s internal approach shows this at work. Their Agent Spaces framework provides centralized governance (access control, security policies, approved tool integrations) while letting individual teams build and deploy their own AI agents within those guardrails. Amazon Bedrock AgentCore Gateway centralizes tool management so that multiple agents across different teams can discover and invoke shared capabilities without duplicating code. The infrastructure is centralized. The decisions about what to build and how to apply it are distributed.

Brynjolfsson and McAfee made a similar observation about general-purpose technologies in The Second Machine Age. The steam engine’s full economic impact took decades to materialize, not because the technology was immature, but because the organizational infrastructure needed to exploit it, the factory design, the supply chains, the management practices, had to be invented alongside it. The electric motor followed the same path. AI is following it now. The models are powerful. What most organizations lack is the internal platform that lets domain teams actually use that power without needing a PhD in machine learning.

The platform team’s success isn’t measured by features shipped. It’s measured by what it enables: how quickly a domain team can go from an idea to a deployed AI agent, how few specialists that team needs, how little time they spend on infrastructure versus the domain problem they’re actually trying to solve.

Guardrails without gatekeepers

Distribution without coordination produces chaos. If every domain team makes independent decisions about which models to use, how to handle sensitive data, what safety evaluations to run, and how to monitor for hallucinations, the result is an ungovernable sprawl of AI systems with inconsistent quality and unpredictable risk.

The answer isn’t to recentralize control. It’s to encode governance into the platform itself, what we might call computational governance.

Define the rules once, enforce them everywhere, automatically. A domain team building a customer-facing AI agent shouldn’t have to submit a request to a governance board and wait for approval. Instead, when the team deploys the agent, the platform runs the required checks. Does the agent’s system prompt comply with the company’s responsible AI policy? Are the retrieval sources approved for this use case? Has the agent been evaluated against the standard bias and safety benchmarks? Are the access controls properly configured?

Policies expressed as code, tested like code, enforced at deployment time. The domain team retains full autonomy over what they build and how. The platform ensures that whatever they build meets the organization’s non-negotiable standards for safety, privacy, and quality. Governance shifts from a committee that reviews proposals to a set of automated checks that run on every deployment.

For AI agents, this matters more than it did for traditional software. An agent that can call tools, access databases, and take actions in the real world introduces risks that static software doesn’t. An agent that drafts customer emails needs different guardrails than one that executes financial transactions. The governance framework must be granular enough to express these distinctions and automated enough to enforce them without becoming the bottleneck that the old centralized model was.

What the early movers got right

The companies that have navigated this transition most effectively didn’t reorganize everything at once. They picked one or two domains with the strongest need and the most willing teams, embedded AI capability there, demonstrated results, and let success pull other domains in. Morgan Stanley started with wealth management research, a domain with a clear use case, clean data, and motivated users, before expanding to institutional securities with AskResearchGPT. Walmart started with a developer-facing agent before extending to customer, associate, and supplier domains. The pattern is consistent: start narrow, prove value, expand.

What happened to the central AI team in these companies is telling. It didn’t dissolve; it evolved into the platform team. The ML engineers who once built models for every business unit started building the platform that lets business units build their own. The prompt engineers who once handled every LLM integration started writing the templates, evaluations, and guardrails that domain teams use. The expertise redistributed rather than disappeared.

The measurement question also separated the successful from the struggling. Klarna measured AI adoption by headcount reduction and chat volume, output metrics, and missed the decline in customer satisfaction until it showed up in revenue. The companies that fared better asked whether the business domain actually improved: did customer resolution rates go up? Did supply chain disruptions go down? Did the sales team close more effectively? The metric has to be the outcome, not the activity.

There’s a sequencing problem that tripped up the rest: they tried to distribute autonomy before investing in the platform. Domain teams given responsibility without supporting infrastructure simply reinvent the same solutions independently, often poorly. The platform has to exist, not in its final form, but in a usable form, before domain teams can be productive.

The force multiplier gap

The urgency behind all of this restructuring becomes clearer when you look at what happens to organizations that get it right versus those that don’t. AI agents act as a force multiplier — they amplify what teams can accomplish per person. But the amplification is not distributed evenly. It compounds. Organizations that integrate AI deeply ship faster, learn faster, iterate faster, and capture opportunities that slower organizations cannot reach. The gap between the two widens with each cycle.

This is showing up most visibly in the startup ecosystem. A ten-person startup in 2026, with AI agents embedded in engineering, marketing, sales, and operations, can match the output of a fifty-person company from three years earlier. Software project timelines are compressing. Tasks that required dedicated employees — research, analysis, monitoring, lead qualification — can now be handled by agents at marginal cost. The unit of work that a single agent can accomplish keeps expanding, from answering a question to executing a multi-step workflow to managing a portfolio of tasks autonomously.

For large enterprises, the implication is uncomfortable. The traditional advantages of scale — headcount, coverage, accumulated process — are eroding. A startup with the right organizational architecture can move faster than a Fortune 500 company with a thousand times the resources, because the startup doesn’t have to route AI initiatives through a centralized team, wait for governance approval, or navigate a data architecture designed before anyone imagined autonomous agents. The remaining advantages of scale — brand, distribution, regulatory relationships, data moats — are real, but they are defensive. They don’t generate the velocity that the current environment rewards.

The force multiplier also has a darker edge. AI agents amplify what an organization is already good at, but they equally amplify dysfunction. Sloppy processes get executed at scale. Unclear goals produce more wasted output, faster. Organizations with poor data hygiene don’t just produce bad results — they produce bad results confidently and at volume. The multiplier is agnostic. It works on whatever it’s pointed at.

This means organizational quality — clean data, clear goals, well-defined processes, genuine domain expertise — becomes the binding constraint. Before AI agents, a mediocre process might have been tolerable because it moved slowly enough for humans to catch errors. With agents executing that process a hundred times faster, every flaw compounds. The companies that benefit most from AI force multiplication are the ones that had their organizational house in order before AI arrived. The ones that didn’t are discovering that AI is an unusually honest mirror.

When the front door is locked

There is another organizational failure mode that doesn’t involve centralized teams or governance committees. It involves workers who adopt AI faster than their organizations can sanction it.

The pattern has played out across industries. A sales rep uploads sensitive customer data into an unsanctioned online tool because it formats spreadsheets faster than the approved workflow. Someone in HR summarizes confidential exit interviews using a free-tier AI model because the official tools haven’t been approved yet. A developer pastes proprietary code into a public chatbot to debug a production issue at two in the morning because that’s what works.

None of these people are malicious. They are productive, impatient, and operating in organizations that move too slowly. The result is shadow AI: the unsanctioned, unmonitored, and often insecure use of AI tools by employees who have discovered that AI makes their work better and have no intention of waiting for permission.

The phenomenon is structurally identical to the shadow IT problem of the 2010s, when employees adopted Dropbox, Slack, and personal Gmail accounts because corporate IT couldn’t provision cloud tools fast enough. The difference is that shadow AI involves sending sensitive data — customer records, personnel files, proprietary source code — to third-party models whose data retention and training policies may be opaque. The risk is not that employees are using AI. The risk is that they’re using it without any organizational visibility or control.

The instinct to respond with bans is understandable but counterproductive. When organizations block the front door, the data leaves through back windows: personal phones on cellular networks, personal email accounts, browser-based tools that bypass corporate firewalls. Curiosity cannot be firewalled. Neither can the competitive pressure that makes AI-using teams visibly faster than AI-abstaining teams.

The organizations handling this most effectively have shifted from prohibition to what might be called radical enablement. They provide controlled sandboxes where employees can use AI tools with approved data. They invest in data triage training — teaching workers which classifications of data are safe for AI consumption rather than issuing blanket bans. They build observability into AI usage patterns, monitoring what tools people reach for rather than blocking ports. And critically, they accelerate approval cycles, reducing the time from “I want to use this tool” to “it’s approved” from months to days.

The underlying principle is simple: if the organization doesn’t provide tools that are good enough, employees will find their own, and the organization won’t like the terms of service. The choice is not between AI adoption and no AI adoption. The choice is between visible, governed AI adoption and invisible, ungoverned AI adoption.

The organizational technology trap

Carl Benedikt Frey documented how, throughout history, labor-replacing technologies were blocked when displaced workers had the political power to resist them. Guilds suppressed mechanization for centuries. The Luddites smashed looms. The pattern he calls the “technology trap,” where a society has access to transformative technology but its organizational structure prevents adoption, applies with uncomfortable precision to AI in large enterprises.

The contemporary version of the trap isn’t workers smashing machines. It’s organizational structures that prevent AI from reaching the people who could use it. A central AI team that gatekeeps all AI development. A governance process that turns every AI initiative into a six-month review cycle. A data architecture that locks domain-specific knowledge behind pipelines controlled by a team that doesn’t understand the domain.

Nobody designed these bottlenecks to block AI. They emerged from reasonable instincts: centralize the scarce expertise, control the risky technology, ensure consistency. But the result is the same as what the guilds achieved: the technology exists, and the organization can’t deploy it where it matters.

Escaping the trap requires the same thing it always has: restructuring so that the people closest to the work can use the technology. In the Industrial Revolution, that meant shifting power from guilds to factory owners who would deploy machinery. In the AI era, it means shifting AI capability from central labs to domain teams who understand the problems AI can solve.

What comes next

The organizational structure determines what kind of AI innovation is possible. A central AI lab can produce impressive demonstrations. Distributed domain teams, supported by a shared platform and governed by automated standards, can produce AI that transforms how the organization actually operates.

But structure alone isn’t enough, and we should be honest about why. The people being asked to adopt AI tools are often the same people whose roles are being reshaped by those tools. There’s a tension in the Shopify memo that we shouldn’t gloss over: asking every team to prove they can’t do the work with AI before hiring is also, implicitly, asking every employee to consider whether AI can do their job. Getting the org chart right is the easier problem. The harder one is what happens when the people inside those boxes feel the ground shifting under them, a dynamic that, as Karl Polanyi showed nearly a century ago, tends to provoke powerful counter-movements from the very institutions that are supposed to be leading the change.