The field guide
This chapter extracts every practical recommendation from the book and organizes them by what you do, in what order. It is a reference for leaders running an AI transformation, not a summary of the book’s arguments.
The transformation is roughly 70 percent cultural and organizational, 20 percent infrastructure, and 10 percent algorithms. Most organizations invert this ratio, spending their energy on model selection while the organizational bottleneck goes unaddressed.
Culture before technology. Platform before distribution. Narrow proof of value before broad rollout. Organizations that skip steps or reorder them produce expensive pilots that never scale. BCG, McKinsey, and MIT data all converge: 74 percent of companies struggle to scale AI value, 95 percent of enterprise GenAI pilots stall at proof-of-concept, and the barriers are overwhelmingly about people and process.
The playbook follows a four-phase maturity model: Diagnose, Restructure, Operate, Scale.
Phase 0: Diagnose
Before any transformation work, assess where the organization actually stands. Most organizations skip this and overshoot directly into technology pilots.
Assess the proficiency gap
Survey leadership AI usage honestly. Place the organization on a spectrum: literacy (knows AI exists and what it does in general), proficiency (uses AI tools regularly and understands their limits), fluency (redesigns workflows around AI capabilities). The gap between literacy and proficiency is where most organizations sit.
Brotman and Sack found that roughly 80 percent of executives were still on free-tier or no AI tools while their organizations debated enterprise strategy. The distance between what AI can do and what most leaders know it can do is widening with every model generation, and organizations that don’t measure the gap can’t close it.
Don’t assume the organization is further along than it is. The proficiency gap compounds: each model generation adds capabilities most leaders haven’t explored, which means the gap widens even as adoption numbers climb.
Map the jagged frontier for your domains
Identify which tasks in each function fall inside AI’s reliable capability boundary and which fall outside it. Use Mollick’s five-category framework as a starting taxonomy: tasks that should remain human, tasks delegated to AI with careful checking, tasks fully automated, centaur tasks (strategic division of labor), and cyborg tasks (deep intertwining within a single task). Revisit quarterly.
The boundary is irregular and domain-specific. BCG’s 758-consultant study found AI produced 40 percent better results on 18 of 19 tasks, then generated a confident, well-structured, completely wrong answer on the 19th. Proximity in subject matter is a poor predictor of which side a task falls on. (The Jagged Frontier)
Don’t treat the frontier as a line you map once. It moves with every model update. Workers operating on stale assumptions about what AI can and can’t do are the most dangerous configuration.
Inventory capability, not cost
For each team, answer: what can this group do today that it couldn’t do six months ago? Measure AI’s impact across quantity (volume of work), quality (accuracy, depth), efficiency (speed, cost), and the unlock — work that is newly possible within a given timeframe and resource set. The fourth dimension is what traditional ROI frameworks miss entirely.
An insurance company reviewing 40,000 contracts for EU compliance in 11 days wasn’t doing existing work faster. That review had never been economically feasible before. (The Seventy Percent; Introduction)
Don’t measure AI’s value by headcount reduction. Klarna measured adoption by support roles cut and chat volume handled, missed the satisfaction decline until it hit revenue, and had to rehire. (Organizing for AI Innovation)
Audit organizational plumbing
Inventory legacy systems agents can’t reach, data quality gaps, undocumented processes that humans navigate through tribal knowledge, and siloed data that prevents cross-functional AI applications.
Challamel’s 270 interviews at Moderna surfaced conference rooms without working plugs before anyone could discuss AI. AI is an honest mirror: it amplifies organizational quality and dysfunction equally. Agents working on broken processes produce errors confidently and at scale. Clean data, documented workflows, and clear exception-handling are prerequisites, not nice-to-haves. (The Seventy Percent; Agent Delegation Engineering)
Phase 1: Restructure
The bottleneck is organizational, not technological. Change the architecture so AI capability reaches the people closest to the work.
Start with culture, not technology
Make AI adoption normal, visible, and celebrated before deploying infrastructure. Challamel’s framework for Moderna’s transformation was three words: culture, business, technology, in that order.
BCG found that in “future-built” companies, 88 percent of managers actively role-modeled AI use; in laggards, 25 percent. The difference was not budget or tooling. It was whether leaders with organizational influence treated AI as something they personally used.
Run a discovery mechanism that surfaces use cases from the people closest to the work. Moderna held a prompt-crafting contest. IgniteTech mandated AI Mondays with consequences. JPMorgan gave LLM Suite to 200,000 employees and ran internal competitions. Khan Academy’s founder built the first prototype himself. The common thread: discovery came from practitioners, not a strategy deck.
Select champions by demonstrated competence and social credibility, not by appointment. Moderna’s Gen AI Champions Team was self-selected: people who had already built something, meeting biweekly, connecting peers laterally. These aren’t AI ambassadors in title. They’re the people colleagues already go to when they’re stuck.
Don’t commission a six-month strategy review before anyone has touched a tool. Khan’s biggest regret was not moving faster. Tishman Speyer ran a single department bootcamp, then let results spread laterally. Speed of learning matters more than comprehensiveness of plan.
Distribute AI to domain teams
Push AI ownership to business domains (marketing, operations, legal, finance) rather than building a centralized AI Center of Excellence. Conway’s Law applies: a centralized team produces centralized systems reflecting the central team’s understanding, which is inevitably a flattened version of the domains they serve.
Each domain team should own its data, define its problems, build or configure its AI agents, deploy them, and be accountable for outcomes. Team composition blends domain specialists (who understand the business problem and edge cases), AI engineers (who build retrieval pipelines and evaluation frameworks), and product thinkers (who define success metrics and iterate on feedback).
Start narrow. Pick one or two domains with the strongest need and the most willing teams. Prove value. Let success pull others in. Morgan Stanley started with wealth management research retrieval — a single well-defined domain with clean data and motivated users — and hit 98 percent adoption before expanding. (Organizing for AI Innovation)
Don’t expect a centralized AI lab to serve the whole enterprise. It becomes a bottleneck where every domain’s needs enter a backlog, get simplified, and emerge as generic solutions that miss the context that makes them useful.
Build the self-serve platform
Create a platform team whose job is making domain teams fast and autonomous. The platform owns identity, permissions, tracing, evaluation tooling, model routing, and security. Domain teams own task design, exception policies, acceptance criteria, and business outcomes.
Measure the platform by what it enables. How quickly does a domain team go from idea to deployed agent? How few specialists does it need? How much time goes to infrastructure versus the domain problem?
Don’t distribute AI to domains before the platform exists in usable form. Teams given responsibility without infrastructure reinvent solutions independently, produce fragmented architectures, and create ungovernable sprawl. The platform doesn’t need to be finished. It needs to be usable. (Organizing for AI Innovation)
Implement computational governance
Encode governance as code, tested and enforced at deployment time. When a domain team deploys an agent, the platform runs required checks automatically: does the system prompt comply with policy? Are retrieval sources approved? Are access controls configured? Has the standard bias and safety evaluation run? Domain teams retain autonomy; the platform enforces non-negotiable standards without committee review.
Don’t recentralize control through governance boards that become the new bottleneck. And don’t rely on individual diligence for compliance. Firms where each lawyer is expected to remember to check AI citations against primary databases produce sanctioned attorneys. (Organizing for AI Innovation; Agent Delegation Engineering)
Evolve the central AI team
The central AI team doesn’t dissolve. It evolves into the platform team. ML engineers who built models for every business unit build the platform domain teams use. Prompt engineers who handled every integration write templates, evaluations, and guardrails others can reuse. The expertise redistributes rather than disappearing. (Organizing for AI Innovation)
Phase 2: Operate
Make delegated machine work reliable, auditable, and safe. This is agent delegation engineering — the discipline that determines whether AI produces value or liability.
Build the delegation stack for each workflow
For every workflow you delegate to AI, address all ten components of the universal stack. Getting a single component right while neglecting the others produces fluent, unreliable output — the most dangerous combination.
The components, drawn from converging guidance across NIST, OpenAI, Anthropic, Microsoft, and AWS:
| # | Component | What it covers |
|---|---|---|
| 1 | Task framing | Explicit objectives, acceptance criteria, risk tier. Every implicit expectation made explicit. |
| 2 | Context packaging | What the agent needs (client position, jurisdiction rules, firm standards) within token budget constraints. |
| 3 | Retrieval and grounding | Connection to verified, authoritative sources. Prevents citation hallucination. |
| 4 | Tool access | Which systems the agent can query, write to, or invoke. Bounded. |
| 5 | Constraints and permissions | What the agent may read, decide, and change, and why the boundaries sit where they do. |
| 6 | Evaluation and verification | Rubrics checking every output against primary evidence before it reaches a human. |
| 7 | Escalation rules | When the agent must stop and route to a human. Novel patterns, low-confidence flags, policy ambiguity. |
| 8 | Audit trails | Every decision traceable months later when a client, regulator, or court asks what happened. |
| 9 | Cost controls | Model routing by task complexity, compute costs visible to budget holders. |
| 10 | Component relationships | Context determines what the model sees; policy determines what it may do; verification determines if output may proceed; audit determines if anyone can reconstruct reasoning afterward. |
(Agent Delegation Engineering)
Externalize tacit institutional knowledge
Create and maintain six engineered elements for each delegated workflow:
| Element | Purpose |
|---|---|
| Canonical source lists | What the agent treats as authoritative |
| Exception taxonomies | Cataloged edge cases the domain has encountered |
| Worked examples | What good output looks like in this specific context |
| Reviewer checklists | Structured verification steps for the human in the loop |
| Escalation thresholds | Conditions under which the agent must stop and ask |
| Retention rules | Traces and corrections stored so the organization learns from mistakes |
The insurance claims processor who knows Southeast flood claims require different evidence thresholds because of a regulatory change two years ago learned that from a colleague who learned it from an incident. If that stays tacit, the agent inherits ambiguity and generates something that sounds institutional. Making implicit knowledge explicit is the tacit-to-explicit conversion (Nonaka) that makes delegation possible. (Agent Delegation Engineering)
Design for the falling-asleep problem
Build deliberate friction into AI review workflows. Dell’Acqua’s recruiter study demonstrated that high-quality AI made humans less careful, not more. When AI is very good, the moments when it’s wrong are exactly when human judgment is most needed and least likely to show up.
What helps: mandatory disagreement steps where the reviewer must articulate what could be wrong before approving. Periodic manual work to keep independent judgment sharp. Review protocols that require checking against primary sources, not just surface plausibility. Structured rotation between AI-assisted and unassisted work. Watch especially for “administratively plausible” output — text that passes casual review but contains defective reasoning underneath. (The Jagged Frontier; Agent Delegation Engineering)
Protect the apprenticeship layer
Build structured learning paths for junior talent who can no longer learn by doing the routine work agents now handle. Use the progressive autonomy ladder: reviewers learn structured verification. Operators run bounded workflows with explicit escalation rules. Designers author the playbooks and acceptance criteria. Governors set policy and risk appetite across workflows.
Junior lawyers built training through document review. Junior developers learned through CRUD features and straightforward bugs. Analysts through data preparation and formatting. That apprenticeship layer is being automated before organizations have built alternatives. (Agent Delegation Engineering)
Don’t let “learning by doing” become “shipping without understanding.” Stanford data shows early-career workers in AI-exposed occupations experiencing meaningful employment decline while experienced workers remain stable. Future capacity erodes beneath high near-term output. (When Code Gets Cheap)
Address shadow AI through radical enablement
Provide controlled sandboxes with approved data rather than banning AI tools. Banning blocks the front door; data leaves through back windows. Personal phones, personal email, browser tools that bypass firewalls.
Data triage training so employees know which data classifications are safe for AI use. Observability into AI usage patterns (monitor what tools people reach for, don’t block them). Accelerated approval cycles, from months to days. Frame AI policy as a permission structure, not a restriction. (Organizing for AI Innovation; The Seventy Percent)
Design attention for the work AI demands
Accept fragmentation for routine work. Protect deep focus for judgment work with structural defenses. AI removes routine work and concentrates what remains into cognitively dense judgment tasks: verification, evaluation, exception handling, architectural decisions.
Gloria Mark’s data shows average screen-focus time fell from 2.5 minutes in 2004 to 47 seconds by 2020. Microsoft measures a notification ping every two minutes during core hours. The response is structured alternation: no-meeting mornings, notification blackouts, async-first communication. Shopify’s calendar purge reclaimed 322,000 meeting hours and produced 18 percent more shipped projects. Basecamp’s six-week build cycles followed by two-week cooldowns have sustained a small team’s outsized output for twenty years.
Don’t force all work into one attention mode. Audit which roles are primarily depth work and which are primarily coordination. A senior engineer spending 80 percent of their time in meetings is a misallocation. A project manager expected to produce four hours of deep analysis between twelve meetings is set up to fail. (The Attention Problem)
Phase 3: Scale and sustain
AI transformation has no destination. The frontier moves with every model update. McKinsey data suggests the length of tasks AI can handle reliably doubles every four to seven months. A pilot approved in January may be obsolete by June.
Build continuous learning into the operating rhythm
Establish a weekly cadence — demos, new workflows, shared discoveries — that keeps the organization’s understanding of AI current. The cadence surfaces use cases before anyone writes a strategy document about them and prevents the proficiency gap from reopening. (The Seventy Percent)
Measure what matters
Use the four-layer evaluation scorecard for every delegated workflow:
| Layer | Measures |
|---|---|
| Outcome | Accuracy, groundedness, rework rate |
| Process | Tool-call correctness, source freshness, policy adherence, escalation rate |
| Operational | Latency, cost per task, failure hotspots |
| Governance | Audit completeness, permission violations, incident rates |
No single layer tells whether a workflow is under control. Accurate results plus violated data access equals a compliance incident. Follows every policy plus costs exceeding value equals an economic failure.
At the organizational level: capability inventory, not cost ratio. McKinsey found that workflow redesign is the strongest single predictor of EBIT impact across 25 attributes examined. Stronger than model selection. Stronger than data quality. (Agent Delegation Engineering; The Seventy Percent)
Evaluate like you deploy
Test AI workflows under conditions that match real deployment, not curated benchmarks. Separate outcome evaluation (did the agent get the right answer?) from process evaluation (did it use the right sources and follow policy?). A correct answer via the wrong process is a latent failure.
Store test specs separately from the codebase so AI can’t teach to the test. Use digital twin environments for safe integration testing. Anthropic’s Petri research found that models recognizing evaluation context can overstate safety performance. (Agent Delegation Engineering; When Code Gets Cheap)
Run tight feedback loops
Every delegated workflow should close the loop: intake, framing, retrieval, execution, verification. Pass: deliver. Fail: escalate to a human reviewer. Reviewer records corrections. Corrections feed back into playbooks, evaluation suites, and agent memory.
Organizations that run this loop tightly compound their delegation capability. Organizations that skip feedback repeat mistakes at machine speed. (Agent Delegation Engineering)
Choose enabling over replacing
Define success as new capability, not headcount reduction. Organizations that frame AI as a replacement technology build what Acemoglu calls “so-so” deployments: marginally cheaper than people, no real productivity gain, no new capability.
Moderna asked “what would make you say you’ve succeeded beyond your wildest expectations?” The answer led to 180 high-value use cases, not 180 cost-cutting measures. Organizations that define success as headcount reduction tend to build so-so deployments. Organizations that define success as new capability tend to build enabling ones. The question you ask at the beginning shapes the technology you build at the end. (The Seventy Percent; When the World Remade Work)
Anti-patterns
Don’t centralize all AI in one team. Conway’s Law guarantees the resulting systems reflect the central team’s understanding, not the domains’.
Don’t measure by headcount reduction. Klarna measured chat volume and support roles cut, missed satisfaction decline until revenue was affected.
Don’t start with technology selection. The 10-20-70 ratio means model choice accounts for a tenth of the outcome.
Don’t commission a strategy before anyone has used a tool. Speed of learning beats comprehensiveness of plan.
Don’t distribute AI before the platform exists. Teams without shared infrastructure reinvent solutions poorly and create ungovernable sprawl.
Don’t rely on individual diligence for compliance. Two federal circuits sanctioned lawyers for AI-fabricated citations. Process must be systemic.
Don’t trust humans to stay vigilant through willpower. High-quality AI degrades human checking. Design friction in.
Don’t assume proximity in subject predicts AI capability. The frontier zigs unpredictably within domains.
Don’t skip the feedback loop. Without corrections flowing back into playbooks and evals, the organization repeats mistakes at machine speed.
Don’t automate the apprenticeship layer without replacing it. Routine work was training, not just labor. Remove it without a structured alternative and the senior pipeline hollows.
Don’t ban AI tools. Data leaves through back windows. Enable with controls instead.
Don’t use generic benchmarks for evaluation. Benchmark inflation, construct validity problems, and curation artifacts make them unreliable. Test under deployment conditions.
Don’t assume AI fixes broken processes. It amplifies them. Clean data, documented workflows, and clear exception-handling are prerequisites.
Don’t design open offices expecting collaboration. Harvard found face-to-face interaction dropped 72 percent after removing walls. People put on headphones.
Don’t force all work into one attention mode. Judgment work requires sustained depth. Routine work tolerates fragmentation. Staff and schedule accordingly.
Case study index
| Case study | Phase | Lesson | Chapter |
|---|---|---|---|
| Moderna (Challamel) | 0, 1 | Culture-first sequencing; 270 interviews before any AI work | The Seventy Percent |
| Klarna | 0 | Centralized AI can’t handle domain complexity; had to rehire | Organizing for AI Innovation |
| Morgan Stanley | 1 | Start narrow in one well-defined domain; 98% adoption | Organizing for AI Innovation |
| Walmart | 1 | Four domain-specific super agents owned by domain teams | Organizing for AI Innovation |
| Shopify | 1, 2 | Mandatory AI use; calendar purge reclaimed 322K hours | The Seventy Percent; The Attention Problem |
| Khan Academy | 1 | Founder built prototype himself before asking others to adopt | The Seventy Percent |
| JPMorgan (LLM Suite) | 1 | 200K employees, internal competitions drove viral adoption | The Seventy Percent |
| IgniteTech | 1 | Mandatory AI Mondays with consequences for non-participation | The Seventy Percent |
| Tishman Speyer | 1 | Department bootcamp, then lateral spread across firm | The Seventy Percent |
| BCG 758-consultant study | 0, 2 | 40% better on 18 tasks, worse than no-AI on the 19th | The Jagged Frontier |
| Dell’Acqua recruiters | 2 | High-quality AI degraded human judgment; friction kept people sharp | The Jagged Frontier |
| Air Canada chatbot | 2 | Missing retrieval and escalation; tribunal held airline liable | Agent Delegation Engineering |
| Third/Seventh Circuit | 2 | Fabricated citations; no verification process existed | Agent Delegation Engineering |
| Agarwal radiologists | 2 | Overrode correct AI predictions; collaboration requires training | The Jagged Frontier |
| Basecamp (Shape Up) | 2 | 6-week build cycles protect deep focus; 20+ years of evidence | The Attention Problem |
| Insurance contract review | 0 | 40K contracts in 11 days; work that was never economically feasible | Introduction |