The field guide

17 min read

This chapter extracts every practical recommendation from the book and organizes them by what you do, in what order. It is a reference for leaders running an AI transformation, not a summary of the book’s arguments.

The ratio that determines everything

The transformation is roughly 70 percent cultural and organizational, 20 percent infrastructure, and 10 percent algorithms. Most organizations invert this ratio, spending their energy on model selection while the organizational bottleneck goes unaddressed.

Sequencing determines outcome

Culture before technology. Platform before distribution. Narrow proof of value before broad rollout. Organizations that skip steps or reorder them produce expensive pilots that never scale. BCG, McKinsey, and MIT data all converge: 74 percent of companies struggle to scale AI value, 95 percent of enterprise GenAI pilots stall at proof-of-concept, and the barriers are overwhelmingly about people and process.

The playbook follows a four-phase maturity model: Diagnose, Restructure, Operate, Scale.

Phase 0: Diagnose

Before any transformation work, assess where the organization actually stands. Most organizations skip this and overshoot directly into technology pilots.

Assess the proficiency gap

Action

Survey leadership AI usage honestly. Place the organization on a spectrum: literacy (knows AI exists and what it does in general), proficiency (uses AI tools regularly and understands their limits), fluency (redesigns workflows around AI capabilities). The gap between literacy and proficiency is where most organizations sit.

Brotman and Sack found that roughly 80 percent of executives were still on free-tier or no AI tools while their organizations debated enterprise strategy. The distance between what AI can do and what most leaders know it can do is widening with every model generation, and organizations that don’t measure the gap can’t close it.

Avoid

Don’t assume the organization is further along than it is. The proficiency gap compounds: each model generation adds capabilities most leaders haven’t explored, which means the gap widens even as adoption numbers climb.

Map the jagged frontier for your domains

Action

Identify which tasks in each function fall inside AI’s reliable capability boundary and which fall outside it. Use Mollick’s five-category framework as a starting taxonomy: tasks that should remain human, tasks delegated to AI with careful checking, tasks fully automated, centaur tasks (strategic division of labor), and cyborg tasks (deep intertwining within a single task). Revisit quarterly.

The boundary is irregular and domain-specific. BCG’s 758-consultant study found AI produced 40 percent better results on 18 of 19 tasks, then generated a confident, well-structured, completely wrong answer on the 19th. Proximity in subject matter is a poor predictor of which side a task falls on. (The Jagged Frontier)

Avoid

Don’t treat the frontier as a line you map once. It moves with every model update. Workers operating on stale assumptions about what AI can and can’t do are the most dangerous configuration.

Inventory capability, not cost

Action

For each team, answer: what can this group do today that it couldn’t do six months ago? Measure AI’s impact across quantity (volume of work), quality (accuracy, depth), efficiency (speed, cost), and the unlock — work that is newly possible within a given timeframe and resource set. The fourth dimension is what traditional ROI frameworks miss entirely.

An insurance company reviewing 40,000 contracts for EU compliance in 11 days wasn’t doing existing work faster. That review had never been economically feasible before. (The Seventy Percent; Introduction)

Avoid

Don’t measure AI’s value by headcount reduction. Klarna measured adoption by support roles cut and chat volume handled, missed the satisfaction decline until it hit revenue, and had to rehire. (Organizing for AI Innovation)

Audit organizational plumbing

Action

Inventory legacy systems agents can’t reach, data quality gaps, undocumented processes that humans navigate through tribal knowledge, and siloed data that prevents cross-functional AI applications.

Challamel’s 270 interviews at Moderna surfaced conference rooms without working plugs before anyone could discuss AI. AI is an honest mirror: it amplifies organizational quality and dysfunction equally. Agents working on broken processes produce errors confidently and at scale. Clean data, documented workflows, and clear exception-handling are prerequisites, not nice-to-haves. (The Seventy Percent; Agent Delegation Engineering)

Phase 1: Restructure

The bottleneck is organizational, not technological. Change the architecture so AI capability reaches the people closest to the work.

Start with culture, not technology

Action

Make AI adoption normal, visible, and celebrated before deploying infrastructure. Challamel’s framework for Moderna’s transformation was three words: culture, business, technology, in that order.

BCG found that in “future-built” companies, 88 percent of managers actively role-modeled AI use; in laggards, 25 percent. The difference was not budget or tooling. It was whether leaders with organizational influence treated AI as something they personally used.

Action

Run a discovery mechanism that surfaces use cases from the people closest to the work. Moderna held a prompt-crafting contest. IgniteTech mandated AI Mondays with consequences. JPMorgan gave LLM Suite to 200,000 employees and ran internal competitions. Khan Academy’s founder built the first prototype himself. The common thread: discovery came from practitioners, not a strategy deck.

Action

Select champions by demonstrated competence and social credibility, not by appointment. Moderna’s Gen AI Champions Team was self-selected: people who had already built something, meeting biweekly, connecting peers laterally. These aren’t AI ambassadors in title. They’re the people colleagues already go to when they’re stuck.

Avoid

Don’t commission a six-month strategy review before anyone has touched a tool. Khan’s biggest regret was not moving faster. Tishman Speyer ran a single department bootcamp, then let results spread laterally. Speed of learning matters more than comprehensiveness of plan.

Distribute AI to domain teams

Action

Push AI ownership to business domains (marketing, operations, legal, finance) rather than building a centralized AI Center of Excellence. Conway’s Law applies: a centralized team produces centralized systems reflecting the central team’s understanding, which is inevitably a flattened version of the domains they serve.

Each domain team should own its data, define its problems, build or configure its AI agents, deploy them, and be accountable for outcomes. Team composition blends domain specialists (who understand the business problem and edge cases), AI engineers (who build retrieval pipelines and evaluation frameworks), and product thinkers (who define success metrics and iterate on feedback).

Action

Start narrow. Pick one or two domains with the strongest need and the most willing teams. Prove value. Let success pull others in. Morgan Stanley started with wealth management research retrieval — a single well-defined domain with clean data and motivated users — and hit 98 percent adoption before expanding. (Organizing for AI Innovation)

Avoid

Don’t expect a centralized AI lab to serve the whole enterprise. It becomes a bottleneck where every domain’s needs enter a backlog, get simplified, and emerge as generic solutions that miss the context that makes them useful.

Build the self-serve platform

Action

Create a platform team whose job is making domain teams fast and autonomous. The platform owns identity, permissions, tracing, evaluation tooling, model routing, and security. Domain teams own task design, exception policies, acceptance criteria, and business outcomes.

Measure the platform by what it enables. How quickly does a domain team go from idea to deployed agent? How few specialists does it need? How much time goes to infrastructure versus the domain problem?

Avoid

Don’t distribute AI to domains before the platform exists in usable form. Teams given responsibility without infrastructure reinvent solutions independently, produce fragmented architectures, and create ungovernable sprawl. The platform doesn’t need to be finished. It needs to be usable. (Organizing for AI Innovation)

Implement computational governance

Action

Encode governance as code, tested and enforced at deployment time. When a domain team deploys an agent, the platform runs required checks automatically: does the system prompt comply with policy? Are retrieval sources approved? Are access controls configured? Has the standard bias and safety evaluation run? Domain teams retain autonomy; the platform enforces non-negotiable standards without committee review.

Avoid

Don’t recentralize control through governance boards that become the new bottleneck. And don’t rely on individual diligence for compliance. Firms where each lawyer is expected to remember to check AI citations against primary databases produce sanctioned attorneys. (Organizing for AI Innovation; Agent Delegation Engineering)

Evolve the central AI team

Action

The central AI team doesn’t dissolve. It evolves into the platform team. ML engineers who built models for every business unit build the platform domain teams use. Prompt engineers who handled every integration write templates, evaluations, and guardrails others can reuse. The expertise redistributes rather than disappearing. (Organizing for AI Innovation)

Phase 2: Operate

Make delegated machine work reliable, auditable, and safe. This is agent delegation engineering — the discipline that determines whether AI produces value or liability.

Build the delegation stack for each workflow

Action

For every workflow you delegate to AI, address all ten components of the universal stack. Getting a single component right while neglecting the others produces fluent, unreliable output — the most dangerous combination.

The components, drawn from converging guidance across NIST, OpenAI, Anthropic, Microsoft, and AWS:

#	Component	What it covers
1	Task framing	Explicit objectives, acceptance criteria, risk tier. Every implicit expectation made explicit.
2	Context packaging	What the agent needs (client position, jurisdiction rules, firm standards) within token budget constraints.
3	Retrieval and grounding	Connection to verified, authoritative sources. Prevents citation hallucination.
4	Tool access	Which systems the agent can query, write to, or invoke. Bounded.
5	Constraints and permissions	What the agent may read, decide, and change, and why the boundaries sit where they do.
6	Evaluation and verification	Rubrics checking every output against primary evidence before it reaches a human.
7	Escalation rules	When the agent must stop and route to a human. Novel patterns, low-confidence flags, policy ambiguity.
8	Audit trails	Every decision traceable months later when a client, regulator, or court asks what happened.
9	Cost controls	Model routing by task complexity, compute costs visible to budget holders.
10	Component relationships	Context determines what the model sees; policy determines what it may do; verification determines if output may proceed; audit determines if anyone can reconstruct reasoning afterward.

(Agent Delegation Engineering)

Externalize tacit institutional knowledge

Action

Create and maintain six engineered elements for each delegated workflow:

Element	Purpose
Canonical source lists	What the agent treats as authoritative
Exception taxonomies	Cataloged edge cases the domain has encountered
Worked examples	What good output looks like in this specific context
Reviewer checklists	Structured verification steps for the human in the loop
Escalation thresholds	Conditions under which the agent must stop and ask
Retention rules	Traces and corrections stored so the organization learns from mistakes

The insurance claims processor who knows Southeast flood claims require different evidence thresholds because of a regulatory change two years ago learned that from a colleague who learned it from an incident. If that stays tacit, the agent inherits ambiguity and generates something that sounds institutional. Making implicit knowledge explicit is the tacit-to-explicit conversion (Nonaka) that makes delegation possible. (Agent Delegation Engineering)

Design for the falling-asleep problem

Action

Build deliberate friction into AI review workflows. Dell’Acqua’s recruiter study demonstrated that high-quality AI made humans less careful, not more. When AI is very good, the moments when it’s wrong are exactly when human judgment is most needed and least likely to show up.

What helps: mandatory disagreement steps where the reviewer must articulate what could be wrong before approving. Periodic manual work to keep independent judgment sharp. Review protocols that require checking against primary sources, not just surface plausibility. Structured rotation between AI-assisted and unassisted work. Watch especially for “administratively plausible” output — text that passes casual review but contains defective reasoning underneath. (The Jagged Frontier; Agent Delegation Engineering)

Protect the apprenticeship layer

Action

Build structured learning paths for junior talent who can no longer learn by doing the routine work agents now handle. Use the progressive autonomy ladder: reviewers learn structured verification. Operators run bounded workflows with explicit escalation rules. Designers author the playbooks and acceptance criteria. Governors set policy and risk appetite across workflows.

Junior lawyers built training through document review. Junior developers learned through CRUD features and straightforward bugs. Analysts through data preparation and formatting. That apprenticeship layer is being automated before organizations have built alternatives. (Agent Delegation Engineering)

Avoid

Don’t let “learning by doing” become “shipping without understanding.” Stanford data shows early-career workers in AI-exposed occupations experiencing meaningful employment decline while experienced workers remain stable. Future capacity erodes beneath high near-term output. (When Code Gets Cheap)

Address shadow AI through radical enablement

Action

Provide controlled sandboxes with approved data rather than banning AI tools. Banning blocks the front door; data leaves through back windows. Personal phones, personal email, browser tools that bypass firewalls.

Data triage training so employees know which data classifications are safe for AI use. Observability into AI usage patterns (monitor what tools people reach for, don’t block them). Accelerated approval cycles, from months to days. Frame AI policy as a permission structure, not a restriction. (Organizing for AI Innovation; The Seventy Percent)

Design attention for the work AI demands

Action

Accept fragmentation for routine work. Protect deep focus for judgment work with structural defenses. AI removes routine work and concentrates what remains into cognitively dense judgment tasks: verification, evaluation, exception handling, architectural decisions.

Gloria Mark’s data shows average screen-focus time fell from 2.5 minutes in 2004 to 47 seconds by 2020. Microsoft measures a notification ping every two minutes during core hours. The response is structured alternation: no-meeting mornings, notification blackouts, async-first communication. Shopify’s calendar purge reclaimed 322,000 meeting hours and produced 18 percent more shipped projects. Basecamp’s six-week build cycles followed by two-week cooldowns have sustained a small team’s outsized output for twenty years.

Avoid

Don’t force all work into one attention mode. Audit which roles are primarily depth work and which are primarily coordination. A senior engineer spending 80 percent of their time in meetings is a misallocation. A project manager expected to produce four hours of deep analysis between twelve meetings is set up to fail. (The Attention Problem)

Phase 3: Scale and sustain

AI transformation has no destination. The frontier moves with every model update. McKinsey data suggests the length of tasks AI can handle reliably doubles every four to seven months. A pilot approved in January may be obsolete by June.

Build continuous learning into the operating rhythm

Action

Establish a weekly cadence — demos, new workflows, shared discoveries — that keeps the organization’s understanding of AI current. The cadence surfaces use cases before anyone writes a strategy document about them and prevents the proficiency gap from reopening. (The Seventy Percent)

Measure what matters

Action

Use the four-layer evaluation scorecard for every delegated workflow:

Layer	Measures
Outcome	Accuracy, groundedness, rework rate
Process	Tool-call correctness, source freshness, policy adherence, escalation rate
Operational	Latency, cost per task, failure hotspots
Governance	Audit completeness, permission violations, incident rates

No single layer tells whether a workflow is under control. Accurate results plus violated data access equals a compliance incident. Follows every policy plus costs exceeding value equals an economic failure.

At the organizational level: capability inventory, not cost ratio. McKinsey found that workflow redesign is the strongest single predictor of EBIT impact across 25 attributes examined. Stronger than model selection. Stronger than data quality. (Agent Delegation Engineering; The Seventy Percent)

Evaluate like you deploy

Action

Test AI workflows under conditions that match real deployment, not curated benchmarks. Separate outcome evaluation (did the agent get the right answer?) from process evaluation (did it use the right sources and follow policy?). A correct answer via the wrong process is a latent failure.

Store test specs separately from the codebase so AI can’t teach to the test. Use digital twin environments for safe integration testing. Anthropic’s Petri research found that models recognizing evaluation context can overstate safety performance. (Agent Delegation Engineering; When Code Gets Cheap)

Run tight feedback loops

Action

Every delegated workflow should close the loop: intake, framing, retrieval, execution, verification. Pass: deliver. Fail: escalate to a human reviewer. Reviewer records corrections. Corrections feed back into playbooks, evaluation suites, and agent memory.

Organizations that run this loop tightly compound their delegation capability. Organizations that skip feedback repeat mistakes at machine speed. (Agent Delegation Engineering)

Choose enabling over replacing

Action

Define success as new capability, not headcount reduction. Organizations that frame AI as a replacement technology build what Acemoglu calls “so-so” deployments: marginally cheaper than people, no real productivity gain, no new capability.

Moderna asked “what would make you say you’ve succeeded beyond your wildest expectations?” The answer led to 180 high-value use cases, not 180 cost-cutting measures. Organizations that define success as headcount reduction tend to build so-so deployments. Organizations that define success as new capability tend to build enabling ones. The question you ask at the beginning shapes the technology you build at the end. (The Seventy Percent; When the World Remade Work)

Anti-patterns

Things that have burned at least one organization documented in this book

Don’t centralize all AI in one team. Conway’s Law guarantees the resulting systems reflect the central team’s understanding, not the domains’.

Don’t measure by headcount reduction. Klarna measured chat volume and support roles cut, missed satisfaction decline until revenue was affected.

Don’t start with technology selection. The 10-20-70 ratio means model choice accounts for a tenth of the outcome.

Don’t commission a strategy before anyone has used a tool. Speed of learning beats comprehensiveness of plan.

Don’t distribute AI before the platform exists. Teams without shared infrastructure reinvent solutions poorly and create ungovernable sprawl.

Don’t rely on individual diligence for compliance. Two federal circuits sanctioned lawyers for AI-fabricated citations. Process must be systemic.

Don’t trust humans to stay vigilant through willpower. High-quality AI degrades human checking. Design friction in.

Don’t assume proximity in subject predicts AI capability. The frontier zigs unpredictably within domains.

Don’t skip the feedback loop. Without corrections flowing back into playbooks and evals, the organization repeats mistakes at machine speed.

Don’t automate the apprenticeship layer without replacing it. Routine work was training, not just labor. Remove it without a structured alternative and the senior pipeline hollows.

Don’t ban AI tools. Data leaves through back windows. Enable with controls instead.

Don’t use generic benchmarks for evaluation. Benchmark inflation, construct validity problems, and curation artifacts make them unreliable. Test under deployment conditions.

Don’t assume AI fixes broken processes. It amplifies them. Clean data, documented workflows, and clear exception-handling are prerequisites.

Don’t design open offices expecting collaboration. Harvard found face-to-face interaction dropped 72 percent after removing walls. People put on headphones.

Don’t force all work into one attention mode. Judgment work requires sustained depth. Routine work tolerates fragmentation. Staff and schedule accordingly.

Case study index

Case study	Phase	Lesson	Chapter
Moderna (Challamel)	0, 1	Culture-first sequencing; 270 interviews before any AI work	The Seventy Percent
Klarna	0	Centralized AI can’t handle domain complexity; had to rehire	Organizing for AI Innovation
Morgan Stanley	1	Start narrow in one well-defined domain; 98% adoption	Organizing for AI Innovation
Walmart	1	Four domain-specific super agents owned by domain teams	Organizing for AI Innovation
Shopify	1, 2	Mandatory AI use; calendar purge reclaimed 322K hours	The Seventy Percent; The Attention Problem
Khan Academy	1	Founder built prototype himself before asking others to adopt	The Seventy Percent
JPMorgan (LLM Suite)	1	200K employees, internal competitions drove viral adoption	The Seventy Percent
IgniteTech	1	Mandatory AI Mondays with consequences for non-participation	The Seventy Percent
Tishman Speyer	1	Department bootcamp, then lateral spread across firm	The Seventy Percent
BCG 758-consultant study	0, 2	40% better on 18 tasks, worse than no-AI on the 19th	The Jagged Frontier
Dell’Acqua recruiters	2	High-quality AI degraded human judgment; friction kept people sharp	The Jagged Frontier
Air Canada chatbot	2	Missing retrieval and escalation; tribunal held airline liable	Agent Delegation Engineering
Third/Seventh Circuit	2	Fabricated citations; no verification process existed	Agent Delegation Engineering
Agarwal radiologists	2	Overrode correct AI predictions; collaboration requires training	The Jagged Frontier
Basecamp (Shape Up)	2	6-week build cycles protect deep focus; 20+ years of evidence	The Attention Problem
Insurance contract review	0	40K contracts in 11 days; work that was never economically feasible	Introduction