The Jagged Frontier
She was one of the best consultants in the room. In a randomized experiment at Boston Consulting Group, 758 consultants were split into two groups: one working with GPT-4, one without. They performed eighteen tasks designed by BCG to resemble the real thing — creative ideation, data analysis, persuasive writing, strategic recommendation. Across 118 separate analyses, the AI-powered consultants were faster, more creative, better written, and more analytically rigorous than their unaided peers. The gap was not subtle. It was roughly forty percent.
Then came the nineteenth task.
BCG had designed one problem to sit outside what AI could reliably handle — a question combining a tricky statistical issue with deliberately misleading data. Without AI, consultants got it right eighty-four percent of the time. With AI, that number dropped to somewhere between sixty and seventy percent. The AI had produced a confident, well-structured, completely wrong answer. And the consultants, who had just spent hours watching it outperform them on everything else, trusted it.
That gap, between the eighteen tasks where AI is spectacular and the nineteenth where it’s catastrophically wrong, is what the researchers call the jagged frontier. It’s the defining challenge of working with AI right now. It’s also temporary. Every transformative technology had a version of this boundary, a zone where it worked brilliantly and a zone where it failed in ways nobody anticipated. We learned to navigate each one. The interesting question is how fast we’ll learn to navigate this one.
The shape of the boundary
We’re used to technologies that degrade gracefully. A car with a flat tire is slower but still drivable. A dull knife still cuts. AI doesn’t work like this. Inside the boundary of its capability, it produces work that matches or exceeds skilled professionals. Outside that boundary, it fails with absolute confidence. Nothing in the output flickers or hesitates. It looks exactly the same whether it’s right or wrong.
Ethan Mollick, who helped design the BCG study with researchers from Harvard, MIT, and Warwick Business School, documented this pattern in detail. The consultants weren’t given toy problems. These were realistic consulting tasks — the kind BCG charges millions of dollars to perform. And the results held no matter how you measured them. Human graders, AI graders, different skill levels of consultant: the forty percent advantage was consistent across every analysis.
What makes the frontier jagged, rather than simply limited, is that the boundary doesn’t follow intuitive lines. A model might write excellent marketing copy but stumble on a closely related persuasion task. It might analyze financial data with precision, then misinterpret the same data when the format changes slightly. It can generate working code for a complex feature and then fail on an edge case that any junior developer would catch. It can summarize a sixty-page legal brief with accuracy and then hallucinate a case citation that doesn’t exist. Closeness in subject matter tells you almost nothing about whether a specific task falls inside or outside the boundary. The frontier zigs and zags unpredictably across every kind of cognitive work.
The irregularity is what makes the frontier a genuine learning challenge. A tool with consistent limitations is easy to work with. You learn what it can’t do and route around the gaps. A tool that switches between expert-level performance and confident fabrication, with no signal distinguishing the two, requires a fundamentally different kind of vigilance. You can’t learn a simple rule like “AI is good at writing but bad at math.” It’s good at some math and bad at other math. Good at some writing and bad at other writing. The map has to be redrawn for every specific task, and it changes every time the model updates.
Mollick’s separate analysis of 1,016 professions reinforces how broad AI’s reach already is. Almost every profession overlaps with AI capabilities to some degree. Only thirty-six job categories had zero overlap (dancers, athletes, pile driver operators, roofers), all requiring embodied physical movement. The surprise is where overlap is heaviest: the most highly compensated, most creative, and most educated jobs have the highest AI overlap. This is the reverse of every previous automation wave, which started with repetitive, dangerous, low-paid work and moved upward. AI started at the top. The automation playbook that worked for predicting the last fifty years of labor market change doesn’t apply here.
The implication is uncomfortable, and it catches most people off guard. The workers whose jobs overlap most with AI aren’t factory workers or data-entry clerks. They’re consultants, lawyers, analysts, engineers, writers, designers, researchers. The people who built their careers on cognitive skill are the ones most exposed to an uneven technology that’s brilliant on some of their tasks and dangerously wrong on others. And the line between the two shifts every few months as models improve.
Falling asleep at the wheel
The BCG result raises an obvious question: why did the consultants trust the wrong answer? These are smart, experienced professionals. They know that analytical tools can be wrong. They know that checking your work matters. They checked their work on the other eighteen tasks. Why not the nineteenth?
Fabrizio Dell’Acqua, one of the BCG study’s lead researchers, ran a separate experiment to answer exactly this question. He gave 181 professional recruiters a screening task: evaluate forty-four job applications based on mathematical ability. The math scores came from an international test, and they were not obvious from the resumes — this was genuinely a judgment call, the kind of task that requires careful evaluation, not pattern-matching.
Dell’Acqua split the recruiters into three groups. One received high quality AI assistance. One received deliberately low quality AI. One received no AI at all. The high quality AI group should have done best. It had the most powerful tool.
They did worst.
Recruiters with high quality AI spent less time on each resume, followed the AI’s recommendations without scrutiny, and showed no improvement over the course of the study. The AI was good enough, often enough, that they stopped thinking. Recruiters with low quality AI had a different experience. The AI’s mistakes were frequent enough to keep them alert. They questioned recommendations, applied their own judgment, and actually got better at the task as it progressed. The friction made them sharper.
Dell’Acqua calls this “falling asleep at the wheel.” When the AI is very good, humans disengage. They stop applying their own judgment. And this is the failure mode that matters, because the moments when AI is wrong are the moments when human judgment is most needed and least likely to show up.
In a typical workday, an analyst reviews an AI-generated market report. The first three sections are excellent, better than she would have written. She catches a minor formatting issue in the fourth section and fixes it. By the fifth section, she’s skimming. By the seventh, she’s barely reading. She sends it to her manager, who knows it was AI-assisted and also skims it. The report contains a subtle error in section six — a misinterpretation of a data trend that reverses the conclusion for one product line. Neither the analyst nor the manager catches it, because the sections surrounding it were so good that neither was reading carefully enough when they hit the wrong one. This isn’t a hypothetical. It’s the dynamic Dell’Acqua documented in controlled settings, and it plays out in offices every day.
A 2025 study by Microsoft and Carnegie Mellon found the same pattern with 319 knowledge workers: the more confident people were in AI’s ability to complete a task, the more completely they disengaged their own thinking. A separate systematic review in Smart Learning Environments documented how sustained reliance on AI dialogue systems progressively impairs critical thinking and analytical reasoning. The mechanism isn’t mysterious. Skills atrophy when you stop using them. If you delegate all your analysis to AI for six months, your capacity for independent analysis declines. The AI isn’t the problem. You stopped practicing.
The previous chapter discussed attention fragmentation — the challenge of sustained focus in environments that constantly interrupt it. Falling asleep at the wheel is a different kind of cognitive threat. Fragmentation scatters your attention across too many demands. Disengagement switches it off entirely because the AI seems to have things handled. One is a problem of overload; the other is a problem of underload. But they converge on the same outcome: a human who isn’t doing the careful, critical evaluation that the situation demands. And in most workplaces, both operate simultaneously. A knowledge worker checking AI output is also fielding Slack messages, scanning emails, and context-switching between projects. She’s fragmented and prone to disengagement, a combination that makes the nineteenth-task error harder to catch. But Dell’Acqua’s own data contains the solution. The low quality AI group improved precisely because the friction kept them thinking. Deliberate friction — review protocols, mandatory disagreement steps, periodic manual work — counteracts disengagement. This is a design problem, and it has design solutions.
The equalizer and its risks
The BCG study’s most important finding was not the nineteenth-task failure. It was the equalizer effect. The lowest-performing consultants gained the most from AI assistance. The gap between bottom-half and top-half performers compressed significantly. The pattern was consistent across the study’s 118 analyses.
This wasn’t unique to BCG. At Stanford and MIT, Brynjolfsson, Li, and Raymond ran what may be the largest study of AI-augmented work to date: 5,179 customer service agents at a large firm, tracked over months. The AI tool monitored conversations in real time and suggested responses based on what had worked for the company’s best agents. It distilled the tacit knowledge of top performers into suggestions that anyone could use.
The average productivity gain was fourteen percent. But averages conceal the distribution. Novice agents, those in their first months on the job, saw a thirty-five percent improvement. Experienced agents, the ones whose knowledge the AI had been trained on, gained almost nothing. The AI was giving novices access to the collective expertise of the company’s best performers, knowledge that normally took months or years to accumulate through trial and error.
The learning curve effect was starker. Novices with AI reached the performance level of experienced agents in about three months. Without AI, the same benchmark took ten months. The tool compressed the learning curve by seventy percent. And it did so without the novices necessarily understanding why the suggested responses worked — they just followed the AI’s lead and got experienced-agent results. Whether they were developing genuine expertise or merely executing someone else’s expertise through a machine is an open question, and it matters for what happens when the tool encounters a situation it hasn’t seen before.
Noy and Zhang at MIT found the same in professional writing. ChatGPT reduced task time by forty percent. The biggest quality improvements went to the least effective writers, who leapt to median quality. The best writers barely moved.
Peng and colleagues at Microsoft measured GitHub Copilot’s effect on coding tasks: fifty-six percent faster completion. Across every study, in every domain, the pattern repeats. AI helps the least skilled the most. The performance distribution compresses.
This has labour-market implications that run counter to the dominant narrative. For the past four decades, technology has widened the skill premium. Computers made elite professionals more productive while automating the middle-skill jobs that supported a broad middle class. The rich got richer and the gap widened. If AI reverses this, if it compresses rather than stretches the skill distribution, then the technology could reduce wage inequality rather than increase it. David Autor, the MIT economist whose work on labour-market polarization defined the field, has argued exactly this. AI, he suggests, could be an “inversion technology” that reverses the direction of the previous transition.
The equalizer effect carries its most transformative implications outside the economies where it was studied. Every experiment cited here — BCG, Brynjolfsson, Noy and Zhang, Peng — was conducted in the United States or Western Europe, with workers who already had access to sophisticated training systems. A customer service agent in Nairobi, a paralegal in Manila, an analyst in Bangalore — workers with foundational education but no access to BCG-level mentorship or MIT-quality instruction — stand to gain far more. If AI can compress the distance between a novice and an expert in Boston, the compression is even more dramatic where the distance is wider and the training infrastructure thinner. This is speculative; the studies haven’t been run in those markets. But it follows directly from the equalizer’s own logic. The technology’s greatest impact may land furthest from where it was invented.
But Autor is careful to frame this as possibility, not prediction. “My thesis is not a forecast but a claim about what is attainable.” And the BCG study reveals the equalizer’s dark side. The workers who gain the most from AI are also the ones with the least independent expertise to deploy when the AI crosses the frontier boundary. A novice customer service agent who has been operating at experienced-agent level for three months, courtesy of AI, has not actually developed ten months of experience. They’ve developed three months of experience augmented by a tool. Take the tool away, or push the tool past its boundary, and they’re still a novice.
The equalizer effect and the falling-asleep effect are two sides of the same dynamic. AI compresses the performance distribution, which is good. But it does so by supplementing judgment rather than building it, which creates fragility. The workers AI helps the most are the ones least equipped to catch its failures. Structured foundational training addresses this gap — and those programs are already beginning to emerge.
Two ways to work
So how do effective workers actually navigate this boundary? Mollick identified two distinct patterns, drawing from the BCG data and broader observation.
The first he calls centaur work, after the mythical creature with a human torso and a horse’s body. In centaur mode, the division between human and machine is clear. The worker decides which tasks fall inside the frontier and hands those to AI, then identifies which fall outside and does those herself. A data analyst might choose the statistical approach themselves, design the interpretation framework, then let AI generate the visualizations and first-draft summaries. In the BCG study, the consultants who performed best in centaur mode were the ones who did their strongest work themselves and delegated specifically the tasks where AI had proven reliable.
The second pattern is cyborg work. Here the boundary blurs. Human and AI efforts intertwine at a granular level. The worker starts a sentence, AI finishes it, the worker edits the ending, feeds it back, iterates. The work product is neither purely human nor purely AI. It’s a deeply integrated hybrid, and the division happens within tasks rather than between them.
A financial analyst doing centaur work might spend the morning reading earnings reports and building an investment thesis herself, reasoning through the logic independently. In the afternoon she hands the thesis to AI and asks it to stress-test the assumptions, generate counter-arguments, and produce the visualizations. She evaluates the output with the confidence of someone who did the hard thinking first. A different analyst, working in cyborg mode, might start with a half-formed intuition about a company, prompt AI to pull relevant data, read the first results, refine her thesis based on what she sees, prompt again with the refined question, edit the AI’s analysis paragraph by paragraph, and arrive at a finished product through dozens of small iterations. Neither approach is inherently better. The centaur maintains clearer ownership of the judgment calls. The cyborg produces a tighter integration of human intuition and machine capability. The right choice depends on the task, the stakes, and the analyst’s confidence in her ability to evaluate AI output in real time.
These aren’t personality types. Skilled practitioners switch between modes depending on the task. Centaur mode works well when you can clearly separate tasks by capability and when the human has strong independent views to protect. Cyborg mode works when the work is too intertwined to divide cleanly — writing, for instance, or exploratory analysis where the thinking and the output develop together.
Mollick proposes a five-category framework for sorting work relative to AI. Some tasks should remain entirely human: creative work with personal voice, value-laden decisions, ethical judgment calls. Some should be delegated to AI with human review (expense reports, paper summarization, scheduling). Some can be fully automated. And then there are the two collaboration modes: centaur for strategic division, cyborg for deep intertwining.
The categories aren’t static. As models improve, tasks migrate. Something that was firmly in “just me” territory six months ago may now be centaur-eligible. Delegated tasks become automatable. Workers operating on stale assumptions about what AI can and can’t do are the ones most likely to hand off the wrong task or refuse to hand off the right one.
The skills required for each mode are different, which matters for training and hiring. Centaur work demands meta-cognitive judgment: knowing what you’re good at, knowing what AI’s good at, and making the allocation decision well. This is the “foundational expertise” that Autor argues workers need — enough domain knowledge to recognize when AI output is trustworthy and when it isn’t. Cyborg work demands something newer: fluid integration skills, the ability to prompt effectively, edit AI output critically, and iterate at speed. There’s no clean pre-AI analogue for this. The closest comparison might be the relationship between a skilled editor and a writer, compressed into a single person operating in real time.
Both modes require understanding the frontier. Without that understanding, centaurs hand off tasks the AI can’t handle, and cyborgs accept output that sounds right but isn’t. The nineteen-task BCG study is, in essence, a story of consultants who failed to map the frontier and paid for it on the task that fell outside.
The expertise question
If the critical skill is knowing where the frontier falls, the question becomes: what kind of expertise lets a worker map it? Autor’s historical analysis provides a useful frame.
He traces three distinct expertise regimes across economic history. In the pre-industrial era, expertise was artisanal — procedural skill combined with expert judgment, acquired through years of apprenticeship. A master wheelwright didn’t just follow blueprints; he read the grain of the wood, knew which species bent well for felloes and which split under stress, adjusted every joint to the specific timber he was working with. That judgment couldn’t be written down in a manual. It lived in the craftsman’s hands and accumulated over a decade of practice. Blacksmiths, tailors, coopers, tanners: the entire pre-industrial economy ran on this kind of embedded expertise, and it took a generation of apprenticeship to transmit.
Mass production destroyed the regime. A weaver who had spent years mastering a hand loom found that a factory girl, fourteen years old with a few days of training, could produce more cloth per hour on a power loom than he could in a day. The factory didn’t need artisanal judgment; it needed workers who could follow instructions, operate machines, and read. Literacy and numeracy replaced apprenticeship as the ticket to the middle class. This was the era of mass expertise, and it created the largest middle class in history. What mattered was reliability, not mastery. Show up, follow the process, handle the paperwork.
Then computers arrived, and they were surgically precise about which work they automated. They excelled at anything with clear rules and structured information — precisely the mass-expertise tier. Bank tellers, travel agents, switchboard operators, bookkeepers, production-line supervisors: these jobs were built on following procedures, and computers followed procedures faster and cheaper. What computers couldn’t do was anything requiring non-routine judgment, creativity, or tacit knowledge: diagnosing a patient, arguing a legal case, designing a building, managing a team through a crisis. The people who could do that work were almost exclusively college-educated professionals. Everyone else was pushed downward. The sixty percent of American adults without a bachelor’s degree saw their mid-tier jobs evaporate. They didn’t move into better jobs. They moved into low-paid service work (food preparation, retail, home health aides) because the only remaining positions that computers couldn’t touch were either highly cognitive or highly physical. The middle hollowed out over four decades, and the polarization reshaped politics, geography, and social identity along the way.
Autor’s argument is that AI can reverse this. Computerization concentrated expert judgment among a credentialed elite because computers could handle routine procedures but not tacit knowledge. AI breaks through that limitation, and the way it does so matters for everything that follows.
Michael Polanyi observed in 1966 that “we can know more than we can tell” — that much of what experts know is tacit, learned through experience, resistant to codification. A master diagnostician looks at a patient and knows something is wrong before she can articulate why. An experienced trial lawyer reads a jury and adjusts her argument in real time based on cues she couldn’t list if you asked. This knowledge is real, consequential, and impossible to encode in traditional software. You can’t write an if-then rule for “the jury isn’t buying it.” Traditional software was bound by Polanyi’s Paradox: it could only automate what could be explicitly specified.
AI doesn’t work that way. It learns from examples rather than following hard-coded rules, which means it can approximate patterns that nobody has articulated. Autor’s analogy is precise: a traditional program is a classical musician playing the notes on the page, exactly as written; AI is a jazz musician, improvising on melodies and adapting in real time. The jazz musician can’t always explain why a particular riff works in a particular moment. But it works.
This means AI can extend expert judgment to workers who have foundational training but not elite credentials. The nurse practitioner is instructive. NPs now perform diagnostic and prescriptive tasks that were once reserved for physicians. What made this possible wasn’t just information technology (electronic medical records, diagnostic databases) but institutional change: new training programs, certification regimes, scope-of-practice regulations. The technology and the institutions developed together over decades. Autor argues AI could accelerate this pattern across professions, from contract law to calculus instruction to catheterization.
But the analogy comes with a warning that connects directly back to the frontier. A pneumatic nail gun is indispensable for a professional roofer and a looming impalement hazard for a home hobbyist. YouTube how-to videos help electricians learn new techniques and would be dangerous for untrained homeowners rewiring a fuse box. The more powerful the tool, the higher the stakes, and the more foundational expertise matters. As Autor puts it: “AI can extend the reach of expertise by building stories atop a good foundation and sound structure. Absent this footing, it is a structural hazard.”
There is a structural paradox here that deserves naming. The reason you turn to AI for help is that a task exceeds your current expertise. But evaluating whether the AI did the task correctly requires the very expertise you lack. You use AI because you can’t do the diagnosis; but if you can’t do the diagnosis, how do you know the AI’s diagnosis is right? Every worker using AI on tasks at the edge of their competence faces this daily. The paradox resolves the same way it resolved for nurse practitioners: you don’t need the full depth of a physician’s training to learn when AI diagnostic predictions tend to fail, what the common error patterns look like, and when to escalate. Targeted evaluation skills are learnable. They require less training than the original expertise, and they can be taught systematically. The measurement problem is real, but it is a training problem with a known shape.
A study by Agarwal and colleagues in 2023 illustrates both the paradox and its resolution. They gave radiologists access to AI diagnostic predictions that matched or exceeded the accuracy of two-thirds of the doctors in the study. The AI should have improved performance across the board. It didn’t. Radiologists overrode correct AI predictions with their own inferior ones, and deferred to uncertain AI predictions when their own independent judgment was actually better. Access to a good tool, without training in how to use that specific tool, produced worse outcomes than no tool at all. The tool didn’t fail. The collaboration failed, because it was the first generation of a new workflow. Nobody had trained the radiologists in how to integrate AI predictions with their own judgment, when to defer and when to override. The second generation of that workflow will be better, because we now know what to train for. The frontier isn’t just about AI capability; it’s about the human’s ability to work the boundary, and that ability improves with practice and instruction.
The frontier smooths out
Every general-purpose technology had its own jagged frontier. As the previous chapter documented, factory owners in 1900 had been living with electric motors for two decades, and less than five percent of mechanical drive in American factories was electric. The technology wasn’t the problem. The organizational knowledge was. Owners wired centralized dynamos into shaft-and-belt layouts designed for steam, and the results were mediocre. Electricity’s frontier was jagged in exactly the way AI’s is: brilliant for some applications (lighting, small tools), useless or worse for others (powering a factory designed around steam’s physics). The boundary wasn’t in the motor. It was in the organizational imagination. By the 1920s, a new generation of designers had discovered unit drive, single-story factories, production-sequence layouts, and the frontier had smoothed completely. Not because the motors improved, but because the humans caught up.
Computers followed the same arc. Mainframes spent twenty-five years as faster clerks, automating payroll and billing while leaving business processes untouched. In 1987 Robert Solow could see computers everywhere except in the productivity statistics. He was staring at a jagged frontier and concluding the technology was limited. Then Walmart built real-time inventory tracking, Dell built build-to-order manufacturing, Amazon rebuilt retail around a database, and the frontier smoothed. The breakthrough wasn’t faster hardware. It was a new generation that designed workflows native to the technology rather than shoehorning it into old ones.
AI is following this pattern, but the timeline is compressed. Electricity took forty years from commercial availability to organizational breakthrough. Computers took roughly twenty-five. AI’s feedback loops are tighter, the experimentation is more distributed, and millions of workers are mapping the frontier simultaneously rather than a handful of factory designers or corporate IT departments. The BCG study of 2023 is the equivalent of a 1900 factory audit that found the dynamo disappointing. We are three years into a process that historically took decades. The frontier is already visibly narrower than it was when that study was conducted.
Reframing the reskilling question
If expertise democratization is real, if AI can extend the reach of professional judgment to a broader set of workers, then the critical policy question changes. The standard framing asks: how do we retrain displaced workers for entirely new careers? Autor’s argument reframes it: how do we give workers enough foundational expertise that AI can extend their capabilities into higher-value work?
Each expertise regime required a different kind of training. The artisanal era required apprenticeships lasting years. The mass-expertise era required high school diplomas, universal and standardized, emphasizing literacy and numeracy over craft. The elite-expertise era required college degrees and often graduate training. The AI era may require something different again, shorter and more focused than graduate training, but deeper than anything a weekend boot camp could deliver. Structured foundational programs that give workers enough domain knowledge to use AI-assisted judgment effectively. Enough to know when the AI is probably right and when to push back. In practice, this means teaching the measurement skill: how to evaluate AI output in a specific domain, what the common failure modes look like, and when to escalate. Enough to be a competent centaur or cyborg rather than a passive recipient of AI output.
What might these programs look like? Shorter than a four-year degree, deeper than a weekend boot camp. Something closer to the NP model: a focused curriculum that teaches enough anatomy, pharmacology, and diagnostic reasoning that the graduate can work effectively with AI diagnostic tools, recognizing when the tool’s suggestion makes clinical sense and when it doesn’t. In legal work, it might mean a paralegal program that covers contract structure, regulatory frameworks, and common failure modes in enough depth that an AI-assisted paralegal can review contracts with genuine comprehension rather than surface pattern-matching. The common thread is domain fluency: enough understanding to exercise judgment about AI output in a specific field, without requiring the full depth of traditional professional training.
A hopeful frame, but Autor is explicit that nothing about it is automatic. “AI will not decide how AI is used. The constructive and destructive applications are boundless.” Whether expertise democratization actually happens depends on institutional choices: training programs, certification regimes, labour law, corporate incentive structures. The nurse practitioner occupation required decades of institutional struggle against the American Medical Association before nurses were permitted to perform tasks that physicians monopolized. The AMA argued that patient safety required physician oversight; nurses argued that training, not credentials, determined competence. The fight took from the 1960s to the 2010s, and scope-of-practice restrictions still vary by state. AI-enabled expertise extension will face similar resistance from incumbent gatekeepers. Lawyers, accountants, financial advisors, architects: every credentialed profession has an institutional apparatus designed to prevent the kind of access expansion Autor describes. But the NP precedent is telling. Nurses fought for decades, and they won. The evidence that trained non-physicians could deliver safe care ultimately outweighed the credentialing arguments. AI-enabled access expansion will face the same opposition and, if the pattern holds, overcome it faster, because the evidence accumulates faster.
What organizations are actually seeing
What’s happening outside controlled experiments?
Aaron Levie, the CEO of Box, mentioned in a tweet (April 2026) some takeaways mfrom eeting with IT and AI leaders from large enterprises across banking, media, retail, healthcare, consulting, tech, and sports. His observations track the empirical findings closely.
The first thing he noticed was what enterprises weren’t talking about. “Most companies are not talking about replacing jobs due to agents. The major use-cases for agents are things that the company wasn’t able to do before or couldn’t prioritize.” Software upgrades, automating back-office processes that constrained other workflows, processing large volumes of documents for new business insights. AI enabling new work rather than replacing existing work, just as the expanding frontier argument would predict.
Everyone was also, paradoxically, working harder. “Unanimous sense that everyone is working more than ever before. AI is not causing anyone to do less work right now.” The BCG study would predict exactly this. If AI makes people forty percent more productive on certain tasks, the organizational response isn’t to give them forty percent more free time. It’s to redirect that capacity toward newly possible work.
Engineering jobs were being misjudged most badly. “Everyone’s estimation of engineering jobs is totally off. Engineers may not be ‘writing’ software, but they will certainly be the ones to setup and operate the systems that actually automate most work in the enterprise.” Centaur work at the organizational level: engineers shifting from the tasks AI handles well (writing boilerplate code) to the tasks it can’t (system design, architecture, integration, oversight).
But the hardest problem wasn’t technical. “Change management still will remain one of the biggest topics for enterprises. Most workflows aren’t set up to just drop agents directly in.” One company had created a head of AI in every business unit, reporting to a central team, just to coordinate adoption across functions. The organizational lag that Perez documented across five technological revolutions is operating in real time.
Most enterprises are also dealing with decades of legacy systems, on-premise databases and applications moved to the cloud but never modernized, that agents can’t tap into in any unified way. Before AI can expand the frontier, companies have to fix the plumbing. Levie also noted a phenomenon he called “tokenmaxxing”: companies operating with strict annual OpEx budgets are going through real trade-off discussions about how to budget for AI compute. One company had pitched a “shark tank” format for internal teams competing for token budgets. Others were developing hierarchies of use cases to ration compute to the highest-value applications. The economics of AI aren’t just about model capability; they’re about who gets access to how much compute, and which work justifies the spend.
Levie also noted that “headless software” dominated his conversations — enterprises are realizing their software must interoperate with AI agents from multiple vendors, and the transition from human-user to agent-user architecture has barely begun.
Separately, Levie observed that cybersecurity is “about to have its Jevons paradox moment.” Better AI tooling for security will increase demand for security talent, he argued, because autonomous vulnerability discovery “automates the proving step, but it doesn’t automate the response. More real findings surfaced faster means more triage, more remediation, more architectural decisions that need human judgment.” AI generates a hundred times more code. That generates a hundred times more security surface. AI can triage the threats, but a human expert is still needed to manage the process. The frontier expands; the human role shifts but doesn’t disappear.
One organizational dynamic is harder to observe from the executive suite. Mollick documents it from the worker level: people who discover effective AI workflows often keep them secret. Three incentives converge. Companies that ban or restrict AI push workers to personal devices and shadow tools. The ban doesn’t stop usage; it stops disclosure. AI-generated work is judged differently when people know AI produced it, so workers who reveal their methods may find their output devalued. And a worker who automates ninety percent of a task and tells management may watch ninety percent of the department get downsized. Silence is rational self-preservation.
The result is that organizations lose access to their most valuable AI innovations because the incentive structure punishes disclosure. The workers with the deepest centaur and cyborg experience, the ones who have actually mapped the frontier, are the ones least likely to share what they’ve learned. Multiply every worker who has quietly figured out how to use AI effectively across every organization, and you begin to see the scale of collective knowledge being left on the table.
The nineteenth task
That BCG consultant wasn’t a bad analyst. She was, by the standards of the study, an excellent one. Her performance on the first eighteen tasks proved it — AI made her forty percent better. On the nineteenth task, she didn’t get lazy or careless. She did exactly what the previous eighteen tasks had trained her to do: take the AI’s output, review it briefly, integrate it into her recommendation. The AI’s answer looked like all the other answers. Same confident tone, same structured reasoning, same plausible conclusions. The only difference was that this one was wrong.
That is the jagged frontier in a single anecdote. The technology doesn’t announce when it crosses the line from competence to confident error. It doesn’t blink or hedge or qualify itself. The answer that’s forty percent better than what you could produce alone looks identical to the answer that’s twenty-four percent worse. The only way to tell the difference is to bring your own judgment to the evaluation, which is the thing that high quality AI trains you to stop doing.
The research points in two directions at once, and both are real. The gains: forty percent on consulting tasks, fifty-six percent on coding, fourteen percent on customer service, with the largest benefits going to the workers who need them most. The risks: the better the AI, the less carefully humans check its work, and the workers who gain most are the ones least equipped to catch failures. Anyone who tells you only one side of this is selling something.
But the balance between those two sides is shifting, and it’s shifting toward the gains. Centaur and cyborg collaboration models describe what effective practitioners already do. The measurement paradox resolves the same way it resolved for nurse practitioners: structured foundational training that teaches evaluation, not the full depth of the original expertise. Dell’Acqua’s own data shows the mechanism for solving the falling-asleep problem: deliberate friction that keeps human judgment engaged.
The equalizer effect matters more than any other finding here. AI’s default direction, absent deliberate interference, compresses skill distributions rather than stretching them. This is the opposite of what computerization did. If institutional choices support it, if training programs emerge, if credentialing barriers soften, the result could be the broadest expansion of professional capability since mass education. That is not a guarantee. Autor is careful to frame it as possibility. “AI will not decide how AI is used.” But the evidence from the field is accumulating on the optimistic side. Nobody in Levie’s enterprise meetings is cutting jobs. Everyone is working more. The frontier is expanding.
And the frontier itself is smoothing. The electricity frontier smoothed as organizational knowledge caught up with the technology. The computer frontier smoothed as a new generation designed workflows native to the tool. AI’s frontier is smoothing faster than either, because the feedback loops are tighter and the experimentation is more distributed. Mollick’s 1,016-profession analysis shows where the boundary falls today. A similar analysis a year from now will show it has shifted. Two years from now, shifted again. The boundary is not a wall. It is receding.
The BCG consultant who trusted the nineteenth-task answer was not a cautionary tale about a broken technology. She was a snapshot of the learning curve’s first year. Three years later, practitioners have already developed centaur and cyborg workflows that would have caught her error. Three years from now, the models will be better, the evaluation tools will be better, and the organizational knowledge will be deeper. The question is not whether we learn to navigate the frontier. Every previous technology says we will. The question is how fast, and how broadly the gains are shared.
The connection to what comes next
Software engineering is where the jagged frontier is being mapped in real time, by millions of workers every day. Developers are the largest single population of people actively collaborating with AI on professional tasks. Some are centaurs, using AI for boilerplate while writing the architecture themselves. Some are cyborgs, prompting and editing in tight loops, producing code that is neither fully human nor fully machine. Some have stopped writing code entirely, directing AI agents through natural language and reviewing what comes back. They call it “vibe coding,” and the name captures both the promise and the danger: you can build software by feel, without understanding the mechanics, and it works beautifully right up until it doesn’t. They are living every dynamic this chapter has described: the equalizer effect, the falling-asleep risk, the secret automation, the frontier that shifts every time the model updates. No other profession is generating this much data about what human-AI collaboration actually looks like in practice. The next chapter goes inside it.