The Jagged Frontier

The Shape of AI Capability

AI does not degrade gracefully. Within the boundary of its capability, it produces work that matches or exceeds skilled professionals. Outside that boundary, it fails confidently and catastrophically. The boundary between these zones is jagged — irregular, hard to predict, and different for every task domain.

Ethan Mollick and a team of researchers from Harvard, MIT, and Warwick Business School tested this empirically with 758 Boston Consulting Group consultants. Consultants were randomized into two groups: one working with GPT-4, one without. They performed eighteen tasks designed by BCG to resemble real consulting work — creative, analytical, writing, and persuasion tasks.

Inside the Frontier: +40% Performance

The results inside the frontier were unambiguous. Across 118 different analyses, AI-powered consultants were faster, more creative, better written, and more analytical than their peers. The effect held regardless of how results were measured — human graders, AI graders, skill level of the consultant.

The performance gain was not subtle. Consultants using AI outperformed the control group by roughly 40% on quality metrics across the eighteen tasks.

Outside the Frontier: Worse Than No AI

BCG designed one additional task specifically chosen to fall outside AI’s capability boundary — a problem combining a tricky statistical issue with misleading data. Human consultants without AI got it right 84% of the time. Consultants using AI dropped to 60-70% accuracy.

The AI produced a confident, plausible, wrong answer. Consultants trusted it. The technology that made them 40% better on eighteen tasks made them measurably worse on the nineteenth.

Falling Asleep at the Wheel

Fabrizio Dell’Acqua’s complementary study explains the mechanism. He gave 181 professional recruiters a task: evaluate 44 job applications based on math ability (scores from an international test, not obvious from resumes). Some recruiters received high-quality AI assistance, some low-quality, some none.

The counterintuitive finding: recruiters with higher-quality AI performed worse than those with lower-quality AI. High-quality AI caused recruiters to spend less time on each resume, follow AI recommendations blindly, and fail to improve over time. Low-quality AI kept recruiters alert, critical, and independent — they improved their judgment through the friction.

Dell’Acqua calls this “falling asleep at the wheel.” When the AI is very good, humans disengage. They stop applying their own judgment. This is precisely the failure mode that matters most: the better the AI, the more dangerous the moments when it’s wrong, because users have stopped checking.

Cognitive Surrender describes the same reinforcement loop: accepting AI output without critical evaluation, leading to atrophy of independent judgment. Dell’Acqua provides the empirical mechanism for what the cognitive surrender literature describes at the individual level.

The Equalizer and Its Risks

The BCG study confirmed the same pattern found in Brynjolfsson’s customer service study and Noy & Zhang’s writing experiment: AI as equalizer. The lowest-performing consultants gained the most from AI assistance. The gap between bottom and top performers compressed.

But the equalizer effect has a dark side. The workers who gain the most are also the most vulnerable to falling asleep at the wheel — they have the least independent expertise to deploy when the AI crosses the frontier boundary. The very workers AI helps most are the ones least equipped to catch its failures. See Expertise Democratization for Autor’s analysis of why foundational training remains critical even as AI extends capabilities.

Mapping the Frontier

Mollick’s practical recommendation: invite AI to everything to learn the shape of the frontier empirically. The frontier is task-specific, domain-specific, and changes as models improve. There is no substitute for direct experimentation.

The frontier’s jaggedness means that proximity in task-space is a poor predictor of AI capability. A model might write excellent marketing copy but fail at a closely related persuasion task. It might analyze financial data accurately but misinterpret the same data when presented in a slightly different format. The boundary does not follow intuitive lines.

For organizations, this means blanket policies — “use AI for X, don’t use it for Y” — will be wrong at the edges. The workers closest to the actual tasks are the ones best positioned to map where the frontier falls in their specific work. This is one reason shadow AI use persists even in organizations that try to control it.

Cognitive Surrender — The psychological mechanism behind falling asleep at the wheel
Expertise Democratization — Why foundational expertise is needed to work safely near the frontier
Centaur and Cyborg Work — The two models for staying effective across the frontier
AI Agents and Job Redefinition — Job tasks constantly shift relative to the frontier
Judgement vs Knowledge in the AI Era — Judgment is what catches failures outside the frontier