← Wiki

Context Engineering

Definition

Context engineering is the emerging discipline of structuring information, instructions, and tool access to maximize AI agent performance on domain-specific tasks. It encompasses what to include in prompts, how to decompose tasks across agents and subagents, what data to retrieve, how much compute to allocate, and how to evaluate results.

Levie frames the challenge: “To build AI agents, in theory, it should be as simple as having a super powerful model, giving it a set of tools, having a really good system prompt, and giving it access to data. But in practice, to make agents that work today, you’re dealing with a delicate balance.”

Why context engineering is hard

AI agents have a fundamental constraint that human workers do not: the context window. They can only process a fixed amount of information at once, and performance degrades as that window fills (“context rot”). This creates a series of non-obvious engineering tradeoffs:

DimensionTradeoff
Global vs. subagent scopeWhat belongs in the master agent’s context vs. delegated to specialized subagents?
Agentic vs. deterministicWhich steps need AI reasoning vs. a hard-coded tool call?
Context window budgetHow much of the limited window goes to instructions, data, history, and tools?
Speed vs. qualityFast inference may sacrifice quality; thorough reasoning may be too slow
Breadth vs. depth of retrievalMore retrieved context helps coverage but risks overwhelming the model

Levie observes: “So far there’s no one right answer for any of this, and there are meaningful tradeoffs for any given approach you take.”

Domain expertise as the bottleneck

Getting context engineering right “requires a deep understanding of the domain you’re solving the problem for. Handling this problem in AI coding is different from law, which is different from healthcare.”

Context engineering is not purely a technical discipline. It requires someone who understands the domain workflow (every step a human expert would take), the information architecture (what data exists, where, and how to retrieve the right subset), and the failure modes (what goes wrong when the agent has too little context, too much, or the wrong kind).

Levie argues that AI agent product managers need a different skill set than traditional PMs: “Deeply understand the domain that you’re building agents for. In an ideal world you actually have worked in that space.”

The subagent paradigm

A central architectural pattern in context engineering is decomposition into subagents. Levie describes this as “super interesting” — rather than one monolithic agent with a bloated context window, work is distributed across specialized subagents, each with a focused context.

The analogy: “There was probably a time in the past where nearly all software tasks could be handled by a single large application. But as the complexity and breadth of tasks grew, there became need for applications to then be broken up across specialized functions.”

This creates a new coordination problem: if you have “100X more AI agents than people doing work in a company, it’s actually more critical than ever that the work is well orchestrated.” The management challenge shifts from managing humans to orchestrating agent workflows.

Information retrieval as the foundation

Levie makes a reductive but clarifying claim: “Many AI agent problems are really just information retrieval problems. If the agent can find the right information for a given task, they’d nail the task.”

This frames context engineering as a variant of the classic knowledge management problem — getting the right information to the right agent at the right time — now operating at machine speed and machine scale.

The eval problem

Context engineering produces non-deterministic systems. Shreya Shekhar identifies the core challenge: today’s agentic systems introduce “stochasticity, the action-intent gap, and subjectivity of results.” Traditional software has strict guarantees around consistency and fault tolerance. AI agents offer none of these by default.

Levie confirms this from practice: “We are barely scratching the surface on evals. A significant portion of knowledge work is going to move to AI agents, and we have essentially no infrastructure for understanding how they’re performing at a fine-grained level.”

The eval gap means organizations cannot yet reliably measure whether context engineering improvements actually improve outcomes — creating a build-measure-learn cycle with a broken “measure” step.

The agent operator as context engineer

Context engineering is not only a design discipline — it is becoming a job function. Levie projects 500,000 to 1 million new “agent operator” roles in Fortune 1,000 companies (see AI Agents and Job Redefinition). These operators must understand MCPs, CLIs, agents.md files, and how to write skills — but the core of their work is context engineering applied to specific business domains.

The challenge in enterprise settings goes beyond prompt design. Agent operators must navigate fragmented data estates where “half your data estate is not even ready to work with the agent” because it sits on network file shares or legacy document management systems. They must restructure workflows so agents receive properly curated context, understand process boundaries, and produce outputs that fit organizational compliance requirements. And when a new model drops, “your workflow probably breaks because the way you prompt that agent now is different” — requiring continuous context re-engineering.

This makes context engineering a perpetual discipline rather than a one-time design task. The feedback loop — build workflow, deploy agent, new model ships, workflow breaks, re-engineer context — demands ongoing technical and domain expertise that justifies a dedicated role.

Connection to the knowledge-judgement shift

Context engineering operationalizes the shift described in Judgement vs Knowledge in the AI Era. The agent supplies knowledge (retrievable, encodable); the human supplies judgment about which knowledge matters and how to structure the agent’s context. The human’s value shifts from knowing things to knowing how to frame problems for AI consumption.