OpenAI Engineering Blog / Jan 2026

The Secret Sauce of the Codex Agent Loop

Distilled from Unrolling the Codex Agent Loop — the non-obvious insights

TL;DR

The loop itself is trivial. The real innovation is encrypted compaction — replacing conversation history with an opaque, model-native compressed state that preserves "latent understanding" without exposing content. Plus: reasoning tokens persist within tool-call chains but vanish between user turns.

The Loop (it's simple)
Call Model Tool Call? Execute & Append Call Again Context Full? Compact Text? Done.
If tool_call → loop. If text → stop. If context full → compact → continue.
INSIGHT 01

Encrypted Compaction ≠ Summarization

When the context window fills up, OpenAI's /responses/compact endpoint doesn't just summarize the conversation. It replaces all prior assistant messages, tool calls, and tool results with a single "compaction item" containing encrypted_content.

This preserves the model's "latent understanding" — internal reasoning state that can't be expressed as text — while remaining opaque and ZDR-compatible (zero data retention). The model was natively trained to resume from these compressed states.

// What you get back from /responses/compact { "type": "compaction", "encrypted_content": "<opaque blob>" // not a summary } // All prior user messages kept verbatim // Assistant msgs + tool calls + tool results = replaced

"A weaker version was previously possible with ad-hoc scaffolding and conversation summarization, but the first-class implementation via the Responses API is integrated with the model and is highly performant."

INSIGHT 01b

Wait — How Does "Encrypted" Content Stay Useful?

The term "encrypted" is misleading. If the content were truly cryptographically encrypted, the embeddings would be unintelligible noise. So how does the model understand it?

It's not really encrypted — it's a learned compression format. The model was co-trained to both produce and decode these compressed representations. Think of it like a proprietary file format: opaque to humans, but the model wrote the spec and knows how to read it.

// What's actually happening: // 1. During training, the model learned to: // - Generate compressed state representations // - Resume coherently from those representations // 2. The "encryption" is really: // - Opaque to humans (can't read it) // - Opaque to external systems (ZDR-compliant) // - Perfectly intelligible to the trained model // Similar to "gist tokens" research (Mu et al., NeurIPS 2023) // where models learn their own compression codebook

The key insight: when you co-train a compressor and decoder end-to-end, the model invents its own compression tricks. Research shows models can learn aggressive pruning, dense encodings, even switching languages mid-compression to pack more meaning per token.

Why "encrypted"? Primarily for ZDR compliance and API opacity — the content can't be inspected by humans or logged in readable form. But to the model, it's just another input format it was trained on. Like how only Microsoft Word can natively read .docx files.

INSIGHT 02

Reasoning Tokens: Ephemeral by Design

Within a tool-call chain, reasoning tokens (the model's internal chain-of-thought) persist. The model maintains deep reasoning state as it calls tool after tool.

Between user turns, reasoning tokens are discarded. This means context is lost between messages — the model can't "remember" its private reasoning from the last exchange.

Developers work around this by having the agent write progress summaries to files (like a scratchpad) — externalizing reasoning state before the turn boundary erases it.

INSIGHT 03

Trained for Multi-Window, Not Bolted On

GPT-5.1-Codex-Max is the first model natively trained to operate across multiple context windows through compaction. This isn't a wrapper or post-hoc trick — the model learned during training how to pick up coherently from a compacted state.

Result: the model can work on a single task for 24+ hours, compacting and continuing across millions of tokens without degradation.

INSIGHT 04

30% Fewer Thinking Tokens, Same Performance

Compaction training had a side effect: token efficiency. On SWE-bench Verified, Codex-Max at "medium" reasoning effort outperforms the base Codex model at the same effort level while using 30% fewer thinking tokens.

Concrete example: generating a frontend interface takes ~27K thinking tokens vs. ~37K for the non-Max model, with equivalent output quality and fewer tool calls.

30%
fewer thinking tokens
24h+
max continuous session
4x
best-of-N attempts
INSIGHT 05

Best-of-N: Run 4, Pick 1

For difficult tasks, Codex uses a Best-of-4 strategy — running the same prompt 4 times to generate different architectural approaches. A reviewer (human or model) picks the best path before finalizing.

This trades compute for reliability. The agent loop isn't just one shot — it's a tournament of parallel attempts.

INSIGHT 06

The Loop Itself Is Boring

The agent loop pattern is dead simple: call model → if tool_call, execute and append result, call again → if text response, stop. That's it.

The innovation isn't the loop. It's what happens when the loop runs long: encrypted compaction preserving latent state, reasoning tokens that persist within chains but not across turns, and a model trained specifically to resume from compressed history.

Every agent framework implements the same loop. The moat is in the compaction layer.