The loop itself is trivial. The real innovation is encrypted compaction — replacing conversation history with an opaque, model-native compressed state that preserves "latent understanding" without exposing content. Plus: reasoning tokens persist within tool-call chains but vanish between user turns.
When the context window fills up, OpenAI's /responses/compact endpoint doesn't just summarize the conversation. It replaces all prior assistant messages, tool calls, and tool results with a single "compaction item" containing encrypted_content.
This preserves the model's "latent understanding" — internal reasoning state that can't be expressed as text — while remaining opaque and ZDR-compatible (zero data retention). The model was natively trained to resume from these compressed states.
"A weaker version was previously possible with ad-hoc scaffolding and conversation summarization, but the first-class implementation via the Responses API is integrated with the model and is highly performant."
The term "encrypted" is misleading. If the content were truly cryptographically encrypted, the embeddings would be unintelligible noise. So how does the model understand it?
It's not really encrypted — it's a learned compression format. The model was co-trained to both produce and decode these compressed representations. Think of it like a proprietary file format: opaque to humans, but the model wrote the spec and knows how to read it.
The key insight: when you co-train a compressor and decoder end-to-end, the model invents its own compression tricks. Research shows models can learn aggressive pruning, dense encodings, even switching languages mid-compression to pack more meaning per token.
Why "encrypted"? Primarily for ZDR compliance and API opacity — the content can't be inspected by humans or logged in readable form. But to the model, it's just another input format it was trained on. Like how only Microsoft Word can natively read .docx files.
Within a tool-call chain, reasoning tokens (the model's internal chain-of-thought) persist. The model maintains deep reasoning state as it calls tool after tool.
Between user turns, reasoning tokens are discarded. This means context is lost between messages — the model can't "remember" its private reasoning from the last exchange.
Developers work around this by having the agent write progress summaries to files (like a scratchpad) — externalizing reasoning state before the turn boundary erases it.
GPT-5.1-Codex-Max is the first model natively trained to operate across multiple context windows through compaction. This isn't a wrapper or post-hoc trick — the model learned during training how to pick up coherently from a compacted state.
Result: the model can work on a single task for 24+ hours, compacting and continuing across millions of tokens without degradation.
Compaction training had a side effect: token efficiency. On SWE-bench Verified, Codex-Max at "medium" reasoning effort outperforms the base Codex model at the same effort level while using 30% fewer thinking tokens.
Concrete example: generating a frontend interface takes ~27K thinking tokens vs. ~37K for the non-Max model, with equivalent output quality and fewer tool calls.
For difficult tasks, Codex uses a Best-of-4 strategy — running the same prompt 4 times to generate different architectural approaches. A reviewer (human or model) picks the best path before finalizing.
This trades compute for reliability. The agent loop isn't just one shot — it's a tournament of parallel attempts.
The agent loop pattern is dead simple: call model → if tool_call, execute and append result, call again → if text response, stop. That's it.
The innovation isn't the loop. It's what happens when the loop runs long: encrypted compaction preserving latent state, reasoning tokens that persist within chains but not across turns, and a model trained specifically to resume from compressed history.
Every agent framework implements the same loop. The moat is in the compaction layer.