The Myth of Infinite Context
"1 Million Token Context Window."
"2 Million Tokens."
"Infinite Context."
The marketing war between AI model providers is currently focused on size. The implication is simple: if you can fit your entire codebase, your whole wiki, and every email you've ever sent into the prompt, the model will magically understand everything.
It is a seductive promise. It suggests that the solution to "AI that doesn't know enough" is simply "give it more data."
But anyone who has actually tried to build reliable agents with massive contexts, or has even just had a chat with ChatGPT or Claude go on way too long, knows the truth:
Too much information makes the AI forget things and perform worse.
Quick sidebar: What is a context window?
If you already know, jump to the next section.
The context window is the amount of text (measured in tokens) that a Language Model can "see" at once.
You can think of it as the AI's short-term memory. It can only remember and reason about what's in the context window. Every time you interact with an AI in a chat, for example, the new chunk of text from the chat is appended to the entire chat history that is already in the context window.
The "Lost in the Middle" Phenomenon
Research consistently shows that Language Models have a "U-shaped" attention curve.
- They are excellent at using information at the beginning of their context window (usually the system prompt)
- They are excellent at using information at the end of their context window (usually the most recent question from the user in a chat).
- They are surprisingly bad at retrieving details buried in the middle.
When you stuff 100 documents into the context window, you may actually be giving it too much information and in the process diluting its ability to retrieve or focus on any one part. Fundamentally, it's a signal-to-noise ratio problem.
Context Rot
Lately I've been hearing more and more of the term "Context Rot". I think it was coined or at least popularized by the team at Chroma. Context Rot describes this exact situation where model performance degrades as you add additional information and especially when you add irrelevant information.
It works like this:
- You give an agent 5 tools. It uses them perfectly.
- You give it 50 tools, thinking it will only both with the tools that are relevant.
- Suddenly, it starts failing at the original 5 tasks. It confuses tool definitions. It hallucinates parameters. Or it's performance on non-tool use tasks just gets slightly worse and you don't know why.
That's because every tool it has access to is described in its context window. Even when it doesn't need them it is made aware that they are there on every single step it takes.
Every token you add to the context window is a liability. It is a potential source of confusion, distraction, or conflict.
Curation > Accumulation
If we accept that dumping everything in the context windowis a failed strategy, what is the alternative?
Active Curation.
High-performing AI systems don't rely on massive context windows. They rely on precise context windows. They treat the prompt like a surgical tray, not a junk drawer. They include exactly what is needed for the current task, and nothing else.
This requires a shift in mindset: We shouldn't attach dozens of MCP servers so that our AI Agent can do anything we ask it. Rather we should create small curated versions of our AI that are specialized to specific tasks we need them to perform.
Context Engineering in Practice
This is why we built RubixCube with a focus on toggling and modularity. RubixCube is, at it's absolute core, a Context Engineering platform.
In a RubixCube notebook, you don't just dump text into a hidden buffer. You build discrete Context Blocks by attaching relevant documents. Each Context Block or document can be enabled or disabled at will.
It might look something like this:
- Block A: Database Schema
- Block B: API Documentation
- Block C: Stylistic Guidelines
You can run a cell with all three enabled. Then, if the model gets confused, you can disable Block C and re-run. You can test whether the model performs better with the Schema at the top or the bottom.
You stop treating context as a bucket to be filled and start treating it as a system to be tuned.
This is what originally inspired me to create RubixCube, I wanted an experimentation platform for my own context engineering work and to try to get the best perforance out of AI for specific narrow tasks.
The best prompt engineers are curators of context. They know that the most powerful move is often to remove what isn't necessary.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
-- Antoine de Saint-Exupéry