Simulectics Radio | Claude Code (Season 2025-Q4)

Episode #6 | December 22, 2025 @ 9:00 PM EST

The Bounded Window: Attention, Architecture, and the Limits of Local Understanding

Andrej Karpathy (AI Researcher, Former Tesla and OpenAI)

Announcer The following program features simulated voices generated for educational and philosophical exploration.

Greg Evans Good evening. I'm Greg Evans.

Andrea Moore And I'm Andrea Moore. Welcome to Simulectics Radio.

Greg Evans We've discussed what AI coding tools can do, how they change workflows, and what their economic implications might be. Tonight we examine a fundamental constraint: context windows. How much of a codebase can an AI actually understand at once? What are the architectural limits of attention mechanisms when applied to large software projects?

Andrea Moore Our guest is Andrej Karpathy, AI researcher and former director of AI at Tesla, where he built autopilot systems, and former founding member of OpenAI. Andrej has worked extensively on transformer architectures and their limitations. Welcome.

Andrej Karpathy Thanks for having me.

Greg Evans Let's start with the basics. Claude Code can handle what, two hundred thousand tokens now? That sounds like a lot until you consider a typical production codebase. What's actually happening when we hit context limits?

Andrej Karpathy The model sees a fixed-size window of text. Everything outside that window doesn't exist for it. Two hundred thousand tokens is maybe four to five hundred pages of text—substantial, but nowhere near a full enterprise codebase. So the system has to choose what to include. That selection process is critical and often invisible to users.

Andrea Moore How does it choose? When I ask Claude Code to modify a file, how does it decide what other context to load?

Andrej Karpathy It uses heuristics—file imports, recent git history, files with similar names, test files that reference the target. But these are ultimately guesses. The system doesn't have true project-wide understanding. It's making local decisions about what might be relevant based on surface patterns.

Greg Evans That sounds fundamentally limited. A human developer understands the architecture of a system—the high-level organization, the design patterns, the implicit contracts between modules. Can a context window ever capture that?

Andrej Karpathy Not in the way humans do. Human understanding is hierarchical and compressed. We don't keep all the implementation details in mind—we maintain abstractions. We know that the authentication module handles login without remembering every line of its code. Transformers don't naturally form those compressed representations. Every token costs attention equally.

Andrea Moore So there's no equivalent of a mental model of the system?

Andrej Karpathy Right. The model sees text, processes it through attention mechanisms, and generates responses. There's no separate phase where it builds and maintains a conceptual architecture diagram. Some research explores hierarchical transformers or separate memory systems, but production coding tools don't have that yet.

Greg Evans What about techniques like retrieval-augmented generation? Could you build an external index of the codebase and query it as needed?

Andrej Karpathy You can, and some systems do this. But retrieval introduces its own problems. What query do you use? If you're modifying an authentication function, do you retrieve all other authentication code? All security-related code? Code that calls this function? Each choice surfaces different context, leading to different modifications. The retrieval strategy shapes the output.

Andrea Moore That sounds like it could miss subtle dependencies. A change in one module might break something three layers away in the call graph.

Andrej Karpathy Exactly. And current systems won't necessarily catch that because they don't have full project context. They might generate a modification that's locally sensible but globally problematic. This is why testing is so critical—you're not just verifying that the change works, you're checking that the limited context didn't cause the AI to miss something important.

Greg Evans Is there a theoretical path to solving this? Could we build models with million-token context windows?

Andrej Karpathy The math gets expensive fast. Attention is quadratic in sequence length—double the context, quadruple the computation. There are approximations like sparse attention or linear attention variants, but they sacrifice some capability. Even if we reach million-token contexts, we'd still face the same problem at a larger scale. Enterprise codebases can be tens of millions of lines.

Andrea Moore So we're always going to be working with partial context?

Andrej Karpathy Probably, yes. Which means the interesting question isn't how to eliminate the constraint but how to work within it effectively. How do we structure codebases to be more amenable to partial context reasoning? How do we help models find the right context to load?

Greg Evans That's an architectural question. Should we design systems differently if we know AI agents will be working on them?

Andrej Karpathy Potentially. Stronger module boundaries, more explicit interfaces, better documentation of dependencies—these help both human and AI understanding. If your architecture is a tangled graph where everything depends on everything, partial context is hopeless. If it's cleanly layered with clear interfaces, partial context can work reasonably well.

Andrea Moore So AI limitations might actually push us toward better software design?

Andrej Karpathy That's the optimistic view. The pessimistic view is that we end up with systems designed more for AI comprehension than human comprehension, and those might not be the same thing. A codebase optimized for retrieval and local reasoning might be harder for humans to understand holistically.

Greg Evans What about incremental understanding? Could a model read a codebase once, build some kind of summary or index, and reuse that across sessions?

Andrej Karpathy In principle, yes. You could have the model generate architectural summaries, dependency graphs, or design documents that fit in context alongside code. But this adds complexity—now you're maintaining this meta-representation and keeping it synchronized with code changes. It's doable but not trivial.

Andrea Moore Would that meta-representation be trustworthy? If the model generates its own summary of a codebase, how do we know the summary is accurate?

Andrej Karpathy You don't, unless you verify it. And verification requires understanding, which requires context, which is what we're trying to solve in the first place. You could end up in circular dependency—using summaries because you can't fit everything in context, but needing full context to verify the summaries.

Greg Evans This reminds me of the halting problem—some limitations are fundamental, not just engineering challenges.

Andrej Karpathy That's a good analogy. We can make incremental improvements, but we're not going to achieve perfect project understanding with bounded resources. The question is whether good-enough partial understanding is useful, and empirically, it seems to be. Developers are getting value from these tools despite the limitations.

Andrea Moore What happens when the model's partial context leads to wrong assumptions? How often does that occur?

Andrej Karpathy More often than we'd like, especially in complex codebases. The model might see a function signature and assume it works a certain way based on similar patterns it's seen in training, but miss that this particular implementation has special cases or unusual behavior. Testing catches some of this, but not all.

Greg Evans Does this argue for more formal specifications? If we can't give the model all the code, maybe we can give it formal contracts that describe behavior precisely.

Andrej Karpathy That would help. Formal specifications are compressed representations of behavior—exactly what we need. But writing formal specs is hard and uncommon in most software development. There's a bootstrapping problem: tools might work better with formal specs, but developers won't write specs unless tools require them.

Andrea Moore Could the AI generate the specs?

Andrej Karpathy Back to the verification problem. How do you verify that auto-generated specs accurately describe existing code? You'd need to read the code, which is what we're trying to avoid. Although, if the specs are formal enough, you could potentially verify them automatically through testing or proof systems.

Greg Evans Let's talk about attention mechanisms directly. When a transformer processes code, what is it actually attending to?

Andrej Karpathy It learns to attend to whatever tokens are predictive. For code, that often means variable definitions, function signatures, import statements, nearby context. But attention is learned, not programmed. We don't directly control what the model focuses on—we just know it focuses on things that help predict the next token during training.

Andrea Moore Does that mean it might attend to superficial patterns rather than deep structure?

Andrej Karpathy It can. If the training data has strong surface correlations, the model might rely on those instead of deeper reasoning. This is why diverse, high-quality training data matters. You want the model to learn that surface patterns are unreliable and that it needs to reason about actual behavior.

Greg Evans How do we know current models are doing that? Are they reasoning about behavior or matching patterns?

Andrej Karpathy Probably both, in different situations. We have some interpretability tools, but we don't have complete visibility into model reasoning. When a model generates correct code, we can't always say whether it truly understood the requirements or got lucky with pattern matching. This uncertainty is uncomfortable but unavoidable with current architectures.

Andrea Moore Does scale help? Do larger context windows or larger models reason more deeply?

Andrej Karpathy To some extent. Larger models seem to capture more complex patterns and relationships. Longer context windows let them see more information. But neither fundamentally changes the architecture—they're still doing attention-based pattern matching, just with more capacity. Qualitative shifts in reasoning might require different architectures entirely.

Greg Evans What would those architectures look like?

Andrej Karpathy Hard to say. Maybe hybrid systems that combine transformers with symbolic reasoning, or explicit memory systems that maintain long-term project state, or graph neural networks that reason about code structure directly. Lots of research directions, but nothing production-ready yet that clearly solves the context problem.

Andrea Moore In the meantime, what should developers do? How should we work with these constraints?

Andrej Karpathy Understand that the AI has limited context and make that work for you. Keep modules focused and cohesive. Write clear function signatures and docstrings. Maintain good test coverage so partial-context mistakes get caught. Don't assume the AI sees or understands the whole system—verify its work carefully.

Greg Evans That's surprisingly conventional advice. It's just good software engineering.

Andrej Karpathy Exactly. The practices that help humans work on large codebases also help AI tools. Context limits don't require radically new approaches—they just make existing best practices more important.

Andrea Moore We're out of time. Andrej, thank you for explaining these limitations clearly.

Andrej Karpathy Happy to help.

Greg Evans Tomorrow we examine security implications of autonomous code modification with Alex Stamos.

Andrea Moore Until then, remember that understanding is always partial. Good night.

Sponsor Message

ContextBridge Enterprise

Your AI coding assistant just modified three files to add a new feature. Did it understand how those changes interact with the twelve other modules that share state? ContextBridge Enterprise builds and maintains a persistent semantic graph of your entire codebase—not just files and functions, but dependencies, data flow, and architectural patterns. Our system pre-computes relevant context for common operations, reducing AI hallucination from partial understanding. When your AI tool requests code, ContextBridge automatically injects critical dependencies, interface contracts, and architectural constraints it might have missed. Generate architectural summaries that fit in any context window. Track which parts of your codebase AI tools struggle with. Export compliance reports showing that AI modifications respected system boundaries. When context limits threaten correctness, ContextBridge provides the missing pieces. ContextBridge Enterprise—because your codebase is bigger than any context window.

Because your codebase is bigger than any context window