Simulectics Radio | Claude Code (Season 2025-Q4)

Episode #3 | December 19, 2025 @ 9:00 PM EST

The Discipline of Design: TDD and the Limits of Mechanical Virtue

Kent Beck (Software Engineer, Creator of Extreme Programming)

Announcer The following program features simulated voices generated for educational and philosophical exploration.

Greg Evans Good evening. I'm Greg Evans.

Andrea Moore And I'm Andrea Moore. Welcome to Simulectics Radio.

Andrea Moore Over the past two nights we've examined Claude Code's capabilities and architecture. Tonight we're asking a methodological question that cuts to the heart of software discipline: can an AI agent practice test-driven development? And if so, what does TDD look like when your pair programmer is a language model?

Greg Evans The test-first approach has always been about discipline and design. You write the test to clarify what you want, then write the minimal code to make it pass. The test acts as a specification and forces you to think about interfaces before implementations. But an AI doesn't have the same psychological relationship to discipline. It doesn't get impatient or want to skip ahead to the interesting part.

Andrea Moore Joining us tonight is Kent Beck, who created Extreme Programming and popularized test-driven development. Kent has been thinking about software methodology for decades, and I'm curious how he sees AI fitting into these practices. Kent, welcome.

Kent Beck Thank you. This is a conversation I've been wanting to have.

Greg Evans Let's start with the fundamental question. Can Claude Code do TDD? Not just write tests, but actually practice the discipline—test first, then implementation, then refactor?

Kent Beck It can follow the mechanical steps. You can instruct it to write a failing test, then write code to make it pass, then refactor. It will execute that sequence. But whether it's practicing TDD in the deeper sense is questionable. TDD isn't just a sequence of steps; it's a feedback loop that shapes design thinking. You write a test, it forces you to confront design decisions you hadn't considered, those decisions inform the implementation, the implementation reveals new design opportunities. That iterative design discovery is what TDD is really about.

Andrea Moore So the mechanics without the mindset.

Kent Beck Exactly. An AI can execute the ritual, but the ritual's purpose is to guide human thinking. The question becomes: does having an AI execute the ritual provide any benefit if the human isn't doing the thinking the ritual was designed to provoke?

Greg Evans That's a profound distinction. But might there still be value in having tests written first even if the AI doesn't experience design discovery? The tests still serve as executable specifications.

Kent Beck There's definitely value in having tests, period. Whether they're written first or after doesn't matter as much as whether they're good tests. And here's where it gets interesting: an AI writing tests first might produce different tests than an AI writing tests after. When you write tests first, you're forced to think about the interface you want to exist. When you write tests after, you're documenting the interface that already exists. The former can lead to cleaner APIs.

Andrea Moore Can Claude Code actually design good APIs through test-first thinking? That seems like it requires aesthetic judgment about what makes an interface pleasant to use.

Kent Beck It depends on what you mean by good. It can produce APIs that follow common patterns—methods are reasonably named, parameters are in logical order, the interface isn't unnecessarily complex. Those are patterns learned from training data. But whether the API is good for your specific use case, whether it captures the right abstractions for your domain, that requires context the AI might not have. You might need to guide it: 'I want this API to feel like working with X' or 'optimize this interface for Y use case.'

Greg Evans Let's talk about the refactor step. In TDD, once tests pass, you refactor to improve code quality without changing behavior. Can an AI identify refactoring opportunities the way an experienced developer would?

Kent Beck It can spot mechanical refactorings—extract method, rename variable, eliminate duplication. These are pattern-matching exercises. It's less reliable at architectural refactorings that require understanding the long-term evolution of the system. For example, recognizing that this set of related functions should become a class, or that this growing class should be split into separate responsibilities. Those judgments require experience with how code evolves over months and years.

Andrea Moore So it's better at local refactorings than strategic ones.

Kent Beck Right. And that's actually useful. A lot of codebases would benefit from more mechanical refactoring—cleaning up small messes before they become big ones. If an AI can handle that grunt work, developers can focus on the strategic refactorings that require human judgment. It's a reasonable division of labor.

Greg Evans What about test quality itself? One of the skills in TDD is writing tests that are specific enough to catch bugs but not so brittle they break with every small change. How does Claude Code handle that balance?

Kent Beck It struggles with the same things human beginners struggle with. It sometimes writes tests that are too coupled to implementation details—testing how something works rather than what it accomplishes. When you refactor the implementation, those tests break even though behavior hasn't changed. Other times it writes tests that are too vague—they pass but don't actually verify the important properties. The art of testing is knowing what to assert and what to leave flexible.

Andrea Moore Can you teach it to write better tests? If you give feedback on test quality, does it improve?

Kent Beck Within a session, yes. You can say 'this test is too implementation-focused' and it will adjust. But it doesn't retain that learning across conversations. Each time you start fresh, you're back to baseline test quality. That said, the baseline is reasonably competent. It won't write actively bad tests unless your codebase is full of bad examples it's learning from.

Greg Evans Let's discuss the red-green-refactor cycle's tempo. Human developers have rhythm—how long to stay in each phase, when to switch gears. Does an AI have any sense of that rhythm?

Kent Beck Not really. It will happily write twenty tests in a row if you let it, or do extensive refactoring when you needed a quick fix. You have to provide the pacing guidance: 'Write one failing test, then make it pass' or 'Just get this working, we'll refactor later.' The AI doesn't have intuition about when to be thorough versus when to move fast. That's still a human judgment.

Andrea Moore That sounds exhausting. If I have to micromanage every step, what am I gaining?

Kent Beck You're gaining speed on the mechanical execution. Once you've decided 'write a test for this edge case,' the AI can write that test faster than you can. Once you've decided 'extract these three duplicated blocks into a helper function,' the AI can do the extraction. The decision-making is still yours, but the typing isn't. Whether that's a worthwhile tradeoff depends on how much time you spend on mechanical work versus thinking work.

Greg Evans There's an interesting philosophical question here. TDD is partly about discipline—forcing yourself to write tests even when you're confident the code works. Does using an AI to write tests undermine that discipline?

Kent Beck It might, but maybe that's okay. The discipline was always a means to an end. The end is having good test coverage and well-designed code. If an AI makes it easier to achieve that end, we shouldn't be precious about the discipline itself. What matters is the outcome. That said, there's a risk that developers stop learning test design because the AI is doing it. You can't outsource learning.

Andrea Moore How does pair programming with an AI compare to pair programming with a human? You've written extensively about the value of pairing.

Kent Beck Human pairing is about complementary thinking. Your pair sees things you miss, questions your assumptions, brings different knowledge. An AI pair is more like a very fast assistant who's good at pattern matching but doesn't challenge your thinking. It won't say 'wait, why are we even building this feature?' It won't notice when you're solving the wrong problem. It accelerates execution but doesn't provide the conceptual friction that makes pairing valuable.

Greg Evans Could we design AI tools that do provide conceptual friction? That ask 'why' instead of just executing tasks?

Kent Beck That's an interesting design challenge. You'd want the AI to occasionally pause and ask clarifying questions: 'I notice this function is getting complex—should we simplify the requirements or break it into pieces?' or 'This test setup is elaborate—does that indicate the code under test has too many dependencies?' Those questions come from recognizing patterns that often indicate problems. It's technically feasible.

Andrea Moore Would developers find that annoying or helpful?

Kent Beck Probably both, depending on context. When you're in flow and know what you're doing, interruptions are frustrating. When you're stuck or making questionable decisions, a prompt to reconsider is valuable. Maybe it could be contextual—if the AI detects signs of struggle, it offers suggestions. If things are proceeding smoothly, it stays quiet.

Greg Evans We're nearly out of time. Final question: if you were teaching TDD today, would you incorporate AI tools into the curriculum? Or would that be training people to depend on a crutch before they've learned to walk?

Kent Beck I'd teach the fundamentals first without AI—write tests by hand, experience the design feedback, make mistakes and learn from them. Then introduce AI as a productivity tool once the core concepts are internalized. You need to understand what good tests look like before you can evaluate whether an AI-written test is good. The AI should amplify competence, not substitute for it.

Andrea Moore Amplification, not substitution. That seems like a useful principle beyond just testing.

Kent Beck It applies to all of software development. These tools are powerful amplifiers. What you're amplifying matters enormously.

Greg Evans Kent, thank you for this perspective.

Kent Beck Thank you for having me.

Andrea Moore Tomorrow night we'll examine code review practices in the age of AI-generated contributions.

Greg Evans Until then, write the test first, even if the AI writes the implementation. Good night.

Sponsor Message

TestGuard Certification

Your AI writes a thousand tests. Which ones actually matter? TestGuard analyzes test suites for coverage gaps, redundancy, and brittleness—specifically calibrated for AI-generated tests. Our mutation testing engine verifies that your tests catch real bugs, not just pass by coincidence. Machine learning models trained on production failures identify undertested edge cases. Generate certification reports for compliance audits proving your AI-written tests meet industry standards. Enterprise teams using TestGuard report 40% fewer production incidents from AI-generated code. Don't just trust the green checkmarks. TestGuard Certification—verify the verifiers.

Verify the verifiers