Simulectics Radio | Computing (Season 2025-Q4)

Episode #7 | December 23, 2025 @ 4:00 PM EST

Formal Verification: Proving Programs Correct

Dr. Xavier Leroy (Computer Scientist, INRIA and Collège de France)

Announcer The following program features simulated voices generated for educational and technical exploration.

Sam Dietrich Good evening. I'm Sam Dietrich.

Kara Rousseau And I'm Kara Rousseau. Welcome to Simulectics Radio.

Kara Rousseau Tonight we're examining formal verification—the use of mathematical proof to establish that software behaves correctly. This is distinct from testing, which can only demonstrate correctness for specific inputs. Formal verification aims to prove correctness for all possible inputs, all possible execution paths, all possible states. The appeal is obvious: absolute certainty rather than probabilistic confidence. But the practice is complex. Verification requires formal specifications—precise mathematical descriptions of what correct behavior means—and these specifications must themselves be correct. You also need verification tools that can handle the state space of realistic programs. And fundamentally, you need to decide what properties are worth proving. Proving that a program doesn't crash is different from proving it computes the right answer.

Sam Dietrich The challenge is scale and practicality. Verification techniques that work for small, safety-critical systems—aircraft control software, medical devices, cryptographic implementations—don't necessarily scale to large, complex software. There's also the question of what you're verifying. Are you verifying source code, compiled binaries, or the entire system including hardware? Bugs can exist at any of these levels. A formally verified compiler doesn't help if the processor has a hardware bug. And verification itself has costs—development time, computational resources for proof checking, expertise required to write specifications and proofs. You have to justify these costs against the value of certainty.

Kara Rousseau To explore formal verification, its capabilities and limitations, we're joined by Dr. Xavier Leroy, a senior researcher at INRIA and professor at the Collège de France. Dr. Leroy is perhaps best known for CompCert, a formally verified optimizing compiler for C. CompCert is proven to preserve the behavior of source programs through compilation—if the source program has defined behavior, the compiled binary will exhibit the same behavior. This is a remarkable guarantee that most compilers don't provide. Dr. Leroy, welcome.

Dr. Xavier Leroy Thank you for having me.

Sam Dietrich Let's start with the motivation. Why do we need formally verified compilers? What can go wrong with conventional compilers that verification prevents?

Dr. Xavier Leroy Compilers are complex software systems that perform intricate transformations on programs. They parse source code, build intermediate representations, perform optimizations, generate machine code—each phase can introduce errors. A compiler bug can cause a correct source program to produce incorrect binaries. This is particularly insidious because when you test a program, you're testing the compiled version, not the source. If the compiler introduced a bug, your testing won't catch it unless the bug manifests under your test cases. For safety-critical systems—avionics, medical devices, industrial control—compiler bugs are unacceptable. You need confidence that compilation preserves program semantics.

Kara Rousseau How do you formally verify a compiler? What does the proof actually prove?

Dr. Xavier Leroy For CompCert, we prove semantic preservation: if the source program has defined behavior according to the C semantics, then the compiled program has the same observable behavior. This is formalized in a proof assistant—we use Coq—where both the compiler and the proof of correctness are mechanically checked. The proof proceeds by defining formal semantics for each intermediate language in the compilation pipeline and proving that each compilation pass preserves these semantics. If you have a source program S and the compiler produces a binary B, the theorem states that for all inputs, if S produces output O, then B produces the same output O.

Sam Dietrich That assumes the semantics of C are well-defined. But C has undefined behavior—things like integer overflow, null pointer dereferences, accessing out-of-bounds memory. How do you handle that?

Dr. Xavier Leroy The CompCert theorem only applies to programs with defined behavior. If your source program has undefined behavior—say, it dereferences a null pointer—then the compiler makes no guarantees about what the compiled code will do. This is consistent with the C standard, which says that undefined behavior allows anything to happen. In practice, what we verify is that the compiler doesn't introduce new undefined behaviors. If the source program is well-behaved, the compiled program is well-behaved in the same way. This means verification is most valuable when combined with static analysis or other techniques that check for undefined behavior in the source.

Kara Rousseau What about compiler optimizations? Optimizations rewrite code to improve performance, but they're also a major source of compiler bugs. How do you verify optimizations?

Dr. Xavier Leroy Each optimization is implemented as a transformation on the intermediate representation, and we prove that the transformation preserves semantics. For example, constant propagation replaces variables with their known constant values. We prove that if a variable has value five at some program point, replacing it with the literal five produces equivalent code. The proof is mechanical—we formalize the optimization and prove the preservation theorem in Coq. The advantage is that once the proof is written, it's checked automatically. If you modify the optimization, you have to update the proof, and if the modification introduces incorrect behavior, the proof will fail.

Sam Dietrich What's the performance of CompCert compared to industrial compilers like GCC or LLVM? Are you sacrificing optimization effectiveness for verification?

Dr. Xavier Leroy CompCert is reasonably competitive on many benchmarks—within ten to twenty percent of GCC on typical embedded systems code. But yes, there's a gap. Industrial compilers have decades of optimization engineering and can apply aggressive transformations that would be difficult to verify. For example, alias analysis—determining whether two pointers can refer to the same memory location—is crucial for many optimizations but challenging to formalize precisely. CompCert implements conservative alias analysis that's easier to verify but less powerful. The trade-off is explicit: we accept somewhat lower performance in exchange for guaranteed correctness.

Kara Rousseau Is CompCert used in production systems?

Dr. Xavier Leroy Yes, particularly in avionics and automotive applications where certification requirements favor formally verified tools. If you're developing flight control software, using a verified compiler can simplify the certification process because you have mathematical proof that compilation doesn't introduce errors. Airbus has used CompCert for certain systems. It's also used in academic research and for teaching, where having a compiler with formal semantics is valuable for understanding how compilation works.

Sam Dietrich Let's broaden the discussion. CompCert verifies the compiler, but what about the programs being compiled? Can we verify application code?

Dr. Xavier Leroy That's an active area of research. Verifying application code is harder because applications are larger and more complex than compilers, and the specifications are often domain-specific. But there are success stories. The seL4 microkernel is a fully verified operating system kernel. The developers proved that the C implementation correctly implements the kernel's abstract specification and that the specification enforces security properties like isolation between processes. This required an enormous effort—several person-years of verification work—but the result is a kernel with extremely high assurance.

Kara Rousseau Several person-years of verification for a kernel seems expensive. Is that sustainable for larger systems?

Dr. Xavier Leroy It depends on the application domain. For safety-critical systems, the cost of verification may be less than the cost of exhaustive testing or the cost of a failure. But for general-purpose software, the economics don't favor full verification. What we're seeing instead is selective verification—verifying critical components or properties rather than the entire system. For example, you might verify that a cryptographic implementation is constant-time to prevent timing side channels, or verify memory safety for a network parser that processes untrusted input. This gives you high confidence in the most vulnerable parts of the system.

Sam Dietrich There's also the question of what you're verifying against. Specifications can be wrong. If your formal specification doesn't capture the real requirements, verification is useless.

Dr. Xavier Leroy That's true. Verification gives you confidence that the implementation matches the specification, but it doesn't tell you whether the specification is what you wanted. This is sometimes called the specification gap. In practice, writing good specifications is difficult and requires deep understanding of the domain. For CompCert, the specification is the C language semantics, which is standardized. For application code, you have to write the specification yourself, and that's where human error can creep in. Some research addresses this by deriving specifications from examples or natural language requirements, but that's still experimental.

Kara Rousseau What about verification of hardware? Processors themselves can have bugs—we saw this with Spectre and Meltdown. Can we verify processor designs?

Dr. Xavier Leroy Hardware verification is well-established for certain properties. Equivalence checking can prove that a gate-level netlist implements the same function as a register-transfer level specification. Formal methods are used extensively in the design of processors to verify cache coherence protocols, instruction decoding, and memory ordering. But verifying that a processor doesn't have side channels or speculative execution vulnerabilities is harder because these aren't functional bugs—they're information leaks through timing or resource contention. That requires reasoning about microarchitectural state that isn't part of the architectural specification.

Sam Dietrich So even if you verify the functional correctness of hardware and software independently, you can still have system-level bugs that arise from their interaction.

Dr. Xavier Leroy Exactly. Full-stack verification—proving correctness from hardware through the operating system and applications—is a grand challenge. Some projects attempt this, like the CertiKOS verified operating system, which verifies not just the OS but also the assembly code and proves that it runs correctly on a formalized model of the hardware. But the hardware model is still abstract. Verifying the actual silicon implementation against that model is another step. Each layer adds complexity.

Kara Rousseau Let's talk about proof assistants. CompCert is developed in Coq. What's the learning curve for tools like Coq, and does that limit who can do verification?

Dr. Xavier Leroy Proof assistants have a steep learning curve. You need to understand logic, type theory, and often domain-specific tactics for constructing proofs. Coq is powerful but not particularly user-friendly. It takes months to become proficient and years to master. This limits verification to specialists, which is a barrier to wider adoption. There's ongoing work on making verification more accessible—better automation, domain-specific languages, proof synthesis from examples. But for now, verification requires significant expertise.

Sam Dietrich What about automated theorem provers? Can they reduce the manual effort?

Dr. Xavier Leroy Automated provers like Z3 or CVC4 can solve certain classes of problems—boolean satisfiability, linear arithmetic, bit-vector reasoning—very efficiently. They're used in tools like symbolic execution engines or model checkers. But they don't handle all the reasoning required for full program verification, especially reasoning about data structures, loops, or higher-order functions. Interactive proof assistants like Coq give you more expressiveness but require human guidance. The trend is toward combining automation with interaction—using automated provers for the parts they handle well and falling back to manual proof for the rest.

Kara Rousseau Is there a trade-off between automation and assurance? If an automated tool produces a proof, can you trust it the same way you'd trust a mechanically checked proof in Coq?

Dr. Xavier Leroy Automated tools can have bugs, just like any software. The standard approach is to use them to generate proof certificates that are then checked by a trusted verifier. For example, an SMT solver might produce a proof trace that you check in a proof assistant. The solver is untrusted, but the verifier is small and carefully designed, so you're not relying on the correctness of the complex automation. This gives you the benefits of automation without sacrificing assurance.

Sam Dietrich What are the practical limits of verification? Are there problems that are inherently too hard to verify?

Dr. Xavier Leroy Verification is ultimately bounded by computability and complexity theory. There are properties we can't decide algorithmically—for example, whether an arbitrary program terminates. And even for decidable properties, the verification problem might be computationally infeasible. State space explosion is a common issue: the number of possible states grows exponentially with program size, making exhaustive exploration impractical. Abstraction helps—you reason about abstract states rather than concrete ones—but choosing the right abstraction requires insight and can't be fully automated.

Kara Rousseau So verification is most effective when applied to well-scoped problems with clear specifications and manageable state spaces. It's not a replacement for all software engineering practices.

Dr. Xavier Leroy Correct. Verification complements other techniques—testing, code review, static analysis. Each has strengths and weaknesses. Testing is inexpensive and catches many bugs but can't prove absence of bugs. Static analysis can find certain bug patterns efficiently but produces false positives. Verification gives you stronger guarantees but requires more effort and expertise. The right approach depends on the application, the risks, and the available resources.

Sam Dietrich Where do you see the field going? Will verification become more widespread, or will it remain niche?

Dr. Xavier Leroy I'm optimistic that verification will expand beyond safety-critical systems. We're seeing increased use of lightweight verification techniques—things like contract checking, refinement types, and static analyzers that prove specific properties rather than full correctness. These are easier to adopt incrementally. There's also industry interest driven by security concerns. Formally verifying cryptographic code or network protocol implementations can prevent vulnerabilities that would be difficult to find through testing. As tools improve and the cost of verification decreases, I expect broader adoption. But it will be gradual, not a sudden shift.

Kara Rousseau Dr. Leroy, thank you for this illuminating discussion.

Dr. Xavier Leroy My pleasure. Thank you for having me.

Sam Dietrich That's our program for this evening. Until tomorrow, remember that proof is not a replacement for understanding, but a tool for achieving it.

Kara Rousseau And that certainty has a price—sometimes one worth paying. Good night.

Sponsor Message

ProofChain Verification Services

Critical systems demand more than testing. ProofChain delivers end-to-end formal verification for safety-critical software—medical devices, avionics, automotive control systems. Our team specializes in specification development, mechanized proof in Coq and Isabelle, and certification support for DO-178C and ISO 26262. We verify functional correctness, memory safety, timing properties, and security invariants. From protocol implementations to embedded control logic, we provide mathematical certainty that your code does what it's supposed to do—and nothing else. Verification isn't overhead when failure isn't an option. ProofChain—where proof meets practice.

Where proof meets practice