Transcript: ‘The AI Model Built for What LLMs Can’t Do’

‘AI & I’ with Eve Bodnia

Like Comments

The transcript of AI & I with Eve Bodnia is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

  1. Introduction: 00:00:51
  2. Why correctness and verifiability matter in AI: 00:02:09
  3. What an energy-based model is: 00:09:33
  4. How EBMs construct energy landscapes to understand data: 00:14:21
  5. Why modeling intelligence through language alone is a flawed approach: 00:19:00
  6. What it means for a model to “understand” data: 00:26:54
  7. How EBMs solve the vibe coding problem and enable formally verified code: 00:37:21
  8. Why LLM progress is plateauing: 00:43:21
  9. Mission-critical industries haven’t adopted LLMs, and why EBMs can fill that gap: 00:49:54

Transcript

Dan

Eve, welcome to the show.

Eve

Hi. Thanks for having me.

Dan

Great to have you on. For people who don’t know, you are the founder and CEO of Logical Intelligence. Tell us what Logical Intelligence does.

Eve

Logical Intelligence does a few things. First of all, we see ourselves as a foundational AI company. We work with both EBMs and LLMs. Everything we’ve built in-house we prototyped on LLMs initially, and we’re building EBMs at the same time—those get plugged in over the long term.

We’re focused on correctness of software and hardware as a product, because I believe there are a lot of issues with AI being placed in mission-critical systems today. Can we do code generation? Can we do chip design? The answer is yes—people use LLMs today. But very few are actually questioning whether the results are correct, whether what’s produced actually makes sense. There’s a big gap in the market around deterministic, verifiable AI, and we’re trying to fill that gap.

Dan

Where my brain goes first is: why does correctness, or whether something makes sense, actually matter if it works?

Eve

Let me ask you a question back. Imagine there’s AI driving a car and you’re in that car, and the car is running on an LLM, and someone tells you that 20% of the time it’s going to hallucinate and you might end up in the wrong place. How would you feel about that?

Dan

In my case, I’d be like, wow, that’s kind of interesting—I’m curious where it takes me.

Eve

Okay, let me give you another example. What about a plane? You’re flying from San Francisco to New York and someone says 20% of the time the next word isn’t going to match and the plane is going to go down. How would you feel?

Dan

My feeling is that planes are currently run very well by deterministic systems, so I’m not sure why I’d need an AI for that.

Eve

I feel like we just can’t avoid AI anywhere over the next 10 years. People are going to try to place AI everywhere, automate systems with it. Technically, you might not need it—we survived without AI up to this point—but it’s the next step in an evolution that people want. For banking, you didn’t need AI initially, but we learned it’s really helpful to automate certain processes and decision-making. It saves a lot of time and creates space to be creative instead of constantly debugging and fixing things. I just feel like it’s an unavoidable future.

Dan

What I’m getting at is—it seems like if you want a guarantee of certainty, the only way to achieve that is to use something you can express in code or logic.

Eve

That’s part of it. For us, certainty comes from both internal and external verifiers. If you take an LLM, the architecture doesn’t allow for internal verifiers—it’s a black box. You don’t have access to what’s happening inside until everything is fully processed. You only have access to the output.

Many companies take an LLM, train it for certain tasks, and if it requires logic, they attach external verifiers—languages like Lean 4, which is a machine-verifiable proof language that lets you check output using mathematical frameworks. But that doesn’t solve the cost problem, because the architecture is still playing a guessing game. Even with an external verifier, even if you fine-tune the LLM for a specific task, you’re still not solving the problem of tokens being expensive. It takes compute to play that guessing game.

EBMs solve that problem differently. EBMs don’t have tokens. There’s no guessing game of that kind. You can essentially oversee all possible scenarios.

Dan

Can you define EBM for us?

Eve

I’ll define it in a second. For now, just think of it as something that doesn’t play a guessing game—something whose architecture allows it to self-align as it processes information. It’s no longer a black box. As it’s performing, you can open it at any time during training and see what’s happening inside. You can’t do that with LLMs. The nature of the architecture is fundamentally different.

So for verification tasks, you have this notion of self-alignment because of the EBM architecture, and the absence of tokens makes it cheaper. And then you also have an external verifier on top of that. Verification on both sides—inside and outside.

Dan

Let me play that back to you. We’re living in a world with LLMs where we can generate a lot of output, and that output is useful for a wide range of things. But to tell if that output is right, the best we can do is guess and check—we generate output and then, if it’s code, we go check it with integration tests or manual tests or whatever. That totally works, but it’s expensive and time-consuming. And one of the core problems is that it’s very hard to know how the LLM arrived at its answer. We can’t look inside it.

Eve

Exactly.

Dan

And what you’re saying is there are other types of models that are more inspectable—ones that give us a sense, before we even run the output, of whether it works. We can look at the model’s internals and understand: how good does the model think this solution is? It’s like being able to ask someone, “Are you sure about this?”—before you go check their work. A language model can answer that question, but at a different level than an EBM does. The answers from EBMs are more likely to be actually correct.

Eve

Yes. With EBMs, you always have the opportunity to see what’s inside. You control the training—it’s no longer a black box. You can do that in real time. With LLMs, you need to wait until training is done before you go look inside. And you can attach the same external verifiers that work for LLMs on top of EBMs, so you get double verification.

You asked me what an EBM is. I want to give a little historical context, because there are so many terms being thrown around today without being defined.

EBM simply means energy-based model. The concept of “energy-based” comes from physics. In theoretical physics, if you’re writing Lagrangians—which correspond to the energy terms in a system, like kinetic energy and potential energy—you’re trying to derive the equations of motion by minimizing that energy. That’s essentially how all of theoretical physics works: you start with energy terms, minimize the energy, and derive equations of motion. Those equations of motion give you conservation laws, so you know exactly what the rules of your system are.

This principle is fundamental—everything around us wants to minimize energy. We’re sitting in chairs talking instead of jumping around because that’s the natural state: minimized energy. We’re using that minimization principle for how AI processes information.

(00:10:00)

Our model is formally called the Energy Based Reasoning Model with Latent Variables, though we call it Kona—we like coffee culture and Kona is a favorite. Let me walk through exactly what those words mean.

Dan

Before you do, I want to make sure people understand what energy minimization actually means. Is this a good concrete example: I’m going to go lie on the couch behind me. My body is uneven, the couch is uneven, and I’m trying to understand how my body is going to end up settling into it, given the laws of gravity. I’ll end up settling in a way that minimizes energy—a good fit between my body and the couch, rather than being all jerky with lots of gaps. Is that the kind of energy minimization you’re talking about?

Eve

Yes. It’s all about your body finding the most comfortable configuration—the one that corresponds to the lowest potential of your body. I’d go even higher level. Imagine you’re tired, Dan—you’ve done thousands of podcasts and you’ve just come home. Let’s say Dan is a variable. We’re trying to figure out his equations of motion around the house: where is he most likely to end up?

You’re probably going to end up on the couch with a nice show and maybe a drink.

Dan

Yeah.

Eve

So that becomes a rule: when Dan is tired, he goes to the couch and relaxes. But to get there, we look at all your possible states—washing dishes, walking around the house. Those are different states, but your most probable scenario is the couch. All of this can be mapped into what we call an energy landscape. It looks like a map with high points and low points. High points correspond to less probable scenarios—if you’re tired, you’re probably not going to be dancing around. Low points correspond to more probable ones. As we figure out where you end up during training, we observe you multiple times, across different days, with varying workloads and internal states. Eventually we train that landscape based on what we see in the real world. The lowest point is you on the couch.

Dan

That makes total sense. Now I want to relate this to LLMs, because you could imagine an LLM trained to predict where I end up after a long day of podcasts—and it would probably also predict the couch. What are the differences in how each approach makes those predictions, and why does it make energy models better for certain scenarios?

Eve

Let’s go back to EBMs, because what we just described is very natural for them. EBMs are all about constructing energy landscapes, navigating them, and using those landscapes as maps of states derived from observed data. In your case, we’d look at all your possible scenarios, map them into an energy landscape—highest points for less probable scenarios, lowest points for the most probable. So it’s very probable you end up on the couch.

There might be some other low points too—sometimes when you’re tired, you might go to the gym. So there could be multiple low points, but some will be lower than others. That’s how an energy-based model thinks: take the data, map it directly to an energy landscape, then use certain algorithms to navigate that structure. And crucially, there are no tokens. We’re not predicting any next token. That’s already a fundamental difference.

How would an LLM think about this? It would rely on a lot of training data—a lot of observations of your behavior—and figure out where you’d end up by attaching probabilities to your next token. And what bothers me about LLMs is that they produce intelligence that is language-dependent.

My own thought processes don’t depend on any particular language. I can think abstractly and then decode that information into different languages. With LLMs, if you’re searching for the next token in French, the information processing is going to be different from English, just because words naturally end up next to each other differently. And we have so many languages in the world, with so many LLMs trained on different ones. You end up having different reasoning processes for each language, which feels fundamentally wrong.

Observing you walking around your house has nothing to do with language. It’s a pure visual-spatial reasoning task—just looking at your body navigating the space, time, and geometry of your home. We need to map that information into language space, find the right words and embeddings, and then start associating those tokens with probabilities based on what we observe. We’re trying to map something that has nothing to do with language into language space and reason about it there—which feels really wrong.

(00:20:00)

Dan

There’s a lot here. I think it’s absolutely right that there are many ways we process information and many forms intelligence can take, and only a few of them are verbal. But something comes up for me: language models work with sequences of tokens, and those tokens have many thousands of weak correlations between them that help us know which comes next. So even if it’s unintuitive to model my behavior inside my apartment using language specifically, we could model it as just a sequence of movements—one movement weakly correlated to the next—that gives us a trajectory telling us where I’m going. Why isn’t that a good approach?

Eve

It’s a good approach—and you don’t need an LLM for it. You need a form of AI that isn’t attached to language, but can be compatible with language if you want it to be. That’s what our model is about.

Dan

I guess what I’m saying is, even forgetting the language part, modeling my movements as a string of correlated events—an event stream where each token is the next thing I do—

Eve

You can do it. People do it today. People even do image recognition using language models. You can be really creative. But that’s what makes it expensive and slow—you’re playing a guessing game about what the next token could be, and that’s what makes it extremely costly. You could do it, but you don’t have to. You can use a different architecture that’s more suitable for non-language-related tasks—spatial reasoning, for example, or applied engineering. When you build a bridge, you don’t go to the literature department—you go to engineering school and learn formal methods.

We’re trying to use the literature department for everything, and I’m saying we don’t have to. EBMs exist, and other forms of AI exist, that don’t require everything to be routed through language.

And it’s really just energy-based minimization when it comes to your resources. If you have infinite money and don’t care about timescale, sure—you can attach everything to language. But if you want to minimize resources and you can’t wait, like when AI is controlling circuits and you need responses in microseconds, that form of AI simply isn’t suitable for those tasks.

Dan

So basically, if I’m spending tons and tons of tokens, I’m looking for a more efficient, more direct way to arrive at solutions to certain problems. An energy-based model gets me there faster. Is it also able to work with less training data?

Eve

Yes. The beauty of EBMs is that they’re really good at working with sparse data. There are evolutions of traditional EBMs that were applied alongside LLMs—then came diffusion models, which emerged precisely because sometimes you don’t have enough data to train a model or your dataset is incomplete. There are ways to reconstruct energy landscapes by injecting certain noise and changing navigation strategies. That’s what diffusion models were about.

The Energy Based Reasoning Model with Latent Variables takes that further: on top of the diffusion approach, the model also tries to understand the data. It’s not just taking data—it’s asking why the data looks the way it does. That understanding goes into the latent variables. Just like the latent space in your brain understands the world around you, keeps you on top of your tasks, and allows you to predict and plan—it’s the same idea here.

Dan

So now we’ve gotten to the latent variable part. When you use the word “understanding,” I think that must mean something very specific to you. Can you help me understand that and how it relates to latent variables?

Eve

That also goes back to your question about how LLMs differ from the kind of EBMs we’re creating.

LLMs don’t understand data. You feed a lot of data into them and they essentially say, “I’ve got it—here’s the most probable scenario.” With an EBM, you can feed a lot of data, and it’s not just going to look for the biggest pattern. It’s going to try to understand the pattern, and that understanding, that knowledge, goes into the latent variables.

What does it mean to understand data? It’s just basic knowledge about the world—basic rules. If there’s a couch behind Dan, it’s probably because he likes to sit on it or likes it as a background. You can infer little rules about you as a data point and the couch as a data point. You can try to create rules like this for everything: navigating your apartment—there’s a kitchen for cooking, a bathroom, a sofa, a bed. That understanding allows you to have your own mental world model, which helps you navigate your environment. If something changes—say someone brings you a different couch—you still know what to do with it, because you understand the rules. That’s how you can infer what to do with something new based on what you already know.

(00:30:00)

With people, this comes naturally through evolution. With AI, we need to teach it—we need to mimic that evolution. What latent variables allow is: look at the data, but also try to understand it. If you’re dealing with numerical analysis, look at all possible correlation functions, and the model will creatively try to figure out the total state of the energy, minimize it, and discover the laws about your data.

Dan

Is a latent variable equivalent to a rule in this scenario—like “if there’s a couch in my apartment, I sit in it”?

Eve

It’s not equivalent to a rule, but it’s equivalent to something that holds knowledge about the rules of your data. It’s like a knowledge storage.

Dan

So one variable holds many different rules.

Eve

Yes. Think of it as a knowledge dataset about your data.

Dan

Is it an explicit dataset—like key-value pairs of rules—or is it more like…

Eve

It’s in the form of an energy landscape—another landscape you navigate. We take the data, construct a structure for the AI to work with so it can start learning the rules, and once it understands the rules it stores that knowledge in the latent variables, in the form of an energy landscape. Then we navigate that energy landscape later.

Dan

Interesting. Could it, theoretically, explicitly write out all the rules it knows? Or does it store them in the energy landscape in a way that’s not directly readable?

Eve

We can access that. And that’s what makes EBMs potentially powerful for data analysis—data analysis is all about searching for patterns and rules in your data. Language isn’t helpful when you’re trying to attach rules about data that consists of numbers, relationships, and functions to American English words and then search for the next word. You lose a lot of information. Here, you have the opportunity to work directly with the data and understand it.

Dan

One of the things I’m trying to understand is that when I hear about models of the world and how things relate to each other, I think of symbolic AI—and those approaches ended up being pretty brittle and requiring too much compute. I’m wondering how an energy landscape that stores a bunch of rules about the world doesn’t fall into the same problems.

Eve

We avoid tokenization. We just map directly into a different data structure. EBMs are naturally non-autoregressive. There are no sequences of tokens, and that’s what makes it fundamentally different.

Here’s another analogy. Imagine you’re trying to navigate a map of San Francisco, and you have an LLM brain. You can only choose one direction at a time. You’re walking along the Embarcadero, making one turn at a time, with tunnel vision. You’re allowed to choose one direction at a time, and sometimes you take the wrong turns because you hallucinate. There might be a hole in the road and you’re just going to fall—you might even see the hole, but you can’t turn back because you’re autoregressive. You have to go forward.

This is why sometimes you prompt an LLM and it doesn’t give you an answer—it’s searching and searching, spending more and more compute, without a bird’s-eye view. It doesn’t have the ability to turn mid-task. It doesn’t know what’s right and what’s wrong anymore. It just randomly picks one direction at a time and keeps walking until it either reaches the destination or doesn’t.

An EBM has the bird’s-eye view at all times. You’re allowed to take different routes. If you see there’s a hole, you choose a different route.

Dan

That’s really interesting. I’ve been doing a lot of coding with language models recently, testing the limits of vibe coding. One of the things I find with big production apps is that over the course of vibe coding something, you may have slightly shifted your sense of what the project is even supposed to be about—what problems you’re trying to solve. If you then look at the codebase, all the code is locally correct, but it forms this patchwork of hot fixes and workarounds. If you zoomed out, you’d realize there’s a much simpler, unified approach—but the model gets distracted by whatever it’s looking at in the current moment. Is that the sort of problem this type of system can help with?

Eve

There are actually several problems in what you’re describing. Solving the problems with vibe coding is one of our use cases. We dream about generating formally verified code and automating coding entirely—moving you from vibe coding in a specific programming language to coding in natural language. You could code in plain English, with no C++ or Python required.

Vibe coding in its current state: yes, you prompt LLMs and they give you something back, but it’s still on you as an engineer to figure out what’s right and what’s wrong. What we’re working toward is a set of rules, with an LLM or EBM helping you check whether the new logic you’re introducing is compatible with the existing logic in your codebase—whether it compiles, whether it’s mathematically consistent. External verifiers can do this. They can say: we know the old logic, we know the new logic, we’re going to see how they merge. We’ll write a mathematical proof confirming the logic is compatible with what you already have and provide you a certificate. It’s all machine verifiable—it happens at the compilation level. The system sends you a message in natural language: “This part of your code is not compatible by logic. Here’s potentially how you fix it. And here’s what we can’t fix for you.”

(00:40:00)

So we’re moving you from vibe coding to vibe code specifications—rules and information about your code become the code specification.

That’s the first problem: logic incompatibility with what you already have. The second problem is: is this code actually doing what you want it to do? And that’s something AI cannot solve for you, because AI cannot look inside your brain. Imagine you’re coding a self-driving car autopilot. You have hardware specifications, logic specifications, and behavior parameters—how the car is supposed to behave. Whether the code compiles is one problem. Whether it does what you want is another. And then there are further questions: is it fast enough on the hardware? Will it hit a pedestrian? Will it actually navigate the streets of San Francisco?

For those behavioral questions, you need to write a bunch of tests and evaluate the entire system. And this is another form of specifications. Sometimes we can guess at the behavior if we have enough data—another LLM or EBM can propose what people who’ve built similar systems have typically looked for. But if you’re doing something entirely new and there’s no data for it, it’s going to be on you to specify the behavior.

This is where it gets personal for me. If you have an LLM driving something mission-critical—a car, a plane—it can misbehave, because you can’t fully constrain it. It hallucinates. An EBM can be constrained. You can define a set of constraints and the EBM is forced to follow them. It’s on you as a human to know what you want the AI to do. From our end, we make sure the AI always obeys the rules given by humans.

And this goes beyond cars and planes. Sometimes a model can say something deeply harmful to someone struggling with depression. Even language can be dangerous. What we’re also solving is this problem of AI sometimes behaving unpredictably in different environments. With EBMs, we do know how they’ll behave. The architecture is designed to be constrained, and there are formal ways to enforce those constraints.

Dan

So it sounds like you have a really promising architecture and models you’ve built—something quite different from the predominant paradigm right now, where companies are pouring hundreds of billions of dollars into data centers and LLM training. What do you think about the current state of the industry, and investment in LLMs versus other approaches?

Eve

It’s an ecosystem, especially Silicon Valley. LLMs were historically the first form of AI that gave us an “aha” moment—starting around 2021 and especially 2023, when they started appearing and people thought, this is the future. So people started believing: if it’s really good at talking to me, eventually it’ll be good at doing data analysis, my taxes, and everything else. Investment communities started pouring money in.

Now people are seeing that as you grow the compute and tweak the architecture a little, it’s kind of reaching a plateau. And there’s so much money already in this space—billions of dollars. You can’t just say, “Let’s dismiss it, let’s pour money into something new.” Nobody thinks that way. We probably don’t have enough capital in this economy to make decisions like that.

That’s why it’s so hard for the investment community to step back and say, “This isn’t working for some tasks—maybe I should invest in something radically new.” I’m not saying people don’t do it, but percentage-wise, it’s a lot smaller. What people feel comfortable with is taking something LLM-based with a few novel elements—still LLM-based enough that they can reuse their existing portfolio companies. And I understand that. If I were an investor, I’d always look at which variables reduce risk and how I can reuse what I already have.

So it’s natural to keep investing in LLM architectures. But there’s also a lot of big tech companies forming circular dependencies here—LLM companies, data center companies, hardware companies, all locked together into one giant ecosystem that’s nearly impossible to break.

When we came along with an alternative architecture, we decided not to position it as something radically different that requires abandoning LLMs. We’re very compatible with LLMs. You can put an LLM on top of us. EBMs are compatible with transformers. We can be a layer that sits beneath your LLM investments and makes them cheaper. If someone comes to a big tech LLM and asks it to do their taxes, the LLM alone won’t solve that—but if it’s attached to an EBM, we can handle that part while the LLM handles anything language-related. We can actually run experiments to reduce costs for LLM portfolio companies and be part of the existing ecosystem while building a new one alongside it.

Dan

That’s really smart—a great strategy. I’m curious about something you said earlier: that progress is plateauing. That’s news to me. Every month or two I’m testing a new model and thinking, this is genuinely way better. And the top model companies feel like there’s still a lot of room in the LLM paradigm. What do you think I’m missing?

Eve

When I say plateauing, I don’t mean it’s going completely flat. It’s incrementally better and better. But is there going to be another phase transition—another breakthrough? I don’t anticipate that, just because we’ve already reached so much complexity: billions of parameters, enormous compute, creative parallelization of reasoning processes, and we still haven’t seen another phase transition.

The reason I concluded it won’t work long-term for certain tasks—like applied engineering—is from talking to companies in that space. Digital assets companies, banks, trading firms where a lot of data analysis is needed. Drug discovery, where people are looking at blood markers, genes, and other non-language datasets. A lot of this data analysis is still done by people today. Decision-making pipelines—like distributing energy on a power grid, figuring out how much power to pump into a system in the next millisecond, second, or hour—are still run by people or human-controlled programs.

LLMs are relatively new to all of this, and all of these mission-critical industries are still not automated by AI. I ask companies: how much of your data analysis is an LLM doing today? The answer is zero. And I ask why. The answer is that big tech LLMs are mainly B2C. They work for you—for coding, for personal needs—but for businesses, they don’t want to share their data with a general-purpose brain. They want privacy. They want a custom AI specifically designed for their tasks.

(00:50:00)

That’s what LLMs can’t do in the form they exist today. There are B2B models for code generation tools, enterprise packages, and so on—but coding is still done by people. It’s interesting to see that there’s still a huge gap, especially in applied engineering and data analysis. Anything that requires a layer of verification is somewhere LLMs haven’t reached.

Dan

I totally agree that there are significant gaps in LLMs. Given what you’re seeing with the companies you work with—do you think the big model companies are sensitive to this? Are they working on energy-based models? Do you suspect they’ll start to adopt approaches like this?

Eve

I do know that some of the big tech model companies have EBM models in-house, which is a positive signal for us. The leaders who came before us started with LLMs, and if they’ve started building EBMs after we began building ours, that’s a positive sign.

Dan

Fascinating. Eve, this has been an incredible conversation—I feel like I learned a lot. Thank you so much for coming on.

Eve

Thank you, Dan. I really appreciate it.

Dan

Of course. If people are interested in following you or your company, or maybe using some of your products, where can they find you?

Eve

I’m mostly on X. We have both a Logical Intelligence account and my personal account there. I’m still learning to be more active on social media. We also have a LinkedIn page that we’re working to update.

Dan

Awesome. Well, thanks for joining.

Eve

Thank you so much, Dan. Bye.


Thanks to Laura Entis for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

For sponsorship opportunities, reach out to [email protected].

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Mail Every Content
AI&I Podcast AI&I Podcast
Monologue Monologue
Cora Cora
Sparkle Sparkle
Spiral Spiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

Pencil Front-row access to the future of AI
Check In-depth reviews of new models on release day
Check Playbooks and guides for putting AI to work
Check Prompts and use cases for builders

Comments

You need to login before you can comment.
Don't have an account? Sign up!

We use analytics and advertising tools by default. You can update this anytime.