Dr. Ben Goertzel’s Conversation with Dr. Gary Marcus at the World AI Summit 2025

By SingularityNET

Sep 04, 2025

Limitations of Current AI Systems
Knowledge Representation and Reasoning
Control and Compliance

In this riveting conversation between Dr. Ben Goertzel and Dr. Gary Marcus, they debate the best paths to AGI and emphasize the need for ethically and technically sound AI systems.

Both Ben and Gary feel that current large language models (LLMs) are completely inadequate for the task, and that there’s a zero chance of them getting us to AGI, mainly due to their reliance on data distribution and their general lack of deep comprehension. Other highlights include: Ben outlining his work on a scalable symbolic AI system capable of handling trillions of logical nodes and both Ben and Gary stressing the importance of neuro-symbolic AI and explicit knowledge representation.

They also debate the feasibility of achieving AGI by 2029, where Gary is more skeptical, due to the complexity and current limitations of AI systems, while Ben’s more optimistic outlook believes that human level AGIs should be right on schedule for 2029, or perhaps even sooner.

[Full transcript]

BG: I’m Ben Goertzel. I’m an AGI researcher and mathematician. I’m the CEO of SingularityNet, the ASI Alliance, True AGI and a bunch of other organizations, and I’m psyched to be here talking to my friend Gary about what’s going on with the path to AGI.

GM: I’m Gary Marcus. I’m a cognitive scientist and AI researcher. I founded Geometric Intelligence, which I sold to Uber. I’ve written six books, most recently “Taming Silicon Valley,” and I’m often described as an AI skeptic. But I want AI to succeed, despite what people may say about me, and I’m as interested as Ben is in how we might actually get there.

BG: So we’ve got a few different topics we’re gonna touch on here. So we’re gonna talk about AI ethics and governance, and who should be controlling AI as it advances, potentially toward AGI and even beyond. We’re also going to talk about reasoning and thinking. And you know, what will it take to make AIs actually reason, as opposed to just just faking it, right?

GM: I have a phrase I like, which is that AI, current AI, is morally and technically inadequate. You could imagine an AI that would be much more technically solid than what we have. And you could imagine a regime, I think, where the people who were running it were doing it more for the benefit of mankind.

BG: I think you’ve inspired a title I’ll use for a future album in my band, Desdemona’s Dream: “Morally and Technically Inadequate.”

GM: I mean, you know, I want a cut of the royalties, but have at it.

BG: I think that that certainly describes the human condition very well, as well as AI systems.

GM: It does. I mean, one of the reasons I’ve always been interested in AGI is because they do think humans are technically inadequate. I wrote a book which you may or may not have read, called “Kluge.” You know, the engineer’s solution to, you know, duct tape and rubber bands. And the point is, the human mind is a “kluge.” I do think we could do better.

BG: We can do way better. Well, that’s right, and I’ve said that regarding ethics as well.

GM: Absolutely.

BG: I mean, we should be able to make AI systems that are both more generally intelligent than we are, and more consistently, coherently, compassionate and ethical than we are. I mean, the bar isn’t that high.

Limitations of Current AI Systems

GM: It should be totally doable. I think the fact of the matter is, we have not met that. Current AI is not reliable. You can’t give it simple constraints to follow, like “don’t tell people how to make biological weapons” and expect that it will actually follow them, but surely we ought to be able to do that kind of thing.

BG: Yeah, what I what I find frustrating with the debate and discussion about large language models and associated chat systems and so on is, on the one hand, people go way overboard and say, well, these, these are already AGIs because, I mean, Chat GPT 4.5 passed the Turing test in some senses, and they’re smarter than people. And they are, in a way. I mean, so is an Excel macro or pocket calculator.

GM: Intelligence is multi dimensional. They’re smarter in some dimensions.

BG: On the other hand, I also don’t think these are sort of silly parlor tricks. Like, I think they’re really, really interesting and useful systems, and they do have some profound and surprising emergent behaviors within them.

But yet, I still don’t think they’re the direct path to AGI. Like, I do think there’s something to be learned from them for AGI and some use of them within AGI systems. I think we agree on a lot here, although not everything. I mean, we agree you cannot take an LLM, like bolt some other stuff onto it, scale it up bigger, and get a human level AGI.

GM: Hasn’t worked so far. That’s for sure.

BG: The fact that it hasn’t worked so far, to me, is not a good argument, because they have gotten smarter and smarter.

GM: There are other arguments too. I would start with the fact that they don’t even, in principle, really represent world models, in the sense of “here are some persistent entities in the world. These are their properties. This is what I expect of them.” They just don’t do that.

BG: Yeah.

GM: They’re opaque, so that makes them difficult to debug. They don’t really reason well outside of distribution. You know, fundamentally, they’re mimics, and they’re better than just the simple memorizing system. But they kind of do, like a little bit of analogy. They don’t really have a deep comprehension of what they’re talking about, and their reasoning is always limited, for that reason, to things that are in some sense (it gets complicated these days), some sense similar to the things they’ve seen before. And if things are different in important ways, they often miss that.

BG: Yeah. It is quite subtle. But when you use these systems a lot, you can see very clearly the phenomena that you’re seeing.

GM: There’s some examples that aren’t even that subtle. So for example, large language models nowadays are trained on enormous amounts of data, and that includes the game of chess, right? They’re trained on millions of games of chess because they’re in the public domain. They’re trained on the rules of chess. They’re trained on books that are about how to play chess, and they still make illegal moves. That’s not subtle. That is a failure to adopt an explicit set of rules, to be constrained by them. It’s not really that subtle. And it turns out that the maybe more subtle part is, the further you go from standard games, the more they break, and the more quickly they break.

BG: Yeah.

GM: The subtle and important.

BG: What I’ve found, like, I use reasoning LLMs quite a lot as research assistants. So, I use them for mathematics, and I use them for programming. And in math, I’ve found, as long as you’re within a recognized field of math that there’s a lot of literature on, they do quite well. I would say, as well as a mediocre math PhD student, which is pretty good right?

GM: It’s not nothing.

BG: Yeah. Yeah. Yeah.

GM: Better than I would do.

BG: I mean, so you can do category theory. You can do topology. You can do type theory. You can certainly do all sorts of calculus. Information geometry. It’s really quite good. Every now and then, like less than 1% of the time, there’s some weird mistake, but there so you have to check, but they’re very, very good. For programming also. I mean, you’ll write code, but it makes a mistake, but then modern LLM programming environments will catch and fix a lot of those mistakes.

But what I did find is, if you’re in a branch of math where there’s not a lot of publications. Like, let’s say para consistent logic, which is being able to reason where a statement can be both true and false, or something, right? Like, that’s a real branch of math. There’s dozens of papers in it. There’s not 1000s of papers on it. LLMs become just super dumb, and no amount of few-shot learning helps. And with programming, we’ve made our own AGI programming language called Metta (whitepaper), m, e, t, t, a for meta type talk, right? And I’ve not yet gotten LLMs to be able to program in that language, which is annoying because I would like to have a programming system.

GM: I have someone who just emailed me yesterday, saying that if you try to have it program in 6502 assembly language, which is the language that I actually learned on, that it will make up instructions and things like that. It’s the same symptom. The same problem. Like, there’s nothing intrinsically more complicated about 6502 than Python. In fact, it’s a lot simpler, but there’s a lot of data on Python. There’s not a lot of data on 6502 in modern form.

BG: iPhone and JavaScript. And it has gotten me to go back to Python for which I don’t even like that much, but LLMs are just so good at it.

GM:Because there’s so much data. It’s always, you know, in proportion to the data.

BG: I think, however, this sort of thinking that they’re doing is something that our brains do. It’s just not all of what our brains do.

GM: I mean, I like the Kahneman System One, System Two analogy. You know, the fast, automatic, reflexive stuff, “System One,” they kind of do pretty well. The more deliberative, abstractive and so forth, that Kahneman calls “System Two,” they don’t do very well.

BG: The subtlety is, though they can do category theory using System One. Humans cannot do category theory.

GM: I’m sure that you can find places where they break.

BG: One guy, Mike Stay, he can do category theory. He’s in System One right? But I’ve never met anyone else with that particular capability, right? So they because of the way they ingest data, like they can ingest every category theory proof ever done, and so they can deal with this in, like, a statistical reflexive way, whereas people need to think really hard, right?

GM: I’m sure you could find places where they break. I mean, did you see Terence Tao’s interview the other day?

BG: No.

GM: I mean, he said, “Look, a lot of the time these things are great, and then they do something that’s stupid.”

BG: I mean, people break too. And, of course, I mean, as you would know,

GM: I’m not that fond, by the way, of the “people break too” thing.

BG: They break in a different pattern.

GM: They do break in different patterns. But like, we don’t want a calculator that breaks when we do certain high precision, high safety kinds of things, and I think alignment should be one of them. We really want systems that we can count on in the way that we can count on calculators. It would be no excuse for a calculator to say, “Ah, well, you know, I forget to carry the one, so I guess it’s okay if my calculator fails.”

BG: This brings us to deeper points. Because, I mean, it doesn’t have to be the same AI system that’s doing cutting edge research that is operating the weapons infrastructure, right?

GM: Sure.

BG: I mean, just like you tend not to hire the same people for those things either, because different people have different characteristics, right? So, I mean, it could be fine that the system doing cutting edge research does make some weird mistakes.

GM: It partly depends on its relation to the person too, and also how much autonomy we give it. So an AI system that makes a bunch of mistakes but works closely with a person, like for coding, and that person knows how to debug. That’s fine.

BG: In that case, what you do is: You couple the AI program generator with a debugger, right? And it finds the mistakes and submits a new prompt. And then you iterate.

GM: I mean, you need humans in the loop there, and I worry a lot about the security and maintainability of the code that’s being written.

BM: Yeah, it’s like, for computer music. AI code generation just works. Works great.

GM: Yeah, you mentioned that. I don’t quite know what projects are doing, but you mentioned it.

BG: Well, so an example is: I’m doing AI generated music in 19 tones instead of the 12 tone scale, right?

GM: The human ear doesn’t have a good sense of what counts as correct in the 19 tone systems.

BG: Mine does. And then people that are into microtonal music do. But it can develop an alternate music theory, and it can generate harmonies there. But I’ve also, we’re also doing AI singing generation with a robot. Then you use AI to modulate the emotional expression of the robot singing.

But the point is, in that domain, you really can go from text prompt to working code. And indeed, the cost of failure, if the output sounds bad, is relatively low. So, I mean, that’s a good use case, but I think we sort of see eye to eye on the limitations of the current systems.

GM: I mean, we didn’t quite articulate that for others though. I mean, so we talked about the distribution stuff. We should mention at least briefly, the Apple paper.

BG: Yeah. Yeah. Yeah.

GM: Which has been…

BG: very funny,

GM: But I think basically it’s saying the same thing that we’re saying, right? They said, “You can get these systems to run algorithms, but if you do them on larger problems, they’re breaking down.” And I think this is another symptom of the problem of distribution.

BG: Well, what’s funny is that they’ll solve a puzzle like the Towers of Hanoi puzzle or something. They will articulate what is the correct algorithm for solving it. And they’re right, of course, but that’s in the literature already.

GM: That’s right. And then they’ll fail to actually analyze it.

BG: They will execute that in some cases.

GM: Well, I’ve been playing around…

BG: Somehow, they will execute it wrong on other cases, which is whacked, right?

GM: Yeah, I’ve had the same experience with chess, right? And I’ve done variations of Tic Tac Toe even easier. I did a variation with tic tac toe the other day with Grok. Actually, it suggested the variation. It said, you can only play on the edges. You can only win on the edges. And I said, “Great.” And so we had it back and forth. And then for my next move, and I’m gonna write this up in my sub stack, I said, “which should my next move be?” And it gives me three different answers, none of which are winning plays. But there’s a simple play for me to play in the middle on the left, and then it’s three in a row. And it just completely didn’t see that. It doesn’t really have the concept of three in a row in tic tac toe, despite, you know, manifest literature on it.

Knowledge Representation and Reasoning

BG: And I think, to my mind, the reason for this, fundamentally, is fairly clear and has to do with knowledge. It has to do with knowledge representation, which is not a fashionable concept in modern neural net AI.

GM: It’s not a well represented concept, right? I mean, I think for me, you need a world model of entities and their properties, stable over time, and that is tied closely together with knowledge representation – and everything else is a shortcut. And the shortcuts work like 80% of the time, but they don’t work 100%.

BG: More than that, in many domains.

GM: Maybe in some domains. Yeah. I mean, as a guideline, we’ll call it 80% and like, you wouldn’t build a GPS navigation system that way. You shouldn’t build a domestic robot that way. Like there’s some domains you need more than 80%. For brainstorming, you don’t.

BG: I mean, what I find is that within well charted areas of math, it’s more like 99% of the time. So it works, actually very well.

GM: I’ve never personally experienced 99% correct in any of the things that I’ve tried…

BG: I would say math or programming are formal systems, which there’s a lot of data about,

GM: and there’s a lot of augmentation. So, I mean, especially with the models like o3, I believe, although I don’t know this for sure, but I believe that they’re doing very heavy data augmentation,

BG: Probably.

GM: based on what they’re expecting.

BG: Anyway, I mean, however good, these are being made as tools in particular useful domains. I mean that that’s important in many ways. It’s important for economics. It’s important…

GM: Yeah, absolutely. Labor.

BG: It’s important for what jobs might be obsolete. But in terms of the AGI path, even if you can bolster it to be 99.9% good at math and programming, or puppy juggling, or whatever the specific domain is. I mean, still, they’re representing knowledge as a sort of huge weighted catalog of special cases. And they’re using a very nice, clever way to sort of synthesize items from this catalog in a, you know, custom weighted way for each response. This is just very different than how real cognitive systems represent knowledge, which involves…

GM: I absolutely agree.

BG: involves something more in the direction of explicit abstraction. And you… Of course, you can do that within a formal neural network, or a biological neural network, or different kinds of things. It’s not so much that the apparatus of formal neurons and weights and activations couldn’t do it. It’s just a…

GM: See you can build a Turing machine and build classic databases and so forth. Out of that, out of nodes, if you want to. It’s just that people choose not to. They’ve chosen to pursue this kind of model-free and knowledge representation-free system, right? I mean, some kind of knowledge emerges. People call it representation, but not in the classical, symbolic AI sense. We have explicit knowledge.

BG: It’s interesting.

GM: That’s a choice. And I think it’s a wrong choice.

BG: I think, I mean, there is, of course, knowledge representation in a transformer, but…

GM: It’s all black box, and it’s never really…

BG: You can map it out with probes, but…

GM: It does mapping things. Like, there’s been noble work to try to do it. But like, if you’re just thinking of a sentence. Like, they get the nouns. They don’t get the verbs. Like they’re very limited in what piece of language they really abstract.

BG: You can make a matrix for information that extracts from the state of the transformer, the full syntactic parse tree of a sentence with all the parts of speech. Like it is English.

GM: You might be able to do that.

BG: Yeah. I mean, there’s papers on it.

GM: But you’re not going to be able to get good representation of actions and things like that.

BG: The thing is, even though you could milk the parse tree or a semantic parse out of the activation states, it is not manipulating those abstract representations.

GM: It’s not.

BG: In a useful way in answering questions, although with a supervised learning approach, you can open that out, right? So…

GM: I mean, not well enough that it will play Chess, right? I mean, chess should be easy, and then there are other examples that are more complicated, but Chess should be relatively easy. You should be able to say, “what is the representation here of a Queen” and so forth, but it will actually do things like have a Queen jump over a Knight, which is completely illegal in the rules of Chess. Yes, you can do this interpretability stuff, and you could say, “here’s a representation of a Queen or whatever,” but it doesn’t really work properly.

BG: No. So, I think then we agree the reason for the limitations and reasoning that we see, and the reasons why LLMs are only are kind of clever when they are kind of close to their training distribution. I mean, this has to do with the way they’re representing knowledge internally, which has to do with the way they’re learning, right?

GM: Yeah, they’re learning contextually related things about words, basically, but they’re not learning the underlying concepts. So in Chess, they’re not learning what a Queen is. They’re learning how people talk about Queens, and they even learn some of the Chess openings, because they see that Pawn to King four is often followed by Knight to Bishop three. But you go out of that regime where they have the stored representations, and then they just don’t know what to do anymore.

BG: So I have my own thoughts about how to make AI systems that can actually reason. But what is your orientation in that regard these days?

GM: It has to start with neuro symbolic AI, which I think we agree about right? There have to be some places in the system where you can have all the formal representations that we know and love from computer science: databases, knowledge graphs, explicit operations over variables. These things have to be first class citizens in, I think, any AGI system.

That doesn’t mean that there’s no room to do the kind of distributional analysis that LLMs do, but it means that you can’t do things just that way. That you do have to have these things as first class citizens. I mean, we didn’t talk about hallucinations, but that’s another example, right? You should be able to read off of your knowledge graph. Like, you know Harry Shearer, probably right? So he’s a friend of mine, and he sent me a bio the other day of him, and it said that his character in Spinal Tap was like David Smith, or something like that. When you probably know it was Derek Smalls. You should be able to just look up what that character name is. It’s there on the first page of Wikipedia. It’s very easy, you know, low hanging fruit, but the systems don’t have the knowledge graph to do that.

BG: So actually, it’s interesting. I disagree with your statement, but yet what you describe is the way I’m doing things. So I don’t think that explicit logic representations or databases or knowledge graphs actually need to be first class citizens to get a human level AGI.

GM: As an empirical guess, but I would say that everything…

BG: I don’t think they are first class citizens in the human brain. On the other hand, the AGI system I’m putting time into building, the Hyperon system, does, in fact, have a huge like, weighted, labeled knowledge metagraph as the main component. And we’re even putting the neural nets inside that metagraph alongside symbolic stuff. So I think that, given that we have this symbolic computing infrastructure of von Neumann machines to work with.

GM: It’d be stupid to squander them.

BG: Yeah, and then they’re so good at doing symbolic manipulation. I mean, it makes a lot of sense to put explicit symbolic manipulation and couple that with neural systems. So I think that is a good approach. On the other hand, if you look at how the human brain gets at abstract symbolic reasoning through cortical, hippocampal interaction and so on, I don’t necessarily think quantifier bindings are first class citizens in the human brain.

GM: I think quantifier bindings, maybe. I think you can make an argument like, there are limits on humans such that they don’t have location addressable memory. This was the argument in my book Kluge and that puts some upper bounds on what they can do. But I think they certainly do short term variable binding. Or, we do short term variable binding every time we parse a sentence, for example. And I think that we do have semantic networks that, you know, allow us to represent properties, relations, taxonomies and so forth.

So I think there’s some boundaries on what humans can do. They’re mostly limited by memory factors and interference. I think, you know, it makes it look like we don’t. But, you know, we end up using Q addressable memory instead of location addressable memory as a kind of substitute. So there are some boundaries on what people can do, but I think, to a first approximation, we can do these things. We can have, you know, recursive representations.

BG: I guess my thinking is that the reptile brain is a first class citizen. The mammal brain is a second class citizen, then the stuff that popped up later in evolution.

GM: Well I think that’s using…That’s using the term in a different way.

BG: Maybe

GM: I think all animals can do some variable, or many animals can do, variable binding. I think Randy Gallistel’s work on ant navigation, for example. His beautiful work. That makes a really strong case. The animals store the values of variables. They update them over time, and they have to…

BG: Ant colonies are pretty smart. Yeah.

GM: Well even an individual ant who can do the dead reckoning back to the colony, never mind the whole whole colony. And so I think actually, most animals…

BG: That’s a really specialized system.

GM: It is. But, you know, another example is bees calculating the solar azimuth function. They can generalize beyond the data they’ve seen to do lighting conditions they’ve never seen before. All of this stuff, I think, shows that some level of variable binding is actually fairly widespread, through the animal world.

BG: Sure.

GM: Doing taxonomy, taxonomic reasoning. You know, “you’re a person, a person is a mammal, blah, blah, blah.” That kind of inference. I’m not sure how much other animals can do that.

BG: I mean, birds can do it, right? Yeah.

GM: I don’t know any good study on birds and taxonomy, but I’m prepared to believe it, but I don’t know offhand evidence for that. Certainly some birds can do a bunch of variable binding and stuff like that.

BG: Yeah.

GM: So, you know, the distribution of these things across the animal kingdom is not equal, but I think some of it is there. Whereas these neural networks only have a kind of hack to get to the variable binding and try to kind of do it without it. The pointers and stuff like that actually do some of it.

BG: I think that you could get variable binding, and this abstract reasoning, from neural nets as a sort of emergent phenomenon, and that’s probably how the human brain does it.

GM: People have been telling me that for 30 years.

BG: On the other hand,

GM: 30 years people have been telling me that.

BG: Well, sure. I mean, people we’re talking about…people were talking about flying into space for a long time before it was finally done also.

GM: Yeah. But, I mean, they did it in different ways.

BG: Cyrano de Bergerac wrote about it, right? I mean…so…

GM: Yeah, but I mean, it required the development of new technologies to actually make it work.

BG: Yeah. I guess…

GM: The abstract idea that we’re going to have, you know, intelligent robots is not new.

BG: No, the ancient Chinese had them 10,000 years ago.

Control and Compliance

GM: We were supposed to talk a little bit about control.

BG: Yeah, yeah.

GM: I think there’s a deep connection here.

BG: Let’s dig into that.

GM: Which is, I think if you don’t have explicit knowledge as part of your system, as first class citizens, you’re limited in control. And I think we see that with LLMs.

BG: The other side of it is, if you do, you’re also limited in control. So you’re limited in control either way.

GM: Well, there may be different orders.

BG: Yeah. Yeah.

GM: So with LLMs, people do things like put stuff in a system prompt, and they can’t get systems with the system prompts to follow them. So the Apple System prompt says, “Don’t hallucinate.” Some of them say, “don’t use copyrighted materials.” Some of them say, “don’t tell people about how to build biological weapons.” And yet the systems do that anyway.

BG: Well people use prompt injection attacks.

GM: And people use prompt injection attacks, and you can say, “don’t allow a prompt injection.” It doesn’t work right? The system’s level of comprehension of the words that are put in those prompts is poor in some sense, and they’re not compliant with the constraints we put on them.

I would say a minimal requirement for control, not a sufficient one, but a necessary one, is that we should be able to state, let’s say, negative prohibitions. You know, we should be able to state those constraints and expect the systems to follow them in the same way we do in classical coding. Like, you know in a For Next Loop or something. You are stating a constraint about how the system does, or thinks about prolog, which is all, you know, constraints.

BG: Yeah.

GM: I think if you don’t have that as some part of your system, you’re in real trouble. Which is not to say that, if you do, you solve the problem right. If you specify your constraints in prolog, but you specify the wrong constraints, you’re still in trouble. But if you can’t even get into that game, I think you’re in deep trouble. I think that’s where we are now.

BG: A few issues. So one is, I mean, there’s a couple reasons the system might not obey your constraint. One is that it has its own autonomy and goals and thinking.

GM: I don’t think that’s the situation we’re in now.

BG: No, no, and then…

GM: That is a potential situation.

BG: The other one is they just are too dumb to understand what your constraint means, right?.

GM: That’s the one we’re in now.

BG: Yeah. Yeah. Yeah.

GM: Right? And we want to get to one where they’re smart enough to follow the constraints, but not, you know, resisting what we tell them. And, I mean, that’s, you know, threading a needle, right?

BG: Well, yeah. So, as you know, my oldest son is named Zarathustra, and one of the things Zarathustra said to those who were trying to follow him was “go away from me and resist Zarathustra.” Like he didn’t want his followers to actually follow him in detail. He wanted them to think for themselves.

GM: Yeah.

BG: I sort of feel that way.

GM: Epidemic in society of people not thinking for themselves.

BG: I sort of feel that way about AGIs. Like, I don’t think we want AGI is to align exactly to our current human values, which are kind of fucked. On the other hand, I agree. We don’t yet have AI systems for which that’s even the issue, right?

GM: Yeah.

BG: Now the issue is…

GM: These are two separate issues.

BG: Now the issue is they’re not understanding the concepts that you’re conveying, because they’re just recognizing statistical patterns in the terms of discourse, right?

GM: Yeah.

BG: Once that problem is solved…

GM: Then we will have a new problem.

BG: Then you have another much more interesting problem, right?

GM: Yeah. I think that people try to duck the second harder problem, and they say “well, we can never do it because humans don’t agree on values.” And there’s some truth in that. There’s some things we’re never gonna get humans to agree on. Like, what is the definition of life with respect to decisions about abortion? Like, there’s always gonna be disagreement. But I think that there’s a lot of sort of “human compatible,” to borrow Stuart Russell’s term, a lot of “human compatible” values that we can actually agree on. Like, “we don’t want machines to kill people or deliberately cause them to come to harm.” As, you know, as Asimov wrote all those years ago.

BG: Oh I spent many years working with a lot of people who don’t agree with that idea. In Washington, DC, and who are people actually engaged with this sort of work, right? So I that is not…

GM: I think it should be a baseline value. I mean, you could even admit sort of defeasible logic and other things being equal and so forth, but I think it should be a baseline value. You don’t kill people, at least not without a good reason.

Ben Goertzel 29:08

BG: Well, sure, but what’s the good reason? Yeah.

Gary Marcus 29:12

GM: There’s work to be done, but there’s, you know, reasonable consensus in society to get started on it.

BG: It’s not safe for democracy.

GM: Yeah. I mean, neither of us are maybe totally happy with the current state of affairs. But…

BG: I mean, the other point is, not only isn’t there a coherent ethos across humanity now, what we take for granted now as common ground would have largely seemed outrageous 500 years ago.

GM: Yeah, and, I mean, it raises the question about the possibility of moral progress. And, you know, do you want AI to be able to make moral progress? Like, if you took the Constitution of the United States and the thinking from, you know, when the United States was formed. Like those people thought it was okay to own slaves, and we don’t really think that’s okay now.

We probably have similar dumb ideas, but one, the most obvious one, is we think it’s okay to eat meat, and probably we should stop thinking that, especially when we can have substitutes for meat that you know are tasty and nutritious and so forth, which isn’t so far away, right? We should probably revise our values about it.

BG: Do you want to go grab a burger?

GM: Yeah, exactly. We’ll talk about this over a burger. With bacon. Bacon burger, right? But so you want to allow…or maybe you don’t. I mean, with complicated questions, do you allow the machine to make moral progress with human consultation? Without? I mean, there’s a whole bunch of hard questions.

BG: That’s not even an interesting question, because it will make moral progress.

GM: Well, no.

BG: I mean, I don’t think we can stop it.

GM: Well, I don’t, I don’t agree with that. I think that we could hard code things and make them fixed. I mean, there’s all these questions about…

BG: Well you could, but then someone else won’t. I mean…

GM: And then there’s a set of political questions: Who controls the box?

BG: I agree if you had a sufficiently powerful global fascist dictatorship controlling AI development, you could perhaps constrain AGI in that way.

GM: Let us hope that that’s not the implementation strategy.

BG: With the current situation, it seems likely there’s going to be a lot of different AGIs popping up all over the place, once we get there.

GM: I think that that’s probably true.

BG: And some, some folks will constrain them in some ways. Others will not constrain them.

GM: Yeah, just like we have some people that are constrained by the laws and values of the land and some that are not, right?

BG: And sometimes that’s very good, which is how progress is made. And sometimes…

GM: It’s less good. Yeah, exactly. We’re not going to run away from that.

BG and GM (at the same time): Yeah.

GM: From a profusion of values and machines doing different things, we have to think about how we’re gonna constrain them as best we can. We should think about what the good guys should be doing here. We should think about what the bad guys are gonna be doing here.

BG: Actually what, what worries me most in terms of the ethics and social impact of AI, may be a little bit different than most of what you worry about. Which is, because I’m an optimist that we’re going to get to human level AGI in the next few years, and you’re more pessimistic about that. So since I think we’re probably going to get to human level AGI, roughly on Kurzweil’s timeline of 2029…

GM: I don’t really see that.

BG: Maybe even a couple years before that. But since I do feel that way, what worries me is the gap between when we get to the first human level AGI and when it has spawned a super intelligence. And if, if that gap is, say, 2 years, 5 years, 10 years… Like, what happens in the world during that interval? Because you’ll have a bunch of different AGIs, many of them controlled by current national leaders who are maybe not the wisest people on the planet.

GM: Yeah.

BG: Then you will have early stage AGIs eliminating many categories of human work, including in the developing world, right?

GM: We don’t need AGI to eliminate some human work. I mean, LLMs, which I think we can agree is not AGI, are already eliminating some human work.

BG: Absolutely so. But then, in Africa or Latin America, when middle class jobs are automated away and agricultural jobs are increasingly automated away by, you know, robot harvesters and so forth, becoming cheap. Like, what actually happens? We’re seeing a decreasing appetite for foreign assistance from the developed world, so that period between human level AGI and superintelligence could hold a lot of chaos.

GM: A lot of scrambling for power.

BG: I tend to be… I’m both an optimist that we can get to human level AGI rapidly, and I’m, for largely non rational reasons, an optimist that superintelligence will be beneficial and compassionate, but the period in between…

GM: Yeah, that’s an optimism that needs to be based in a plan, which we don’t have.

BG: And we can’t know either.

GM: I think we could do more to prepare than we have.

BG: We can do more to prepare for human level AGI.

GM: And for how to make it have a good outcome.

BG: Preparing for superintelligence, there’s… I mean, the hope is that once we have the human level AGI, with its help, we can prepare for superintelligence better than we’re able to now, but it’s hard enough…

GM: You know, AGI is not going to be magic, right?

BG: No?

GM: Like, I think that a lot of people treat it as magic, but like. For example, in biology, you still have to do empirical work. Like, I read a lot of the AGI literature, and it sounds like the minute we have it, everything changes. And the reality is we have lots of smart people working on lots of smart problems that are really hard. And even if AGI sort of is like having a million smart people working 24/7, that doesn’t magically solve all the problems or take away the need for empirical data.

BG: It will be more than that, though, right? Because it’ll be a million people with also transhuman characteristics. Because, I mean, a human can’t suck all biological data into their brain at once, right?

GM: Yeah, some of that.

BG: As soon as you get a human level AGI, you will obviously have an AGI that’s transhuman in many ways, because it will, it will have a mind meld with all sorts of other narrow AI tools, right? So you… But certainly it’s not magic.

GM: It’s not magic or instantaneous.

BG: We haven’t been good at predicting how hard different problems will be, right? So probably, some things will be easier for an AGI than we’re thinking, and some things will be harder than we’re thinking.

GM: There’s a long history of that in AI.

BG: We’ve been quite poor at…

GM: Calibrating that.

BG: At guessing that, right.

GM: Yes.

BG: And I think it’s certainly true. I mean, you can see, though, some problems like food distribution, to solve world hunger. Like, a low cost drone army could solve that pretty straightforwardly. And it’s not. It’s not too hard to see. It’s like, just deliver…

GM: Well, I mean, there are political problems about.

BG: There are.

GM: It’s not so much a physical distribution problem.

BG: It’s both.

GM: Yeah.

BG: Yeah. Yeah. So there’s… There’ll be many, many, many complexities.

So, I’m curious. Why do you think it’s implausible that AGI will come about on Kurzweil’s timeline? I mean, granted, the LLMs are not it, right?

GM: Well, so, I mean, you start with that LLMs are not it.

BG: They’re not it.

GM: There’s nothing else that we have that’s in production, and production takes a long time. So like, think about driverless cars. We’ve had them for 40 years. I actually took one this morning.

BG: Well, that’s hardware devices, so production of software doesn’t necessarily take as long.

GM: Well, I mean the hardware for the Waymo cars has been around, but, you know, I can’t even take it to the airport when I’m here in San Francisco. That’s actually a software problem, not a hardware problem, unless it has to do with highways.

BG: I understand.

GM: You should be able to take surface roads to the airport.

Ben Goertzel 37:42

BG: But testing that sort of thing requires human safety.

GM: Well, that’s gonna be true for AGI too. Or at least we hope that people care about safety. I mean, I’m not sure they will, but in general.

BG: I think there are reasons that developing medical therapies or big hardware devices that risk human life is going to be slower than just pure software development though, right?

GM: Sure, but I would argue that the Waymo thing, it is actually a software problem.

BG: It’s software. But to test each version of the software, you need to run it on hardware involving humans.

GM: There are going to be parallels to that with AGI right? Like it’s going to give medical advice, and you need to test it.

BG: Yeah. Just looking at my own AGI R & D, I don’t see slow things like that in the loop, actually. I mean, because you can, you can test it on hypothesis generation. You can test our own mathematics. You can test it on actually a lot of things.

GM: Every time we have a benchmark. Every time we do a benchmark, it turns out somebody’s gamed the benchmark.

BG: I don’t care about benchmarks. I’m not interested in benchmarks.

GM: So, you’re gonna…

BG: I’m just saying I think you can do the work to get to an AGI without involving these things that are slow for human safety or large physical object reasons.

GM: I mean, God help us if you do it without any regard to AI safety. I mean, to safety. Human safety.

BG: That’s a separate point. But, I mean, when you’re dealing directly with human life, people with hulking metal objects involving human bodies…

GM: Do you know that Sam Altman just said he’s going to use Open AI’s AI to drive cars. This was yesterday, right? And that’s an example, like, if you have AGI, not that he does, as far as I know, people will do things like they will use it to drive cars. They will use it to drive tractors.

BG: I understand. But my feeling is, you can get to AGI without doing these dangerous and time consuming things.

GM: I mean, look, in the abstract, sure.

BG: I mean in the actual project I’m working on. We are trying to get to AGI without doing those expensive and dangerous things en route, right?

GM: At some point, if it is released into the wider world, you need to do those things.

BG: Yeah. I mean, we have, we’re building our own little robots. They’re just not robots that are driving people around that are big enough to squash people, right? So, I mean…

GM: Sure, you can start to test them on robots that aren’t, you know, big enough to hurt people.

BG: Right. The question is, if you can get to AGI that way, then you can do the safety testing..

GM: After.

BG: Afterwards, right. So I’m just saying, I don’t think the slowness of testing things like self-driving cars or medical therapies is actually a constraining factor in getting to AGI.

GM: I think that the part that you’re not fully taking on that is important to me is that it’s often easy to build demos and hard to get things reliable.

BG: I mean, building a home service robot…

GM: Yeah.

BG: doesn’t have the same safety issues as a driverless car.

GM: No. But you’re still gonna find, for example, there’s a wide range of edge cases. Have you done them well enough? And you can constrain a particular commercial problem.You know. So, if you have a home robot, you don’t allow it to be used in Victorian houses and only in, you know, nursing homes or retirement homes that have a particular constraint.

BG: That’s a different point. Yeah. Yeah. I’m just saying I don’t think the practical safety concerns you have with self-driving cars or medical applications are not a constraint.

GM: I understand. I’m granting that point.

BG: Okay.

GM: I’m concerned that people might do things without doing the requisite safety, but I’m granting that point. But I’m still saying that most stuff, we build a prototype, for example, and then it’s really hard to make it work reliably enough, and it’s possible…

BG: Then the objection is just “AGI is hard” basically.

GM: Well, in some sense, that is part. But, more broadly, though. More broadly, engineering complex systems is hard. And every time we have tried to do it, whether it’s the Denver Airport traffic thing or the SDI Star Wars defense thing, or…

BG: These are all examples where life and limb of humans are involved each step of the way, though, and those are going to be slower.

GM: Yeah, but they’re also, you’re getting sidetracked on it. That is true. I grant that point. But there’s also the case that complex software just usually doesn’t work. You know, it usually has problems. It needs to be debugged. We’re not that good as a society engineering complex software.

BG: So the software that we’re building is all based on very nice functional programming and math. So it does work.

GM: I know. I hope it works.

BG: I mean, there are, there are as many decades of functional programming and logic programming theory aimed at exactly this point of making, like, formal verifiable software, right?

GM: I see the appeal. And I would, you know, rather write things in Rust or Haskell or something like that, then write them in, you know, shitty C code with a lot of buffers.

BG: Yeah. Yeah. Yeah.

GM: Like that will help.

BG: Right. Right. Right.

GM: There’s probably going to be a lot of moving parts.

BG: The commercial world is not using these languages because they’re hard for the average human programmer’s brain. But on the other hand, if you’re hiring people smart enough to build an AGI, they can learn.

GM: They can, they can handle it.

BG: Right, right. Yeah. Yeah.

BG: So so that that is true, building complex systems can take a long time.

GM: So let me give one more argument, and then maybe we should wrap up, which is: The cognitive prerequisites that I’ve spent most of my career thinking about. I don’t really see solved in the popular tradition. I understand you have some avenues towards them.

BG: Right.

GM: But in general, we don’t have good solutions now to maintaining world models over time for generalizing or for abstraction. For distinguishing types and tokens. Lots of basic stuff that the LLM world has just not confronted.

BG: Yeah. Yeah.

GM: I can see why a neuro symbolic approach might do better, but I think that no neuro symbolic approach has seen the level of the kind of productionizing that we need, except in very narrow domains.

BG: True.

GM: So, so Alpha Fold, for example, is a neuro symbolic system.

BG: Yeah.

GM: But it’s very narrowly constrained. The world has not yet seen – although I know you’re trying to work on it…

BG: So is Deep Mind’s solution to StarCraft; it was neuro symbolic.

GM: Yeah and but in the end, it wasn’t that great, is my understanding. I don’t know the final verdict on it.

BG: It’s good at playing StarCraft.

GM: Even there, I think there were some issues, but I’m not sure. But in general, they’ve been narrow engineered, right? The best neuro symbolic systems have been narrow engineered.

BG: No, we’re not there yet.

GM: So that’s the gap, to me, is you need to be able to address all the things…

BG: So, what I’ve been spending the last two years on is building the first massively scalable substrate for symbolic AI.

GM: I think that’s what you need to do.

BG: I can get a trillion logical nodes and links in RAM on one machine, and then efficiently run things on them on multiple cores, with concurrency, right? And so now we’re just starting to experiment with cognitive architectures on this.

GM: So I can’t personally vouch for the work, because I haven’t gone through it.

BG: Yeah. Yeah.

GM: But to me, that sounds like an important step, but also, that there are multiple important steps.

Ben Goertzel 44:50

BG: There’s a lot of research.

GM: How do you reason over partial information, for example? And you know, the kinds of things you’re doing are valuable towards that.

Ben Goertzel 44:53

BG: Yeah.

GM: But they’re not answers to that. And my view is like, there are dozens of things that need to be answered.

BG: Yeah.

GM: And usually answering a question actually takes a long time. But take transformers, right? We had them in 2017, but really doing something with them took, depending on how you want to count it, five or six or seven years. They still don’t quite do the things we want. So to my mind, there are like hundreds of puzzles we haven’t solved.

BG: Yeah.

GM: And it would be miraculous or horrible, depending on the outcome, but improbable, shall we say, if we were to solve all of those in five years.

BG: My thinking is, a lot of us have actually solved a lot of these problems conceptually and tested them on small prototypes over the last few decades. When we have a scalable system to try them out on, things may come together quite nicely,

GM: I won’t logically rule that out.

BG: Yeah?

GM: But I’ll also say I’ve seen a lot of demos that you know took longer to be realized.

BG: It’s certainly true. But we haven’t seen this amount of resources going into the problem ever before, either,right? Which is quite, quite different.

GM: That’s part of why I hedge my bets.

BG: Yeah. Yeah.

GM: And like, I think the chance that it’s gonna come out of LLMs is zero.

BG: Yeah.

GM: But the chance that someone else. Could be you. Could be someone else, has a different architecture that works, is a little bit higher. Like, maybe. Like, I don’t think there’s enough societal support for looking at alternatives. You’ve got some money to do it. But in general, like we don’t have the same amount of sort of person hours on these alternatives.

BG: Well, that’s right. Yeah. I’ve been there. Investors are herd animals.

GM: Investors are herd animals.

BM: For science as well as for business.

GM: I think when, when AGI does arrive. Maybe it’s 2029. Maybe it’s 2039. Maybe it’s 2049. When it arrives, people are going to look back at the early 2020s and say “Why did they spend so much effort over here, when the answer is over there.”

BG: We can look back at all of human history, and say “What the fuck were these people doing?”

GM: And, on that note.

BG: Yeah. Yeah. Yeah. No. I think uh…

GM: It’s great.

BG: It’s a good conversation.

GM: Always a pleasure, Ben.

BG: Yeah, we, I think we were really, really close together in perspective compared to the vast majority of I think, the AI people today, and it’ll be quite interesting.

GM: And someday, we will both be proven right. Right now. People have their own ideas about LLMs.

BG: For sure.

GM: But someday, I think people will see.

BG: But things like. It’s interesting with things like this apple paper, people are somehow taken aback. When…

GM: You shouldn’t be surprised.

BG: This is like, insanely, insanely obvious from the get go. So yeah, there’s going to be a lot more things like that right?

GM: Indeed. All right, well.

BG: All right.

GM: Thanks a lot.

BG: Thanks. Yeah.

Dr. Ben Goertzel’s Conversation with Dr. Gary Marcus at the World AI Summit 2025

Contents

Limitations of Current AI Systems

Knowledge Representation and Reasoning

Control and Compliance

What We Do

Foundation

Join Us

Updates

Blog

Dr. Ben Goertzel’s Conversation with Dr. Gary Marcus at the World AI Summit 2025

Contents

Limitations of Current AI Systems

Knowledge Representation and Reasoning

Control and Compliance

Related Blog Posts

Subscribe to our newsletter