The Anthropomorphic Interface

Well, here we are, the future has arrived. We are living in the age of AI, our heads spinning with the possibilities and dangers, trying to understand what it all means for art, society, our jobs, our lives. These are big questions, and this is a humble programming blog (at least it was originally intended to be a programming blog, although in fact you will find little code here). But like everyone else, I have to try to get my head around it all and adjust to this latest upheaval.

While the world of technology is in constant flux, as anyone trying to keep up with the latest generation of JavaScript frameworks can attest, still there are certain aspects of how we interact with computers that have remained fairly stable over years and even decades, certain technologies and patterns of use that are the foundations on which the latest thing is built. But today we are seeing something new emerge, something that deserves to be considered a new foundation on which new ways of computer use can be erected.

Until recently, the two primary modalities of interaction with computers were the command line and the graphical user interfaces, the CLI and the GUI. The command line was for developers, sysadmins and power-users. The GUI is for them and also everyone else. On the command line, you can get the machine to do whatever it is you want, but only if you know the exact incantation to write, the details of which might be spread across 10 different man pages, and there may be innumerable false-starts and unexpected outcomes to work through before you arrive at just the right combination of commands. The GUI, on the other hand, lets you accomplish all kinds of things using a small vocabulary of interactions; point and click, drag and drop, make a selection and press the button. But a GUI interface only provides a certain set of functions that the developers foresaw the need for, without the CLI’s ability to combine commands in arbitrary ways to accomplish whatever goals you have, and so trades off ease of use for flexibility.

And that was how it was, for decades. Then along came LLM’s, ChatGPT and the New Wave of Artificial Intelligence, and what was once a sci-fi trope has become a reality. And with it came a new way to interact with your computer: that of the “prompt”, or natural language interface, the wishes that the genie in the machine will grant you if you word them with enough care. As with the creation of a new genre of music, we can anticipate that it will be rejected vehemently by some, embraced enthusiastically by others, and will spur an explosion of experiments as its potentialities are explored.

The Natural language Interface

It is well known that when ChatGPT came out it saw wide and rapid adoption, and it is not hard, after using it even for a few minutes, to see why. Natural language is the most natural interface imaginable, at least for a large number of things that you might want to do with a computer. With the natural language interface, computers now simply understand what you mean, for at least some value of “understand”. Instead of being required to spell everything out in excruciating detail, you can rely on it to get the gist of what you are saying, to not fall apart at the first misspelled word or missing punctuation mark. You don’t always get what you are looking for, but if you keep your expectations reasonable, you get some amazingly relevant results.

Given this level of apparent understanding on the part of our computer compatriots, it is natural that we would be tempted to extend to them the same “theory of mind” that we do to other people. However I don’t think that, even with the impressive capabilities of AI models today, it is appropriate to anthropomorphize an LLM. Even with their creative capabilities, their facility at producing unique text, images, audio and video at the behest of the prompter, we don’t yet have what you could call the “anthropomorphic interface”, the interface that allows you to apply all of your intuitions about human interaction to your silicon counterparts. Here I discuss my reasoning for this assertion.

Minds and Models

John Searle’s “Chinese Room” is a widely-known thought experiment meant to show how computers are not capable of understanding in the same way that people are, based on the idea that mere symbol manipulation does not constitute understanding. It is an influential meme, often appearing in discussions of the prospects for Artificial General Intelligence (AGI), likely because it allows you to concretely visualize the process of manipulating symbols with no understanding of what they are meant to symbolize. The limitation of this thought experiment is its assumption that computers are only capable of symbol manipulation. What it is missing is the concept of a model, of a representation of some domain in a medium like neurons or silicon. I don’t know enough neuroscience to speak with any authority about how to brain models the world, but as a brain user I will put forward the assertion that our understanding of the world is a product of how it is modeled in the medium of neurons, not just of the set of individual things one can know about but of the vast system of connections between them. It is not just a matter of being able to manipulate a set of symbols, but of applying that ability to a rich network of relationships between symbols, that gives rise to understanding.

The OG of AI, John McCarthy, said somewhere that a thermometer could be understood to have beliefs, although limited to three: “it is too hot”, “it is too cold”, and “it is the right temperature”. We could consider this to be a joke or a mere metaphor, but this way of looking at things does get at the concept of a model: the thermometer has a representation of the temperature and the set point within its makeup, an extremely simple model of a very specific part of the world. It is lightyears away from the complex representations that humans carry within themselves of the world they inhabit, and yet it would be possible to place these two at points along a spectrum, and then put LLM’s somewhere in between, as embodying a much more sophisticated set of representations and connections than the thermometer. As the richness of the set of representations increases, the comparison between it and what we could think of as a mind becomes less of an analogy and tend more towards identity.

The model of the world that we carry around with us is what allows us to engage with it, to recognize patterns, plan and reason about it with varying degrees of effectiveness. While I lack the scientific knowledge to explain how that works in detail, I think it’s clear that any artifact that we would be justified in calling intelligent must include a model with a certain level of sophistication. And that is something that can be said of LLM’s.

Specifically, LLM’s are very complex models of language, and because language is a way to model the world, this provides them with a complex model of the world. The statistical regularities of language provides them with a model about the world that language is about. And that is why they can do things that we take for granted but which computers were previously incapable of doing, despite their superhuman calculating abilities. The relationships between words, or the tokens that words can be broken down into, capture aspects of the world in terms of probabilities. The world is stable enough that if you simply rely on the probability that one token follows the previous string of tokens, you get results that are relevant and useful. And of course, because the map is not the territory, and no model, natural or artificial, is a complete representation of the world, you get some amount of nonsense being spit out by your chatbot, what we now call hallucinations.

It is as if you put all of the ingredients in the world into a big blender and then could magically reconstitute any meal you wanted out of it. But that is not really what is going on, since an LLM does not model language as a uniform sludge of tokens but as a complex set of weighted relationships between them, which is what makes it a model rather than simply a big bucket of data.

Naturally this brings up all kinds of philosophical questions: what is language? how is language related to the world? what do we mean by “the world”? All interesting and fruitful avenues of inquiry, but here we will not attempt to delve too deeply into such questions, and will rely on our everyday intuitions about what such things mean.

So LLM’s contain very sophisticated models of language, in some respects much more sophisticated than any individuals, if we just consider its ability to access an immense number of facts, embedded within its vectors. But just as intelligence is not simply symbol manipulation, at least intelligence as we know it, we can consider other aspects of our mental world that is missing from that of an LLM.

Intentional Systems

The philosopher Daniel Dennett has proposed that there are three “stances” that we can take towards things in our environment, which determine how we interact with them:

  1. The physical stance, which applies to anything we consider a simple object, with no more complex interaction possible than that provided by its physical characteristics.
  2. The design stance, in which we understand that an artifact is the product of a design process and provides certain affordance based on that design.
  3. The intentional stance, in which we understand the behavior of an “intentional system”, e.g. a person, in terms of their motivations and beliefs.

For example, we understand a waterfall only in terms of its physical properties, a shovel as a tool that has a specific purpose, so that its physical aspects are a function of that purpose, and the content of an email from a co-worker as a representation of their intentions and motivations for writing it. We can also apply all of these stances to the same thing, as a pet can be considered as a physical entity by a vet checking on its health, as the product of millions of years of evolutionary design, and as an individual with its own way of understanding the world.

Similarly, we can consider a computer system as a physical object or as the product of the extremely complex design processes that produced the hardware, operating system, drivers and application software. The question is then, now that AI is a thing, can we apply the intentional stance to our machines?

I don’t believe so. LLM’s have an amazing facility with language because they are extremely complex models of language, but any intention, motivation, beliefs, or other properties that we would ascribe to an intentional system are not part of that model. That doesn’t mean that I don’t believe that such a thing is possible, just that, no matter how sophisticated the current generation of AI models get, no matter how much of what was previously believed could only be done by humans is now routinely handled by machines, the current set of AI technologies do not include intention in their models, and so cannot be considered intentional systems. That will only happen when it is actually possible to represent intention within a machine in such a way that it enables independent interaction with the world. Then we will be able to apply the intentional stance towards these artifacts, and will have the much sought after Artificial General Intelligence, the Philosopher’s Stone of AI alchemy.

Clippy Reborn

What we have today could be considered pseudo-intentional system, which appear to exhibit intentional behavior, but the impetus is all on our side, with the machines good enough at pattern matching to appear to be following along. It is interesting to note that Clippy (technically Clippit) and the Microsoft Office Assistant suite that it was part of, was considered an AI application of the time, using Bayesian networks to attempt to guess at the user’s intent—”It looks like you’re writing a letter”. And this attempt famously failed, with Clippy earning ignominy as a nuisance rather than an assistant.

It looks like today, Clippy’s modern day descendants are ready to pick up where it left off, with all kinds of applications gaining AI assistants, although now they generally are identified solely by name and skip the cute mascot avatar to represent it. The computer assistant is the kind of idea that is compelling enough to survive any number of failures until technology up to the challenge comes along.

Clippy was a failure because understanding intention is a very difficult problem that does not yield to simple pattern matching. How would you encode an intention in silicon? I personally have no idea, but that is really no matter, since until ChatGPT came around I had no idea about transformers and the attention mechanism that made the current advances in AI possible, and they did not need to wait on me. One obvious marker of an intentional system is self-knowledge, and certainly ChatGPT can talk about itself. The question is, does it care about itself? It can produce words that could give that impression, if prompted correctly, but that is a different thing from how we would understand self-knowledge in an individual, who is not simply completing input but is driven by motivations, simple and complex.

Today we don’t just have chatbots but agents, and the concept of agentic AI, tool-using AI, certainly does seem to imply independent agency and intention. I am still getting a feel for using these AI assistants, getting my feet wet with Claude code in my terminal and reading up on the subject. As of right now I don’t think that this is radically different, that AI agents still work by matching an input with an output, although the output in this case might be the results of calling a command line tool, making an HTTP request, or querying a database. But more research is needed.

This doesn’t mean that people won’t use the intentional stance towards AI chatbots. With any system complex enough that you cannot see any of its working parts, it is natural, even parsimonious, to assume that it works according to its own set of intentions. People have been anthropomorphizing computers as long as they have been with us. It is well documented that when the ELIZA program came out, people were willing to attribute intelligence to this simple program, so that its inventor had to write a book about why that is a bad idea. It does demonstrate clearly that we are pre-primed to assume any language we encounter has intelligence guiding it, and no surprise since that has been true for as long as language has existed. The idea that language could be broken down into its constituent parts, and reassembled at your command into useful information, of language as a kind of database, speech without a speaker, is something that no generation before ours has had to fathom.