AI Does Not Exist

This is the script to my video on AI, which you can watch here.

In 2021, University of Washington linguistics professor Emily Bender, Google Ethical AI Team co-leaders Timnit Gebru and Margaret Mitchell, PhD student Angelina McMillan-Major, and three more unnamed authors, all Google employees, published a scientific paper called “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜”. The paper was a criticism of trends in AI development towards Large Language Models, or LLMs, detailing the risks of the ballooning size of language models, including social, environmental, and financial harms, with a suggestion that AI researchers and companies maybe try something else…

And a few months before the paper was published, Timnit Gebru was fired from Google for refusing to withdraw the paper or redact her name from it. The next month, Margaret Mitchell was fired as well.

So, what the fuck happened here? It’s pretty simple, if you stop and think about it. Gebru and Mitchell were fired for daring to point out the obvious: AI does not exist.

AI DOES NOT EXIST
BY CHAIA ERAN
VOICEOVER BY ENVIRONAT, NEAT ON THE ROCKS, AND SABERA MESIA
MUSIC BY RAFAEL KRUX
PART ONE: MACHINE LEARNING?

Now, I know you’re probably sitting there going, “Chaia. What do you mean by “AI does not exist?” Look, here’s ChatGPT! Here’s Stable Diffusion! How can you possibly claim that AI’s not real?” And my answer to that is yes, ChatGPT exists. Yes, Stable Diffusion exists. But there’s a difference between the actual technology as it exists, and the myth of AI that it’s used to prop up. So, I’m gonna start by demystifying how LLMs actually work — I’ll save image generators like Stable Diffusion for another day, or never. I fully admit that this isn’t my exact forte — my degree is in applied computer science — but this entire schtick is built on people just going, “Well, it makes no sense, so I must just not understand it,” and ending their critical thinking there. So, I’m gonna give it a fair shake, given my understanding of computer science and the scientific literature out there. No AI was used to write any part of this video, and as always, I’ll have my sources in the description down below.

So, first and foremost, let’s talk about machine learning and neural networks. These are computer science terms referring to a specific form of computing system that’s able to do interesting things when provided with data. As a basic example you might be familiar with, let’s look at MarI/O, a neural network created by Youtuber SethBling back in 2015. It’s a neural network model trained to play a single level of Super Mario World.

SethBling generated a bunch of random neural networks, with a simplified display of the level as an input, and controller inputs as output. The first several networks just stood there until they timed out, until a network was generated that just pushed the right button on the D-Pad. Each network’s run ties a value called “fitness” to how far into the level the network gets, and how long it takes it to do it. When enough networks have a decent fitness score, those networks then get random additions placed upon them, and the process begins again. In this way, the neural network “learns” to play Mario, as only the networks that do well at the game get iterated on.

That’s a very basic model, but it’s enough for you to get the picture. You have an input, a process, and an output, and that process isn’t hard-coded by a person, it’s developed almost organically through trial-and-error. That development is called “training”. Each “neuron” in the network contains a number called an “activation”, and each link between neurons has a positive or negative value called a “weight”. The first “layer” of neurons is derived entirely from the simplified game input, with solid ground having an activation of 1, and enemies having an activation of 0. Each neuron’s activation is then multiplied by the weight of the connection to the next neuron to get a new activation. Additionally, once all the weights and activations of previous neurons have been multiplied together, a bias value is added to the computation to influence the result. This network goes through a few layers until it reaches the final layer, where sufficiently high activation of a neuron will lead to the corresponding button being pressed.

This system of weights and activations is typically represented as a vector of activations being multiplied by a matrix of weights, then adding a vector of biases. So really, each neuron contains a function that takes the weights and activations of all the previous neurons that connect to it, adds the bias, and spits out a new number. Now, SethBling’s case is actually pretty interesting, because he hasn’t given it anything to compare itself against but fitness. He’s let the neural network “figure it out” on its own, but most complex neural networks are instead given practice data to train themselves on, before being sent out to deal with new data. There are two forms of training data-based learning, called supervised and unsupervised learning.

Under supervised learning, the training data is hand-labeled by people, and there’s an expected output. When the neural network is generated, the output given from the training data is compared against the expected result, and a “cost” is generated from the comparison. Using some calculus — which I’m horrible at and won’t really go into — called gradient descent, the weights and biases are readjusted to make the network produce the desired output from the training data. Under unsupervised learning, the training data is unlabeled, and the model is left to make connections on its own, such as clustering similar data together, or making associations. Supervised learning is more accurate, but, well, you have to label all the data by hand first. It’s a lot more work. Supervised models can make predictions, while unsupervised models just group data together.

ChatGPT is a semi-supervised model: first it undergoes unsupervised learning on its training data, then a supervised fine-tuning on a smaller dataset. But if you’ve been paying attention, you may have noticed that calling this “machine learning” is actually a misnomer. This may be considered splitting hairs, but I think it’s important to note that this is far more akin to tuning than learning. The program is using math to fit its output more closely to the desired one. It’s all just linear algebra and calculus. There are different models and algorithms you can use, but it all comes back to mathematics. In the case of ChatGPT in particular, the neural network involved is a transformer model. I don’t particularly want to get into the weeds of how exactly all the different models work, but the fundamental principles are the same between them.

So, with ChatGPT, when you input a prompt, it breaks your prompt down into tokens, which are made up of words, or sub-words like prefixes or suffixes, which are each given a numerical representation. These tokens serve as the inputs for the first layer of neurons in the network. From there, ChatGPT’s neural network finds what the most likely next token in the sequence is, based on its training data — sort of. In actuality, finding the most likely token every time makes surprisingly boring sentences, so ChatGPT has a variable called temperature that adds a level of randomness to the sequence. Temperature is a decimal number between 0 and 2 that defaults to 1. A temperature of 0 will have no randomness at all, while a temperature of 2 will be really off-the-wall. ChatGPT does this one token at a time, adding the newly found token to the prompt and continuing until it reaches a cutoff point.

OK, I think that’s most of the mathy stuff out of the way. I think I need to make clear that yes, this is computer science. Everything I’ve said so far is actual research done on an actual technology, and I’m not denying the existence of that technology. Like, just to be clear, I don’t have a problem with MarI/O, or neural network tuning, which I think is a more accurate term than “machine learning”. To get into my real point, we need to talk about stochastic parrots.

PART TWO: STOCHASTIC PARROTS

To drive home at what, exactly, is going on with LLMs, I’m going to draw from the work of Emily Bender, that linguistics professor from earlier, to talk about form and meaning. Form is the shape that language takes, while meaning is the relationship between that form and something else outside of the language. To give you a simple example, let’s take the phrase, “I love you.” The form of this phrase is that of a transitive verb relating a first-person pronoun to a second-person pronoun. The meaning of the phrase, or at least, a possible meaning among several, is that “I”, the person speaking, “love”, that is, hold dear or cherish, “you”, the person listening. The meaning relates the words to things and actions that are out there in the world, whether they’re material, like you or me, or abstract, like love.

So. my question is, can LLMs understand meaning? To answer this, there are two related thought experiments I’d like you to consider. The first one is a famous experiment from 1980 by philosopher (and sexual harasser) John Searle called the Chinese Room Experiment. It goes like this:

Imagine that a man is locked in a room with a large batch of Chinese writing. The man is a native English speaker who doesn’t understand Chinese at all. He doesn’t even know that what he’s looking at is Chinese, for all he knows it could be Japanese, or Korean, or just meaningless squiggles. Now imagine that he’s given a second batch of Chinese characters, and instructions written in English for connecting the first batch of characters to the second one. Finally, he’s given a third batch of symbols and more instructions to correlate them with the first and second batch. Now, imagine that he gets really, really good at following the instructions, and it turns out that when he matches characters together, what he’s really doing is answering questions. A native Chinese speaker on the other side of the door is writing down questions, and the man in the room is writing down answers. He gets good enough at following the rules that the person outside the room can’t tell the difference between his answers and answers given by a native Chinese speaker. Does the man now understand Chinese? Searle says that no, he doesn’t. He still has no idea what any of the words he’s connecting mean, he’s just following formal rules.

I think you can see where I’m going with this. If you wanted, we could modify the Chinese Room Experiment to better reflect how LLMs are trained. What if rather than having the rules written down for him, like a normal program, the man was given a sequence of Chinese characters and told to pick which one comes next, then told if his answer was right or wrong? Through enough guesswork, and trial and error, he’ll eventually make logical connections about how Chinese characters relate to each other and reach the same level of proficiency such that an outside observer can’t tell the difference between his writing and a native speaker… but he still doesn’t understand Chinese! He still doesn’t know what the words mean.

Now, people have been debating Searle’s experiment for over 40 years now, so let’s not get too bogged down in it. Instead, we’ll move on to the next experiment, called the Octopus Test, by our old friend Emily Bender. Let’s say that Alice and Bob, two native English speakers, are in a shipwreck, and each wash up on two separate uninhabited islands. In exploring the islands, they find that some previous inhabitants have left behind a telegraph line — think texting, but over a wire. Alice and Bob start happily sending each other their theories about Tears of the Kingdom — which has probably released by the time this video gets out, but as of this recording is still about a week from launch. [And as of this transcription, it's been out for a while.]

Now, let’s say a hyper-intelligent deep-sea octopus who can’t visit either of the islands stumbles across the telegraph line, and figures out how to listen in on Alice and Bob’s theories. Yes, that’s weird, but it’s a thought experiment, bear with me. The octopus doesn’t understand any English, but it is really, really good at statistics and pattern-matching. The octopus has as much time as it wants to listen to Alice and Bob’s discussions and figures out how to accurately predict how Bob will reply whenever Alice sends a message. The octopus also learns that some words tend to go together in similar contexts, and maybe even learns to generalize by interchanging some patterns. Either way, the octopus can’t actually see any of what Alice and Bob are talking about.

Eventually, the octopus gets lonely, and cuts the cable, inserting itself into the conversation by pretending to be Bob. And for a while, it works, because the octopus spent a long time listening to Alice and Bob talk about Tears of the Kingdom. Now let’s say Alice starts talking about how excited she is for Diablo 4 and asks “Bob” what he thinks. Now, the octopus doesn’t know anything about Diablo, or even really what a video game is, but it recognizes that Alice is using similar wording to when she was talking about Tears of the Kingdom, so it falls back on how Bob responded to hype for that game, and tells Alice how it looks good, but it’s going to wait for reviews.

One day, however, Alice encounters an angry bear. She grabs a few sticks and frantically telegraphs Bob, asking for advice on what to do, and how to make a weapon to defend herself. To this, the octopus responds that she should use the Ultrahand to add durability to her weapon. The octopus has failed the Turing test, and Alice exits, pursued by a bear.

So, OK, maybe I’m laying my point on a little thick, but LLMs are trained solely on syntactical form. As of right now, we don’t really have a way to computationally quantify meaning, so the very smart folks over at OpenAI, Microsoft, and Google tried to get around that by falling back on statistical relations between words instead of semantic relations. I’d even go so far as to argue that we can’t quantify meaning, specifically because meaning exists outside the formal system of language. The entire point of meaning is the relation between the formal linguistic system, and something embodied in the world. No matter how much clever math and statistics we apply to the formal system, that’s not something we’re going to be able to model.

So, what this results in is an LLM that’s very, very good at replicating the form of a language, but completely and totally incapable of carrying any meaning in its output.

Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

— Bender and Gebru, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”

But all of that is just a critique of language models in general. When you factor in the critique of large language models, you start to see even more problems. See, when they say this model is large, what they’re referring to is the size of its training data… and the training data for ChatGPT is enormous. ChatGPT was trained on five datasets totalling 499 billion tokens. When you have that many tokens to work through, it’s simply impossible for a person, or even a team of people, to document them all. This leads to what Gebru and Bender call documentation debt. This means that if something goes wrong with the model, there’s no way to figure out what’s causing the issue, because nobody wrote down what data was used, because there’s just way too much of it to document! And the training data for GPT-4 is allegedly five times larger than that — though we don’t actually know, since OpenAI refuses to release any information about GPT-4.

And if that’s too much to document, that’s certainly too much to comb through for, say, racial slurs, to say nothing of more subtle biases. But that didn’t stop the plucky team over at OpenAI, who solved the problem by building another neural network trained on all the vilest shit in the dataset and using that neural network to scrub the shit from ChatGPT’s training data. And how did they do that, exactly? Why, by sending tens of thousands of utterly vile text snippets to Sama, a firm that employs workers in Kenya, Uganda, and India to label data for Silicon Valley clients! …For $2 an hour… while denying employees access to the counseling they were contractually entitled to… and then we only found out thanks to investigative journalism from TIME magazine…

Oh, hey, African content moderators unionized! Not just the ones for ChatGPT, but Facebook and TikTok moderators too! Mazel tov! That can’t undo the harm that was done, but it’s a step forward.

So, look, there are plenty of other problems with LLMs in their current form, including the environmental cost of training a model, which is computer-intensive in the same way as crypto mining, and Bender and Gebru do a really good job of detailing the issues in their paper. But there’s one issue that overpowers any other criticism of LLMs and neural network tuning, and it’s one we have to talk about: The myth of AI, and the relentless grift it cloaks.

PART THREE: THE EMPEROR HAS NO CLOTHES ON

If you’ve been following the news on this stuff, you’ve seen the hype, and you don’t really need me to recap it for you. Google’s CEO going on 60 Minutes and claiming that Bard has “emergent properties”, Forbes articles gushing about businesses using ChatGPT to identify their weak spots, OpenAI CEO Sam Altman releasing blog posts about “preparing for AGI”, Washington Post articles claiming that ChatGPT “feels like magic”, it’s all over the place. If you want a highlight reel, there’s a great collection of both articles and dissections of them on Professor Bender’s Medium, I’ll link it in the description alongside her papers.

What I want to get into is the modus operandi behind all this marketing. And what we’re seeing is, in my opinion, a FOMO, or fear-of-missing-out, scam being driven by excessive hype. Nobody wants to be Mike Himowitz poo-pooing the iPhone in the Baltimore Sun right before it changed mobile electronics forever, and so-called AI companies are taking advantage of that hesitance to just outright make shit up. And because nobody wants to be wrong about this, the massive tide of hype is self sustaining and self-increasing. Nobody wants to admit they can’t see the emperor’s new clothes, so the hype can get further and further from reality, because what if it’s true? A lot of people reporting on this stuff don’t know the first thing about machine learning or linguistics… because that’s not their job. They’re trusting the people with the social capital to be considered experts in the field, like the CEO of Google, and those people are feeding them bullshit to make money. And this leads into the scarier stuff, including the letter.

In March of this year, the Future of Life Institute released an open letter calling for a six-month ban on the training of new LLMs “more powerful than GPT-4”. The letter claims that AI systems are now becoming human-competitive at general tasks, and that AI development is on a “dangerous race to ever-larger unpredictable black-box models with emergent capabilities”. They claim that AI R&D should be focused on making AI systems “aligned, trustworthy, and loyal.” Several of their citations talk about “AI alignment” and “existential risk”, and the letter was signed by over 27,000 people, including Turing Prize winner Yoshua Bengio, Yuval Noah Harari, Steve Wozniak, Andrew Yang, and of course, Elon Musk.

The concept of AI alignment is the idea that if we do make a sentient AI, there’s a danger that it could turn against us, like Skynet, or the Machines in The Matrix. So, we need to make sure that the AI is aligned with human values, otherwise it could destroy humanity! And who knows what could happen if China or Russia were to make an eeeeeevil AI, so we need to hurry up and make a good one that COULD BEAT THEIR EVIL ONE

Now, this is patently ridiculous, but the Future of Life Institute may actually believe what they’re saying… because they’re longtermists. Longtermism is a ridiculous ideology that coined the term “existential risk”, and believes, in short, that in the future there will be trillions of digital humans whose potential future lives all have equal moral weight to that of an alive human in the present, and since there are trillions of them their lives matter more than those of the 8 billion real people who actually exist, so we should do whatever we can to ensure their existence, no matter how many real people get hurt. In this context, “existential risk” means a risk towards the existence of those trillions of fake people, not necessarily any risk towards people in the present. And that’s not just me being cynical; Nick Bostrom, one of the first people cited in the letter and one of the founders of longtermism, wrote a paper in 2012 explaining this reduction of human lives to mathematical weighting, and after predicting how many people could potentially live on Earth over the course of its existence, he came to the conclusion that, and I quote,

It's possible that the Earth might, in a good-case scenario, remain habitable for at least another billion years. Suppose that one billion people could live sustainably on this planet for that period of time, and that a normal human life is, say, a hundred years. That means that 10 to the power of 16 human lives of normal duration could be lived on this planet if we avoid existential catastrophe. That has the implication that the expected value — this is when you multiply the value with the probability — The expected value of reducing existential risk by a mere one-millionth of one percentage point — this is such a small reduction that it's unnoticeable, but — Reducing existential risk by a mere one millionth of one percentage point is at least a hundred times the value of a million human lives.

— The end of humanity: Nick Bostrom at TEDxOxford

Even if we use the most conservative of these estimates, which entirely ignores the possibility of space colonization and software minds, we find that the expected loss of an existential catastrophe is greater than the value of 1016 human lives. This implies that the expected value of reducing existential risk by a mere one millionth of one percentage point is at least a hundred times the value of a million human lives. The more technologically comprehensive estimate of 1054 human-brain-emulation subjective life-years (or 1052 lives of ordinary length) makes the same point even more starkly. Even if we give this allegedly lower bound on the cumulative output potential of a technologically mature civilization a mere 1% chance of being correct, we find that the expected value of reducing existential risk by a mere one billionth of one billionth of one percentage point is worth a hundred billion times as much as a billion human lives.

— Nick Bostrom, “Existential Risk Prevention as Global Priority”

But don’t worry! AI alignment is fake, just like the fearmongering about existential risk. Remember, none of the neural networks we have are even on the road to intelligence, it’s just linear algebra and calculus. We don’t have AI.

Jonathan Frakes: It's false.

The very first source on that letter is actually Bender and Gebru’s paper, the one that I started this video talking about, but it cherry-picks quotes from the paper completely out of context and makes the exact mistake the authors of the paper were warning about; and that’s not me saying that, the authors of the paper came out with their own statement denouncing the misinterpretation of their work.

The fearmongering is actually the same marketing as the hype but playing on fear instead of excitement. When Sam Altman says that he’s scared of what’s coming out of his own company, he’s not going to stop making it. No, instead he’s going to refuse to release any metrics about GPT-4, because what if the information about the LLM he’s selling for money fell into the wrong hands and got used for evil? Clearly the safest place for it is in the pocket of a for-profit company. And clearly, OpenAI and Google and Microsoft simply have to work on this technology, because what if some less scrupulous actor got to it first, so clearly, we should give them as much money as we can to do this!

Like, I almost feel insane having to point this out, because it’s just so patently obvious that they’re full of shit. It’s the dumbest possible reskin of Pascal’s wager to try and get more tech companies to jump onboard the FOMO train and give them money. Believe and give us money, and you have a shot at an AI paradise. Disbelieve and don’t give us money, and you risk an AI hell. But even aside from the marketing grift, all this fear about rogue AI and AI alignment has a second, somewhat darker purpose: Ethics-washing.

See, people are skeptical of Big Tech, they always have been. And when self-driving Teslas kill someone, or crime algorithms disproportionately target people of colour, goodwill evaporates pretty quickly. So, it might be understandable that people want these companies to, I don’t know, maybe stop what they’re doing, or at the very least do it in a way that doesn’t hurt people. Enter ethics-washing. Think of how often you hear tech companies talk about the problem of “solving bias” in their algorithms. It’s true that a racist algorithm is a problem, but it’s a problem whose root cause is the fact that we live in a racist society, and when the focus is on technical solutions to fix the algorithm, they don’t have to answer whether they should even be using an algorithm in the first place.

Jeff Goldblum: Your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should.

Not to mention, even when these companies do post their ethics policies, they’re often vague and undefined, and according to AI ethicist Dr. Thilo Hagendorff, ethics policies have no impact on what the engineers at these companies actually do.

In practice, AI ethics is often considered as extraneous, as surplus or some kind of “add-on” to technical concerns, as unbinding framework that is imposed from institutions “outside” of the technical community. Distributed responsibility in conjunction with a lack of knowledge about long-term or broader societal technological consequences causes software developers to lack a feeling of accountability or a view of the moral significance of their work. Especially economic incentives are easily overriding commitment to ethical principles and values. This implies that the purposes for which AI systems are developed and applied are not in accordance with societal values or fundamental rights such as beneficence, non-maleficence, justice, and explicability.

— Dr. Thilo Hagendorff, “The Ethics of AI Ethics: An Evaluation of Guidelines”

The ethics policies are kind of just there for the sake of existing, a document they can point to and say, “look, we’re working on it,” and then not actually follow through. The one thing these companies are scared of above all is government regulation, but by self-regulating, or at least keeping up the appearance of self-regulation, they can essentially tell the government that there’s no need to look too closely, they’re taking care of themselves. And by putting their focus on big scary questions like AI alignment, it lets the other, less terrifying but more important ethical concerns slip through the cracks.

Or, you know, they could just fire ethicists who bring up real problems, like Timnit Gebru and Margaret Mitchell. Hell, why not just pull a Microsoft and fire the entire ethics team! That way you don’t even have to pretend to care!

A COMPUTER CAN NEVER BE HELD ACCOUNTABLE; THEREFORE A COMPUTER MUST NEVER MAKE A MANAGEMENT DECISION

Look, this slide is from IBM in 1979. It lays down the single most important rule of ethical computing, and these companies have run roughshod over it for the sake of their grift. And all of this hype, all this incessant marketing, for a fake technology that doesn’t exist, and a handful of stochastic parrots being hailed as a god.

"But he hasn't got anything on," a little child said.
"Did you ever hear such innocent prattle?" said its father. And one person whispered to another what the child had said, "He hasn't anything on. A child says he hasn't anything on."
"But he hasn't got anything on!" the whole town cried out at last.
The Emperor shivered, for he suspected they were right. But he thought, "This procession has got to go on." So he walked more proudly than ever, as his noblemen held high the train that wasn't there at all.

— Hans Christian Andersen, “The Emperor’s New Clothes”

Now if you’ll excuse me, I’m going to go scream into a pillow.

Bibliography

Return to Writings