ChatGPT and LLM's -- Raf Summary for February, 2023
ChatGPT and the big stink everyone is making.
I am fortunate to be part of small group that discusses machine learning and AI. They are the smart ones. Maybe I am the moderator. Or fly on the wall. Something.
A recurring topic has been: We don't know how these LLM's work. "But Wolfram describes it step by step!" "That doesn't explain why GPT3 does ___ and ___."
Now the guys saying this are serious math heads, and have been working in this area for some time. They read the science. They follow the math (that I sometimes try to parrot).
I am going to say two things here:
1) It appears, to Raf, that part of the problem is over-estimating what consciousness/sentience is.
2) Thinkers like Carmack and Wolfram are, in Raf's book, to be listened to. They say what they are thinking, and they are some of the clearest thinkers in all of Computer Science. (They aren't the only ones, but they are two we have been discussing recently.)
Part 1: Consciousness
Recent events and observational science (by Zuboff, Gottschall, McNamee and many others) show that humans are highly plastic. Why? We are neural networks that simply build associations from the words we see and hear. That is what brains are. Consciousness and conscious will are thin.
Our egos struggle with saying that an LLM, which is an association network of words, is comparable to a person. Raf proposes that this is simply a problem of perception and ego. If ChatGPT was "behind a screen" (the way orchestra musicians are auditioned), we would consider ChatGPT to be a person of at least medium intelligence.
When someone says "The LLM is sentient" it may be more a statement about human consciousness than a statement about the LLM. An LLM can behave at least as wisely and functionally as a ninth grade student. In other words, as wisely and functionally as a billion people (this number is a Raf guess).
In some circles, there is a phrase: "Check your privilege." In this context, Raf suggests: "Define your humanity humbly."
Part 2: Take a deep breath and listen to the deep thinkers.
To Raf, the essence of Carmack's analysis is: "If you take your entire DNA, it’s less than a gigabyte of information. So even your entire human body is not all that much in the instructions, and the brain is this tiny slice of it —like 40 megabytes, and it’s not tightly coded. So, we have our existence proof of humanity: What makes our brain, what makes our intelligence, is not all that much code. "
As a systems programmer, Carmack is saying (Raf paraphrase): "Look, a finite, manageable amount of DNA specifies how to make a brain. I think I can write that much code myself. It appears to me the problem of recreating a brain is doable in computer code."
A big part of why this seems reachable is that the sphere of big web sites that serve millions (and billions) of users (think Google, Facebook, etc.) has lead to platform technologies that allow us build a virtual big computer out of many cheap small servers. With “the hardware platform” solved, a single software engineer can write a finite amount of code that does BIG things (manages massive amounts of data, builds enormous associative language models, etc.). It is the confluence of platform technologies, large graph models (the models inside LLMs) and the whole internet full of language (to train the LLM on) that got us to GPT2/3 (and coming soon: 4). Some advanced thinking can probably move the football all the way.
When I look at Wolfram's essay, a few things jump out:
"But say all we’ve got is the data, and we don’t know what underlying laws govern it."
That is how human language works. I can't explain most of English grammar, but I use it. ShipRush team linguist, Patricia Anderson,[1] reminds me that linguists are post hoc. They come up with rules to explain what people are already doing. Language evolves by itself and we learn it via exposure (e.g. listening, reading, speaking and writing). IOW: Exactly the way LLMs learn.
Wolfram goes on, discussing visually recognizing distorted numbers:
"And we have a “good model” if the results we get from our function typically agree with what a human would say. And the nontrivial scientific fact is that for an image-recognition task like this we now basically know how to construct functions that do this.
Can we “mathematically prove” that they work? Well, no. "
When Wolfram says (Raf paraphrases follow) "recognition is solved" and "the solve cannot be proven" that is important. He is saying the systems are probabilistic and as reliable as humans. But they are like a meat grinder: The handle only turns one way. We cannot (yet) mathematically prove that what came out was going be correct."
Like Carmack, Wolfram goes from "Let's look at LLMs" to biology. And then to what Raf calls the experiential framework:
"But an important feature of neural nets is that—like computers in general—they’re ultimately just dealing with data."
Which is what humans do... we grow in to data processing machines, with language at the core. Wolfram continues (emphasis Raf’s):
"...what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.
In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language. "
Wrapping This Up
Big Internet and Social Media have deeply changed society. This probably will too.
—-
[1] That is a joke. The ShipRush team had a linguist on it... Patricia, but she was a software engineer doing normal software engineer stuff... in her day job at ShipRush. Her moonlighting gig (from Raf’s chair g) was linguistics.