The following is from Benedict Evans newsletter, which comes in a free version, and a more detailed advanced pay version. Both available HERE.
Mr. Evans is considered by many to have “top shelf knowledge and opinions” on tech advances and tech business. His take below is useful, and asks good questions. Others who are “in the know” have somewhat different takes, and ask different questions or get different answers.
Which only makes sense given that what we have today in this field is hard to understand. But what we will have tomorrow looms as much much harder to understand as machine learning writes algorithms we find impenetrable. (See a following post where an AI interface programmer discusses what he knows about how “Prompt Engineering” works today, and where it may be going.)
ChatGPT and AI creation
2m people have now signed up to use ChatGPT, and a lot of people in tech are more than excited, and somehow much more excited than they were about using the same tech to make images a few weeks ago. How does this generalise? What kinds of things might turn into a generative ML problem? What does it mean for search (and why didn’t Google ship this)? Can it write code? Journalism? Analysis? And yet, conversely, it’s very easy to break it – to get it to say stuff that’s clearly wrong. The last wave of enthusiasm around chat bots largely fizzled out as people realised their limitations, with Amazon slashing the Alexa team last month. What can we think about this?
The conceptual breakthrough of machine learning, it seems to me, was to take a class of problem that is ‘easy for people to do, but hard for people to describe’ and turn that from logic problems into statistics problems. Instead of trying to write a series of logical tests to tell a photo of a cat from a photo of a dog, which sounded easy but never really worked, we give the computer a million samples of each and let it do the work to infer patterns in each set. This works tremendously well, but comes with the inherent limitation that such systems have no structural understanding of the question – they don’t necessarily have any concept of eyes or legs, let alone ’cats’.
To simplify hugely, generative networks run this in reverse – once you’ve identified a pattern, you can make something new that seems to fit that pattern. So you can make more picture of ‘cats’ or ‘dogs’. To begin with, these tended to have ten legs and fifteen eyes, but as the models have got it better the images have got very very convincing. But they’re still not working from a canonical concept of ‘dog’ as we do (or at least, as we think we do) – they’re matching or recreating or remixing a pattern.
I think this is why, when I asked ChatGPT to ‘write a bio of Benedict Evans’, it says I work at Andreessen Horowitz (I left), went to Oxford (no), founded a company (no), and am a published author (not yet). Lots of people have posted similar examples of ‘false facts’ asserted by ChatGPT. It often looks like an undergraduate confidently answering a question for which it didn’t attend any lectures. It looks like a confident bullshitter.
But I don’t think that’s quite right. Looking at that bio again, it’s an extremely accurate depiction of the kind of thing that bios of people like me tend to say. It’s matching a pattern very well. This is a probabilistic model, but we perceive the accuracy of probabilistic answers differently depending on the domain. If I ask for ‘the chest burster scheme in Alien as directed by Wes Anderson’ and get a 92% accurate output, no-one will complain that Sigourney Weaver had a different hair style. But if I ask for some JavaScript, or a contract, I might get a 98% accurate result that looks a LOT like the JavaScript I asked for, but that 2% might break the whole thing. To put this another way, some kinds of request don’t really have wrong answers, some can be roughly right, and some can only be precisely right or wrong.
So, the basic use-case question for machine learning was “what can we turn into image recognition?” or “what can we turn into pattern recognition?” The equivalent question for generative ML might be “what can we turn into pattern generation?” and “what use cases have what kinds of tolerance for the error range or artefacts that come with this?” How many Google queries are searches for something specific, and how many are actually requests for an answer that could be generated dynamically, and with what kinds of precision?
There’s a second set of questions, though: how much can this create, as opposed to, well, remix?
It seems to be inherent that these systems make things based on patterns that they already have. They can be used to create something original, but the originality is in the prompt, just as a camera takes the photo you choose. But if the advance from chatbots to ChatGPT is in automating the answers, can we automate the questions as well? Can we automate the prompt engineering?
It might be useful here to contrast AlphaGo with the old saying that a million monkeys with typewriters would, in time, generate the complete works of Shakespeare. AlphaGo generated moves and strategies that Go experts found original and valuable, and it did that by generating huge numbers of moves and seeing which ones worked – which ones were good. This was possible because it could play Go and see what was good. It had feedback – automated, scalable feedback. Conversely, the monkeys could create a billion plays, some gibberish and some better than Shakespeare, but they would have no way to know which was which, and we could never read them all to see. Borges’s library is full of masterpieces no human has ever seen, but how can you find them?
Hence, a generative ML system could make lots more ‘like disco’ music, and it could make punk if you described it specifically enough (again, prompt engineering), but it wouldn’t know it was time for a change and it wouldn’t know that punk would express that need. So, can you automate that? Or add humans to the loop? Where, at what point of leverage, and in what domains? This is really a much more general machine learning question – what are domains that are deep enough that machines can find or create things that people could never see, but narrow enough that we can tell a machine what to look for?