Update All The Way

In 2017, I looked into a glass orb and saw the future. Well, specifically two glass orbs – the fresnel lenses found in the first-generation Oculus Rift. I had already spent plenty of hours playing with my Samsung Gear VR headset, but the limited interaction and range of motion had me convinced that VR was simply a toy and not anything significant. However, one of my engineering friends at my university had gotten his hands on the department’s Oculus Rift headset and offered to let me test it out. Within ten seconds of wearing the headset and playing a silly little demo where you shoot a bubble gun and let a butterfly land on your finger, I realized that the world was never going to be the same – the Matrix instantly became inevitable. I ascertained correctly that one day, regardless of how long it takes, virtual reality will be one of the most important and pervasive technologies mankind has ever created.

Despite this future being clear and obvious to me, you can – to this day – find countless people on the internet who will tell you with a puffed-out chest and a determined squint that VR is a gimmick and will never go anywhere. This confused me at first. How was everyone missing what seemed perfectly obvious? Even some of the more enthusiastic VR YouTubers didn’t seem to quite get it at first. Even to this day, with the Apple Vision Pro capturing the imaginations of all those unwashed masses who could never afford it in the first place, it seems that the average person has no conception of how virtual reality will end up impacting their lives. In spite of his awkwardness and inability to relate to regular human beings, Zuckerberg seems to be one of the only other people who truly “get it” when it comes to VR. He was off-base with trying to create the metaverse in the early 2020s, riddled with NFTs and whatnot, but his ultimate vision of a shared virtual space where everyone is essentially a mini-god is more than likely correct – if perhaps a little early.

I bring this up to illustrate a point, as I once again see this exact same pattern playing out with generative AI. Just yesterday, OpenAI released their first video model: Sora. When I looked at the outputs from this model, I felt a similar sense of wonder and awe that I felt back in April of 2022, when DALL-E 2 was revealed to the public. However, unlike the DALL-E 2 release, this was not nearly as surprising to me. I had expected video generation of this quality somewhere around late July or early August of this year, 2024. Sora is simply a few months ahead of schedule, well within the margin of error when it comes to these sorts of long-term predictions. Nothing about this has changed my expectations of when the video problem will be “solved”, but that’s precisely the point. My predictions haven’t changed – everyone else’s have.

Over the last 24 hours, I have seen nothing but people freaking on all the major social media websites. “Oh my god!” they say. “This came out of nowhere! I thought AI couldn’t do hands! How is it suddenly able to make videos?” If you’ve spoken to me at all in the last year or two, we’ve probably discussed earlier video generation models, such as Gen-2 or Stable Video. As such, you would understand that this didn’t come out of nowhere, but is simply the latest step in the long march toward perfection. However, it seems that the average person, at least on February 14th of 2024, believed that AI generated art simply had not progressed since DALL-E 2. To them, the idea of AI being able to create videos is as sudden and shocking as if they suddenly learned their pinky toe was conscious and had quite a bit to say.

To borrow part of an earlier post on this website: “The concept of AI-generated films is certainly not a new one for me. I had conceived of the idea as far back as 2014, when generative adversarial networks (GANs) first reared their heads. I remember a conversation I had with a friend at the time where we discussed the possibility of a future where court cases where no longer able to rely on video footage of crimes being committed, since the video could easily have been faked by a third party. Of course, we expected this future to be half a century away – not possible by the end of the next decade.”

In other words, the first time I experienced something even mildly related to the sorts of technology we’re seeing today, I was able to update my predictions all the way to the end. I didn’t update a little and then wait for the next development before updating a bit further, I was able to fully look past the next few years and see where everything was inevitably going. While these developments are exciting to me and help confirm that my predictions were correct, I am unsurprised. I’ve spent the last several years knowing what was coming, because I managed to update all the way.

You can think of it like two men wanting to walk across a street, but seeing a car heading toward them. The first man says, “Just because the car has moved 50 feet in the time since I first looked at it doesn’t mean it’s going to reach the crosswalk! It would have to go at least ten times further than that to get there! That seems unlikely to me, as I am a very smart person and know that cars are heavy and hard to move. It will probably stop after another two or three feet, so I’m all good to walk into the street.”

The second man says, “Oh wow, that car sure covered that 50 feet rather quickly. If it keeps that pace up, it’ll be here in ten seconds or so – even faster if it speeds up. I think it’s probably smart to not step into the street.” And so, the second man stays on the sidewalk while the first man proudly and confidently steps out onto the street, only to be struck dead by the speeding car. His inability to see the issue at hand and update all the way turned out to be a fatal mistake. Unfortunately, it seems to me as though most people would find themselves squarely in the position of the first man – unable to see the inevitable outcome.

People are currently telling themselves lies such as, “AI videos will never replace film makers, they’re not creative! They’ll never replace VFX artists, they’re not smart! They’ll never do this and they’ll never do that, because I say so.” Of course, anyone with long-term memory will recall this exact same set of arguments being lodged against image models two years ago. Just as the critics were predictably wrong back then, they are similarly wrong today. AI video will not stop at this arbitrary point, relegated to being a tool magically useful only to those who already get paid to do the jobs it threatens. It will not only be able to generate background actors or B-roll footage. It won’t even be constrained to simply creating movies. We are looking down the barrel of a loaded gun, primed to completely destroy our ability to tell truth from fiction. As untrustworthy as the internet already is today, we are living in perhaps the final year where you can trust video evidence of anything. There is little time left before anything you can imagine, from incredible cinematic space operas to security footage of crimes being committed, will be easily and freely generated in mere seconds. Anything your eyes are capable of processing will be able to be synthesized and fabricated. There will be no more digital truth.

If any of this sounds ridiculous to you, I would recommend asking yourself this question: “What are the odds that this magic technology that somehow came out of nowhere with nobody knowing it was even possible, what are the odds that it stops here? Right when I suddenly become aware of it, it just stops?” Of course, the answer is that it won’t stop. And in six months when someone improves on it, the public will once again freak out and loudly proclaim how this is both magical and doomed to fail. Meanwhile, I will simply make a little note on my calendar: My predictions haven’t changed – everyone else’s have.

2/16/2024