"Synthetic Media and the Future of Film"

The concept of AI-generated films is certainly not a new one for me. I had conceived of the idea as far back as 2014, when generative adversarial networks (GANs) first reared their heads. I remember a conversation I had with a friend at the time where we discussed the possibility of a future where court cases were no longer able to rely on video footage of crimes being committed, since the video could easily have been faked by a third party. Of course, we expected this future to be half a century away – not possible by the end of the next decade.

If you know me at all, it should come as no surprise that I’ve written previously on the advancement of this technology in the years since. One of my best posts, at least in my opinion, is an article I wrote on the 24th of May, in 2022. OpenAI’s DALL·E 2 had been unveiled to the public only one month prior, and I felt the need to sit down and theorize about where the technology was going next. (At the time, the names “synthetic media” and “generative media” were both floating around. Unfortunately, I seem to have backed the wrong horse on this one.)

I have gone ahead and truncated my original 2022 post to cut out the unnecessary pieces that were needed as groundwork two years ago, but not so much today. I have also brought some of the references up to date. If you want to read the entire original post, you can find it: here.

---

It was long believed that the first jobs to be obsoleted by AI would be lawyers and accountants, as those seemed the prime targets. After all, creativity has hardly been the forte of computers for the past half-century, being almost exclusively the product of human effort. However, in recent years, something has begun to change significantly. Widely introduced to the public via OpenAI's original DALL·E model, text-to-image has captured the imaginations of countless individuals who were under the impression that such advancements were still decades away. As even more advanced models rear their heads, such as Stable Diffusion XL, Midjourney V6 and DALL·E 3, we can clearly note that the quality of images is increasing at an incredibly rapid pace. While initial image models were notoriously bad at certain tasks, such as the generation of hands, almost all major problems have been solved. The quality gap between early image generations and real photographs has quickly vanished, leaving only the finest details left to be perfected.

If we assume that the current rate of progress continues, there is only a short amount of time left before the “image problem” is solved entirely, leaving us with no human-viable methods by which to discern them as fake. The lessons and methods learned in the creation of image models have been quickly spreading to other domains, but one in particular has seen its own incredible rate of growth: video.

There are already models that are beginning to tackle text-to-video generation, such as Runway’s Gen-2 and Stable Video. While the current models are currently all making what I call “pan and zoom” videos, where the motion is lacking and it hardly feels like a real video at all, this is clearly a solvable problem on the same level as “Ha-ha look at those stupid image generators, they can’t even do hands!” This is but a brief moment in the long story that is soon to play out.

If we assume a rate of progress identical to the text-to-image models, we should expect to see the “video problem” solved within the next several years. Even if some roadblock appears and slows progress down, the end point is inescapable: we will eventually be at the point where a person is able to sit down at their computer and manufacture a feature length film in as much time as it would take to watch it, complete with a compelling plot and interesting score. When this day arrives, a notable barrier in creativity will have been broken. What the digital camera did to the photography industry, namely increase access and decrease the skill level needed to enter the field, AI will do to the film industry.

At the moment, where one man can easily sit down and create a phenomenal work of art in Photoshop, it is nearly impossible for that same man to create a feature length film on his own. He would have to find and coordinate actors, a composer, and a full film crew. All of this takes time and money; enough of both to make most people who aspire to be filmmakers to abandon their aspirations. This is why Hollywood, as corrupt and condemnable as it might be, is so successful. Most people would probably rather make their own films than watch whatever Netflix felt like financing, but they are simply unable to. However, just as we currently see image models making both the time and money needed for professional illustration work approach zero, feature-length films will soon be the domain of the individual creator, pursuing his or her dreams with endless possibilities and boundless creativity.

The shift will be gradual, but as it is now with photos, the writing will be on the wall. If you were able to insert yourself into movies, wouldn't that be something that interests you? Instead of training to be an actor, moving to California and hoping to get lucky, what if you were simply able to tell an AI model to swap you in for the role of Luke Skywalker? Or perhaps something entirely original?

The time for speculating about this potential future is nearing its end. Just as we would now be foolish to bury our heads in the sand and ignore text-to-image models, we will soon be equally foolish to dismiss text-to-video models. Hollywood is not yet in any danger, and their vice grip on blockbuster films will remain firm for at least a few more years, but it won't last forever. Just as the silver screen was the death of the live theater, AI will be the death of the movie theater.

1/2/2024