In the 90s, before the internet became our internet, there was a long lead-in of (bear with us, gen Z’ers) broadcast ‘news stories’ on ‘television sets’ telling us about the wondrous potential possibilities offered by the online world. For anyone without the internet, ie the majority of us, it was both exciting and entirely speculative. It wasn’t until actual internet access and, say, finding a Jurassic Park: Lost World website that the possibilities became palpable. And, even then, there were years of loud modems and agonizingly slow loading before it felt essential to day-to-day life.
It feels like we’re at that loud modem stage with generative AI. There’s some cool stuff happening, but lots of the commentary is overblown, and some impenetrable. And, unlike the 90s, we have the prism of social media to further exaggerate and distort reactions to developments. So, for example, on Twitter X, for every piece of interesting AI content, there are a few hundred useless AI productivity hack threads from Cryptobros evacuees.
So, for our first edition, and for many of our early posts, we’re going to concentrate on the basics. Today: video and generative AI.
First, a quick explainer for newbies
Broadly speaking, there are two types of AI video generators:
Business-focused providers that, essentially, make it easier and cheaper to produce training videos. Synthesia, Inpicture, Flexclip and at least a dozen other companies are vying over that space. Some of the services include human-like avatars giving, say, mandated HR training. Some just add more automation to video editing work. These startups are attracting plenty of backing, Synthesia brought in $90 million in a recent funding round, but they’re not behind the weird stuff you see shared on social.
Creative-focused providers are the tools that are inspiring, or appalling, anyone who dreams of being the next Scorcese. Runway, Pika Labs, FullJourney, Modelscope and Zeroscope are proving the most popular tools so far. All have their flaws, think of them as the video equivalent of very-early prototypes for text-to-image models like Stable Diffusion or Dall-E. Where they are being used in professional video production it’s likely they involve the same, or even more, manual editing work to make it high quality. This sounds counter-intuitive, but at present the desire to shout “look! It’s made by AI” generally supersedes ease-of-use requirements. These tools are very much behind the weird stuff you see on social.
On top of that, there are tools that help with non-visual elements of the process, such as Elevenlabs for text-to-voice. There are also a bunch of cool developments around more technical aspects of video production and AI tools to help with VFX and animations that we’ll get into in future editions.
What’s actually good?
In April of this year, a German filmmaker tweeted X’d out a movie trailer he had made using AI tools. It was exactly the fever dream you could expect to have after watching The Great Gatsby and eating bad shellfish. And it was also a good, coherent video made using the tools AI filmmakers have to hand in 2023. But it became something of a Rorschach test for how you feel about AI-assisted creativity. A viral tweet X sharing the video and declaring “the end of Hollywood” became a perfect opportunity to dunk on all the hype surrounding the medium.
Social media, and this is a major scoop straight-out-the-gate from Explainable, did not allow for a nuanced take on creative AI video content! It was either an obvious grift or The Future, Now! In truth, it’s a massive leap forward in what we can create, but with massive flaws at present, often leading to comically, or horrifically, bad content.
And because this is all so new, there’s an audience for both the good and the bad, which muddies the picture for those trying to just find the good. An AI-video thrown together in minutes, one that highlights the multi-fingered, Venom-teethed horror show flaws in AI text-to-video, can get plenty of traction. While a painstakingly built video that achieves basic competence can be ignored. We’re not currently in a meritocracy when it comes to AI-generated video. Also, of course, AI video is a really exciting thing for geeky teenagers, so trending AI videos on Reddit tend to be dominated by clips that look like they were cut for time from an Adult Swim sketch show.
So, so far, what works? Being mindful of limitations for one. The strongest creative efforts tend to have a clear hook, they use plenty of AI-generated imagery with some basic video editing to animate the elements. The fake movie trailer format has worked well because it lends itself to using plenty of close-to-static images, voiceovers, and dramatic music.
The Harry Potter by Balenciaga video is currently the biggest AI video success story from an independent creator, at least in terms of YouTube views. But it's essentially a slightly animated series of AI images. The creator, a Berlin-based photographer going by the alias Demon Flying Fox, used Midjourney for the images, ElevenLabs for voiceover and D-ID, an app that makes AI video from photos. If he had entered the same prompt into a text-to-video generator the results may have been funny, or horrific, but it’s unlikely to have reached 10 million YouTube views.
And talk of video metrics also points to another obstacle to assessing the landscape, virality through adherence to online trends like, say, Balenciaga or Barbenheimer tend to dominate the conversation. What about the videos being produced by those happy souls who, like Cillian Murphy, don’t know what a meme is?
So we’re not yet at the Lumière brothers stage of AI-generated video, we’re at the names-from-the-History-of-Film-Wikipedia-page-that-are-too-obscure-to-reference-in-a-newsletter stage. But the next big milestones are probably only a matter of weeks or months from bursting into view.
Important Explainable rule of thumb
Lots of generative AI content looks OK… until you talk to an expert. A copywriter can explain why the marketing blurb from Bard is a hack job. A graphic designer can tell you why an illustration from DALL-E is the devil's work. We’ll be talking to a lot of these experts over the coming year and hearing how they use these tools and why human-led creativity is probably safe for some time to come.
What do the filmmakers think?
We got in contact with Karpi, who freaked out the internet this month with his Heidi reimagining and Caleb from Curious Refuge, who have had recent viral success with AI-powered interpretations of Star Wars and Lord of the Rings in a Wes Anderson-style.
What kind of reaction have you got to your video?
Karpi: Overwhelmingly positive. Even the negative reactions are filled with sarcasm and fun. Most people liked the ‘nightmarish’ quality, the ‘body-horror’ and the sheer weirdness of it all. One funny reply was ‘AI will never replace filmmakers, except David Lynch’. Or ‘these are not the Swiss Alps - this is the uncanny valley’.
Did it take long to make?
Karpi: Less than a day. I’ve never got so much attention for so little work.
Generative AI: an exciting new era for creators or a new way for creators to get screwed out of money?
Karpi: Both. The dangers of generative AI are the dangers of capitalism.
Looking at videos produced using AI input, most creators seem to be using fulljourney, Pika, runway gen 2 and Modelscope. Do you have a favorite generator? And, if so, why?
Karpi: I’ve tried most of them. I like using Gen-2 because of the weirdness and the cinematic quality. The upscaled resolution is also quite usable. They all have their interesting faults and quirks.
Caleb: Midjourney is the best at creating creative and inspiring images. The results that I get from Midjourney are truly breathtaking and realistic (in most cases). However, Midjourney only produces images at this point, so if you want to make those images move you need to use an image-to-video generator like Pika or Runway Gen 2. Pika is giving me slightly better results at this moment in time and it accepts direction better while staying true to the reference images. However, Runway Gen 2 has a better interface that doesn’t get bogged down by other folks' submissions. Generally, I’d say start with Midjourney images and take them to Runway if you want to make them move. It’s probably the easiest workflow at the moment.
What other tools should creators be across?
Karpi: I love to experiment with music and speech synthesis. Elevenlabs is mindblowing. And I started to use RVC to transfer the voices of singers. The possibilities (and dangers) are endless.
Caleb: Chat GPT 4 - I’ve tested every LLM out there, ChatGPT is still the most creative and produces the best results. Midjourney - It creates photorealistic and impressive images that are better than anything I could imagine. Pika & Runway Gen 2 - Both tools create impressive text-to-video and image-to-video results. The quality is a bit bad, but when paired with an AI upscaling tool, the results become much better.
Elevenlabs - It’s a text-to-voice tool that creates the most realistic voices AI voices in the world. I am shocked every time I use the tool that the voices aren’t performed by a real actor. D-ID - This tool animates faces from images in a realistic way. It’s perfect if you want your actors to voice lines or if you just want the subject to move like a real human.
What limitations do you come up against at the moment and how do you see those limitations changing over the coming years?
Karpi: Compared to generative images, generative music and video is not very usable at the moment. But this will change fast. I believe the limitations in the coming years are almost exclusively of a moral and ethical nature.
Caleb: There are many different limitations that we encounter when working on a project. Some of the limitations include consistency between shots, poor text-to-video quality, and the challenges we face when asking AI to generate certain prompts related to cinematography. However, those limitations are going to be removed in the coming months and years. Very soon we’ll be able to type in a prompt and not only get a great, consistent shot. But also type in a prompt and see an entertaining piece of content with multiple shots, a story, voices, expression, movement, and more. It’s going to be wild.
What advice do you have for anyone trying to create video using AI tools?
Karpi: Do it. Try everything. Get a feel for it. Play around.
Caleb: Just start creating. It can be tricky to navigate all of these tools, so if you need some direction on what tools to use or the exact workflows.
What AI filmmakers do you admire at the moment?
Karpi: I honestly don’t know that many. Personally, I like funny stuff, and most of the AI filmmakers I encountered try to be serious or dramatic - and end up producing kitsch in the process.
Caleb: This entire field is so new, so there are very few AI artists creating super awesome stuff. However, there are a few folks I really admire like Cornel Swoboda and the Corridor Digital team who are pumping out some very cool experiments using the tools.