Expert Interview #1: AI filmmakers

Speaking with the creatives behind The Frost

Sep 14, 2023

For today’s Explainable I spoke with Josh Rubin and Stephen Parker from Waymark, the AI video creator agency. They made headlines earlier this year for their DALL-E 2-generated short film The Frost, and they are currently working on a follow-up.

The interview was transcribed using Castmagic (two thumbs up, would recommend) and has been edited for brevity.

Explainable: First off, can you tell us a little bit about your own background and how you got involved in Waymark and started The Frost project?

Stephen Parker (Creative Director, Waymark): We are a technology company, essentially building software to help people make their own commercials. And we did that for several years, largely in a templated capacity. And then in the last two years, we've become heavily invested in AI. You could think of the AI as sort of a mock user on our part, helping to further enable easy access to the platform by essentially using the creative tools we've already produced to build a video on behalf of the user.

We’ve become expert practitioners at the level where we need to say to an AI “this is what the best prompt should look like on behalf of our user in this situation to yield X or Y”. The Frost for us was a chance to sort of run creatively, see how far we could go in that effort.

Josh Rubin (Executive Producer, Waymark): So I guess I'm a filmmaker by trade. I was an editor, and producer, editing trailers in Los Angeles for almost 15 years. And I'm also a documentarian, I've made my own documentary.

Stephen: So we started messing with the image generation first. We were lucky to get early researcher access to DALL-E 2. And I really just kind of got into it right away, started generating all kinds of images, but was most immediately drawn to cinematic sort of portraiture. So I would go away on the weekends, generate a bundle of images, send it to Josh, be like, “hey, what do you think about this?”

We started doing some tests and those tests included a lot of different things: after-effects, film grain, some motion, moderate puppetry, just trying to figure out how we could bring still images to life and sort of convert them into video. The Frost was kind of born out of a weekend effort where I was sort of doing Color Out of Space meets At the Mountains of Madness. And Josh really liked the vibe, the world-building aspect of it was really cool. But he was like “you know, it needs a story. They have to have a reason to go up the mountain. They have to have something driving them. We need a base camp, we have to start somewhere, we have to end somewhere, we need to focus on a few characters”.

We thought it might just be a two or three-minute piece, but it ended up being quite a long run. And because of that, we ended up enlisting more members of our team to help with this.

I taught people how to use DALL-E and sort of use a prompting framework that I'd set up, which was giving me consistent results. And then Josh started writing a script, breaking that out into storyboard actionables. He brought on a second editor and there's a motion person as well that works with him to sort of turn those stills into full-on animation.

Explainable: So it sounds like you're talking about a lot of the fundamentals of creating a film, just using AI tools. It's not as if everything else gets thrown out in terms of storyboarding.

Stephen: Well ours is kind of a live storyboard. For us, the storyboard is sort of a live thing. Like, it really truly is the film right? Because those images are the images we pulled, created, and assembled with DALL-E. And then if Josh wants to make a change or he wants another image, we just insert a spot, put the new image in. We work with those images in sequence in the storyboard, but they're like the real film that you shoot in the end. So it's a much more iterative, live process.

Explainable: What tools, if any, have kind of blown your mind over the last twelve months or longer? And what tools, when it comes to video in particular, have you thought, “okay, we're not quite there yet”?

Stephen: With the project, even as long as The Frost, where you're going to go for, say, three months, it's very possible that the tool is outdated by the time you get to the end of the process. And I think that is something every vested artist, technician, and practitioner is going to experience in the very near future. I stopped off at a rest stop somewhere to enjoy this tool for a bit, and by the time I became proficient, there was another version of the tool.

I would say continuity is maybe the biggest [failing]. Once you find a character, you find a scene or you find something just hanging on to it is definitely not something that these tools do well just yet, although they are always improving. Midjourney has things like blend and image-to-image-to-image that will allow you to sort of create more consistent characters today, certainly, than we could with a zero-shot approach with DALL-E when we first began. But we would love to see consistency of character.

And we love the text-to-video as well. Josh has been using that of late. But the early problems we've seen there like just a lack of motion, a lack of attention to what really makes film film, and how that works.

Josh: I mean, the hottest tool for me right now, I think, is Runway. We're just now ramping up production on The Frost: Part Two. And The Frost: Part One was quite laborious in that we had to animate every single frame. [For The Frost: Part Two] we use all text-to-video prompts and the Runway Gen-2 model, and it's not without any kind of road bumps on there, but it was, in terms of motion, night and day. We had to kind of jump through hoops in order to make characters talk, and to make characters walk, and to make landscape seem believable. And it's just like there was a lot of cinematic labor that went into each and every frame that kind of comes with each 18-second output.

But every time we look around, the technology is, like, lapping us. So it's been an interesting journey to just try and keep up with it and use it. Not just marvel at it, but like, okay, now how can we use this? So that's kind of what we're dealing with, with this new The Frost: Part Two piece.

Stephen: It still seems to me that a lot of this stuff is trained on stock image and footage. We haven't had a moment where Hollywood is like, “okay, guys, here are all the renders for all films. Now we're going to break this up into a bunch of text and image frame generations, and we're going to go and send it to the biggest computers in the world”. That hasn't happened yet. We're still very much at the stock video, plus a lot of other ensemble hacks and tricks in order to get to the end products that we're seeing today.

Explainable: At the moment, it feels like you slightly have to lean into that aesthetic that comes from a lot of this, which is slightly unnerving. Just that kind of uncanny valley-type stuff. That rather than trying to stop it from happening at all, which seems like a fool's errand, you have to lean into it with the storyline as well.

Stephen: Yeah. I mean, Josh really championed that, early DALL-E continues to be full of weird little artifacts. I sort of fought for photographic accuracy for a long time, but Josh was really like, “no, the emotion is here in this shot. The composition is here. This is what I want”. It would be the fuzzier weirder ones versus the more staid, photographic ones. And so we did end up sort of leaning into it as an aesthetic. And video is a little different. The text-to-video stuff has a very sort of morphic quality, where one thing sort of transitions into another. And I've done a few little studies with that. It can be quite fun to lean into, I would say. It is very much its own thing and a new thing, but unless we're making some kind of, like, cosmic monster scene or something, it's not super useful in a typical sort of production sense.

Josh: We've talked about it a bunch of times and we were talking about how DALL-E is another artist at the table, to respect it as such. And it's like, if that's the aesthetic that DALL-E is bringing, then let's go with it. And there were times where it didn't work, where it wasn't appropriate. Where it was either too far off base or too photorealistic or the shot type wasn't there. There's a certain amount of wrestling with the AI that we had to do, but at a certain point, you can't wrestle with it too much. You kind of have to let it be what it is. And, yeah, that evolved into that uncanny aesthetic that you're talking about. I kind of look at it like a graphic novel come to life. That's what it looks like to me.

Explainable: In advertising, in all industries really, there's a huge amount of excitement and perhaps hype around Generative AI and also, simultaneously, a lot of wariness. Are you finding that? That people aren't quite sure how to use these tools or how far they want to go with these tools yet.

Stephen: Yeah, I would say it's still undiscovered country and everybody is kind of trying to feel it out and figure out what the best course is. It's true for us as well. We make use of these tools in an ensemble scenario where they still very much command or take advantage of a creative language that we have authored definitively as humans. So in terms of Waymark the videos people are seeing, that is the result of a lot of human blood, sweat and tears that we put in earlier in the process. And I think we're still listening and sort of waiting to see what the world thinks and what the world decides. And I don't know that we imagine ourselves as anything more than people trying to be conscientious and observant at this point. At the same time, we do really want to know about the tech. We do want to know everything that's possible.

So for us, it's a very fun sort of second medium or second workflow to take advantage of. And I personally am somebody who thinks there's probably still room for all of us when all the dust settles here. Certainly, things will change, roles and productions will definitely be transformed by AI. But I really hope to see some new art forms as a result of this and some new participation.

Explainable: Final question, for anyone starting out in the industry, what advice do you have for them in terms of what you need to learn?

Stephen: Yeah, I mean, writing and art history are definitely not going away. Being a good writer is about editing, about storytelling, and about communicating effectively with these machines. Art history is really about understanding the history of human art. And when you have that context in your back pocket, it gives you something great to communicate with in terms of your interaction with these machines and other people. And I think those two things are not going away.

Josh: I mean, if you're just interested in like, “hey, let's make some cool music videos with DALL-E or Midjourney”. People are doing that all over and getting amazing results. But to me, the thing that excites me is, where is that narrative? How are AI and the narrative cinemascape going to advance?

That's what I would recommend for any young person coming into the world of AI. It's not just like, hey, you can't just type in “make me a movie about robots taking over Arizona” or something. You have to write. You have to know what a three-act structure is. You have to know the building blocks of how to make these things.

Expert Interview is a monthly feature where I talk to creatives about their use of generative AI tools. Their wins, their frustrations, their tips, and their concerns. Want to talk about your own use of generative AI for a future issue? Get in touch contact@explainable.online.

Expert Interview #1: AI filmmakers

Speaking with the creatives behind The Frost

Discussion about this post