We had brief introductions to text, image and video AI production and now it’s the turn of music. The spectre of AI-generated music feels a little more dystopian. There are plenty of perfectly functional videos, graphics and pieces of text out there, created by people who happily admit they’re not filmmakers or artists or writers. But music feels different. The promise that anyone can create music with a touch of a button hits an AI uncanny valley, as likely to repulse as impress.
It’s still a well-served area of gen-AI though. Most text-to-music providers frame their tools as a step-up on synthesizers or mixing desks, just the new tool that talented artists will benefit from, rather than something that could flatten the gloriously rolling world of music. Meta’s Music Gen and Google’s MusicLM are the heavy hitters. Meta follows its open-sourcing strategy, which means the tool is most readily available from Hugging Face. The Google tool is in beta in the Google AI Test Kitchen.
I got access to the Test Kitchen after a wait of one day with my journalist hat on. Students, tech workers and artists are also included in the application drop-down options, so it doesn’t look like a particularly strict door policy. While MusicLM is served up in the kitchen at the moment, the tools change pretty regularly so can give a good taste of what’s coming down the line.
Meta and Google’s tools both nail an approximation of the type of lo-fi music many of us like to play when we don’t want to be distracted by music. But once you get beyond music-as-inoffensive wallpaper and ask for some variety like, say, a crescendo, the flaws become more clear. At this point, it feels like the tools could serve either, 1: someone who hates music and wants to feel nothing from it or 2: someone who is deeply knowledgeable about music and wants to experiment with something new. For all of us in the middle, we probably have just enough knowledge to make something that sounds awful. Though this also goes for most of us with real-world musical equipment, I suppose.
Boomy and the money promise
The other tool that crops up around music and AI is Boomy. An earlier arrival to the AI music space, Boomy grabbed headlines before generative AI became a mainstream concept. The company pitch includes the prospect of money-making, encouraging users to submit their songs to streaming platforms.
Any actual musician will tell you exactly how difficult it is to make money from music, particularly via streaming payments. What people are buying from Boomy, even if the company isn’t doing the hard sell, is the prospect that AI can be a gateway to immediate earnings.
And the music industry wrestles with new threats and opportunities presented by AI as it grows apace. Spotify recently removed songs not because they were AI-generated, but because their listeners may have been AI-generated. Google and Universal Music are reportedly in talks on how to license the work of artists when used in AI-generated music. In short, and similar to all creative industries, this could leave artists with more tools, but industry gatekeepers with more control.
There’s an episode of Parks and Rec where Ben Wyatt is briefly unemployed and spends three weeks making a terrible, mercifully short, claymation video. It’s both funny and haunting for anyone who has ever got excited about a project that turned out to be, well, nothing.
The terrible video above was made in less than 30 minutes. The audio is from Boomy and involved no creative spark on my part, just clicking three buttons, one labeled “lo-fi” and the other “morning sun”. It was then run through Meta’s MusicGen and looped in iMovie. The video is from Gen 2 by Runway, and that was built from an image generated in Midjourney. That prompt, “a robot playing decks in the style of artwork for a nineties eurodance single”. It’s important to emphasize, this is not meant to be good, it’s just demonstrating that the bar for an aimless person’s cry for help has been raised.
Staying on music, TextFX from Google Lab Sessions dropped last week. It’s a glossy collaboration between rapper Lupe Fiasco and Google. It’s the kind of thing we’ll see a lot of as marketing departments look to make AI developments resonate with wider audiences.
The experiment uses an LLM to “explore creative possibilities with text and language”. It’s essentially a slicker version of that old writer staple of sticking-words-into-thesaurus.com.
So if you want to include a good simile in your dope rap song about, say, King Charles’ sausage hands you type it in and TextFX gives you, “The king’s sausage fingers looked like giant hot dogs poking out of his silk gloves.”
Or, to give another random example, you want to build a chain of semantically related items about, say, King Charles’ sausage hands then TextFX pops back with “sausage, breakfast, pancake, syrup, sugar, sweet, cake, frosting” which is perhaps too American breakfast-related for a rap about a British monarch’s hands but could still be useful.
The possibilities are endless and, it’s important to note, are not solely restricted to lyrics about King Charles’ sausage hands.
My father-in-law once injured his hand stabbing a cabbage for the BBC sound effects department. A stabbed cabbage is a much more HR-friendly way to simulate the sound of an actual stabbing. But the days of cabbage stabbing may be over. Meta also released its AudioGen model last week, which can generate audio effects such as sirens passing or a man whistling. It has a less user-friendly placement on Hugging Face, but it could be a game changer for amateur true crime podcasters. Is that good news though?