Text-to-image has a problem, well it has lots of problems. But I’m going to call this particular issue the Elephant in the Room Problem. The Elephant in the Room Problem presents like this:
Hey ChatGPT-4 draw me a picture of a room with no elephant in it:
You can see the problem. It’s quite a big one. Maybe you just even peered closely at your screen to confirm that there is, indeed, an elephant in the room and it’s not just staring in ominously from the garden.
Maybe it’s just a quirk of DALL-E 2. Let’s pop over to Midjourney and give the same prompt.
That’s even worse. The DALL-E 2 elephant at least attempted to not make a scene. Maybe it’s just a weird quirk coming from an unusual request. Let’s try something more straightforward.
Hey ChatGPT, draw a picture of a man without a beard.
Cool, cool. We’re seeing a pattern here. We can double-check with Midjourney.
Yep.
This isn’t a brand new problem, it has always been there, but it’s getting renewed attention this week. It’s a good example of what drives discourse around AI on social media. For the cynics the above demonstrates why the AI hype is all a bit silly, that we are nowhere close to human intelligence.
But it helps the AI hype merchants too, because they’re merely not hyping AI, they are also hyping their unique ability to harness AI. So the Elephant in the Room Problem becomes an obstacle only the clued-in can overcome.
In truth, most people fooling around with text-to-image have probably come up with workarounds to the odd little ways AI tools stumble on seemingly straightforward asks. And most of us have made our peace that these tools are not perfect, but not too shabby either.
The AI researcher Jeremy Nguyen offers the following solution to The Elephant in the Room Problem, “If you’re frustrated [with text-to-image] and it keeps doing the wrong thing—remember: Don't tell it not to do something. That's like telling someone not to think of a pink elephant”.
So the Elephant in the Room Problem can be solved, as any emotionally repressed person can tell you, by simply never mentioning it.
Small Bits #1 Can you find Waldo/Wally?
Just to stick to the dunking-on-text-to-image-tools theme of today’s Explainable. This was the honest-to-God image I got from DALL-3 this morning when asking for a Where’s Waldo (Wally for UK and Irish readers) image. Leaving aside the copyright concerns there is another almost elephant-sized problem with this result.
Small Bits #2: Don’t do free work
I spotted this last week on LinkedIn. The natural consequence of (legitimate) hype around AI and machine learning jobs and their salary expectations. Obviously, no one should volunteer to help a software company. Or any company for that matter. But here we are.
Small Bits #3: An AI murder mystery
Someone made a murder mystery using DALL-E and posted it to Reddit. Because it’s AI the victim is in lingerie for no particular reason. But the full image allows for zooming in to tiny details and the uploader shared ten images with further detail. The solution is posted, with spoiler warnings, in the Reddit thread and the creator can be found here.