AI: 5 Things Dall-E or Midjourney Don’t Do Right (Yet)

problems ia images

Dall-E, Stable Diffusion, Midjourney… the generation of images through artificial intelligence it has only just begun. These three AIs are capable of doing things that just over a year ago would have seemed like witchcraft. During the last weeks and months, the Internet has not stopped being filled with examples of the capabilities of these networks. However, today we are going to move the focus elsewhere, and discuss the common weak points of these artificial intelligences.

Letters, words and texts

A resource that is widely used in the world of video games is to fill the poster scenarios with intelligible languages. Fully synthetic alphabets that look real and form words that don’t exist. Programmers use this trick to save time, because then they do not have to locate the assets for each region in which the games are marketed. Well, if you ask an AI to generate a poster with a text, the result will be a heading or a paragraph with entirely invented charactersvery similar to what we see in video games.

Sometimes, the AI ​​will try to be able to create characters that we know, but will fail to order the letters, or even repeat some of them.

Eyes

stable diffusion eyes

As a general rule, the eyes are a bit resistant to AIs. Software such as Midjourney or Stable Diffusion can generate practically perfect human or animal faces. However, you have to make several attempts to find some eyes that look coherent.

It is quite normal to get Red eyestotally black eyeballs or totally black images lacking symmetry. Within what is acceptable, there are also results in which the artificial intelligence does not finish separating the white of the iris and the pupil. Luckily, there are other artificial intelligences like GFPGANwhich are capable of repairing images that have weird faces or poorly resolved eyes.

Hands

hands stable diffusion reddit.jpg

How many fingers does a hand have? Neither AI is fully clear. to artificial intelligences they have a hard time understanding that the five fingers of a human hand are different. The same thing you get an image of a hand that only has two fingers. Or, just the opposite: a whole catalog of indices and rings. This problem is quite present in Dall-E, Stable Diffusion and Midjourney.

Lateral thinking and context

problem context midjourney

At this point, the three main AIs have their pros and cons, but we come back to a situation where there are common problems. If you push the AI ​​out of their boxes, you will get bad results. Do you want an image of a person with three eyes? Or one of a nine-tailed fox? Well, you can have it complicated, because the AI, sometimes, is not going to understand what you are asking of it. They are quite square, and have been trained in such a way that They don’t want you to break their schemes.

In this same line, we have the context analysis. Dall-E 2 takes the gold medal in this aspect, but that does not mean that you have to explain very carefully what you want the AI ​​to paint for you. For the AI, an egg is one thing, and a fried egg is another. You have to describe the image as if you were explaining it to an alien. Otherwise, you will have a result that will make you laugh out loud, just as it happened to me with the image that I have given as an example. We will talk a little more about the context in the final block of this article, as it is closely related to the last point.

Application of Censorship

censored words midjourney

When GANs began to show the world their full potential, we quickly learned that the censorship It was going to be our daily bread. This topic would give to talk at length in another article, but the problem here is not censorship, but the way it is applied.

We fully understand that an AI prevents you from generating a pornographic image or one that invites self-harm. But it doesn’t make any sense for something that calls itself “artificial intelligence” to work with a list of banned words

warning censorship ia.jpg

In English (which is how you have to interact with the AI), the same word can easily have ten meanings. As long as only one of the meanings is on the list, you won’t be able to use it. And we’re not talking about crazy terms, but about normal and current words that we use on a daily basis. I tried to generate a texture of a leaf with many branches in Midjourney. I received a warning because you can’t paint ‘veins’ on that AI. I tried to create a giant Maine Coon cat that merged with the clouds – I have an image with that same prompt made in SD and they didn’t give me any trouble—. The AI ​​wouldn’t let me—after searching the Collins, I discovered that the term ‘Coon’ can be used with racist connotations. I wanted to generate a painting of a Renaissance woman cutting onions, but I couldn’t; the verb ‘cut’ is also censored.

censorship is the weak point from both Dall-E 2 and Midjourney. In Stable Diffusion, censorship can be bypassed using the software on your own computer. It was obvious that these systems were going to be censored, but the program itself should have tools to determine what is malicious and what is not. Okay don’t let me generate a photo of Lady Gaga, but don’t stop me from generating a dog with Lady Gaga sunglasses. AIs still have a long way to go at this point, as the censorship to which they are subjected at the moment is totally meaningless.