26 Sep, 22

Neural networks: how did they learn to create pictures?

Let’s find out why neural networks learn so quickly, in what professions they will replace us, and why digital progress can be frightening?

By Qrius

How did Neuronets learn to draw?

In just one month, graphical neural networks have transformed from entertainment for geeks into a working tool for illustrators. The main principle of creating images has also changed: now, it doesn’t matter whether you can wield a brush or a stylus – juggling with words is much more critical. The revolution happened almost overnight, and there are several reasons for that.

The first neural network capable of generating high-quality pictures based on text descriptions in English was DALL-E from OpenAI (co-founded by Ilon Musk). The developers unveiled it on Jan. 5, 2021, but the program was unavailable to almost anyone. The second version, which appeared in April of this year, DALL-E 2, was already capable of creating photorealistic images, one of which was featured on the cover of Cosmopolitan. Now neural networks illustrate games like teen patti real cash, and even books. However, there were a lot of restrictions on working with the neural network: the resulting images could not be used for commercial purposes, it was forbi rel=”follow”dden to generate people’s faces, etc. And it would have gone on like that if not for competitors.

We’ll find out:

How the AI began to draw;
How is it dangerous for us?
What will happen next?

The legend

The leading competitor was the legendary David Holtz, who wrote his Ph.D. in hydromechanics at NASA and the Max Planck Society. First, he founded Leap Motion, a company developing revolutionary 3D controllers for the gesture-based interface, and 12 years later, Midjourney, which employs fewer than ten people. While testing a prototype Midjourney neural network last September, Holtz discovered an exciting feature: Most people don’t know what they want. So the AI asks: “What do you want?” – and gets the answer, “A dog.” “What kind of dog?” – “A pink one.” Then the user sees a pink dog in the picture – enough for him.

But if you put people together in a group, someone will add something like, “I want a space dog,” and another will add, “An Aztec space dog.” It’s already a game of imagination: people like to create together. Eventually, Holtz decided to make Midjourney a social app: you must sign up for Discord, the instant messaging system initially favored by cyber athletes, to enter. Midjourney now has a gigantic community on Discord – a million people who come up with new images together. “Every time you ask an AI to draw an illustration, Midjourney doesn’t remember anything it’s done before,” says David Holtz. – It has no will, goals, intention, or ability to tell stories. Will, intention, and stories are us. The neural net is just an engine for imagination. The machine has nowhere to go, but people do. It’s a collective human mind, equipped with modern technology.”

Converging Images

The human brain is fascinating: on the one hand, it subconsciously looks for something familiar in any picture as a symbol of safety, and on the other hand, it is invigorated by novelty. These two principles always guide the brain. When you show a person so-called divergent patterns – images in which the brain cannot find ordinary meanings – he becomes uncomfortable. Artists use this method intuitively; for decades, they’ve been getting to it. And in neural networks, “convergence/divergence” can be changed manually. Correctly balancing neural networks is an art: if you over-twist it, the picture will become uninteresting; if you under-twist it, your brain will boil.

Midjourney solved this problem beautifully: people write requests to Discord, and AI generates images that all participants see and evaluate. Thus, Midjourney users (a million of them!) who like and write comments act as free data markers. And the neural network learns from their reactions: these pictures are good and not so good, so we need to adjust the weights so that there are more pictures of the first type and fewer of the second.

Specialization

DALL-E 2 is considered the best graphical neural network nowadays: it has fantastic quality, a powerful language model, massive database of images – in a word, if you are a fan of photorealism, you’re in this place. Midjourney, on the other hand, is the hottest project, everyone’s favorite. It extracts all the Internet data- pictures and text descriptions. And most importantly, it does not strive for realistic illustrations.

In 1978, Masahiro Mori, a robotics professor, noticed that too humanoid robots make people dislike, fear, or disgust them. This phenomenon was called the “sinister valley effect. Our brain involuntarily fixes small differences, creating a persistent feeling of inconsistency with reality: “something is wrong here. That’s why a photorealistic image has to be of very high quality, especially the eyes. Midjourney initially moved away from this concept and generated “art. But not abstract art. “The world needs more beautiful things, so we want everything to look beautiful and artistic,” says David Holtz.

In the language of machines.

With the advent of AI, the job market has also changed. For example, a new profession has been born in the last few months: prompt-designer. He formulates requests for a neural network so pictures would turn out beautiful. Whereas previously an artist had to learn how to draw, now they have to learn to speak a particular language. During the testing of DALL-E, an exciting feature was revealed: if you add to the text prompt the phrase “Unreal Engine,” then the final picture becomes more contrasting and bright. It turned out that the AI training bases had screenshots from Unreal Engine that were bright and contrasty. And then the imagination kicked in: if you put in “ArtStation” (an online marketplace for professional artists, designers, and illustrators), the image gets more artistic. Even adding the word “wow” would make the picture better.

By the way, you cannot only talk to a neural network descriptively. The language of photography is also suitable: just set camera position, focal length, film sensitivity, shutter speed, and specify brand and type of lens. Or choose a style: “Make it like Helmut Newton if you know a lot about art, order impressionism, realism, or a picture in Van Gogh colors. Close to computer graphics – write in Maya terms.

What professions are missing?

Worst of all will be illustrators of children’s books. Why? Because in books for pre-schoolers, vocabulary is limited, the semantic load is also tiny, the pictures should not be complicated, and getting into this style of AI will be very easy. Here we leave the question of aesthetic education of our children out of the picture. Still, looking at modern publications, we would like to say that we have lost this battle even without the participation of artificial intelligence. However, neural networks already cope not only with simple stories but also, for example, with illustrations of Pasternak’s poems. If progress continues, we will soon see books illustrated by neural networks and written by them.

Disclaimer:

As per the Public Gambling Act of 1867, all Indian states, except Goa, Daman and Sikkim, prohibit gambling
Land-based casinos are legalized, with certain guidelines, in Goa and Daman, as per the Goa, Daman and Diu Public Gambling Act 1976
Land-based casinos, Online gambling and E-gaming (games of chance) are legalized in Sikkim under the Sikkim Online Gaming (Regulation) Rules 2009
Only some Indian states have legalized online/regular lotteries as per and subject to the conditions laid down by state laws. Kindly refer to the same here
Horse racing and betting on horse racing, including online betting, is permitted only in a licensed premise in select states. Kindly refer to the 1996 Judgement by the Supreme Court Of India here and for more information
This article does not endorse or express the views of Qrius and/or any of its staff.

Stay updated with all the insights.
Navigate news, 1 email day.
Subscribe to Qrius