Together with
OpenAI Releases An Artificial Intelligence Tool That Can Produce a Full Image From Text
Nov 22, 2022
286
OpenAI Releases An Artificial Intelligence Tool That Can Produce a Full Image From Text
Aftab Alam
Aftab Alam
Share

OpenAI is an artificial intelligence company affiliated with Microsoft. OpenAI has announced the creation of an AI system that can take a description of an object and construct an extremely accurate picture of it all by itself. Instead of requiring traditional Photoshop or digital art expertise, the method allows a person to effortlessly change the image using simple tools and text modifications.

"We hope that tools like these democratize people's capacity to build whatever they want," said Alex Nichol, an OpenAI researcher on the project. He believes the tool might be valuable for product designers, magazine cover designers, and artists, either as a source of inspiration or to generate final works. He also suggested that game developers would wish to utilize it to create scenes and characters, despite the fact that the software only creates still photos, not animation or movies.

Because the software could be used to more easily generate racist memes, fake images for propaganda or disinformation, or even pornography, OpenAI claims to have taken precautions to limit the software's powers in this area, such as removing such photos from the AI's training dataset and also adding rule-based filters and other human content evaluations to the images the AI generates.

Similarly, OpenAI is aiming to maintain strict control over the distribution of the new AI, which it refers to as a "research project" rather than a commercial service. According to the company, it is only distributing the software to a "limited and selected number of beta testers." In the past, OpenAI's natural-language processing advancements have routinely made their way into commercial products within 18 months.

OpenAI's program is called DALL-E 2, and it is an upgraded version of DALL-E, which was first launched by OpenAI in early 2021. The name is confusing, but it is meant to represent a mashup of WALL-E, the cartoon robot from the Pixar film, and a play on words for Dali, as in Salvador, the surrealist artist, which makes sense given the system's strange character.

The early DALL E version could only generate graphics in a cartoonish style, frequently on a basic background. In terms of photo quality, the new DALL-E 2 is capable of producing high-resolution images with rich backdrops, depth-of-field effects, realistic shadows, shading, and reflections.

Previously, extremely accurate representations were possible with computer-rendered graphics, but their development required a high level of creative expertise. Simply type "a Shiba Inu wearing a beret and a white shirt" and the AI will generate thousands of lifelike variants on what you just wrote.

DALLE is a 12-billion parameter GPT-3 version that was trained on a dataset of text-image pairs to generate images from text descriptions. We observed that it can, among other things, produce anthropomorphized representations of animals and objects, connect unrelated concepts in plausible ways, generate text, and make adjustments to existing images.

Language may be utilized to educate a huge neural network to do a range of text production tasks, as demonstrated by GPT-3. Image GPT demonstrated that the same type of neural network may be used to generate high-fidelity images. These findings are expanded upon to illustrate that controlling visual notions through language is now possible.

DALLE is a transformer language model, similar to GPT-3. It accepts both the text and the image as a single stream of data with up to 1280 tokens and is trained to create all of the tokens one by one using maximum likelihood. DALLE may use this training technique to not only create a picture from scratch but also to regenerate any rectangular portion of an existing image that extends to the bottom-right corner in a fashion that is consistent with the text prompt.

They understand that work using generative models has the potential to have far-reaching societal implications.

They intend to investigate how models like DALLE relate to social issues such as the economic impact on specific work processes and professions, the possibility for bias in model outputs, and the longer-term ethical challenges posed by this technology in the future.

It also makes image editing easier. What you have to do is simply draw a box around the image part that needs to be altered and enter natural-language instructions to explain the change. For example, You need to make some changes to Shiba Inu?s beret so all you have to do is draw a box around Shiba Inu's beret and text "make the beret red," the beret will turn red while the rest of the image stays the same. DALL-E 2 is also capable of generating the same image in a variety of styles that the user can specify in plain text.

DALL-E 2 was an important step toward OpenAI's goal of creating artificial general intelligence (AGI). According to Ilya Sutskever, cofounder and chief scientist of OpenAI, it is a single piece of software capable of attaining human-level performance or better across a wide range of different jobs. Also that AGI would require "multimodal" conceptual understanding, or the ability to correlate a word with a visual or series of pictures and vice versa.

Natural-language processing has previously been used by OpenAI to achieve AGI. The company's only commercial product is a programming interface that allows other companies to use GPT-3, a vast natural-language processing system that can construct long stretches of unique text and do a variety of other natural-language tasks, such as translation and summarization.

However, DALL-E 2 is not something one would call perfect. In complex settings, it is sometimes unable to render details. Imperfection is seen in the lighting and shadow effects. It also lacks the same understanding of "binding properties" as other multimodal A.I. products. If you tell it to put "a red cube on top of a blue cube," it will occasionally propose putting the red cube below the blue cube.

Write for us

Our writers are independent, remote and growing in numbers. Join our team of enthusiastic authors and begin creating and earning today.

Get Started