What is MidJourney?

MidJourney is an AI-powered system that creates images from user prompts. On their website, they describe themselves as: “An independent research lab. Exploring new mediums of thought. Expanding the imaginative powers of the human species.”

How does it work?

Similar to OpenAI’s DALL-E or Google’s Imagen, MidJourney is a text-to-image generation tool. It uses an AI model to convert a text input into an image using an AI diffusion model.

The model is trained using a dataset of paired text descriptions and images. The goal of the training is to learn a function that maps text descriptions to images, such that the generated images correspond to the text descriptions as closely as possible.

During training, the generator learns to create images that are similar to the real images in the dataset. Once trained, the generator can take random input, such as noise, and use it to generate new images.

Diffusion models work by altering the training data with the addition of Gaussian noise, gradually removing the details in the data set till it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesises data from pure noise by slowly reducing noise to produce a clean sample.

The model was trained on a set of annotated images, to which the noise was gradually added. Post image

The diffusion model’s goal is to reverse this process and try to remove noise from the image, with the help of the initial textual annotation which is used as a guide. This results in a model that can hallucinate fantastic images from a complete noise.

Post image