What is Conditioning?

Created by NovArch Studio, Modified on Wed, 25 Oct, 2023 at 7:33 PM by NovArch Studio

Our understanding is incomplete: Where does the text prompt enter the picture? Without it, NovArch AI is not a text-to-image model. You will either get an image of a building or a house without any way to control it.

This is where conditioning comes in. The purpose of conditioning is to steer the noise predictor so that the predicted noise will give us what we want after subtracting from the image.

Text conditioning (text-to-image)

Below is an overview of how a text prompt is processed and fed into the noise predictor.

Tokenizer first converts each word in the prompt to a number called a token. Each token is then converted to a 768-value vector called embedding. The embeddings are then processed by the text transformer and are ready to be consumed by the noise predictor.

How the text prompt is processed and fed into the noise predictor to steer image generation.