What is Conditioning?

Created by NovArch Studio, Modified on Wed, 25 Oct, 2023 at 7:33 PM by NovArch Studio

Our understanding is incomplete: Where does the text prompt enter the picture? Without it, NovArch AI is not a text-to-image model. You will either get an image of a building or a house without any way to control it.


This is where conditioning comes in. The purpose of conditioning is to steer the noise predictor so that the predicted noise will give us what we want after subtracting from the image.


Text conditioning (text-to-image)


Below is an overview of how a text prompt is processed and fed into the noise predictor. 


Tokenizer first converts each word in the prompt to a number called a token. Each token is then converted to a 768-value vector called embedding. The embeddings are then processed by the text transformer and are ready to be consumed by the noise predictor.

How the text prompt is processed and fed into the noise predictor to steer image generation.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article