You can find out more about all the inner mechanics of NovArch AI in the FAQ section, so let’s go through some examples of what happens under the hood.
Text-to-image
In text-to-image, you give NovArch AI a text prompt, and it returns an image.
Step 1. NovArch AI generates a random tensor in the latent space. You control this tensor by setting the seed of the random number generator. If you set the seed to a certain value, you will always get the same random tensor. This is your image in latent space. But it is all noise for now.
A random tensor is generated in latent space.
Step 2. The noise predictor U-Net takes the latent noisy image and text prompt as input and predicts the noise, also in latent space (a 4x64x64 tensor).
Step 3. Subtract the latent noise from the latent image. This becomes your new latent image.
Steps 2 and 3 are repeated for a certain number of sampling steps, for example, 20 times.
Step 4. Finally, the decoder of VAE converts the latent image back to pixel space. This is the image you get after running NovArch AI.
Here’s how to image evolves in each sampling step.
Image at each sampling step.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article