How training is done?

Created by NovArch Studio, Modified on Wed, 25 Oct, 2023 at 7:32 PM by NovArch Studio

The idea of reverse diffusion is undoubtedly clever and elegant. But the million-dollar question is, “How can it be done?”

To reverse the diffusion, we need to know how much noise is added to an image. The answer is teaching a neural network model to predict the noise added. It is called the noise predictor in NovArch AI. It is a U-Net model. The training goes as follows.

Pick a training image, like a photo of a building.
Generate a random noise image.
Corrupt the training image by adding this noisy image up to a certain number of steps.
Teach the noise predictor to tell us how much noise was added. This is done by tuning its weights and showing it the correct answer.

Noise is sequentially added at each step. The noise predictor estimates the total noise added up to each step.

After training, we have a noise predictor capable of estimating the noise added to an image.

Reverse diffusion

Now we have the noise predictor. How to use it?

We first generate a completely random image and ask the noise predictor to tell us the noise. We then subtract this estimated noise from the original image. Repeat this process a few times. You will get an image of either a building or a house.

Reverse diffusion works by subtracting the predicted noise from the image successively.

You may notice we have no control over generating a building or house image. We will address this when we talk about conditioning. For now, image generation is unconditioned.

You can read more about reverse diffusion sampling and samplers in this article.