What is a Diffusion model?

Created by NovArch Studio, Modified on Wed, 25 Oct, 2023 at 7:32 PM by NovArch Studio

NovArch belongs to a class of deep learning models called diffusion models. They are generative models, meaning they are designed to generate new data similar to what they have seen in training. In the case of NovArch, the data are images.

Why is it called the diffusion model? Because its math looks very much like diffusion in physics. Let’s go through the idea.

Let’s say I trained a diffusion model with only two kinds of images: buildings and houses. In the figure below, the two peaks on the left represent the groups of building and house images.

Forward diffusion turns a photo into noise.

A forward diffusion process adds noise to a training image, gradually turning it into an uncharacteristic noise image. The forward process will turn any building or house image into a noise image. Eventually, you won’t be able to tell whether they are initially a building or a house. (This is important)

It’s like a drop of ink fell into a glass of water. The ink drop diffuses in water. After a few minutes, It randomly distributes itself throughout the water. You can no longer tell whether it initially fell at the center or near the rim.

Below is an example of an image undergoing forward diffusion. The building image turns to random noise.

Reverse diffusion

Now comes the exciting part. What if we can reverse the diffusion? Like playing a video backward. Going backward in time. We will see where the ink drop was initially added.

The reverse diffusion process recovers an image.

Starting from a noisy, meaningless image, reverse diffusion recovers a building image. This is the main idea.

Technically, every diffusion process has two parts: (1) drift and (2) random motion. The reverse diffusion drifts towards either building OR house images but nothing in between. That’s why the result can either be a building or a house.