To understand what it is, we will need to first touch on its predecessor, classifier guidance…
Classifier guidance
Classifier guidance is a way to incorporate image labels in diffusion models. You can use a label to guide the diffusion process. For example, the label “house” steers the reverse diffusion process to generate photos of houses.
The classifier guidance scale is a parameter for controlling how closely should the diffusion process follow the label.
Suppose there are 3 groups of images with labels “house”, “building”, and “car”. If the diffusion is unguided, the model will draw samples from each group’s total population, but sometimes it may draw images that could fit two labels, e.g. a house with a car.
Classifier guidance. Left: unguided. Middle: small guidance scale. Right: large guidance scale.
With high classifier guidance, the images produced by the diffusion model would be biased toward the extreme or unambiguous examples. If you ask the model for a house, it will return an image that is unambiguously a house and nothing else.
The classifier guidance scale controls how closely the guidance is followed. In the figure above, the sampling on the right has a higher classifier guidance scale than the one in the middle. In practice, this scale value is simply the multiplier to the drift term toward the data with that label.
Classifier-free guidance (CFG)
Although classifier guidance achieved record-breaking performance, it needs an extra model to provide that guidance. This has presented some difficulties in training.
Classifier-free guidance, in its authors’ terms, is a way to achieve “classifier guidance without a classifier”. Instead of using class labels and a separate model for guidance, they proposed to use image captions and train a conditional diffusion model, exactly like the one we have in text-to-image.
They put the classifier part as conditioning of the noise predictor U-Net, achieving the so-called “classifier-free” (i.e. without a separate image classifier) guidance in image generation.
The text prompt provides this guidance in text-to-image.
Classifier-free guidance scale
Now we have a classifier-free diffusion process using conditioning. How do we control how much the AI-generated images should follow the guidance?
Classifier-free guidance scale (CFG scale) is a value that controls how much the text prompt steers the diffusion process. The AI image generation is unconditioned (i.e. the prompt is ignored) when the CFG scale is set to 0. A higher CFG scale steers the diffusion towards the prompt.
In NovArch AI the recommended value for a CFG Scale is between 6 and 8 for architectural images.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article