Process

Guide: Stable Diffusion InPaint Masked Content Options Explained

When doing research to write my Ultimate Guide to All Inpaint Settings, I noticed there is quite a lot of misinformation about what what the different Masked Content options do under Stable Diffusion’s InPaint UI. To help clear things up, I’ve put together these visual aids to help people understand what Stable Diffusion does when you select these options, and how you can use them to get the AI generated images you want.

What are the InPaint Masked Content options?

Masked Content options can be found under the InPaint tab of the Stable Diffusion Web UI beneath the area where you can add your input image. The default value is “original”.

Masked Content Options located in the InPaint menu in the Stable Diffusion Web UI

These options determine what Stable Diffusion will use at the beginning of its iterative image generation process, which will in turn affect the output result. Here is a quick summary of what these options do:

  • Fill: The InPaint result will be generated off of an extremely blurred version of the input image.
  • Original: The result will be generated based on the original content of the designated sections of the image to be altered. This is what you will want most of the time.
  • Latent Noise: This option is good to select if you want the inpainted output to be very different from the original image, since the designated area will be inpainted based off of noise produced from the seed number. Basically this is starting from a blank slate.
  • Latent Nothing: In this option, Stable Diffusion will fill in the designated area with a single solid color that is a blend of the colors from the surrounding pixels. This option is good to select if you want the InPaint to be extremely different from the original image but still maintain a vestige of its color palette.

To get a sense of what Stable Diffusion is doing for each of these options, we can look at the following visualizations, which were produced with a Denoising Strength of 0.0 and CFG Scale of 10 from Seed 3946908895 using the Euler A Sampling Method to Emma Watson’s face. For this study, we also turned off the Restore Faces option in InPaint.

In Paint Masked Content Options after only 1 Sampling Step
Seed initialization appears to be the same regardless of which masked content option is selected in InPaint
In Paint Masked Content Options after only 2 Sampling Steps
Masked Content options begin to influence Stable Diffusion’s iterative generation process when the sampling step count is greater than 2

When sampling steps are set to just 1, we can see that Stable Diffusion begins to initialize the generation process from the seed in a similar manner regardless of which Masked Content option is selected. At the second sampling step, Stable Diffusion then applies the masked content.

Effect of Masked Content Options on InPaint Output Images

With the above, you hopefully now have a good idea of what the Masked Content options are in Stable Diffusion. The next logical question then becomes: how do I use Masked Content to get the AI generated images I want? To answer that, let’s use all of the same settings as the images above, but increase the sampling steps to 80. I also entered a text prompt to transform Emma Watson’s face into a 2D black and white checker pattern. Since the human mind is very good at distinguishing irregularities in both human faces and simple repeating geometric patterns, these two inputs seemed like a logical method for discerning the differences in color and composition between the output images.

Study showing InPaint Masked Content options vs Denoising Strength

To help interpret the above, we must remember that Denoising Strength is the amount of noise that Stable Diffusion adds to an input image to then iteratively resolve it into an output image that matches the text prompt. Higher Denoising Strength increases variation and reduces the influence of your input image on your output image, which makes high values useful for significant modifications like turning Emma Watson’s face into a 2d black and white checker pattern.

From the above X/Y plot that charts different Mask Options against increasing Denoising Strength, we can observe the following general trends:

  • Original is best for minor modifications that won’t change the input images composition. For significant changes, using “original” might not get you the result you want with InPainting, even with high Denoising Strength
  • Fill retains a faint trace of the input image even at high Denoising Strength. This setting could be used if you want to give SD significant freedom while still retaining some vestige of the input image.
  • Latent noise is what should be used if you want to completely blow out masked areas so that the output has little relation to the input image’s composition or color.
  • Latent nothing is what should be used if you want to completely blow out the inpainted sections so that the output has no relation to the input’s composition but has some minor vestige of the masked area’s color palette. That statement is a little less obvious in this particular study using Emma Watson, but has been my anecdotal observation from other image generations.

Start Playing With InPaint Masked Content Options in Stable Diffusion

If you want to see how adjusting the masked content options will improve your AI generated images in Stable Diffusion, here are a few options to get you quickly started without having to download and install it yourself:

Stable Diffusion Tutorial: How to In Paint
What are Sampling Steps and How to Reduce Them in Stable Diffusion
Stable Diffusion Denoising Strength Explained
How to Train a Custom Embedding in Stable Diffusion Tutorial
What are Seeds and How to Use Them