Process

Ultimate Guide to Upscale Images with AI in Stable Diffusion

You don’t need to be an AI image generating wizard to want to upscale images. You may simply want to enlarge an image to make a high quality print or you have older digital photos that were taken at lower resolution and wish to increase their size for modern use. For those who do generate images with AI, many models work best at lower sizes such as 512×512 or 768×768. Or perhaps your hardware struggles to run AI image generations at larger sizes. Whatever your reason, you have a digital image that you want to make bigger. Here is a step-by-step guide on how you can do it in Stable Diffusion for all levels of users, and get better image quality than other free and even paid upscaling options.

For the purposes of this guide, I’ll be upscaling the following two images:

Photograph to be upscaled in Stable Diffusion
Illustration of Hayao Miyazaki's Cat Bus in My Neighbor Totoro to be upscaled in Stable Diffusion

What is Upscaling, and why is AI better at it?

Upscaling is the process of taking an image of a lower resolution and converting it to a higher resolution. You probably have been upscaling images already, and just didn’t know it, since upscaling is often referred to as “enlarging” or “resizing.” But what does that mean?

Let’s say you have a 1024×1024 image and you want to double its size. To do that, you need to increase the number of pixels in the image by 2x to make it 2048×2048 pixels. If you want that original image for display on a 4k monitor or TV, you would have to increase it by 4x to 4096×4096 pixels.

In order to increase the size of an image, an app or computer program will have to add pixels to it. This process is called image resampling. There are many ways resampling can be done. Before AI, these were the most common:

  • Nearest neighbor interpolation: Sometimes referred to as basic resampling. Pixels from the lower resolution image are copied and repeated to fill out all the pixels at the higher resolution. Because this looks pretty terrible, sometimes filters are applied after interpolation by apps to smooth out the jagged edges that are created this way. If you’re curious, this article does a great job of visualizing nearest neighbor interpolation.
  • Bilinear and bicubic interpolation: These algorithms takes basic resampling to the next level by examining more of a pixel’s nearest neighbors and calculates a weighted average to produce a higher resolution output. Bilinear interpolation looks at 4 neighbors, whereas bicubic interpolation looks at 16 neighbors. Because they look at more neighbors, these resampling methods tend to produce smoother gradients and transitions in upscaled images.
  • Lanczos filtering: I will leave it to Wikipedia to provide a mathematical understanding of how Lanczos filtering works. For the purposes of image upscaling, it produces outputs that visually look very similar to bicubic interpolation.

So how does AI improve upon these? Well all of the above resampling methods only look at the data provided within the lower resolution input image to generate a higher resolution output. AIs can use data from the image sets they were trained on to fill in the missing pixels during the upscaling process. This means they can actually add in new detail to the higher resolution image, rather than simply scaling the detail provided solely in the lower resolution image. I will demonstrate how this translates into a higher quality upscaled image in the sections below.

How to Start Upscaling in Stable Diffusion

The first step is to get access to Stable Diffusion. If you don’t already have it, then you have a few options for getting it:

Option 1: You can demo Stable Diffusion for free on websites such as StableDiffusion.fr.

Option 2: Use a pre-made template of Stable Diffusion WebUI on a configurable online service. I’ve written an article comparing different services and the advantages of using Stable Diffusion AUTOMATIC1111 v1.5 and v2.1 on RunPod.io. This guide and its screenshots were taken using RunPod.io, but the process still applies if you have the SD WebUI on your local hardware as well.

Option 3: Download AUTOMATIC1111’s Stable Diffusion WebUI by following the instructions for your GPU and platform below:

Once you have Stable Diffusion open, click on the Extras tab at the top. The upscaling menu will be the first one to appear.

Upscale menu under Extras in Stable Diffusion

If you’re just upscaling a Single Image, then you can stay on the first tab. If you have multiple images you want to upscale, then you can go to Batch Process to drag and upload multiple images into the Stable Diffusion Web UI, or you can go to the Batch from Directory tab to specify the folder containing the images to upscale and the folder you want Stable Diffusion to save the output images to. Note that if you produced an AI generated image within this session of Stable Diffusion, you can upscale it without saving and re-uploading it by hitting the “Send to extras” button in the lower right corner of the txt2img and img2img tabs.

The Resize setting under the Scale by subtab is the multiplier you want to upscale by. For example, if you want to increase a 512×512 image to 1024 x 1024, the multiplier you want is 2. The default setting to upscale in Stable Diffusion is 4. If you want to upscale your image to a specific size, then you can click on the Scale to subtab to enter the specific width and height you want to upscale to. If the size ratio of the upscaled image is different from your input image, then you should check or uncheck Crop to Fit depending on how you want Stable Diffusion to handle that difference.

The Scale To subtab in the Upscale menu in Stable Diffusion

Once you’ve specified how much you want to upscale your image, you need to tell Stable Diffusion what model to use. For the purposes of this guide, we’ll just be using Upscaler 1. In some Stable Diffusion WebUIs this setting appears as a drop down menu, whereas in some other WebUIs it appears as a bullet form. Either way, select a model you want to use and then hit the Generate button to upscale your image.

What are the Best Upscale Models in Stable Diffusion?

If you’re anything like me when I first started using Stable Diffusion, you probably looked at the list of upscale models with incomprehensible names and acronyms and wasn’t sure what to pick. The list of available upscale models varies across different Stable Diffusion WebUI versions and templates, so I will be covering the most popular AI models against traditional resampling methods. Up first, we have the photograph of a kitty:

As expected, the nearest neighbor interpolation method on the far left looks the most pixelated. Given its huge popularity as an image editor, I also tried Adobe Photoshop‘s resampling methods (which can be found under Image > Image Resize and then selected from the Resample drop down menu) and found that the Bicubic Smoother worked the best for our kitty photo. It smooths out the jagged edges in the kitty’s fur, but looks blurry under close inspection. The Lanczos model in Stable Diffusion looks extremely similar, which is not surprising since bicubic interpolation is considered a more efficient form of the Lanczos model.

The next two images in my comparison demonstrate how machine learning can produce extremely high quality upscaled images. ESRGAN_4x is based on the Enhanced Super-Resolution Generative Adversarial Network, and you can see in the side-by-side comparison why it won some awards. The upscaled image it produced is crisp and added some hair fibers to fill in areas that were missing in detail.

LDSR, which is the Latent Diffusion Super Resolution upscaler that was included with the release of Stable Diffusion 1.4, also creates a decently crisp image, but it has a lot of drawbacks. The biggest one is that LDSR is extremely slow. The LDSR image took about 20 times as long to process as the other upscale models! It also noticeably altered the color profile of the kitty photograph, making the whole thing greener. For these two reasons, I do not recommend using LDSR for upscaling images in Stable Diffusion.

But what if we want to upscale an illustration, cartoon or anime instead of a photograph? Well that is where the image of Hayao Miyazaki’s Cat Bus comes in:

Here we can see that models good at upscaling photographs are not necessarily the best for upscaling illustrations and other forms of art in which crisp lines and solid colors are desirable. ESRGAN_4x introduces a substantial amount of noise to the upscaled illustration. The SpinIR model does a decent job but ESRGAN 4x+ Anime6B is clearly the best at upscaling the Cat Bus’ whiskers.

For additional comparisons of more upscaler models, you can check out this comprehensive post on Reddit written by /u/Locke_Moghan/.

These comparisons demonstrate an important concept: the best AI image upscaler models to use are often the ones specifically trained on the type of art you want to upscale. However, many Stable Diffusion Web UI versions only have a specific set of models. How do you know what they were trained on, and if they’re not good at the type of image you’re trying to upscale, how can we find and use different ones? That is what we will cover in the next section.

How to Find and Use Custom Upscale Models

If you want to use upscale models that are not included in the default set provided in the version of Stable Diffusion you have, there are a few places you can look for them:

  • For custom trained models, the best place to start is this Custom Model Database, a community wiki listing many different custom models trained on different art styles and subjects. It includes links to where to download the upscale model files, recommendations on what multiplier they are optimized for, and what they’re trained to do. In addition to models focused on photo restoration and manga, there’s also models specialized for upscaling skin, faces and foliage. And yes, there is a custom trained upscale model for cats.
  • For models officially released by their original developers like new versions of ESRGAN and SpinIR, you can find them listed under the Official Model Database.
  • Hugging Face also has a repository of upscale models available for download, but documentation on these are rather sparse.

Note that the upscale model files have the *.pth file extension.

Once you’ve downloaded the upscale model you want to use, you need to add them to Stable Diffusion. Open up the models directory in your Stable Diffusion directory. Inside of the models directory are subfolders with names corresponding to the official models. You will want to put your model file ending in *.pth in the folder of the official model it was trained on. For example, most of the models in the Custom Model Database were trained with ESRGAN architecture. Therefore, you will want to put those in the ESRGAN subfolder in Stable Diffusion’s models directory, like the NMKD-Superscale model file shown in the image below.

A custom upscale model file in the ESRGAN subfolder in the models directory of Stable Diffusion

Once you’ve added the file to the appropriate directory, reload your Stable Diffusion UI in your browser. If you’re using a template in a web service like Runpod.io, you can also do this by going to the Settings tab and hitting the Reload AI button. Once the UI has reloaded, the upscale model you just added should now appear as a selectable option in the Upscaler 1 setting under the Extras tab in Stable Diffusion.

With this, you should now know everything you need to start upscaling images!

More Stable Diffusion Guides and Tutorials

Stable Diffusion Tutorial: How to In Paint
How to Get All the Features of Stable Diffusion Without Installing It Yourself
How to Train a Custom Embedding in Stable Diffusion Tutorial
Stable Diffusion CFG Scale Explained
Stable Diffusion Denoising Strength Explained
What are Sampling Steps and How to Reduce Them in Stable Diffusion