When messing around with various LoRAs, I found a few which tended to create additional unwanted figures within the image. After some testing, I found that it was the result of my habit of creating images at 1024×1024. When the width was halved, the extra people vanished. But I want 1024×1024 images. So what do I do?

I could reopen the images in photoshop and make them bigger, but why should I have to, the image generator supports the size I want? Well, Stable Diffusion supports upscaling in a variety of modes, but in my testing, I found that directly resizing resulted in poor image quality. There are a couple of ways of upscaling, and so I ran some tests. Those tests made my computer cry. After looking into what was going on, I found some of the workflows were asking for the machine to generate images at resolutions of 4096×4096 up to 8192×8192. This was because there were a number of nodes where it scaled by a factor instead of to a value. My monitor is only 2560×1440, so images that big don’t do me any good. There was no point in making the machine do all that work.

So, I refined the workflow to try a variety of methods to upscale the images without going too big. I ended up with a workflow that looked something like this:

This is an incomplete example

It got convoluted as I tried to get all of the combinations of options. That rapidly multiplies into a large number of combinations. So I’ll go over what the various upscaling options I tried and what they were doing. Of those that I got to work, I had three. These are “Upscale Image”, “Upscale Image using Model”, and “Upscale Latent”. Being the most obvious, I started with “Upscale Image”. It takes an image as input and produces an image as output. Within ComfyUI, You have images either from loading directly, or off of a “VAE Decode”, that is, not counting the various upscaling nodes we’re testing. There may be more esoteric nodes which produce an image output, but I’m not digging for special cases just yet.

I needed an image to work with, I used the same prompts as in the previous VAE article, but set the size to 512 and used different sampler values. The image produced was thus:

The vacant expressions really sell it.

So when I tested “Upscale Image” all by its lonesome, I fed it off the “VAE Decode” after the sampler. It has four options – method, width, height, and crop. Width and height are self explanatory. Crop is disabled by default, its only other option is center. So I assume that if the module gets an image bigger than its defined dimensions it will clip the edges to fit when set to center. I did not test this. The upscale methods are of the same types you get when changing an image size in Photoshop, and I tested them all. The results I got were similar to photoshop, but worse. That’s disappointing.

The next node to try was “Upscale Image using Model”. I haven’t dug into upscaling models that much, but they are also separate files that can be acquired and added to your collection like LoRAs, VAEs, and Checkpoints. I have four, some of which came with the software. The results of upscaling via model were all slightly blurry. I was not happy with the results I was getting so far.

The last node was “Upscale Latent”. It had the exact same options as “Upscale Image”. Unlike the first two, this node worked against the latent image channel. But if you remember in the first article, the latent channel in front of the sampler is just a brown field. Upscaling that is pointless, as we can ask for an larger blank latent. However, the sampler node puts out its own updated latent image. So we’d have to wedge this one between the sampler and the VAE Decoder. Lets see what the results are…

That doesn’t look right.

What in the unholy corrosion is that?

So are we out of options? Do we resign ourselves to upscaling in photoshop? Not a chance. We don’t have to accept the raw upscaled latent that came out of the sampler. We can feed it into a second sampler and use that funky image to make a new image at the higher scale. So, it’s time to lay out a second sampler. I cleaned up my workflow to have a more straightforward layout. Right now we’re working with this:

I tend to move disconnect nodes out of the way, pay them no heed.

On the top left, we have removed all LoRAs, giving just a checkpoint feeding a positive and negative prompt and the first sampler, which gets a blank latent image. The prompt is the same one I used for the VAR testing, but the seed is different. It feeds a VAE Decoder that is using the old reliable orangemix and goes to a Preview Image node. Preview Image is a save image node that doesn’t write to your output directory. It allows viewing the work product without ending up with scads of failed attempts in your output. Off of the prompt and sampler nodes, you will notice additional output lines feeding the latent upscaling node and putting the prompts into a second sampler node. Since outputs can feed multiple inputs (but inputs can only take one source) I attached another VAE decoder and a preview node off the latent upscaler so we could see the garbled image that came off that.

This is where we finally start paying attention to the ‘denoise’ setting on the sampler. We want the output to resemble the input, so we can’t give the sampler a free hand. If we left the denoise setting at 1, the only thing the second sampler would pay attention to would be the dimensions of the incoming latent signal. The lower we tune that, the more the sampler sticks to the latent. Now, the latent we have here is pretty messy, having a lot of noise and artifacting as a result of the upscaling. So I ended up settling on a 0.7 denoise value. Since we are feeding the same text prompts to both samplers, it will still interpret the image in the same manner as the original produced them.

So, what did we get?

I like how it added a lamp where the original just had a glow.

That’s not the same image. But we shouldn’t have expected the same image. We ran it through a second sampler, and gave that sampler permission to fill in the details that had been damaged by the upscale latent node. By resampling, we got additional details added into the image that a straight upscale would never be able to do. It does mean that the people have different faces and hairstyles, along with some new tassets for the lady’s gown. But there’s another thing we can do with this workflow. If you’re paying very close attention, you might notice that I’m suing a second checkpoint loader for the second sampler. I said in the first article that you couldn’t use two different checkpoints. Well, that’s partially true. A given sampler can only use one checkpoint. This is a whole second sampler, so it’s an entirely different run through the engine. It doesn’t have to use the same checkpoint as the first.

So I changed the second checkpoint loader to use a different checkpoint from the first. Eagle-eyed Glibs will note that the example workflow already shows this version of the workflow. What this allows us to do is change the entire art style of the piece. I had a few examples that didn’t really show off this feature as they were too close to the margin of error of the original feed, so I dug out checkpoints for comic book styles and old pulp covers. From the same sampler settings, I got very different resampling results.

Teleported into the void.

They made him a villain. Is she a hostage? That expression says maybe.

Since that pulp cover checkpoint has been a pain to work with, I’m not surprised that it wrecked the background and added stuff that I can’t make heads or tails of. I’ve not used the comic book style one much, but it did mess up the arms in between the two figures.

Modification of existing images is a deep rabbit hole we’ve barely touched. We’ll have to touch on a few of those in future installments.