In my previous article on AI Art, I admitted to not knowing what the VAE does. I did find that it stood for Variational Autoencoder. The rest was just a magic black box. The articles I tried to reference threw a lot of math at me. So I decided to approach the question experimentally. On the way there, I found out what some of the other magic from the previous article was. I admitted to borrowing keywords from other people to speed up getting somewhere. I’ve since been digging into what those do as well, and uncovered an aspect I’d previously overlooked.

Automatic1111 and ComfyUI have different syntaxes for the prompts, which disguised the fact that some of those magic words are triggers for another type of module. That is, Embeddings. Embeddings are much like LoRAs, but are triggered exclusively from within the prompts. The Automatic1111 syntax doesn’t require anything but the Embedding name to trigger it, while ComfyUI differentiates them with a prefix of embedding: added before the name. This can be combined with a weight value to end up with something like embedding:name:1.5 and be valid. A number of these are specifically designed to be in the negative prompt. I’ve seen a number of names for that type, but it’s not entirely consistant. They work by including all the nightmare fuel you don’t want, and by including them in the negative, it nudges the engine towards the positive.

So, I’ve updated the prompts I’m using to make use of these embeddings. I also got lectured about representation from a fictitious source. So I updated the prompt to add a lady alongside the knight.

The new positive prompt is “knight with lady in gown in ballroom, masterpiece, best quality, sleek, highly detailed, digital painting, realistic digital painting, detailed digital painting, smooth gradients, caucasian, knight, armor, futuristic, cyberpunk, sleek, highly detailed, anime, indoors, ballroom, soft indoor lighting, day, digital painting, 1man:1.5, 1woman:1.5, muscular male, shapely female, duo, couple, redhead, green eyes, looking at viewer, armor, realistic, highres, smooth gradients, detailed face, realistic skin tone, youthful, strong, fantasy art, youthful, strong, full body”

The negative has been trimmed down a bit to “NSFW, (worst quality:2), (low quality:2), (normal quality:2), text, signature, lowres, watermark, embedding:EasyNegativeV2, embedding:HDA_BadHands_neg-neg, (embedding:bad-hands-5:1.5), embedding:BadDream, embedding:UnrealisticDream, (extra fingers, deformed hands, polydactyl:1.5),”

So many numbers

Now, in order to see the impact of the VAE on the images, I need to do cut down on the other variables. So, it’s time to look at some of the options in the sampler. We are mostly interested in the first two “seed” and “control_after_generate”. Since computers don’t do true random, the seed value controls what the rest of the system ends up. All other values being equal, the same seed will make it generate the same image. You can either enter a manual seed, or let the application come up with another one. “control_after_generate” gives the option of ‘randomize’, ‘increment’, ‘decrement’, or ‘fixed’. Randomize is the default on installation. The key thing to bear in mind is that ‘after_generate’ is literal. When set to randomize, it will generate an image then produce a new seed value. So when set on randomize the seed value displayed will not be the one that produced the image just saved, it will have been made from the previous seed value. I’ve found that when switching from fixed to randomize, hitting “queue prompt” will produce a new seed but not an image. So you can toggle to randomize, generate a new seed, then switch back to fixed before creating an image in case you want to save the seed value.

That is what I did to come up with my test image for the VAE. Once I had something to work with, I left it on fixed. This kept everything consistant between images so that the only difference would be the VAE’s effects on the image. To do the test, I went and accquired some more VAE modules so I’d have a wider range of data to compare. Counting the default VAE, I ended up with a total of nine. How convenient for making a grid. So I fed the same settings through all nine and here’s what came out.

So, that’s more of a hallway than a ballroom, but we’re looking at the visual effect of the VAE. And wow, two of those really scrambled the image. Both of those were sdxl versions. After poking around I found the most probable reason. It has to do with the checkpoint being used. The two VAEs that produced the distorted orange images were SDXL versions. The checkpoint I used was not. SDXL is a variation of Stable Diffusion which may or may not be an improvement. I have a checkpoint which is set up for the XL version. I came up with a lengthy analogy to NTSC and PAL formats, but realized it might have not clarified the issue. But if I switch to that checkpoint, we get a very different grid.

So much TV static

Fun.

All things consitered, I think I’m going to default to orangemix still.

Now I still have two big topics to cover that I know of. Both involve changing an existing image in some manner. But we’ll save those for later articles.