I don’t like failing to solve a puzzle, so when I ended the Inpainting article on a defeat, I had to figure out a way to do it. So I did some research, and delved into features I don’t fully understand. I just knew that there were ways of making the engine use a sketch as a guide to image generation. So where did we leave off? We had an armored guy whose arm just refused to bend to the side and pretend to hold a banner pole. This guy-

Our starting image.

I had a large inpainting mask covering the whole banner and upper corner of the image. This created some fairly random results. So, how do we control the output better? It involves a feature I don’t fully understand called, appropriately enough, a Control Net. By default, the ComfyUI install doesn’t come with any control net modules. I went to the guys who made Stable Diffusion and found two – control_scribble and control_openpose, as each of these might do what I wanted to. I used the large mask image as a guide to where to draw in my crude sketch of a banner in hand. With these, I set up the new workflow. Here’s the section that’s been modified.

Am I contractually obligated to include these?

As the mask had been exported from the previous workflow into an image file, I decided to use an image to mask node to allow me to make easier adjustments to the mask in a friendlier editor (Photoshop). The one setting on the “Convert Image to Mask” node picks which color channel from the image to use as the mask. Since the image is either full black or full white, any channel other than Alpha works, so I left it on ‘Red’. This still feeds the VAE Encode for inpainting node and that latent goes into the sampler.

At the top, we have all the new stuff. The Load image should be familiar. Load Control Net Mode is just another basic loader, picking something from the ‘controlnet’ directory to feed the “Apply ControlNet” node. The advanced node has some more options and works on the positive and negative conditioning channels separately, the basic version works on a single conditioning channel, which might require a second node. I stuck with the advanced and put it between the prompts and the sampler. The only option I adjusted was the “strength”, which appears to work like any other weighting option to adjust the influence of the control net. At 1.00, I got a stone wall with my sketch drawn on it as a set of inscribed channels. So most of the time it ranged between 0.5 and 0.7 .

This did not get me what I wanted, as one of the results from the open pose control net got me an audience.

Now we have an audience.

I appreciate that the crowd is wearing a consistent uniform, and going to some lengths to stand on the architecture in some semblance of the pattern of my sketch. But this is not what I wanted. After some rethinking, I remembered the lesson from the outpainting article. The results are more reliable the smaller the area to be changed. As the image had a banner in it already, I drew a new mask which only included the current arm and the area I wanted the arm to move to. And ran through scads of iterations, adjusting settings, even adding more prompt elements. The system stubbornly didn’t want to have the arm held out to the side. It was insistent on keeping the arm down along the side. But finally, I stumbled onto a combination of weights that gave me this:

AI hands are notorious.

That’s a messed up arm, but it’s in the ballpark. So I shrank the mask again to just the region of the new arm and ran the inpainting again. The engine promptly decided to try to move the arm back to the side, even amputating the hand because it stuck outside the mask region. This frustrated me for some time, but I again went back to the lessons learned. It can’t move the arm if we don’t give it the space to change those parts which are good. I started drawing tiny masks, nudging the arm into shape, even drawing one from a prompt of “desert, rocks,” to simply erase some of the excess arm. I ran into the same hesitance to leave the arm on the side when I tried to redraw the hand. The first attempts resulted in the hand vanishing, so that it could move back to the side, outside the inpainting mask. I had to use a prompt of “hand, glove, fist, holding pole” to keep the hand in place, though the first iteration was a bit off, giving him three fingers and a fat thumb. At little more tweaking, and he had all his fingers back, but still a fat thumb.

Finally! …wait

Sure this means the flag is slightly small, but he has a hand within margins of error. But I’ve noticed a problem. For whatever reason, all the inpaint runs darkened the main image and made it blurry. I’m not sure why there’s anything being done to those pixels. But I want to clean it up. I could merge the two in photoshop, but I already have everything I need in ComfyUI, so I’m going to do something that may sound odd, but is perfectly valid. I’m going to disconnect the latent feed from the sampler. Without the latent, the sampler won’t run. I’m going to pull in another image loader, and then feed a new node.

No resampling

In the middle there is an “ImageCompositeMasked” node. I couldn’t find any good documentation on this, but made some educated guesses. Since both the source and destination are the same size, I didn’t need to turn on resizing for the source. The mask I used was the second inpainting mask I created that covered the original and intended arm position. So the only question was which image was supposed to be the source and which was the destination. Based upon the behavior of other nodes, the source is what gets placed inside the masked area, and the destination is the background it gets placed over. So the original, brighter image is the destination, and the faded, inpainted image is the source. I didn’t bother to feed the image output to any of the more sophisticated nodes, just a preview node from which I could save the result if I liked it.

After all that work, here is the result:

Finally?

I’m going to claim success.

It’s about time.

We’ve now finished a review of the basic user experience…

… the basic…

… basic …

… …

Dammit.