Stable Diffusion AI Image Generator

I tried it... ran out of memory every.single.time...
Seems to work for me. Except that there's an error depending on the model I use, so I need to check what's cutting with that.
I assume it may be something with the baked vae in the sd1.5 model I have or something not sure.

However, here's an image I've made that I'd never be able to make without the extension.
768x4096 image on a 6GB GTX1060 and without using --medvram. No OOM issues.

00023-630488691.png
 
Seems to work for me. Except that there's an error depending on the model I use, so I need to check what's cutting with that.
I assume it may be something with the baked vae in the sd1.5 model I have or something not sure.

However, here's an image I've made that I'd never be able to make without the extension.
768x4096 image on a 6GB GTX1060 and without using --medvram. No OOM issues.

View attachment 1530283
Default settings? There's such a myriad of settings and dropdowns, and very minimal documentation. Come to think of it, I think I may have generated a super wide nature image with this once a couple of months ago but it was just a test. Whenever I really wanted to use it for something more deliberate... Out of VRAM errors.
 
Default settings? There's such a myriad of settings and dropdowns, and very minimal documentation. Come to think of it, I think I may have generated a super wide nature image with this once a couple of months ago but it was just a test. Whenever I really wanted to use it for something more deliberate... Out of VRAM errors.
Scroll to the bottom.

Not default settings.

The Tiled VAE settings seemed a bit conservative to me.
-I set Encoder to 1024 (Default: 960) and Decoder to 160 (Default: 64).
-Set "Move VAE to GPU" to True

For Tiled Diffusion.
-I set latent height and width to 128 (Default: 96).
 
Scroll to the bottom.

Not default settings.

The Tiled VAE settings seemed a bit conservative to me.
-I set Encoder to 1024 (Default: 960) and Decoder to 160 (Default: 64).
-Set "Move VAE to GPU" to True

For Tiled Diffusion.
-I set latent height and width to 128 (Default: 96).
Ok, so I see that it works for me in TXT2IMG when I use it to make images with large resolutions (1920x1080) where previously I couldn't go much larger than 1280 or so. It doesn't really seem work for me when I try to upscale something in IMG2IMG, like a previously generated or existing image - it seems like it can only upscale to about 1500 on the long edge before I get OOM and I can sometimes get that resolution just by doing a normal latent upscale pass without using this extension, so it doesn't seem particularly useful.
As far as upscaling is concerned, I've managed to get super detailed 16 000 x 9 000 images using the Ultimate Upscale script with the ControlNet tile model.
 
Ok, so I see that it works for me in TXT2IMG when I use it to make images with large resolutions (1920x1080) where previously I couldn't go much larger than 1280 or so. It doesn't really seem work for me when I try to upscale something in IMG2IMG, like a previously generated or existing image - it seems like it can only upscale to about 1500 on the long edge before I get OOM and I can sometimes get that resolution just by doing a normal latent upscale pass without using this extension, so it doesn't seem particularly useful.
As far as upscaling is concerned, I've managed to get super detailed 16 000 x 9 000 images using the Ultimate Upscale script with the ControlNet tile model.
You're going to have to be more specific on what you're actually doing. What settings must I do. Screenshot would be quicker so I knew what upscale you were trying to do.
 
You're going to have to be more specific on what you're actually doing. What settings must I do. Screenshot would be quicker so I knew what upscale you were trying to do.
1. Generate a 512x512 of anything, 20-25 steps, any sampler
2. Send it to IMG2IMG, upscale that x2 using latent upscale, 0.5 denoise, 20-25 steps, any sampler
3. Take the upscaled 1024x1024 image to IMG2IMG, see if you can upscale that again 4x using the Tiled Vae extension using any settings - default or whatever worked for you before. No matter what I do, I cannot upscale it larger than 1500x1500 using the tiled Vae extension.
 
1. Generate a 512x512 of anything, 20-25 steps, any sampler
2. Send it to IMG2IMG, upscale that x2 using latent upscale, 0.5 denoise, 20-25 steps, any sampler
3. Take the upscaled 1024x1024 image to IMG2IMG, see if you can upscale that again 4x using the Tiled Vae extension using any settings - default or whatever worked for you before. No matter what I do, I cannot upscale it larger than 1500x1500 using the tiled Vae extension.
I assume that's the "Just resize (latent upscale)" option.

My guess is that your Tiled VAE encoder is too big so the extension ignores it and you end up doing a normal "untiled" latent upscale. Make sure you're reading your cmd log.

To do an upscaled 1024x1024 image (in my case, to upscale it to 2560x2560), I set my decoder to 512 so it was within the size. Play with sizes between that and 1024. I just did it quickly as proof-of-work.

I only did a 2.5x upscale on the second round. 4x was going to take me 30 minutes lmao
But the point is that it works when upscaling to resolutions higher than 1500x1500.

I also lowered my Tiled VAE Decoder to 128, from 160 as I previously said. Didn't want to risk it OOM'ing after waiting 15 minutes.

Did as you said with steps and denoise. Generated an image. Used the "send to img2img" button. Kept the same prompt. Set up the Tile VAE and Tiled Diffusion stuff. Then Resize latent upscale to 1024x1024. Then sent it back to do a further 2.5x

512x512

00024-3877830417.png

1024x1024

00004-3175804923.png

2560x2560 (converted to Q=90% jpg to reduce size due to forum limit, but the forum resizes it to 1600x1600 anyways) ¯\_(ツ)_/¯

00008-2944320415.jpg

Here's proof.

1685045741897.png

My model hates grass it seems.
 
Last edited:
I assume that's the "Just resize (latent upscale)" option.

My guess is that your Tiled VAE encoder is too big so the extension ignores it and you end up doing a normal "untiled" latent upscale. Make sure you're reading your cmd log.

To do an upscaled 1024x1024 image (in my case, to upscale it to 2560x2560), I set my decoder to 512 so it was within the size. Play with sizes between that and 1024. I just did it quickly as proof-of-work.

I only did a 2.5x upscale on the second round. 4x was going to take me 30 minutes lmao
But the point is that it works when upscaling to resolutions higher than 1500x1500.

I also lowered my Tiled VAE Decoder to 128, from 160 as I previously said. Didn't want to risk it OOM'ing after waiting 15 minutes.

Did as you said with steps and denoise. Generated an image. Used the "send to img2img" button. Kept the same prompt. Set up the Tile VAE and Tiled Diffusion stuff. Then Resize latent upscale to 1024x1024. Then sent it back to do a further 2.5x

512x512

View attachment 1530381

1024x1024

View attachment 1530383

2560x2560 (converted to Q=90% jpg to reduce size due to forum limit, but the forum resizes it to 1600x1600 anyways) ¯\_(ツ)_/¯

View attachment 1530389

Here's proof.

View attachment 1530391

My model hates grass it seems.
Eh... ok... seems quite fickle, then. I'll check it out and see whether it produces any better results compared to the ultimate upscale script, which seems like it can do upscaling a bit faster and with less hassle.
 
Did a test and managed to get a 4096 render from Tiled VAE, got the same same size from ulitmate upscale last night, faster.
The results seem... identical to ultimate upscaler, in quality and with the requirement to do cleanups afterwards. Both methods have issues whereby there are visible seams from tiling in most cases, both also have issues where they generate random glitches, also a result from the tiling, highish denoising strength and SD trying to create "something" from the noise. I had hoped that tiled vae might produce better, cleaner results, but it just seems like another way to do the same thing with very similar results. Even if I used a lower denoising strength, I don't really see any benefit over using Tiled VAE over Ulitmate upscale as the results would be the same - the latter being able to do it faster.
closer.jpg
Compare TVAE vs UUS.jpg
 
Did a test and managed to get a 4096 render from Tiled VAE, got the same same size from ulitmate upscale last night, faster.
The results seem... identical to ultimate upscaler, in quality and with the requirement to do cleanups afterwards. Both methods have issues whereby there are visible seams from tiling in most cases, both also have issues where they generate random glitches, also a result from the tiling, highish denoising strength and SD trying to create "something" from the noise. I had hoped that tiled vae might produce better, cleaner results, but it just seems like another way to do the same thing with very similar results. Even if I used a lower denoising strength, I don't really see any benefit over using Tiled VAE over Ulitmate upscale as the results would be the same - the latter being able to do it faster.
View attachment 1530467
View attachment 1530465
You're missing the point of the extension, especially since you're labelling the left image as Tiled VAE when it's not.

It comes in 2 parts. Tiled VAE to reduce VRAM OOM in general, and the Tiled Diffusion Upscaler which is what you're criticising now.
The Tiled Diffuser also allows for per-tile prompts on top of being able to use ControlNet.

It's not just an upscaler. Read the wiki I linked you.

I'll look into Ultimate Upscale though. Seems interesting. Like a much faster LDSR.
 
Very nice thread @OnlyOneKenobi .

Do you have some guidelines for me I can apply in Architecture / Building Industry / BIM?

I have only used mid journey a bit, but on a basic level. Nothing that really blows my mind away or more than a but of inspiration
 
You're missing the point of the extension, especially since you're labelling the left image as Tiled VAE when it's not.

It comes in 2 parts. Tiled VAE to reduce VRAM OOM in general, and the Tiled Diffusion Upscaler which is what you're criticising now.
The Tiled Diffuser also allows for per-tile prompts on top of being able to use ControlNet.

It's not just an upscaler. Read the wiki I linked you.

I'll look into Ultimate Upscale though. Seems interesting. Like a much faster LDSR.
Sigh. I used shorthand to label the different sections of the screenshot to compare the two quickly. If I was going to be anal about it, I would have called it "The Tiled Diffusion with Tiled VAE manipulations extension used in conjunction with the Tile Model control_v11f1e_sd15_tile [a371b31b] (for ControlNet V1.1.19). You seem to really like it, so if it works for you - great - as far as I'm concerned, its most interesting function, upscaling, it performs that similarly to existing upscalers and doesn't add anything or make any significant difference to my life. If I'm using it wrong, okay... maybe one day I'll see someone do something miraculous with it that peaks my interest and then I'll go and read all the wikis to make sense of it and try to use it similarly.
 
Last edited:
Sigh. I used shorthand to label the different sections of the screenshot to compare the two quickly. If I was going to be anal about I would have called it "The Tiled Diffusion with Tiled VAE manipulations extension used in conjunction with the Tile Model control_v11f1e_sd15_tile [a371b31b] ( for ControlNet V1.1.193". You seem to really like it, so if it works for you - great - as far as I'm concerned, its most interesting function, upscaling, it performs that similarly to existing upscalers and doesn't add anything or make any significant difference to my life. If I'm using it wrong, okay... maybe one day I'll see someone do something miraculous with it that peaks my interest and then I'll go and read all the wikis to make sense of it and try to use it similarly.
OK ¯\_(ツ)_/¯
 
Very nice thread @OnlyOneKenobi .

Do you have some guidelines for me I can apply in Architecture / Building Industry / BIM?

I have only used mid journey a bit, but on a basic level. Nothing that really blows my mind away or more than a but of inspiration
One of the main drawbacks of any image generator right now is that it can only produce low resolution images, even if you use a super powerful GPU. This makes it hard to use them for things like architecture where you need very accurate and detailed renders. You can find some good upscalers that can make your images super high res, but they also change your original image a bit every time you upscale it. This might affect how you use AI in that field, depending on what you want to do. If you just want to make some mockups or something like that and you don't care too much about precision, then SD with ControlNet and MLSD, Depth, Canny and the new "Reference" model will be awesome for "remixing" scenes.
 
@RedViking to give you an idea of what SD can do, here's an image downloaded from Unsplash :

tiago-b-4RZs7nStWZ4-unsplash.jpg

You can use the reference model to generate different but similar scenes :

Ex01.jpg

Or you can use something like MLSD which uses the straight lines from the image as the basis for creating new images.

Ex03.jpg
Add to that, that you can use multiple controlnets together, use different Checkpoint models, LoRA models and a huge variety of styles in your prompting and the potential is enormous, you could make virtually anything with a little time and effort and even enhance or improve the generated images further in Photoshop and so on.
 
OK ¯\_(ツ)_/¯
Alright, I'll admit I was a dum@ss as far as this multidiffusion extension is concerned. I suppose that is for a few reasons... Everyone seems to be "selling" it as an upscaler first and foremost, but having spent some time with it today I see the real value lies in the region prompt control part of it - and I wasn't expecting much of that component because similar extensions like Latent Couple never really did what I expected from it, also I seem to recall it came out around the same time as Latent Couple and Regional Prompt Control didn't seem to work back then as well as it does now. I guess it's improved with time. I suppose to get the most of it, it has to be used with ControlNet or Region Prompt control and consistent prompts otherwise you might get some incoherent, weird tiling images stitched together.

All in all, it works well when it works. I managed to get this pretty decent looking 1080p render from it without too much hassle. I think I'll hang on to it for a while and see what else I can do with it.

Tropical.jpg
 
So there's a new extension for Auto1111 called Inpaint Anything that seems useful, but be cautious if you also happen to use the Dynamic Prompts extension because the former uses a different version of "Jinja" which breaks Dynamic Prompts if you install it. To fix the issue, you'll have to remove the Inpaint anything extension and uninstall \ reinstall Jinja to get DP working again.

pip uninstall jinja2
pip install jinja2

Personally, I'd rather keep using DP instead of IA. Maybe they can co-exist in a future update.
 
It's been a long, long time that I've found any "style" models worth sharing, but if you like a "comic booky" animated style then this one is worth downloading and trying out :


It works very well with all the custom LoRA's I've made and also most of the ones I've downloaded as well. It captures the likeness of a person that was trained on photorealistic images very well in that particular animation style.

Here's a quick render I did of Riaan in this animation style.

RiaanAnimated.jpg
 
Top
Sign up to the MyBroadband newsletter
X