Stable Diffusion AI Image Generator

NGL I'm super chuffed that SDXL works with my 1060 6GB.
Especially now since I'm using SD to generate ideas to be modelled. SDXL makes things much easier on the prompt side.

I mean look at this.

With a simple prompt:
Pokémon, fire elemental horse, realistic, 4k, bokeh

1690494691042.jpeg1690494722011.jpeg
 
I have SDXL running smoothly now in a clean copy of Auto1111. And it's hella fast with "downgraded" drivers generated 3 x 1024 pics in a minute or so. It is a lot more accurate in giving you precisely what you ask for too.

prompt : breathtaking watercolour painting, a dragon sipping on a cup of coffee, in an english tea garden, vivid, imaginitive, fantasy art, uplifting, colourful, 32k, art by George Caleb Bingham, Arthur Burdett Frost, Abraham de Vries, trending on ArtStation, masterpiece

negative : worst quality, low resolution, bad art,compression, artifacts

00030-1210082146.png
 
A few days later and SDXL is running better than ever.

This extension simplifies the process of having to run the base model and then the refiner in Auto1111, now you only tick the "enable refiner" box and you don't have to do the extra img2img \ refiner pass anymore.

This is the "fixed" SDXL vae which allows it to be run in fp16 precision without generating NaNs.

There are already a bunch of LoRAs and models out for SDXL, but what I really want is ControlNet Support... bring it on :)

Gotta love those "it gave me exactly what I wanted" vibes one gets from SDXL prompts as well :

00015-4009651006-landscape photo, eerie cottage in a haunted forest, ghostly trees, horror, ma...png
 
How do I use this thing Stable Diffusion thing? My dad wants to try this out!
 
Running SDXL base + refiner through Comfy in Google Colab.

This is how much RAM and VRAM it uses when free to use as much resources as it wants.

Assigned an Nvidia Tesla T4.

1691356883197.png
 
Last edited:
I really like this new LoRA control extension for Auto1111. In a nutshell, it allows you to use various LoRAs at different strengths at different steps in the image generation. It's like "alternating prompts" but more powerful because it allows you to blend many different styles, subjects, concepts into one style.

In the below image, I started off with a World of Warcraft style render of an Orc, then applied a body horror LoRA to it to make it uglier (Warcraft Orcs are quite cartoony) and then another horror LoRA to make the overall picture more sinister.

The prompt for the below was "a green skinned, brutal Orc holding his axes on the battlefield, fantasy, horror, dread,world of warcraft, bodyhorror <lora:WoWSketcherAlpha:0.8@0,[email protected],0@0> <lora:bodhor:0@0,[email protected],0@1> <lora:Dread:0@0,[email protected],0.3@1>"

00018-056-2657931507-a green skinned, brutal Orc holding his axes on the battlefield, fantasy,...png
 
For those who have been wanting a less intimidating install and simpler interface - there is a new SD frontend named Fooocus, which you can get from here

And here's a quick video tut explaining the app and its basic settings : https://www.youtube.com/watch?v=8krykSwOz3E

This software uses the latest model, Stable Diffusion XL, so you should get very good results right off the bat without advanced prompting, add ons, control net or much else.
 
There's some nice new features in the latest version of Automatic1111 WebUI :


Repository: AUTOMATIC1111/stable-diffusion-webui · Tag: v1.6.0 · Commit: 5ef669d · Released by: AUTOMATIC1111

Features:​

  • refiner support #12371
  • add NV option for Random number generator source setting, which allows to generate same pictures on CPU/AMD/Mac as on NVidia videocards
  • add style editor dialog
  • hires fix: add an option to use a different checkpoint for second pass (#12181)
  • option to keep multiple loaded models in memory (#12227)
  • new samplers: Restart, DPM++ 2M SDE Exponential, DPM++ 2M SDE Heun, DPM++ 2M SDE Heun Karras, DPM++ 2M SDE Heun Exponential, DPM++ 3M SDE, DPM++ 3M SDE Karras, DPM++ 3M SDE Exponential (#12300, #12519, #12542)
  • rework DDIM, PLMS, UniPC to use CFG denoiser same as in k-diffusion samplers:
    • makes all of them work with img2img
    • makes prompt composition posssible (AND)
    • makes them available for SDXL
  • always show extra networks tabs in the UI (#11808)
  • use less RAM when creating models (#11958, #12599)
  • textual inversion inference support for SDXL
  • extra networks UI: show metadata for SD checkpoints
  • checkpoint merger: add metadata support
  • prompt editing and attention: add support for whitespace after the number ([ red : green : 0.5 ]) (seed breaking change) (#12177)
  • VAE: allow selecting own VAE for each checkpoint (in user metadata editor)
  • VAE: add selected VAE to infotext
  • options in main UI: add own separate setting for txt2img and img2img, correctly read values from pasted infotext, add setting for column count (#12551)
  • add resize handle to txt2img and img2img tabs, allowing to change the amount of horizontable space given to generation parameters and resulting image gallery (#12687, #12723)
  • change default behavior for batching cond/uncond -- now it's on by default, and is disabled by an UI setting (Optimizatios -> Batch cond/uncond) - if you are on lowvram/medvram and are getting OOM exceptions, you will need to enable it
  • show current position in queue and make it so that requests are processed in the order of arrival (#12707)
  • add --medvram-sdxl flag that only enables --medvram for SDXL models
  • prompt editing timeline has separate range for first pass and hires-fix pass (seed breaking change) (#12457)

 
So while SDXL is impressive, the older SD 1.5 models should not be dismissed easily either. I made an 8k wallpaper (downscaled to share here) using Deliberate and Dreamshaper, with ControlNet QR Monster to create the hypnotic spiral effect in this forest landscape.

Originally I just created it with a 1:1 aspect ratio and then used Photoshop's new generative expand to make it 16:9 so that it can be used as a wallpaper.

Quite impressed with SD and Photoshop (Firefly) working together so nicely.

Hypnotic Forest.jpg
 
This is a nice extension for Auto1111 to explore more styles built into SD XL and SD1.5 base models.

 
having fun pasting dwarf fortress creature descriptions into dall-e 3

digital art, omo scaldedattacks. a towerering feathered spider, it has large mandibles and it has a bloated body. its mauge feathers are long and sparse.
omo.jpg

da vinci illustration of afe confusedfated, a huge three eyed mosquito, it has large mandibles and it undulates rhythmically. its dark chestnut exoskeleton is wrinkled.



afe confusedfated.jpg

has some difficulty counting though.


digital art, underworld spire the bastion of gripes. a gateway between worlds
location.jpg

just playing around with the prompts.
Respect a biker 5.jpg
death knight 2.jpg


Midjourney is in trouble.
 
having fun pasting dwarf fortress creature descriptions into dall-e 3

digital art, omo scaldedattacks. a towerering feathered spider, it has large mandibles and it has a bloated body. its mauge feathers are long and sparse.
View attachment 1597072

da vinci illustration of afe confusedfated, a huge three eyed mosquito, it has large mandibles and it undulates rhythmically. its dark chestnut exoskeleton is wrinkled.



View attachment 1597074

has some difficulty counting though.


digital art, underworld spire the bastion of gripes. a gateway between worlds
View attachment 1597076

just playing around with the prompts.
View attachment 1597078
View attachment 1597082


Midjourney is in trouble.
Are you using Dall E 3 through Bing?
 
Top
Sign up to the MyBroadband newsletter
X