Here's how to get the new model from Stability AI, Stable Cascade running locally in Automatic1111 or Forge :

 
I've been trying out Forge and it's pretty darn great - it has all of the features and functions of Auto1111 and more, plus it is better optimised and runs quite a bit faster. It can even do "high res fix" and upscale base images 3x in one go.

I'd recommend you do the manual install instead of the 1-click installer. Here's the guide to get it up and running :

 
Stable Diffusion 3 announced :
  • Stable Diffusion 3: Stability AI announces the preview release of Stable Diffusion 3, which shows significantly improved overall generation quality in early demos.
  • Improved performance: Specifically, Stability AI promises improved performance on multi-part, complex prompts, image quality, and text writing capabilities. The model is not yet generally available, but there is a waiting list that you can sign up for.
  • Safety precautions: Stability AI says it has taken numerous safety precautions to prevent the model from being misused by malicious actors, starting with training and continuing through testing, evaluation, and deployment.
  • Other models: Stability AI has recently released several new models, including Stable Cascade, a very fast text-to-image model, Stable Video Diffusion, a generative video model, and Stable Zero123, a model for text-to-3D applications.
 
Another super useful IP Adapter has been released, this one is used for referencing composition in images and then generating similar images based on that. As explained in the below video, this is different from the existing ControlNets such as Canny or Openpose which would be used to copy an exact pose - the composition IP adapter essentially makes an image that replicates the placement and lighting of your subjects in a new picture.

 
I really didn't want to dig too deep into ComfyUI as it looks daunting to learn, but it turns out that the Forge repo is essentially being discontinued, and it seems everyone and their dog is using ComfyUI for stable diffusion these days. I guess I'll share my experience with Comfy if all goes well.
 
I really didn't want to dig too deep into ComfyUI as it looks daunting to learn, but it turns out that the Forge repo is essentially being discontinued, and it seems everyone and their dog is using ComfyUI for stable diffusion these days. I guess I'll share my experience with Comfy if all goes well.
Forge has a second branch that apparently is still getting updates. I'm also put off by the comfy noodle monster so I'm still using A1111. I also have stableswarm installed but haven't used it yet. Apparently will be SD3 ready at launch in 2 days time.
 
Here's a screenshot.
f3d8574891b91d4f07d3c703eea8d1ff.jpg
 
After an hour or two, I think I've gotten to grips with the basics of Comfy. I will still keep my Forge setup as it is familiar and easier to use when trying to do controlnet and inpaint related stuff, but the potential for ComfyUI is huge, especially for video.
 
Running inside Krita (free drawing / image editing app)

Been using it for a day. He's added region support just yesterday. Pretty good. I can't see why professional artists wouldn't start using this going forward. It's a very good aid.
There's some UI issues that need to be worked out, but it's very good. And the LCM "live preview" is, at least on a GTX1060, pretty quick. I can imagine it being almost instant on newer cards.

This was started with simple line work for the road and buildings, the character had a little more in order to establish position, arm placement and so on.

1718174035216.jpeg


Then over time, around 3 hours last night, I slowly changed and added detail which it then tries to interpret. The hair for example, was drawn by me to that shape with the AI filling and adding. As you can see it's not perfect. Trying to add an Alice band required a redraw of the hair around that part which you can see has affected the background but that's due to unclean layering which can be quickly fixed. Colour changes were also manually added, and understands that well.

1718174054288.jpeg
 
Last edited:
I have Stable Diffusion 3 up and running in Comfy UI. First impressions are that it is somewhat better in adhering to the specifics of prompts (though it did ignore my negative prompt for watermarks), but image quality wise, there are countless finetuned 1.5 and SDXL models that look way better than what SD3 is outputting. Maybe it's a matter of updating my prompting style and getting to grips with Comfy UI a bit more, but so far, I'd give it a solid 5 out of 10. The model is pretty darn huge at 15.8GB for SD3 medium with the clip models included.

ComfyUI_temp_uyllo_00007_.jpg
 
I trained a new LoRA last night, mainly because it's been a while since I last made one and to see whether the workflow I used before still works. It seems like it does. You can grab it here if you want some Zombies in various styles for your generated images... here's one made with the "cartoon me" model.

00030-743221606.png
 
I found a fantastic ComfyUI workflow that functions similar to "Magnific" - in some situations it will probably change the look of the subject slightly, but depending on your requirements, it's just amazing for upscaling old low res photos and adding detail to them. Unlike most of the other creative upscalers I've encountered, this one seems to preserve the look of the subject more closely. I've already tested it on some old landscape photos taken with a 2000s era digital camera (640x480 resolution). The results are impressive and the upscale is almost 10x. It does introduce some noise and slight weirdness in areas but it's nothing that can't be cleared up with a bit of Photoshopping.

 
I'm quite happy with my progress in ComfyUI. I've made a (complicated) text to image workflow that handles photorealism very well. It is a 2x high-res pass process and optional nodes for Style and Face copying, as well as a high-res pass ControlNet to ensure the base image isn't changed too dramatically during the second and third passes, even if the denoising parameters are set high.
Image output example :
HistoryRepeatsItself.jpg

The workflow :
ComfyUI.jpg
 
Phew, the new "Flux" stuff is moving fast. There is already a quantised model available for people who don't have the most powerful GPU's (I tested on my 3060 and it's a 50-60% speed improvement), a realism LoRA which makes photos generated with Flux look more photo realistic, and a ControlNet Canny model which I haven't quite figured out yet (but will soon).

The quantised model seems to be just as good, or perhaps better compared to the previous Flux Schnell model I'd tried out earlier. It's much faster - it can output a 1366x768 image in 1.5 minutes, which is comparable to an SDXL model with 2 upscale passes, instead of the 5 or so minutes it took using the original model. It seems to handle text better too, as it hasn't made any spelling errors or messed up the words on any of the images I'd generated so far.

RoboFlux.jpg
 
Been a while since I've posted here. Anyway, I've been doing some photo and AI bashing in a custom ComfyUI Workflow, to make some interesting, non-typical creature designs for various purposes. Quite happy with how it's turning out so far, I'll share some of the other results (those will feature in other projects first) but this one works as a stand-alone piece. I think SDXL can sill give Flux and Midjourney a run for their money when one puts some effort in to it.

Cyclops Creature Resized.jpg
 
Some people have said that Flux doesn't do photo realism, landscapes or photos of "normal" people very well. Well, I'd say I've managed to come up with some workflows that can do these sorts of things pretty well. I might be speaking for myself, but I would be hard pressed, (if I didn't make them) if I saw these pics to figure whether they were photos or AI rendered pictures.

IMG_20240607_161229_0000.jpg

IMG_20231201_052245_0004.jpg

IMG_20230124_031211_0000.jpg

IMG_20230429_144936_0002.jpg

IMG_20220516_150944_0001.jpg
 
Some people have said that Flux doesn't do photo realism, landscapes or photos of "normal" people very well. Well, I'd say I've managed to come up with some workflows that can do these sorts of things pretty well. I might be speaking for myself, but I would be hard pressed, (if I didn't make them) if I saw these pics to figure whether they were photos or AI rendered pictures.


View attachment 1782288
The guy with the toys-are-us colostomy bag in the back is the giveaway.
 
Top
Sign up to the MyBroadband newsletter