Replies: 6 comments 16 replies
-
mixing upscaled latents into a larger latent image to maintain composition information will artificially reduce the amount of gaussian noise in the image faster than the noise schedule accounts for. This is most likely going to get you blurry images. It's also not very different from your typical "highres" workflow where the desired composition is given by re-noising an upscaled image to add the finer details, except here the amount of noise is in accordance with the noise schedule. |
Beta Was this translation helpful? Give feedback.
-
The blur thing sounds plausible, though "in exhaustive". First, can't schedule be modified to account for mixing? Second if using ancestral samplers/those that add noise, I think it shouldn't be too big of a problem. The typical hi res fix I find extremely underwhelming, as you have to "balance" (more like tradeof with both sides being kinda shitty), between no enough de-noise and thus you do not really get appreciable enhancement of detail, or too much and you get distortion/artefacts, almost as if you started with with high res image). I have tried some weird flows to get better, high res results, and I think they manage to get somewhat better results (though easy to become biased here, and not sure how well it's objectively measurable) (oh I'm also not looking into how similar upscaled image is to original, I'm more interested in good high res image, that decently follows prompt, and low res image I take as "guiding" in between step). So one flow is, upscale latient, then unsample it, (tried lots of settings, but seems euler, cfg1, and decent number of steps [for now I settled on 40] gives best results later on), then from unsampler go to ksampler (adv), don't add noise, use "appropriate" cfg (not sure about that, might depend on resulting image size and maybe other things), different sampler, almost certainly ancestral, different number of steps ~20 (in general I like ancestral samplers much more then on a, as they need much less steps, and maybe just give better results overal). And that's it, it's possible to get quite high res results like that, though still not great, as it still suffers from "incorrelation". But all that still suffers as there is no "integration over scale". And I thin such "scale averaging sampling" could solve it, again with appropriate weighing of factors, "large scale" denoising being important at start, and then almost not at all near the end of sampling. The unsample-resample flow, is a thing that makes me think this should work much better than regular, or mentioned flows. As it is clear that seed/initial noise has extremely strong impact on result. And when you do regular hi res, the noise introduced is unrelated to anything done previously, so it screws with whatever "preimage" original noise had. While you have just one initial noise, and operate on it across multiple different scale via averaging, it should have much less of "clash preimages" resulting in weird artifacts. Of course it would need tuning and figuring out good dropofs for factors and what not. If somehow sampler code could be a bit less cryptic I could probably attempt it myself :). Well might be interested enough to try to figure it out as is, though seemed very complex. |
Beta Was this translation helpful? Give feedback.
-
Hm, yeah scale invariant can end up being tricky, as it might end up having weird ass unintended consequences. Unsampler, I only use it with same promp, just change smapler names/cfg and steps, and generally it seems results are quite robust. I think I actually might sorta create workflow replicating this somehow using your nodes. Maybe... One thing I do not understand, it seems that "image" and "noise" are stored as separate things in sampler? If so then possibly this complicates things. Though still with slerp and inject noise, I could probably have a got at cobbling something, by doing single steps with ksampler (adv) and then fiddling with inject/slerp/upscale or what not. Though when I tried slerping for highres stuff, I think it mostly went quite not well. Thanks for taking time to answer to my weird ass ideas! |
Beta Was this translation helpful? Give feedback.
-
So in case anyone wants some updated. I think I have reworked the original implementation into 2x res sampler, that works decently, and although I'm no longer as certain that it is significantly better than regular high res, I'll probably create custom node repo sometime this week. Some things are not explored with it yet, i.e. using it instead of regular sampler in high res fix workflow, tried just a little bit, and it probably can be made to work with certain params. Not yet sure, and overall params need to be tweaked based on prompt/model, which I guess is not our of ordinary/to be expected. |
Beta Was this translation helpful? Give feedback.
-
Hey guys, so I finally published repo with what I was experimenting on: https://github.com/morphles/xSamplers showcase is kinda sucky (since I'm kinda tired from testing all kinds of stuf, but hopefully it will be interesting enough at least to someone), and I'll probably revisit mixing curves a bit. Would be nice if you could check if it works at all for you. And maybe help with progress bar, as it currently does not work. I intend to post it on reddit too later today. |
Beta Was this translation helpful? Give feedback.
-
Well seems interest here waned :). Frankly a bit for me too, though I see that there likely are decent possibilities with this method, and even bilinear scaling works, and works well with different curve params. So there are more avenues to explore, but if no one shows interest even after reddit post, I'll likely not pursue too much further. |
Beta Was this translation helpful? Give feedback.
-
This is probably way way out, and maybe just because I don't know enough, but still... (I also tried looking at code, but python is not my first lang, and have, particular problems with imperative code, which is also highly complex and unfamiliar, so did not get far for now [if ever]). So I need someone smarter/more knowledgeable to at least comment if I'm talking complete bs or could this be feasible somehow.
So as far as I understand how SD sort works, it gets noise, tries to predict "counter noise", that would move noise towards picture requested in prompt, does not get it on first, step, so applies what it guessed, and tries again. No there is that scheduling business, which as far as I was able to find, is basically how much noise it tries to eliminate at certain step in process. "whatever". I want high res pictures, and SD1.5, the one with all shitton of community models, well it's not great at it, and usual hires fix flow - I find it underwhelming (and spent probably way too much time in past couple weeks trying to build something better in comfyui, not really succeeding much, but maybe in some cases getting somewhat better pictures than usual). In any case, thinking about hi res fix, how samplers might work, and noise, and playing with BlenderNeko's unsampler etc. Got me thinking, couldn't sampler be written so that it "predicts across multiple scales"? (now I know latent is not strictly pixels, but this will be best way to explain). So suppose we have 8x8 noise pixel, and we need to predict how to move them from being noise towards being picture. Now it predicts (I'm somewhat guessing) "counter noise" for each pixel, all good, we have 8x8 counter noise. But now, we average our noise in 2x2 cells, and get 4x4 noise grid, and do same prediction on it. Now we need to combined this averaged 4x4 noise with full scale 8x8 noise, we can just average it (so for each 1 pixel of 8x8 predicted noise average it's val with one from 4x4 predicted noise, obviously using same 4x4 nose pixel value for 4 8x8 pixels). Now obviously we can't just do this always, as in the end we want fine details. So that would mean, that we do not do just straight averaging, and instead at starting steps - coarse/averaged predicted noise should weigh more, and as we approach end (or maybe even middle/just few steps), it should mostly use fine/usual noise.
Not the cleanest explanation, but hopefully it can be understood. My idea is that, one of the biggest problems with high res, is detachment of different areas, but if we can somehow "introduce correlations" via this averaged noise/prediction, it should work "like high res fix" just in one go, well actually I would think it should work much much better. Also it does not need to be just 1 level of averaging (which would just let to ~2x usual rendering size [well per dimension]). Of course this will introduce significant slow down, but again, might be worth it, if it works.
So can someone who knows technicalities more than me, explain if this is complete nonsense and can't work for some reason, or is there a chance it could somehow work?
Beta Was this translation helpful? Give feedback.
All reactions