Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with scanning models on page load #60

Open
sansmoraxz opened this issue Nov 23, 2024 · 14 comments
Open

Performance issues with scanning models on page load #60

sansmoraxz opened this issue Nov 23, 2024 · 14 comments

Comments

@sansmoraxz
Copy link
Contributor

The scanner is a huge bottleneck, especially when there's a large number of models. I have to wait around 15 seconds for the page to be responsive. This needs refactor.

@sansmoraxz
Copy link
Contributor Author

This is ugly, maybe globs would be better.

files = search_files(base_dirname)
files = folder_paths.filter_files_content_types(files, ["image"])

@Ainaemaet
Copy link

Ainaemaet commented Nov 23, 2024

I agree, it's too slow.

@hayden-fr Personally I think it would be better not to scan for changes at all during startup, and rely on the user to manually decide when they want to do that.

As well, I believe having the ability to choose where to scan is a must. As it is I started scanning, left it running for 2 hours (I have many hundreds of models), and when I had thought it would be finished I seen it was scanning through all my custom_node folders getting hung up on .json files that have nothing to do with model info.... I think it's great to have the option to scan there, but for those of us who know where our models are it just hangs things up.

I believe you will find a treasure trove of help looking up the 'Civitai Browser +' (https://github.com/BlafKing/sd-civitai-browser-plus) extension, as it is performative on both downloading/updating and displaying, and very feature rich.

This model manager is going to be quite important for Comfy I believe - definitely one of the more important features to get nailed down right! :)

@hayden-fr
Copy link
Owner

As well, I believe having the ability to choose where to scan is a must.

Good idea.

@hayden-fr
Copy link
Owner

At first I used the asynchronous scanning method, but during the test I found that its prompts were not friendly enough and it was difficult for users to find that it was scanning silently in the background, so I changed it to the synchronous method.

The model scanning function is not a commonly used function, I think it is acceptable to wait for a long time once in a while. What is needed now is to speed up the scanning process.

@hayden-fr
Copy link
Owner

The current slow progress of scanning is mainly spent on calculating the hash value of the model.

@clayne
Copy link

clayne commented Nov 24, 2024

You really need to use a cache that's mtime based and it should be based on when one is actively using the model manager interface and not just at node startup time. It took me forever to figure out what recently updated node was hanging up my comfy startup and it ended up being this one (after the compatibility fix for preview changes was merged in).

I do appreciate that the other change was made, however, as I too also use civitai-helper in a1111 to get model info, but I think the approach to how this node handles large amounts of models and/or loras needs to be rethought in general. It certainly shouldn't be doing something potentially heavyweight or time consuming in a silent manner.

@sansmoraxz
Copy link
Contributor Author

I believe the main bottleneck is the repeated syscalls. For the images I have made a quick and dirty bandaid of maintaining a global cache. Problems may arise somewhere else but for now it's doing the job well enough.

@sansmoraxz
Copy link
Contributor Author

sansmoraxz commented Nov 24, 2024

Here's the my workaround if anyone is interested.

diff --git a/py/download.py b/py/download.py
index 84f61d2..cd8b7cc 100644
--- a/py/download.py
+++ b/py/download.py
@@ -130,7 +130,7 @@ def get_task_status(task_id: str):
         )
 
         download_model_task_status[task_id] = task_status
-
+    utils.clear_dir_images_cache()
     return task_status
 
 
diff --git a/py/services.py b/py/services.py
index 184f7c5..1682620 100644
--- a/py/services.py
+++ b/py/services.py
@@ -51,7 +51,7 @@ def scan_models():
                 }
 
                 result.append(model_info)
-
+    utils.clear_dir_images_cache()
     return result
 
 
@@ -107,6 +107,7 @@ def remove_model(model_path: str):
     model_descriptions = utils.get_model_all_descriptions(model_path)
     for description in model_descriptions:
         os.remove(utils.join_path(model_dirname, description))
+    utils.clear_dir_images_cache()
 
 
 async def create_model_download_task(task_data, request):
@@ -206,7 +207,7 @@ async def download_model_info(scan_mode: str):
                     utils.print_error(
                         f"Failed to download model info for {abs_model_path}: {e}"
                     )
-
+    utils.clear_dir_images_cache()
     utils.print_debug("Completed scan model information.")
 
 
@@ -245,6 +246,7 @@ async def migrate_legacy_information():
                     utils.print_info(f"Migrate preview image from {fullname}")
                     with Image.open(preview_path) as image:
                         image.save(new_preview_path, format="WEBP")
+                    os.remove(preview_path)
 
                 description_path = f"{base_file_name}.md"
 
@@ -262,14 +264,18 @@ async def migrate_legacy_information():
 
                 json_info_path = f"{base_file_name}.json"
                 if os.path.isfile(json_info_path):
-                    with open(json_info_path, "r", encoding="utf-8") as f:
-                        version = json.load(f)
-                        metadata_info.update(
-                            {
-                                "baseModel": version.get("baseModel"),
-                                "preview": [i["url"] for i in version["images"]],
-                            }
-                        )
+                    try:
+                        with open(json_info_path, "r", encoding="utf-8") as f:
+                            version = json.load(f)
+                            if "images" in version and "baseModel" in version:
+                                metadata_info.update(
+                                    {
+                                        "baseModel": version.get("baseModel"),
+                                        "preview": [i["url"] for i in version["images"]],
+                                    }
+                                )
+                    except Exception as e:
+                        utils.print_error(f"Failed to load {json_info_path}: {e}")
 
                 description_parts: list[str] = [
                     "---",
@@ -289,6 +295,8 @@ async def migrate_legacy_information():
                     utils.print_info(f"Migrate description from {fullname}")
                     with open(description_path, "w", encoding="utf-8", newline="") as f:
                         f.write("\n".join(description_parts))
-
+                        # delete old txt file
+                        os.remove(text_info_path)
+    utils.clear_dir_images_cache()
 
     utils.print_debug("Completed migrate model information.")
diff --git a/py/utils.py b/py/utils.py
index d9997f0..072bccf 100644
--- a/py/utils.py
+++ b/py/utils.py
@@ -35,6 +35,11 @@ def _matches(predicate: dict):
 
     return _filter
 
+dir_images_cache = {}
+
+def clear_dir_images_cache():
+    global dir_images_cache
+    dir_images_cache.clear()
 
 def filter_with(list: list, predicate):
     if isinstance(predicate, dict):
@@ -204,8 +209,13 @@ def get_model_metadata(filename: str):
 
 def get_model_all_images(model_path: str):
     base_dirname = os.path.dirname(model_path)
-    files = search_files(base_dirname)
-    files = folder_paths.filter_files_content_types(files, ["image"])
+    global dir_images_cache
+    if base_dirname in dir_images_cache:
+        files = dir_images_cache[base_dirname]
+    else:
+        files = search_files(base_dirname)
+        files = folder_paths.filter_files_content_types(files, ["image"])
+        dir_images_cache[base_dirname] = files
 
     basename = os.path.splitext(os.path.basename(model_path))[0]
     output: list[str] = []
@@ -219,6 +229,7 @@ def get_model_all_images(model_path: str):
 
 
 def get_model_preview_name(model_path: str):
+    basepath = os.path.dirname(model_path)
     images = get_model_all_images(model_path)
     basename = os.path.splitext(os.path.basename(model_path))[0]
 
@@ -226,9 +237,9 @@ def get_model_preview_name(model_path: str):
         image_name = os.path.splitext(image)[0]
         image_ext = os.path.splitext(image)[1]
         if image_name == basename and image_ext.lower() == ".webp":
-            return image
+            return join_path(basepath, image)
 
-    return images[0] if len(images) > 0 else "no-preview.png"
+    return join_path(basepath, images[0]) if len(images) > 0 else "no-preview.png"
 
 
 from PIL import Image

Just apply this patch file.

@clayne
Copy link

clayne commented Nov 24, 2024

delete old txt file

Is the txt file originated by this node? If not, it shouldn't delete it. What originally create(d) the file in text_info_path?

@sansmoraxz
Copy link
Contributor Author

Ideally should be up to the user, maybe some flag clean_old_artifacts.

The txt file AFIK was maintained by the model, can't say if any other tool uses it. The diff is originally just to make my usecase working. Someone else may clean this up and make a proper PR. The entire thing is just a workaround.

@darebgd
Copy link

darebgd commented Dec 6, 2024

Right now, the performance is killer for large model bases.
I saw you said that you tried asynchronous scanning, but what about 'lazy load' when you load a page at a time until it is scrolled out of focus? This might be a dirtier approach than the band-aid we have, but in this form, the node is nearly unusable.

@darebgd
Copy link

darebgd commented Dec 6, 2024

Here's the my workaround if anyone is interested.

diff --git a/py/download.py b/py/download.py...
...

BTW, I have applied the workaround and it seems it works, at least loading is blazing fast - can't tell if all features work as expected because I haven't tested it thoroughly.

The status of the original software right now is - I do not have to uninstall it since the loading is moved when the manager is open, but I can't use it because it takes hours to load my huge database after each startup. I am ok waiting on it once, but not every time, so - I'll have to live with the patch.

Warning: Offtopic!

One more thing: I keep reading how devs are busy and can't really dedicate time... When will you, kids, learn about commitment? You published something and gave up in the middle of the development? (not you, but I heard you complain before) That's all software nowadays, companies and individuals sell all kinds of unfinished crap under the mask of CI/CD when all they sell is actually just - vaporware.
Not saying you are selling, but it seems you're building your soft skills to really thrive in a SW company.
Sorry for the rant.

@hayden-fr
Copy link
Owner

Update to version 2.1.4, now a new obtaining method is used, which should be improved.

Looking forward to usage feedback.

@darebgd
Copy link

darebgd commented Dec 6, 2024

Damn... I sounded like a spoiled customer just now... Sorry, job-related trauma...

All I wanted to say is... don't leave the project unfinished.
It's extremely useful, I mean ComfyUI Manager-level useful. Nothing comes even close to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants