You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Any import made to a component is causing unnecessary packages to be loaded and increasing load times significantly
The design of the __init__.py files in the component directories are not optimized because importing a single component will bring all the others in even when using the full path to file. The users of this framework have no control over this
Here's a simple example.
from haystack.components.routers.conditional_router import ConditionalRouter
The above line of code will actually still import all the other components defined in the router __init__.py which includes TransformersZeroShotTextRouter and that loads in ML libraries transformers and sklearn. This is because of how python works where parent packages will be initialized completely when a child submodule is referenced
This is problematic because load times are much higher due to components & packages that are not being used. In a complete RAG pipeline, I have seen load times spike up all the way 5-7 seconds with several ML libraries such as torch being loaded from the /utils/init.py which on its own takes a few seconds
Expected behavior
The expected behavior is that the imports are lazily loaded to only when they are accessed and init.py should not automatically load everything. This will significantly improve load times
I have put together a pull request with suggested changes for how to lazily import efficiently while still maintain type checking for IDEs. Please prioritize reviewing this as soon as possible as the performance is an issue for our users
Describe the bug
Any import made to a component is causing unnecessary packages to be loaded and increasing load times significantly
The design of the
__init__.py
files in the component directories are not optimized because importing a single component will bring all the others in even when using the full path to file. The users of this framework have no control over thisHere's a simple example.
from haystack.components.routers.conditional_router import ConditionalRouter
The above line of code will actually still import all the other components defined in the router
__init__.py
which includesTransformersZeroShotTextRouter
and that loads in ML librariestransformers
andsklearn
. This is because of how python works where parent packages will be initialized completely when a child submodule is referencedThis is problematic because load times are much higher due to components & packages that are not being used. In a complete RAG pipeline, I have seen load times spike up all the way 5-7 seconds with several ML libraries such as
torch
being loaded from the/utils/init.py
which on its own takes a few secondsExpected behavior
The expected behavior is that the imports are lazily loaded to only when they are accessed and init.py should not automatically load everything. This will significantly improve load times
I have put together a pull request with suggested changes for how to lazily import efficiently while still maintain type checking for IDEs. Please prioritize reviewing this as soon as possible as the performance is an issue for our users
Other related issues/PRS
#8650
#8706
#8655
FAQ Check
The text was updated successfully, but these errors were encountered: