Schedule Generation Performance Optimization #357
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Small, but notable, performance increase to the _tryholidays() function.
In testing I found that generating market calendars and date ranges comprised a significant portion of what my code was spending time doing. While the majority of the time spent is in panda's date_range function I did find that the _tryholidays function was abnormally slow.
It turns out that some calendars, such as NYSE, use Holiday Calendars for Special opens/closes that behave more like Ad-hoc Days as than Holidays. This is a format that Exchange Calendars follows as well. Currently when making a schedule, these dates are retrieved through Pandas' Holiday Calendar System which is designed around reoccurring holidays making the date retrieval far slower than it needs to be.
I added a check that manually retrieves all the single occurrence holiday dates if it can, and defers to Pandas' Holiday Calendar system if it needs to. This is a pretty niche optimization since it is most noticeable when generating a lot of schedules, but since I found it I figured I'd share it.
Before the Optimization:
After the Optimization:
Code Executed during profiling:
import pandas_market_calendars as mcal
from timeit import timeit
NYSE = mcal.get_calendar("NYSE")
special_dates = lambda: NYSE.special_dates("market_open", "2025-01-01", "2025-02-01")
print(f"{timeit(special_dates, number=6000)/6000 = }")