-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-3361] Improve Docs Parsing Performance #9037
Comments
Is there any progress on this issue? Our dbt docs are about 1M and full project parse ( |
@aranke here is the issue mentioned during the dbt meetup today on slow documentation parsing. There is also a closed PR that proposed a fix to this. Hope you will be able to prioritize this 🙏🤩 |
Here is a flame graph of doing a full parse of our dbt project (~2300 models). Our documentation markdown file is just shy of 1MB. As you can see, If we empty out our Markdown docs file and remove all |
Have replicated the changes in #9045 in a new PR for This change reduces |
@fredriv Thank you for keeping us focused on this issue. Because I wanted to make some further tweaks to the implementation, I opened a separate PR and it will soon be merged. I shouted you out in the PR notes. |
We've received a complaint that dbt-core's parsing performance is surprisingly slow for large docs files. On an M1 Mac, files of around 500K can take over a minute to parse, and appears to increase super-linearly. The critically slow step is the call of extract_toplevel_blocks() on the file contents. The extraction of top-level jinja blocks is could likely be made much faster, but this is extremely critical code and we need to preserve existing behavior.
This does not appear to be a regression, but current performance is embarrassingly bad.
To generate a file which reproduces the performance problem, repeat the following snippet a few thousand times in a text file with the .md (markdown) extension, and add it to a dbt project, or call extract_toplevel_blocks() on it directly.
Impact on other teams
None
Needs backport?
Unsure
The text was updated successfully, but these errors were encountered: