-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage with incremental #1710
Comments
I've also changed these settings. But the benchmark stays around the same.
|
@Deninc are you able to post one item from your json file? disabling deduplication should make runs faster and decrease the memory usage. is it possible that "event_date" is not very granular? ie you have millions of records with the same date btw. do batching for better performance: https://dlthub.com/docs/reference/performance#yield-pages-instead-of-rows |
Hi @rudolfix yes basically for this dataset all
Here it actually increase the memory usage significantly, I'm not sure why? |
@rudolfix I can confirm using
|
Updated, the above benchmark was wrong. I used The correct benchmark is here.
|
@Deninc I think we'll disable boundary deduplication by default in next major release |
dlt version
0.5.3
Describe the problem
I've found that the extraction phase is hogging the memory if I enabled the
dlt.sources.incremental
andprimary_key=()
.Expected behavior
I'm not sure if this is a bug. Is there a way I can limit the memory usage?
Steps to reproduce
My test is with a
test.jsonl
file of 2.76 million rows, around 3.66GB in size.The first case the memory usage is low (179.00 MB), but it takes forever to run (rate: 33.07/s).
After that I add
primary_key=()
to disable deduplication. It runs much faster (rate: 20345.09/s), but now the memory usage is too high (12208.89 MB).Operating system
macOS
Runtime environment
Local
Python version
3.11
dlt data source
No response
dlt destination
No response
Other deployment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: