docs: document that `parallelized=True` resources with `add_limit(x)` usually yield `x-1` #2142

joscha · 2024-12-12T10:56:10Z

Most of the time (80%ish or so) parallelized resources that are limited yield one item less.

…han the limit

netlify · 2024-12-12T10:56:29Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`b458594`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/675ac14e103de0000861f505
😎 Deploy Preview	https://deploy-preview-2142--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

joscha · 2024-12-12T14:21:02Z

tests/pipeline/test_resources_evaluation.py

+
+    limit = 5
+    result = list(sync_resource1().add_limit(limit))
+    allowed_result_range = range(limit - int(parallelized), limit + 1)


this is not ideal.
Ideal would be to make the parallel yields exact. I tried this for a few hours to no avail. Next best would be to run this test x times when parallelized=True and then ensure that the threshold of yields lies below 4.5.

sh-rp · 2024-12-13T10:13:24Z

@joscha thanks for this PR. If you look in the PR list, you can see that I have changed the add_limit implementation. You could check out that branch and see wether you are still seeing those problem there or maybe even add your test there. I think it might actually be resolved.

joscha · 2024-12-13T10:20:34Z

Will give your branch a try!

joscha · 2024-12-13T14:11:00Z

@joscha thanks for this PR. If you look in the PR list, you can see that I have changed the add_limit implementation. You could check out that branch and see wether you are still seeing those problem there or maybe even add your test there. I think it might actually be resolved.

yes, current code of #2131 (3738c29) reliably produces exacltly limit results in both parallelized and non-parallelized resources.

sh-rp · 2024-12-13T14:30:25Z

@joscha amazing, your helping out on this is very much appreciated :)

joscha · 2024-12-13T15:12:04Z

@joscha amazing, your helping out on this is very much appreciated :)

I am glad it now produces exactly limit results to be honest. I was quite stumped for a bit, thought the REST api I was using had an issue. I might make sense to merge the test in this pull request after merging #2131, so we pick up regressions. As far as I can tell this is currently not asserted anywhere. With your changes in 3738c29 the code in this PR becomes:

@pytest.mark.parametrize("parallelized", [False, True])
def test_limit_sync_resource(parallelized: bool) -> None:
    @dlt.resource(parallelized=parallelized)
    def sync_resource1():
        for i in range(1, 10):
            yield i

    limit = 5
    result = list(sync_resource1().add_limit(limit))
    assert len(result) == limit

which is precise enough to keep.

docs: document that parallelized resources usually produce one less t…

b458594

…han the limit

joscha mentioned this pull request Dec 12, 2024

convert add_limit to pipe step based limiting #2131

Open

joscha commented Dec 12, 2024

View reviewed changes

joscha changed the title ~~docs: document that parallelized resources usually produce one less than the limit~~ docs: document that parallelized=True resources with add_limit(x) usually yield x-1 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document that `parallelized=True` resources with `add_limit(x)` usually yield `x-1` #2142

docs: document that `parallelized=True` resources with `add_limit(x)` usually yield `x-1` #2142

joscha commented Dec 12, 2024

netlify bot commented Dec 12, 2024 •

edited

Loading

joscha Dec 12, 2024 •

edited

Loading

sh-rp commented Dec 13, 2024

joscha commented Dec 13, 2024

joscha commented Dec 13, 2024

sh-rp commented Dec 13, 2024 •

edited

Loading

joscha commented Dec 13, 2024

docs: document that parallelized=True resources with add_limit(x) usually yield x-1 #2142

Are you sure you want to change the base?

docs: document that parallelized=True resources with add_limit(x) usually yield x-1 #2142

Conversation

joscha commented Dec 12, 2024

netlify bot commented Dec 12, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs ready!

joscha Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

sh-rp commented Dec 13, 2024

joscha commented Dec 13, 2024

joscha commented Dec 13, 2024

sh-rp commented Dec 13, 2024 • edited Loading

joscha commented Dec 13, 2024

docs: document that `parallelized=True` resources with `add_limit(x)` usually yield `x-1` #2142

docs: document that `parallelized=True` resources with `add_limit(x)` usually yield `x-1` #2142

netlify bot commented Dec 12, 2024 •

edited

Loading

joscha Dec 12, 2024 •

edited

Loading

sh-rp commented Dec 13, 2024 •

edited

Loading