Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only first 20 linked pages get downloaded #405

Open
tal-amir opened this issue Jan 5, 2025 · 4 comments
Open

Only first 20 linked pages get downloaded #405

tal-amir opened this issue Jan 5, 2025 · 4 comments

Comments

@tal-amir
Copy link

tal-amir commented Jan 5, 2025

Hi there,

Is there a hard-coded limit to the number of downloaded linked pages? (i.e. breadth, not depth)

I'm observing it when trying to capture a webpage, including all depth-1 links.
It seems that WebScrapBook only downloads the first 20 links that match the inclusion criteria. The remaining links keep pointing to their original web address.
I verified this by changing the setting "Included URLs for capturing linked pages". This has led to a different subset of links beging downloaded, but still of size 20.

Is it possible to change that limit to an arbitrary number or disable it altogether?

Thanks!

@danny0838
Copy link
Owner

There should be no such limit. Please provide the webpage URL and the capture config for further investigation (copy it from Capture As > Advanced).

@tal-amir
Copy link
Author

tal-amir commented Jan 5, 2025

Can I send them to you by email? I prefer not to share it publically.

@danny0838
Copy link
Owner

You can, but it would be much more problematic to track the issue. Mask sensitive information instead whenever possible.

@tal-amir
Copy link
Author

tal-amir commented Jan 5, 2025

I don't think this is feasible, but the problem does seem to be specific to that website.
I'll try to find another solution. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants