Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account network delays during sleep #12

Open
chfoo opened this issue Oct 9, 2014 · 0 comments
Open

Account network delays during sleep #12

chfoo opened this issue Oct 9, 2014 · 0 comments

Comments

@chfoo
Copy link
Member

chfoo commented Oct 9, 2014

in tinyback it was a bit more precise and optimized (in my opinion):
there was a rate limit tuple, defined here: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/services.py#L48
implementation was there: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/__init__.py#L132
the thing is, if I take is.gd for example, you can scrape 60 url in 1 minute, so with terroroftinytown-client-grab, the delay will be implemented as 1s
now, think on a 1 day timeframe, with tinyback you could scrape 86,400 urls / day
with terroroftinytown-client-grab, you will call sleep(1) 86,400 times, but if you take into account the RTT for each url request, maybe you only scrape 80/85k url

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant