-
Notifications
You must be signed in to change notification settings - Fork 63
Stuck on Downloading for a long time #3
Comments
Requests are running concurrently in scrapy in the sense that they won't block the main twisted event loop. Stock scrapy requests will therefore go through concurrently even if an unfinished webdriver request is downloading something. However, because all webdriver requests are attached to a specific webdriver instance (which itself needs to enforce sequential access for obvious reasons), and I haven't got around to implementing multiple webdriver instances support yet, in practice only one webdriver request may be performed at a time. |
Ah I see, so we basically want it as a new feature multiple webdriver Thanks again for your detailed reply. Helps me a lot! On Tue, May 14, 2013 at 9:58 PM, Nicolas Cadou [email protected]:
|
You got that exactly right, support for multiple webdriver instances would be a new feature for scrapy-webdriver. And no worries about being stupid, you have no idea how much head-banging my desk had to suffer when I was trying to make sense of twisted and scrapy. :) As for the obvious reasons, a webdriver instance is basically like a browser with just one tab. So trying to download two things at the same time would not work at all. And then, the state of that browser and its currently loaded page need to be left untouched until the parser method in the scrapy spider has finished working with it. |
Ok I may give this feature a try if you dont mind. Gives me a reason to learn more about Twisted, Scrapy and Selenium. May take some time though, not sure if I will finish at all even, got many other stuff going on also. I'm amazed so few are using this btw. |
I would certainly not mind contributions. As for the low usage, this project is still very young, so I'm not surprised. |
Fixes brandicted#3 stuck on downloading for a long time
@ncadou Do you think it would be feasible to allow for parallel scrapy-webdriver requests using multiple tabs or windows in a single webdriver instance instead of extending to multiple webdriver instances (to avoid overhead)? |
There are ways with webdriver to create tabs and windows, and switch between them, so it should be possible to implement that support in scrapy-webdriver. |
@ncadou Could you add a feature to use multiply webdrivers using one of the following settings 'CONCURRENT_REQUESTS' How long to wait this feature? |
@IIIypuk09 multiple webdriver instances are planned down the line, and your suggestion about using settings makes total sense, but unfortunately I don't know when I'll have the opportunity to implement that feature. |
I'm currently seeing that its stuck on downloading for a long time, could it be that the request timed out so it won't continue? Are requests currently not concurrent because of the queues? It only takes one out of the queue one by one?
Feature description:
Add ability to spawn multiple webdrivers so we can scrapy requests concurrently.
For this we need an extra option, max_number of webdriver as it shouldn't grow indefinetly.
The reason that it got stuck on downloading is probably because PhantomJS crashed:
So we maybe also need a way to check if PhantomJS is still responding and if not we should automatically restart the webdriver/phantomjs.
The text was updated successfully, but these errors were encountered: