You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.
I saw the request is replaced with dont_filter=True, if I remove that the spider will just stop when it gets to the same url.
I need to use the offsite middleware though, so any thoughts?
I will do some hacking, on a total rewrite where there is no need for the Spider middleware and only DownloaderMiddleware or a normal Downloader. Starting to understand this stuff a little hehe.
The text was updated successfully, but these errors were encountered:
If I remember correctly, dont_filter=True comes from an earlier experiment where requests were not queued up in the spider middleware. They would be rescheduled in the scrapy queue and then dropped by the offsite middleware. I'm not sure why it'd still be needed though. Do you have an idea where the spider stops exactly?
Another reason for needing WebdriverSpiderMiddleware is that we need to keep track of when a spider parse method finishes working with the webdriver instance it got assigned, as until the parsing is finished, the webdriver instance should not be changed by any other spider activity. We could have the spider parse method explicitly release the webdriver instance, but that looks error-prone and in general not very clean to me. My concern here is ease of use, by making WebdriverRequest as much as a drop-in replacement for the stock Request as possible.
The spider middleware layer ended up being the best place to do the accounting and the future multiple instance management.
Yea i noticed the same thing. No idea yet why.. been looking at the related code without much success yet.
I see yea we need the webdriver if people still want to use, couldn't we just pass a deep copy? Guess not because it would be interacting with the same remote webdriver.
You're right I think for using the webdriver in the spider, the Spider middleware seems like a nice solution. I am mostly using this for rendering the page with javascript, so didn't get to that part yet.
I saw the request is replaced with dont_filter=True, if I remove that the spider will just stop when it gets to the same url.
I need to use the offsite middleware though, so any thoughts?
I will do some hacking, on a total rewrite where there is no need for the Spider middleware and only DownloaderMiddleware or a normal Downloader. Starting to understand this stuff a little hehe.
The text was updated successfully, but these errors were encountered: