You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After PR 596, Afghanistan, Colombia, Costa Rica, Dominican Republic, Malta, Zambia (and others) will filter the dates from the URLs of their base lists.
Should we create a new spider class? They all have a list of available URLs in common, and they can have a method like this:
def parse_dates_to_filter(self, date):
if self.from_date and self.until_date:
return False
if self.date_format == '%Y':
if not (self.from_date.year <= date <= self.until_date.year):
return True
elif self.date_format == '%Y-%m':
return not ((self.from_date.year <= date.year <= self.until_date.year)
and (self.from_date.month <= date.month <= self.until_date.month))
else:
date = datetime.strptime(date, self.date_format)
return not (self.from_date <= date <= self.until_date)
We have the PeriodicSpider class but it doesn't work with those scrapers.
The text was updated successfully, but these errors were encountered:
Yes, the new class sounds good to me as well, although instead of having the suggested method, the class could have the date_format and date_pattern attribute, start_request method with a callback to a build_urls function where the returned URL list is filtered and yielded according to the date_format and date_pattern
jpmckinney
changed the title
New spider class?
New class for spiders that filter lists of URLs by date
Feb 1, 2021
After PR 596, Afghanistan, Colombia, Costa Rica, Dominican Republic, Malta, Zambia (and others) will filter the dates from the URLs of their base lists.
Should we create a new spider class? They all have a list of available URLs in common, and they can have a method like this:
We have the
PeriodicSpider
class but it doesn't work with those scrapers.The text was updated successfully, but these errors were encountered: