Releases: algolia/docsearch-scraper
feat(meta): handle comma-separated version
This release enables to use coma-separated token for docsearch:version
meta tag.
The behaviour of the docsearch:version
meta tag will be similar to the meta tag keyword defined from the HTML 5 spec.
The docsearch:version
tag can be a set of comma-separated tokens, each of which is a version relevant to the page. These tokens must be compliant with the SemVer specification or only contain alphanumeric characters (e.g.latest
, next
, etc.). As facet filters, these version tokens are case-insensitive.
For example, all records extracted from a page with the following meta tag:
<meta name="docsearch:version" content="2.0.0-alpha.62,latest">
Will be tagged with the version:
version:["2.0.0-alpha.62" , "latest"]
deps: upgrade Scrapy + Chrome to stable 84
This PR upgrades Scrapy to its latest version (2.2.1). It also removes unnecessary use of CustomContextFactory. It also upgrades the chrome version to its latest stable available v84.
Upgrading Scrapy introduces many benefits such as:
File extensions that LinkExtractor ignores by default now also include 7z, 7zip, apk, bz2, cdr, dmg, ico, iso, tar, tar.gz, webm, and xz
Upgrading Twisted to its lates version. This is required to mitigate with CVE-2020-10109
Better logging system
A new DNS_RESOLVER setting allows enabling IPv6 support
feat: update chrome to 83.0.4103.61
v1.10.0 feat: update chrome to 83.0.4103.61
feat(meta): do not jsonized version meta
v1.9.0 feat(meta): do not jsonized version meta
feat(analytics): define a consistent ObjectID
- define a consistent ObjectID
feat(headless_chrome): use google chrome 78
- use google chrome 78
Before porting to v3
v1.3 before python v3 porting
Update slenium depedencies. Enable usage of chrome 73
v1.0 First steady version correctly released
Merge pull request #417 from algolia/fix/build_base Fix/build base