Skip to content

Releases: openzim/zimit

1.5.1

18 Sep 09:13
2be5562
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.11.0
  • Scraper stat file is not created empty (#211)
  • Crawler statistics are not available anymore (#213)
  • Using warc2zim 1.5.4

1.5.0

23 Aug 16:40
12dab25
Compare
Choose a tag to compare

Added

  • --long-description param

1.4.1

23 Aug 12:18
951241d
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.10.4
  • Using warc2zim 1.5.3

1.4.0

02 Aug 14:46
cbaaa77
Compare
Choose a tag to compare

Added

  • --title to set ZIM title
  • --description to set ZIM description
  • New crawler options: --maxPageLimit, --delay, --diskUtilization
  • --zim-lang param to set warc2zim's --lang (ISO-639-3)

Changed

  • Using browsertrix-crawler 0.10.2
  • Default and accepted values for --waitUntil from crawler's update
  • Using warc2zim 1.5.2
  • Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM (#172)
  • --failOnFailedSeed used inconditionally
  • --lang now passed to crawler (ISO-639-1)

Removed

  • --newContext from crawler's update

1.3.1

06 Feb 11:04
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.8.0
  • Using warc2zim version 1.5.1 with wabac.js 2.15.2

1.3.0

02 Feb 16:49
Compare
Choose a tag to compare

Added

  • Initial url check normalizes homepage redirects to standart ports – 80/443 (#137)

Changed

  • Using warc2zim version 1.5.0 with scope conflict fix and videos fix
  • Using browsertrix-crawler 0.8.0-beta.1
  • Fixed --allowHashUrls being a boolean param
  • Increased check_url timeout (12s to connect, 27s to read) instead of 10s

1.2.0

21 Jun 17:14
Compare
Choose a tag to compare

Added

  • --urlFile browsertrix crawler parameter
  • --depth browsertrix crawler parameter
  • --extraHops, parameter
  • --collection browsertrix crawler parameter
  • --allowHashUrls browsertrix crawler parameter
  • --userAgentSuffix browsertrix crawler parameter
  • --behaviors, parameter
  • --behaviorTimeout browsertrix crawler parameter
  • --profile browsertrix crawler parameter
  • --sizeLimit browsertrix crawler parameter
  • --timeLimit browsertrix crawler parameter
  • --healthCheckPort, parameter
  • --overwrite parameter

Changed

  • using browsertrix-crawler 0.6.0 and warc2zim 1.4.2
  • default WARC location after crawl changed
    from collections/capture-*/archive/ to collections/crawl-*/archive/

Removed

  • --scroll browsertrix crawler parameter (see --behaviors)
  • --scope browsertrix crawler parameter (see --scopeType, --include and --exclude)

1.1.5

11 Jun 09:47
Compare
Choose a tag to compare
  • Using crawler 0.3.2 and warc2zim 1.3.6