Skip to content

Commit

Permalink
Fixed #191: --lang to crawler, --zim-lang to warc2zim
Browse files Browse the repository at this point in the history
  • Loading branch information
rgaudin committed Aug 2, 2023
1 parent 1f1ccdf commit a965ad6
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `--title` to set ZIM title
- `--description` to set ZIM description
- New crawler options: `--maxPageLimit`, `--delay`, `--diskUtilization`
- `--zim-lang` param to set warc2zim's `--lang` (ISO-639-3)

### Changed

Expand All @@ -20,6 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Using `main` warc2zim ⚠️ change before releasing!
- Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM (#172)
- `--failOnFailedSeed` used inconditionally
- `--lang` now passed to crawler (ISO-639-1)

### Removed

Expand Down
17 changes: 17 additions & 0 deletions zimit.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,18 @@ def zimit(args=None):
action="store_true",
)

parser.add_argument(
"--lang",
help="if set, sets the language used by the browser, should be ISO 639 language[-country] code",
)

parser.add_argument(
"--zim-lang",
help="Language metadata of ZIM "
"(warc2zim --lang param). ISO-639-3 code. "
"Retrieved from homepage if found, fallback to `eng`",
)

parser.add_argument(
"--mobileDevice",
help="Emulate mobile device by name from "
Expand Down Expand Up @@ -348,6 +360,10 @@ def zimit(args=None):
warc2zim_args.append("--description")
warc2zim_args.append(zimit_args.description)

if zimit_args.zim_lang:
warc2zim_args.append("--lang")
warc2zim_args.append(zimit_args.zim_lang)

print("----------")
print("Testing warc2zim args")
print("Running: warc2zim " + " ".join(warc2zim_args), flush=True)
Expand Down Expand Up @@ -482,6 +498,7 @@ def get_node_cmd_line(args):
"exclude",
"collection",
"allowHashUrls",
"lang",
"mobileDevice",
"userAgent",
"useSitemap",
Expand Down

0 comments on commit a965ad6

Please sign in to comment.