Skip to content

Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

Notifications You must be signed in to change notification settings

hplt-project/warc2text-runner

Repository files navigation

About

Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

Resources

Stars

Watchers

Forks

Packages

No packages published