HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
Pinned Loading
Repositories
Showing 10 of 21 repositories
- cc-download Public
hplt-project/cc-download’s past year of commit activity - release2_inspection Public
hplt-project/release2_inspection’s past year of commit activity - bitextor-mt-models Public
hplt-project/bitextor-mt-models’s past year of commit activity - OpusCleaner Public
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
hplt-project/OpusCleaner’s past year of commit activity - warc2text-runner Public
Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.
hplt-project/warc2text-runner’s past year of commit activity - monolingual-multilingual-instruction-tuning Public
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
hplt-project/monolingual-multilingual-instruction-tuning’s past year of commit activity