Skip to content
@hplt-project

HPLT - High Performance Language Technologies

A space that combines petabytes of natural language data with large-scale model training

Pinned Loading

  1. OpusCleaner OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    Python 49 13

  2. OpusTrainer OpusTrainer Public

    Curriculum training

    Python 16 5

Repositories

Showing 10 of 21 repositories
  • data-analytics-tool Public

    Data Analytics Tool

    hplt-project/data-analytics-tool’s past year of commit activity
    JavaScript 10 1 0 0 Updated Dec 22, 2024
  • cc-download Public
    hplt-project/cc-download’s past year of commit activity
    Shell 0 0 0 0 Updated Dec 22, 2024
  • hplt-project/release2_inspection’s past year of commit activity
    Jupyter Notebook 1 6 1 0 Updated Dec 20, 2024
  • hplt-project/bitextor-mt-models’s past year of commit activity
    Shell 1 0 3 0 Updated Dec 19, 2024
  • OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    hplt-project/OpusCleaner’s past year of commit activity
    Python 49 13 56 (4 issues need help) 1 Updated Dec 17, 2024
  • OpusPocus Public

    Marian machine translation training pipeline for thousands of models

    hplt-project/OpusPocus’s past year of commit activity
    Python 2 0 19 (3 issues need help) 0 Updated Dec 9, 2024
  • warc2text-runner Public

    Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

    hplt-project/warc2text-runner’s past year of commit activity
    HTML 3 0 5 0 Updated Dec 3, 2024
  • bitextor-slurm Public Forked from paracrawl/cirrus-scripts

    Scripts for running bitextor jobs

    hplt-project/bitextor-slurm’s past year of commit activity
    Shell 0 1 1 0 Updated Dec 3, 2024
  • monotextor-slurm Public

    Set of scripts to run monotextor-like pipeline under slurm HPCs

    hplt-project/monotextor-slurm’s past year of commit activity
    Rust 2 GPL-3.0 0 0 0 Updated Nov 4, 2024
  • monolingual-multilingual-instruction-tuning Public

    Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

    hplt-project/monolingual-multilingual-instruction-tuning’s past year of commit activity
    Python 9 0 0 0 Updated Nov 2, 2024

Top languages

Loading…

Most used topics

Loading…