This is a list of some wonderful open-source projects & applications integrated with Hugging Face libraries.
First-party cool stuff made with β€οΈ by π€ Hugging Face.
- transformers - State-of-the-art natural language processing for Jax, PyTorch and TensorFlow.
- datasets - The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools.
- tokenizers - Fast state-of-the-Art tokenizers optimized for research and production.
- knockknock - Get notified when your training ends with only two additional lines of code.
- accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
- autonlp - Train state-of-the-art natural language processing models and deploy them in a scalable environment automatically.
- nn_pruning - Prune a model while finetuning or training.
- huggingface_hub - Client library to download and publish models and other files on the huggingface.co hub.
- tune - A benchmark for comparing Transformer-based models.
Learn how to use Hugging Face toolkits, step-by-step.
- Official Course (from Hugging Face) - The official course series provided by π€ Hugging Face.
- transformers-tutorials (by @nielsrogge) - Tutorials for applying multiple models on real-world datasets.
NLP toolkits built upon Transformers. Swiss Army!
- AllenNLP (from AI2) - An open-source NLP research library.
- Graph4NLP - Enabling easy use of Graph Neural Networks for NLP.
- Lightning Transformers - Transformers with PyTorch Lightning interface.
- Adapter Transformers - Extension to the Transformers library, integrating adapters into state-of-the-art language models.
- Obsei - A low-code AI workflow automation tool and performs various NLP tasks in the workflow pipeline.
- Trapper (from OBSS) - State-of-the-art NLP through transformer models in a modular design and consistent APIs.
- Flair - A very simple framework for state-of-the-art NLP.
Converting a sentence to a vector.
- Sentence Transformers (from UKPLab) - Widely used encoders computing dense vector representations for sentences, paragraphs, and images.
- WhiteningBERT (from Microsoft) - An easy unsupervised sentence embedding approach with whitening.
- SimCSE (from Princeton) - State-of-the-art sentence embedding with contrastive learning.
- DensePhrases (from Princeton) - Learning dense representations of phrases at scale.
Highly optimized inference engines implementing Transformers-compatible APIs.
- TurboTransformers (from Tencent) - An inference engine for transformers with fast C++ API.
- FasterTransformer (from Nvidia) - A script and recipe to run the highly optimized transformer-based encoder and decoder component on NVIDIA GPUs.
- lightseq (from ByteDance) - A high performance inference library for sequence processing and generation implemented in CUDA.
- FastSeq (from Microsoft) - Efficient implementation of popular sequence models (e.g., Bart, ProphetNet) for text generation, summarization, translation tasks etc.
Parallelization models across multiple GPUs.
- Parallelformers (from TUNiB) - A library for model parallel deployment.
- OSLO (from TUNiB) - A library that supports various features to help you train large-scale models.
- Deepspeed (from Microsoft) - Deepspeed-ZeRO - scales any model size with zero to no changes to the model. Integrated with HF Trainer.
- fairscale (from Facebook) - Implements ZeRO protocol as well. Integrated with HF Trainer.
- ColossalAI (from Hpcaitech) - A Unified Deep Learning System for Large-Scale Parallel Training (1D, 2D, 2.5D, 3D and sequence parallelism, and ZeRO protocol).
Compressing or accelerate models for improved inference speed.
- torchdistill - PyTorch-based modular, configuration-driven framework for knowledge distillation.
- TextBrewer (from HFL) - State-of-the-art distillation methods to compress language models.
- BERT-of-Theseus (from Microsoft) - Compressing BERT by progressively replacing the components of the original BERT.
Conducting adversarial attack to test model robustness.
- TextAttack (from UVa) - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
- TextFlint (from Fudan) - A unified multilingual robustness evaluation toolkit for NLP.
- OpenAttack (from THU) - An open-source textual adversarial attack toolkit.
Transfer the style of text! Now you know why it's called transformer?
- Styleformer - A neural language style transfer framework to transfer text smoothly between styles.
- ConSERT - A contrastive framework for self-supervised sentence representation transfer.
Analyzing the sentiment and emotions of human beings.
- conv-emotion - Implementation of different architectures for emotion recognition in conversations.
You made a typo! Let me correct it.
- Gramformer - A framework for detecting, highlighting and correcting grammatical errors on natural language text.
Translating between different languages.
- dl-translate - A deep learning-based translation library based on HF Transformers.
- EasyNMT (from UKPLab) - Easy-to-use, state-of-the-art translation library and Docker images based on HF Transformers.
Learning knowledge, mining entities, connecting the world.
- PURE (from Princeton) - Entity and relation extraction from text.
Speech processing powered by HF libraries. Need for speech!
- s3prl - A self-supervised speech pre-training and representation learning toolkit.
- speechbrain - A PyTorch-based speech toolkit.
Understanding the world from different modalities.
- ViLT (from Kakao) - A vision-and-language transformer Without convolution or region supervision.
Combining RL magic with NLP!
- trl - Fine-tune transformers using Proximal Policy Optimization (PPO) to align with human preferences.
Searching for answers? Transformers to the rescue!
- Haystack (from deepset) - End-to-end framework for developing and deploying question-answering systems in the wild.
I think this is just right for you!
- Transformers4Rec (from Nvidia) - A flexible and efficient library powered by Transformers for sequential and session-based recommendations.
Evaluating model outputs and data quality powered by HF datasets!
- Jury (from OBSS) - Easy to use tool for evaluating NLP model outputs, spesifically for NLG (Natural Language Generation), offering various automated text-to-text metrics.
- Spotlight - Interactively explore your HF dataset with one line of code. Use model results (e.g. embeddings, predictions) to understand critical data segments and model failure modes.
Search, but with the power of neural networks!
- Jina Integration - Jina integration of Hugging Face Accelerated API.
- Weaviate Integration (text2vec) (QA) - Weaviate integration of Hugging Face Transformers.
- ColBERT (from Stanford) - A fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Cloud makes your life easy!
- Amazon SageMaker - Making it easier than ever to train Hugging Face Transformer models in Amazon SageMaker.
The infrastructure enabling the magic to happen.