From 888af1959f66f85c00a4b9dd7f0d672f4e452cf7 Mon Sep 17 00:00:00 2001 From: Cemberk Date: Thu, 14 Nov 2024 00:32:26 -0600 Subject: [PATCH] Automated PR: Downstream develop rebase new changes (#69) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Added mamba.py backend (#30139) * Update README.md * tests: forward ok * backward test done * done testing * removed check. scripts * Update README.md * added use_mambapy arg * fixed typo in warning * protected imports w/ mambapy package * delete pscan.py + raise rather than assert * Update import_utils.py * fix whitespaces and unused import * trailing whitespace + import block unformatted * Update modeling_mamba.py * transpose before pscan * shape comment * ran make style * use_mambapy=False by default Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * ran make fix-copies --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Rename Phi-3 rope scaling type (#31436) * renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format * Revert "Incorrect Whisper long-form decoding timestamps " (#32148) Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit cd48553fc8375e1a28d4d82cfe231dedf6a23af8. * Fix typing to be compatible with later py versions (#32155) * feat(cache): StaticCache uses index_copy_ to avoid useless copy (#31857) * feat(cache): StaticCache uses index_copy_ to avoid useless copy Using index_copy_ allows for explicit in-place change of the tensor. Some backends (XLA) will otherwise copy the tensor, making the code slower and using more memory. Proposed implementation will end up using less memory and on XLA will result in less compilation, but the change is also quite generic, making no change whatsoever on CUDA or CPU backend. * feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy Applying the same change done in StaticCache. * fix(cache): fallback of index_copy_ when not implemented * fix(cache): in index_copy_ ensure tensors are on same device * [run slow] llama * fix(cache): add move of cache_position to same device in SlidingWindowCache * Revert "[run slow] llama" This reverts commit 02608dd14253ccd464e31c108e0cd94364f0e8b9. * Added additional kwarg for successful running of optuna hyperparameter search (#31924) Update integration_utils.py Added additional kwarg * Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py * Updated `ruff` to the latest version (#31926) * Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff * Dev version: v4.44.0.dev0 * Llama 3.1 conversion Co-authored-by: Arthur Zucker * fix (#32162) * fix: Fixed an if condition that is always evaluating to true (#32160) Fixed an if condition always evaluating to true. * [docs] change temperature to a positive value (#32077) fix * adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer (#32171) * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer * style fix with ruff: * fix: default value reflects the runtime environment variables rather than the ones present at import time. (#32153) * fix: default value reflects the runtime environment variables rather than the ones present at import time. * Fix: Change `deterministic` to None by default; use env var if None * Update qwen2.md (#32108) * Update qwen2.md outdated description * Update qwen2.md amended * Update qwen2.md Update * Update qwen2.md fix wrong version code, now good to go * Remove conversational pipeline tests (#32099) Remove conversation pipeline tests * RoPE: relaxed rope validation (#32182) * relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist * let's not warn when someone is running a forward (#32176) * let's not warn when someone is running a foward without cache + self.training * more models * fixup * Fix resize embedding with Deepspeed (#32192) fix resize when deepspeed * Fix float8_e4m3fn in modeling_utils (#32193) * Fix float8_e4m3fn in modeling_utils * style * fix * comment * Support dequantizing GGUF FP16 format (#31783) * support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16 * :rotating_light: No more default chat templates (#31733) * No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again * fix: Replaced deprecated `unittest method` with the correct one (#32198) Replaced deprecated unittest method with the correct one. * [whisper] fix short-form output type (#32178) * [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test * remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 (#32210) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 * Update question_answering.py (#32208) * [BigBird Pegasus] set _supports_param_buffer_assignment to False (#32222) set _supports_param_buffer_assignment to False * [warnings] fix E721 warnings (#32223) fix E721 warnings * Follow up for #31973 (#32025) * fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh * translate philosophy.md to chinese (#32177) * translate philosophy.md to chinese * add the missing link * Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac (#31846) * use currently active microphone on mac for ffmpeg_microphone * Allow ffmpeg_microphone device to be specified Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix code snippet for Grounding DINO (#32229) Fix code snippet for grounding-dino * Generation: stop at `eos` for assisted decoding (#31301) * fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Llava: generate without images (#32183) * llava w/o images * tests * Resize embeds with DeepSpeed (#32214) * fix resize when deepspeed * deepsped uses new embeds * we needed this * don't log base model architecture in wandb if log model is false (#32143) * don't log base model architecture in wandb is log model is false * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * convert log model setting into an enum * fix formatting --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Refactor: Removed un-necessary `object` base class (#32230) * Refactored to remove un-necessary object base class. * small fix. * Adds: extra_repr for RMSNorm layers in most models (#32204) * adds: extra_repr() to RMSNorm layers in multiple models * adds: extra_repr for deprecated models as well * formatting as per style guide * Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 (#31934) * Add check for target_sizes is None in post_process_image_guided_detection * Make sure Owlvit and Owlv2 in sync * Fix incorrect indentation; add check for correct size of target_sizes * [tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039) * add flash attention check * fix * fix * Flash-Attn: fix generation when no attention mask or no pading (#32241) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2 * More flexible trigger condition (#32251) update Co-authored-by: ydshieh * Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244) * replace for loop by tensor ops * rm assert; readability * ๐Ÿšจ Bloom support for cache class (#31445) * bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Upload new model failure report to Hub (#32264) upload Co-authored-by: ydshieh * Optimize t5 tokenize logic to avoid redundant calls (#32270) * Optimize t5 tokenize logic to avoid redundant calls * fix and overwrite copies * fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call (#32262) Removed one wrong argument passed to convert_blip_checkpoint function call. * Repo: remove exceptions in `check_docstrings` (#32259) remove exceptions * make `p_mask` a numpy array before passing to `select_starts_ends` (#32076) * fix * bug fix * refine * fix * fix(docs): Fixed a link in docs (#32274) Fixed a link in docs. * Generate: end-to-end compilation (#30788) * mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test ๐Ÿ˜ˆ * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation * Whisper tokenizer word level timestamps (#32197) * fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test * [pipeline] fix padding for 1-d tensors (#31776) * [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Make static cache compatible with torch.export (#32168) * Add stream messages from agent run for gradio chatbot (#32142) * Add stream_to_gradio method for running agent in gradio demo * use torch 2.4 in 2 CI jobs (#32302) Co-authored-by: ydshieh * Docs: fix GaLore optimizer code example (#32249) Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue. * Fix GGUF dequantize for `gguf==0.9.1` (#32298) * fix gguf dequantize for gguf==0.9.1 * fix old version * make style * Cast epochs_trained to int when resuming training (#32286) * fix epochs_trained as int when resuming training * refactor --------- Co-authored-by: teddyferdinan * feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663) * Fix M4T for ASR pipeline (#32296) * tentative fix * do the same for M4T * Docs: formatting nits (#32247) * doc formatting nits * ignore non-autodocs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Alternative agent plan (#32295) * new agent plan * plan type assertion * style corrections * better prompt naming * make fixup * fix: Added missing raise keyword for few exceptions (#32333) Fixed raising of few exceptions. * fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276) * fixes #32329 : The Torch code is correct - to get an average of 10% oโ€ฆ (#32335) fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step. * Repo checks: skip docstring checks if not in the diff (#32328) * tmp * skip files not in the diff * use git.Repo instead of an external subprocess * add tiny change to confirm that the diff is working on pushed changes * add make quality task * more profesh main commit reference * Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `` and `` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI * LLaVA-NeXT: fix anyres shapes (#32314) fix * Gemma2 and flash-attention (#32188) * enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore * Llama 3.1: Fix incorrect `inv_freq` assignment (#32330) fix ๐Ÿ’ฉ * [Idefics2] - Fix FA2 call for Perceiver layer (#32275) * Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Gemma 2: support assisted generation (#32357) * Fix error when streaming to gradio with non-string tool arguments (#32360) Fix error when streaming agent run to gradio with non-string tool arguments * >3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: Removed unnecessary `@staticmethod` decorator (#32361) * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * fix: warmup_steps check for training_args (#32236) * LLaVa: add cache class attribute (#32278) cache class flag * [enc-dec cache] fix bug in indexing (#32370) * [whisper] compile compatibility with long-form decoding (#31772) * [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace * Remove size check between attn_weights and kv_seq_len for phi3 (#32339) * Remove size check between attn_weights and kv_seq_len * add unit tests * add missing attribute _supports_param_buffer_assignment for gpt-j. (#32359) Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com> * Check device map for saving tokenizer config on TPU (fix for issue #31971) (#32043) * Remove TPU device map for saving tokenizer config * Update tokenization_utils_base.py * Fix error msg when passing non-string device into tokenizer * Fix error message for non-string tokenizer device * Print out tokenizer device type in error msg * Update tokenization_utils_base.py * update clean_up_tokenization_spaces warning (#32371) * Empty list in defaults for LLaMA special tokens during weights conversion (#32342) empty list in defaults * Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233) * Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion * Offloaded KV Cache (#31325) * Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams * Docker: add `speech` dep to the consistency docker image (#32374) * Fixed Hybrid Cache Shape Initialization. (#32163) * fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by: Aaron Haag * Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299) * Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs (#32368) nits * RoPE: Add numerical tests โœจ (#32380) tests! :D * [generate] only require an attention mask for mps with torch<2.4 (#32367) * up * style * stopping * fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157) fix: Exception raised when running . * MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500) * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe) * fix typo [:-1] to [:, -1] * to meet formatting requirement * to meet formatting requirement * remove white space * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue. * propagate to startcoder2, phi3, mixtral and qwen2 * update qwen2_moe * Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer (#32393) Bump keras in /examples/research_projects/decision_transformer Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1. - [Release notes](https://github.com/keras-team/keras/releases) - [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1) --- updated-dependencies: - dependency-name: keras dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: SeamlessM4TFeatureExtractor stride remainder (#32088) * fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction * Phi3 tests: fix typing for Python 3.8 (#32388) fix phi * #32184 save total_vocab_size (#32240) * save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer) * add values for neftune (#32399) I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder. * Fix documentation references to google/bit-50 model (#32407) * Persist embedding type of BART and mBART models after resize (#32242) * fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize * fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413) Fixed tokenizertests for luke, mluke models. * Respect the config's attn_implementation if set (#32383) * Respect the config's attn if set * Update test - can override in from_config * Fix * Fix documentation links and code reference to model llava-next (#32434) * Cache: create docs (#32150) * draft * updates * works? * try adding python example in hidden section * another try * hwo do i render python * format as html code? * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante * Update docs/source/en/kv_cache.md Co-authored-by: Joao Gante * one more small update * should render hidden secrtion now * add outputs * fix links * check links * update all links * update with offloaded cache * all cache is importable, so they appear in docs * fix copies * docstring... --------- Co-authored-by: Joao Gante * Llava: fix checkpoint_doc (#32458) fix: add new llava like model bug * add the missing flash attention test marker (#32419) * add flash attention check * fix * fix * add the missing marker * bug fix * add one more * remove order * add one more * Update kwargs validation for `preprocess` with decorator (#32024) * BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests * Fix get large model config for Switch Transformer encoder only tester (#32438) * Dependencies: fix typo (#32389) deps_2 * Add Nemotron HF Support (#31699) * Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree * Generate: fix end to end compilation (#32465) * Add codestral mamba2 (#32080) * add new model like * draft cuda forward - mismatched keys (sharding on conv1) * match keys successfully * fix split * get generation/forward running (wrong gens, norm?) * :update * some refactoring * fixes * works up until copy to cache * fix * update * NON WORKING VERSION * version that work? * nit * fix config * fix conversion script * working cuda forward * nit * update * simplifcation * make mamba slow simple work * no einops * todo * fix style * no einops * update fix no einsum * nit * remove einops * bug: scan_output differs strongly * add rms norm option * fix fast + slow generation with and w/o cache :heavy_check_mark: * draft integration tests * remove a big chunk of the einsum * fix slow, fast generations, without any einsum * fix copies * fix structure * fix up modeling and tests * fix tests * clamping is indeed worse * recover mamba2 cache test * fix copies * no cache position (yet) * fix tf tests * fix matmul for generate * fixup * skip cache tests for now * [run-slow]mamba2 * tune out hidden states for padding * test batched generation * propagate attention mask changes * fix past length * fix integration test * style * address comments * update readme * add mamba2 version check * fix tests * [run-slow]mamba2 * skip edge tests * [run-slow]mamba2 * last fixup * [run-slow]mamba2 * update README --------- Co-authored-by: Arthur Zucker * Migrate import checks not need accelerate, and be more clear on min versions (#32292) * Migrate import checks to secondary accelerate calls * better errs too * Revert, just keep the import checks + remove accelerate-specific things * Rm extra' * Empty commit for ci * Small nits * Final * Documentation: BOS token_id deprecation change for NLLB (#32443) Update nllb.md * dev version 4.45.0 * `is_torchdynamo_compiling` -- cast a wide exception net (#32476) * cast a wide net * make fix-copies with a few manual changes * add copied from * Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit (#32276)" (#32477) * Revert "fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276)" This reverts commit 62c60a30181a65e1a3a7f19c3055a240a6a21335. We uncovered an issue with this change that caused our training runs to hang. * `is_torchdynamo_compiling` -- cast a wide exception net (#32476) * cast a wide net * make fix-copies with a few manual changes * add copied from --------- Co-authored-by: Joao Gante * ๐ŸŒ [i18n-KO] Translated `mask_generation.md` to Korean (#32257) * docs: ko: tasks/mask_generation.md * feat: nmt draft * fix : toc local * fix : manual edits * fix : ko-toctree * fix: resolve suggestions Co-authored-by: boyunJang Co-authored-by: Chaewon Song * fix: resolve suggestions Co-authored-by: boyunJang Co-authored-by: Chaewon Song * fix: resolve suggestions * fix: resolve suggestions * fix: resolve suggestions --------- Co-authored-by: boyunJang Co-authored-by: Chaewon Song * ๐ŸŒ [i18n-KO] Translated `idefics.md` to Korean (#32258) * docs: ko: tasks/idefics.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Chaewon Song Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> --------- Co-authored-by: Chaewon Song Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `image_to_image.md` to Korean (#32327) * docs: ko: tasks/image_to_image.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> * fix: handle remaining suggestions Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> --------- Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> * Cache: new Cache format in decoder-only models (#31421) * draft bart with new cache * add cache for decoder-only models * revert utils * modify docstring * revert bart * minor fixes * fix copies (not related) * revert tests * remove enc-dec related code * remove bloom * remove opt (enc-dec) * update docstring * git, codegen, gpt_neo, gpt_neox, gpj * clean up * copied from statements * revert * tmp * update warning msg * forgot git * add more flags * run-slow git,codegen,gpt_neo,gpt_neox,gpj * add cache flag to VLMs * remove files * style * video LLMs also need a flag * style * llava will go in another PR * style * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copy from * deprecate until v4.45 and warn if not training * nit * fix test * test static cache * add more tests and fix models * fix copies * return sliding window mask * run slow tests & fix + codestyle * one more falcon fix for alibi --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Gemma2: add cache warning (#32279) * gemma2 fallback to dynamic cache * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joao Gante * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * raise error and dont fallback to dynamic cache * prev will break most forward calls/tests * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update * fix copies --------- Co-authored-by: Joao Gante Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * enable xla fsdp (#32048) * enable xla fsdp * add acceleration version check for xla fsdp * Fix typo in tokenization_utils_base.py (#32484) * Agents use grammar (#31735) * Allow optional use of grammars to constrain generation * fix broken link in docs (#32491) `https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline.__call__` `generate_kwargs (dict, optional) โ€” Additional keyword arguments to pass along to the generate method of the model (see the generate method corresponding to your framework here).` link in "here" doesnt work * Docs: alert for the possibility of manipulating logits (#32467) * logits * words * ๐ŸŒ [i18n-KO] Translated `gptq.md` to Korean (#32293) * fix: manual edits * fix: manual edits2 * fix: delete files * fix: resolve suggestions Co-authored-by: Sungmin Oh Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> * fix: resolve suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Sungmin Oh Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `prompting.md` to Korean (#32294) * docs: ko: tasks/prompting.md * feat: nmt-draft * fix: update translation in prompting.md * fix: update toctree.yml * fix: manual edits * fix: toctree edits * fix: resolve suggestions Co-authored-by: boyunJang Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> --------- Co-authored-by: boyunJang Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `quantization/quanto.md` to Korean (#32281) * docs: ko: quantization/quanto.md * feat: nmt draft * fix: resolve suggestions Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> * fix: resolve suggestions Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> --------- Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `image_feature_extraction.md` to Korean (#32239) * docs: ko: tasks/images_feature_extraction.md * feat: nmt draft * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits * feat: manual edits * Update docs/source/ko/tasks/image_feature_extraction.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/tasks/image_feature_extraction.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * fix: manual edits --------- Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Fix references to model google mt5 small (#32497) * Docs: Fixed WhisperModel.forwardโ€™s docstring link (#32498) Fixed WhisperModel.forwardโ€™s docstring link. * ๐ŸŒ [i18n-KO] Translated `chat_templating.md` to Korean (#32362) * docs: ko: chat_templating.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/chat_templating.md Co-authored-by: Sungmin Oh * Update docs/source/ko/chat_templating.md Co-authored-by: Sungmin Oh * fix: apply suggestions from code review - anchor Co-authored-by: Sungmin Oh * fix: manual edits Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> * fix: manual edits * fix: delete 'default template' section --------- Co-authored-by: Sungmin Oh Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> * Fix link to autoclass_tutorial.md in i18n.md (#32501) * Fix typo: depracted -> deprecated (#32489) Hello! ## Pull Request overview * Fix typo ## Details This should speak for itself. cc @itazap @ArthurZucker - Tom Aarsen * Fix issue #32518: Update llm_tutorial.md (#32523) Update llm_tutorial.md remove comma re: issue 32518 https://github.com/huggingface/transformers/issues/32518 * Change Phi3 `_supports_sdpa` to True (#32457) * Change `_supports_sdpa` to True * add phi3 to sdpa support list * Uniformize kwargs for processors - GroundingDINO (#31964) * fix typo * uniform kwargs * make style * add comments * remove return_tensors * remove common_kwargs from processor since it propagates * make style * return_token_type_ids to True * revert the default imagekwargs since does not accept any value in the image processro * revert processing_utils.py * make style * add molbap's commit * fix typo * fix common processor * remain * Revert "add molbap's commit" This reverts commit a476c6ee88318ce40d73ea31e2dc2d4faa8ae410. * add unsync PR * revert * make CI happy * nit * import annotationformat * Fix add-new-model-like (#31773) * handle (processor_class, None) returned by ModelPatterns * handle (slow, fast) image processors in add model * handle old image processor case * Add Qwen2-Audio (#32137) * add qwen2audio * Update check_repo.py * fix style * fix test * fix style * add model size * Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * switch the attention_mask and the feature_attention_mask * add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py * fix initialization * update chat_template * fix consistency issue after copy * add docstrings to _merge_input_ids_with_audio_features * add copied from to prepare_inputs_for_generation * add more details to docs * rm comment * add init_std * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * update * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update tests * rm ignore_index * update processor * rm ffmpeg_read * Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_audio.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * typo * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * fix quality * [run_slow] qwen2_audio * [run_slow] qwen2_audio * [run_slow] qwen2_audio * add official model --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * filter flash_attn optional imports loading remote code (#30954) * filter flash_attn optional imports loading remote code * improve pattern * fix code style * Update src/transformers/dynamic_module_utils.py Co-authored-by: Matt --------- Co-authored-by: Matt * ๐ŸŒ [i18n-KO] Translated `ko-llm_tutorial_optimization.md` to Korean (#32372) * docs: ko: llm_tutorial_optimization.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/llm_tutorial_optimization.md Co-authored-by: Chaewon Song * Update docs/source/ko/llm_tutorial_optimization.md Co-authored-by: Chaewon Song * fix: resolve suggestions - 1 Co-authored-by: Chaewon Song Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> Co-authored-by: boyunJang * fix: resolve suggestions - 2 Co-authored-by: boyunJang Co-authored-by: Chaewon Song Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> --------- Co-authored-by: Chaewon Song Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> Co-authored-by: boyunJang * ๐ŸŒ [i18n-KO] Translated `trainer.md` to Korean (#32260) * docs: ko: ko-trainer * feat: nmt draft * fix: manual edits * fix: manual edits * fix: glossary * fix: glossary * Apply suggestions from code review Co-authored-by: Jinuk <45095330+JinukHong@users.noreply.github.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> --------- Co-authored-by: Jinuk <45095330+JinukHong@users.noreply.github.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `eetq.md` to Korean (#32352) * docs: ko: quantization/eetq.md * feat: nmt draft * fix docs: ko: quantization/eetq.md * fix docs: ko: quantization/eetq.md * fix: resolve suggestions Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> * fix: resolve suggestions * fix: resolve suggsetions --------- Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `fsdp.md` to Korean (#32261) * docs: ko: fsdp.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> * fix: resolve suggestions * Update docs/source/ko/fsdp.md Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> * Update docs/source/ko/fsdp.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * ๐ŸŒ [i18n-KO] Translated `bitsandbytes.md` to Korean (#32408) * docs: ko: quantization/bitsandbytes.md * feat: nmt draft * fix: minor typos * fix: manual edits * fix: manual edits * fix: resolve suggestions Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * fix: resolve suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix generate with `inputs_embeds` as input (#32493) * I think inputs_embeds has ndim == 3 * fix sequence length catch * add generate test * [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama * skip whisper * fix bart test * more fixes * Fixed test `test_static_cache_exportability` with torch 2.4.0 (#32516) Workaround the export issue in torch 2.4 Co-authored-by: Guang Yang * Fix code example to load bigcode starcoder2 7b (#32474) * [docs] Translation guide (#32547) clarify * Gemma2: fix FA2 generation (#32553) fix FA2 * Fix a bug in Qwen2Audio (#32552) fix _update_model_kwargs_for_generation * fix slow integration gemma2 test (#32534) no empty revision * fix non contiguous tensor value error in save_pretrained (#32422) Signed-off-by: duzhanwei Co-authored-by: duzhanwei * ๐ŸŒ [i18n-KO] Translated `agent.md` to Korean (#32351) * docs: ko: main_classes/agent * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: thsamaji <60818655+thsamajiki@users.noreply.github.com> Co-authored-by: SeungAhSon * fix: resolve suggestions * fix: resolve code line number --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: thsamaji <60818655+thsamajiki@users.noreply.github.com> Co-authored-by: SeungAhSon * Add new model (#32615) * v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix: FA2 with packed training (#32487) * fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment * Fix sliding window attention used in Gemma2FlashAttention2 (#32522) * fix sliding window attention (flash2) in gemma2 model * [run-slow] gemma * fix slicing attention_mask for flash_attn2 * fix slicing attention_mask when flash_attn is used * add missing comment * slice the last seq_len tokens in the key, value states * revert code of slicing key, value states * fix: Fixed conditional check for `encodec` model names (#32581) * Fixed conditional check for encodec model names. * Reformatted conditional check. * Fix `.push_to_hub(..., create_pr=True, revision="my-branch")` when creating PR on not-owned repo (#32094) Fix create_pr aagainst existing revision * Bump aiohttp from 3.9.4 to 3.10.2 in /examples/research_projects/decision_transformer (#32569) Bump aiohttp in /examples/research_projects/decision_transformer Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.4 to 3.10.2. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](https://github.com/aio-libs/aiohttp/compare/v3.9.4...v3.10.2) --- updated-dependencies: - dependency-name: aiohttp dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/visual_bert (#32220) Bump torch in /examples/research_projects/visual_bert Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Cleanup tool calling documentation and rename doc (#32337) * Rename "Templates for Chat Models" doc to "Chat Templates" * Small formatting fix * Small formatting fix * Small formatting fix * Cleanup tool calling docs as well * Remove unneeded 'revision' * Move tip to below main code example * Little bonus section on template editing * ๐ŸŒ [i18n-KO] Translated `deepspeed.md` to Korean (#32431) * Update _toctree.yml * docs: ko: deepspeed.md * Apply suggestions from code review Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ko/deepspeed.md * Update docs/source/ko/deepspeed.md Co-authored-by: SeungAhSon * Apply suggestions from code review Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> * Update docs/source/ko/_toctree.yml --------- Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: SeungAhSon * ๐ŸŒ [i18n-KO] Translated `awq.md`to Korean (#32324) * fix: manual edits * Apply suggestions from code review Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> Co-authored-by: Chulhwa (Evan) Han * fix:manual edits - ์ž˜๋ชป๋œ ๊ฒฝ๋กœ์— ๋ฒˆ์—ญ๋ณธ ํŒŒ์ผ์„ ์ƒ์„ฑํ•ด์„œ ์˜ฎ๊น€ * Delete docs/source/ko/tasks/awq.md * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> Co-authored-by: Chulhwa (Evan) Han Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: Fixed failing `test_find_base_model_checkpoint` (#32638) Fixed failing test_find_base_model_checkpoint. * Bump tensorflow from 2.11.1 to 2.12.1 in /examples/research_projects/decision_transformer (#32341) Bump tensorflow in /examples/research_projects/decision_transformer Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.11.1 to 2.12.1. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.11.1...v2.12.1) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * "to be not" -> "not to be" (#32636) * "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * fix: Updated the `is_torch_mps_available()` function to include `min_version` argument (#32545) * Fixed wrong argument in is_torch_mps_available() function call. * Fixed wrong argument in is_torch_mps_available() function call. * sorted the import. * Fixed wrong argument in is_torch_mps_available() function call. * Fixed wrong argument in is_torch_mps_available() function call. * Update src/transformers/utils/import_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * removed extra space. * Added type hint for the min_version parameter. * Added missing import. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Expand inputs in processors for VLMs (#30962) * let it be * draft * should not have changed * add warnings * fix & add tests * fix tests * ipnuts embeds cannot be passed with pixels * more updates * paligemma ready! * minor typos * update blip-2 * fix tests & raise error * docstring * add blip2 test * tmp * add image seq length to config * update docstring * delete * fix tests * fix blip * fix paligemma * out-of-place scatter * add llava-next-video * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * remove tmp * codestyle * nits * more nits * remove overriding in tests * comprehension when merging video * fix-copies * revert changes for embeds test * fix tests after making comprehension * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Update src/transformers/models/blip_2/processing_blip_2.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * more updates * fix tests --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Automatically add `transformers` tag to the modelcard (#32623) * Automatically add `transformers` tag to the modelcard * Specify library_name and test * Fix tests (#32649) * skip failing tests * [no-filter] * [no-filter] * fix wording catch in FA2 test * [no-filter] * trigger normal CI without filtering * fix tensors on different devices in `WhisperGenerationMixin` (#32316) * fix * enable on xpu * no manual remove * move to device * remove to * add move to * Add support for GrokAdamW optimizer (#32521) * add grokadamw * reformat * code review feedback, unit test * reformat * reformat * Add Depth Anything V2 Metric models (#32126) * add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * add checkpoint and repo names * adapt head to support metric depth estimation * add max_depth output scaling * add expected logits * improve docs * fix docstring * rename depth_estimation to depth_estimation_type * add integration test * Refactored tests to include metric depth model inference test * Integration test pass when the timm backbone lines are commented (L220-L227) * address feedback * replace model path to use organization path * formatting * delete deprecated TODO * address feedback * [run_slow] depth_anything * Fix: Fixed directory path for utils folder in `test_tokenization_utils.py` (#32601) * Removed un-necessary expressions. * Fixed directory path for utils folder in test_tokenization_utils.py * Modify ProcessorTesterMixin for better generalization (#32637) * Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs * remove crop_size argument in align processor tests to be coherent with base tests * Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino * TF_Deberta supporting mixed precision (#32618) * Update modeling_tf_deberta.py Corrected some codes which do not support mixed precision * Update modeling_tf_deberta_v2.py Corrected some codes which do not support mixed precision * Update modeling_tf_deberta_v2.py * Update modeling_tf_deberta.py * Add files via upload * Add files via upload * Fix tests recurrent (#32651) * add fix for recurrentgemma * [no-filter] * trigger-ci * [no-filter] * [no-filter] * attempt to fix mysterious zip error * [no-filter] * fix lookup error * [no-filter] * remove summarization hack * [no-filter] * Support MUSA (Moore Threads GPU) backend in transformers (#31913) Add accelerate version check, needs accelerate>=0.33.0 * fix: Fixed failing tests in `tests/utils/test_add_new_model_like.py` (#32678) * Fixed failing tests in tests/utils/test_add_new_model_like.py * Fixed formatting using ruff. * Small nit. * Update translation docs review (#32662) update list of people to tag * Add TorchAOHfQuantizer (#32306) * Add TorchAOHfQuantizer Summary: Enable loading torchao quantized model in huggingface. Test Plan: local test Reviewers: Subscribers: Tasks: Tags: * Fix a few issues * style * Added tests and addressed some comments about dtype conversion * fix torch_dtype warning message * fix tests * style * TorchAOConfig -> TorchAoConfig * enable offload + fix memory with multi-gpu * update torchao version requirement to 0.4.0 * better comments * add torch.compile to torchao README, add perf number link --------- Co-authored-by: Marc Sun * Fix `JetMoeIntegrationTest` (#32332) JetMoeIntegrationTest Co-authored-by: ydshieh * Update the distributed CPU training on Kubernetes documentation (#32669) * Update the Kubernetes CPU training example * Add namespace arg Signed-off-by: Dina Suehiro Jones --------- Signed-off-by: Dina Suehiro Jones * fix: Fixed unknown pytest config option `doctest_glob` (#32475) Fixed unknown config option doctest_glob. * Unpin deepspeed in Docker image/tests (#32572) Unpin deepspeed * Updated workflows to the latest versions (#32405) Updated few workflows to the latest versions. * reopen: llava-next fails to consider padding_side during Training (#32679) restore #32386 * fix: Corrected ` falcon-mamba-7b` model checkpoint name (#32837) Corrected the model checkpoint. * fix: update doc link for runhouse in README.md (#32664) * VLMs: small clean-up for cache class (#32417) * fix beam search in video llava * [run-slow] video_llava * add back the position ids (#32554) * add back the position ids * fix failing test * Use head_dim if in config for RoPE (#32495) * use head_dim if in config for RoPE * typo * simplify with getattr * Generate: unify `LogitsWarper` and `LogitsProcessor` (#32626) * [tests] make test_sdpa_equivalence device-agnostic (#32520) * fix on xpu * [run_all] * Cache: use `batch_size` instead of `max_batch_size` (#32657) * more precise name * better docstrings * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix AutoConfig and AutoModel support for Llava-Next-Video (#32844) * Fix: fix all model_type of Llava-Next-Video to llava_next_video * Fix doc for llava_next_video * * Fix formatting issues * Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation * Fix docs TOC for llava-next-video * improve _get_is_as_tensor_fns (#32596) * improve _get_is_as_tensor_fns * format * Revert PR 32299, flag users when Zero-3 was missed (#32851) Revert PR 32299 * fix multi-gpu with static cache (#32543) * Reduce the error log when using core models that need their weights renamed, and provide a step forward (#32656) * Fin * Modify msg * Finish up nits * Make beam_constraints.Constraint.advance() docstring more accurate (#32674) * Fix beam_constraints.Constraint.advance() docstring * Update src/transformers/generation/beam_constraints.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Joao Gante Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * generate: missing `to` in DoLa body, causing exceptions in multi-gpu generation (#32856) * Add Flax Dinov2 (#31960) * tfmsenv restored in main * installed flax * forward pass done and all tests passed * make fix-copies and cleaning the scripts * fixup attempt 1 * fixup attempt 2 * fixup third attempt * fixup attempt 4 * fixup attempt 5 * dinov2 doc fixed * FlaxDinov2Model + ForImageClassification added to OBJECTS_TO_IGNORE * external pos_encoding layer removed * fixup attempt 6 * fixed integration test values * fixup attempt 7 * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * comments removed * comment removed from the test * fixup * Update src/transformers/models/dinov2/modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * new fixes 1 * interpolate_pos_encoding function removed * droppath rng fixed, pretrained beit copied-from still not working * modeling_flax_dinov2.py reformatted * Update tests/models/dinov2/test_modeling_flax_dinov2.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * added Copied from, to the tests * copied from statements removed from tests * fixed copied from statements in the tests * [run_slow] dinov2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Add Descript-Audio-Codec model (#31494) * dac model * original dac works * add dac model * dac can be instatiated * add forward pass * load weights * all weights are used * convert checkpoint script ready * test * add feature extractor * up * make style * apply cookicutter * fix tests * iterate on FeatureExtractor * nit * update dac doc * replace nn.Sequential with nn.ModuleList * nit * apply review suggestions 1/2 * Update src/transformers/models/dac/modeling_dac.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * up * apply review suggestions 2/2 * update padding in FeatureExtractor * apply review suggestions * iterate on design and tests * add integration tests * feature extractor tests * make style * all tests pass * make style * fixup * apply review suggestions * fix-copies * apply review suggestions * apply review suggestions * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Update docs/source/en/model_doc/dac.md Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * anticipate transfer weights to descript * up * make style * apply review suggestions * update slow test values * update slow tests * update test values * update with CI values * update with vorace values * update test with slice * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * support torch-speech (#32537) * [tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519) * enable * fix * Add __repr__ for Conv1D (#32425) * Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D. * Support save/load ckpt for XLA FSDP (#32311) * Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by: Zach Mueller * add is_fsdp_xla_v1_enabled --------- Co-authored-by: Zach Mueller * RT-DETR parameterized batchnorm freezing (#32631) * fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by: Pavel Iakubovskii --------- Co-authored-by: Pavel Iakubovskii * Fix incorrect vocab size retrieval in GGUF config (#32551) * fix gguf config vocab size * minor fix * link issue * Mamba / FalconMamba: Fix mamba left padding (#32677) * fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix: Mamba2 generation mismatch between input_ids and inputs_embeds (#32694) * fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass * Docs: Fixed `whisper-large-v2` model link in docs (#32871) Fixed whisper-large-v2 model link in docs. * Add tip to clarify tool calling (#32883) * Allow-head-dim (#32857) * support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara > * update Co-authored-by: bzantium * Co-authored-by: suhara * Update Co-authored-by: Yoshi Suhara --------- Co-authored-by: bzantium Co-authored-by: Yoshi Suhara * ๐Ÿšจ๐Ÿšจ๐Ÿšจ Update min version of accelerate to 0.26.0 (#32627) * Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci * Fix repr for conv (#32897) add nx * fix: jamba cache fails to use torch.nn.module (#32894) Co-authored-by: Gal Cohen * Fix: Mamba2 `norm_before_gate` usage (#32686) * mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only * Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer (#32903) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Replace `tensor.norm()` with decomposed version for CLIP executorch export (#32887) * Replace .norm() with decomposed version for executorch export * [run_slow] clip * link for optimizer names (#32400) * link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup * [i18n-ar] add README_ar.md to README.md (#32583) * Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md * fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296) [whisper] don't overwrite return_timestamps when not passed to generate * Update docker image building (#32918) commit * Jamba: update integration tests (#32250) * try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix: Added missing `huggingface_hub` installation to workflows (#32891) Added missing huggingface_hub installation to workflows. * fix: no need to dtype A in jamba (#32924) Co-authored-by: Gal Cohen * FEAT / Trainer: Add adamw 4bit optimizer (#31865) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit 25278e805f24d5d48eaa0638abb48de1b783a3fb. * style * version check * CI: separate step to download nltk files (#32935) * separate step to download nltk files * duplicated * rm comma * FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain * Add SynCode to llm_tutorial (#32884) * Fix benchmark script (#32635) * fix * >= 0.3.0 --------- Co-authored-by: ydshieh * Improve greedy search memory usage (#32895) Do not call torch.repeat_interleave if expand_size is 1 * Add chat_template for tokenizer extracted from GGUF model (#32908) * add chat_template to gguf tokenizer * add template through tokenizer config * fix: (issue #32689) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. * Gemma2: eager attention by default (#32865) * [run_slow] idefics2 (#32840) * Fix regression on `Processor.save_pretrained` caused by #31691 (#32921) fix save_pretrained * ๐ŸŒ [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han * Apply suggestions from code review Co-authored-by: Ahnjj_DEV * Apply suggestions from code review Co-authored-by: Ahnjj_DEV * Apply suggestions from code review Co-authored-by: Ahnjj_DEV * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han Co-authored-by: Ahnjj_DEV * Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863) * docs: fix outdated link to TF32 explanation (#32947) fix outdated link * conflict updates 11/14/2024 --------- Signed-off-by: dependabot[bot] Signed-off-by: duzhanwei Signed-off-by: Dina Suehiro Jones Co-authored-by: Alexandre TL Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Amit Garg Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Alvaro Moran <6949769+tengomucho@users.noreply.github.com> Co-authored-by: Deep Gandhi <97520292+DeF0017@users.noreply.github.com> Co-authored-by: RhuiDih <166782544+RhuiDih@users.noreply.github.com> Co-authored-by: Sai-Suraj-27 Co-authored-by: Lysandre Co-authored-by: Lysandre Co-authored-by: Arthur Zucker Co-authored-by: Joao Gante Co-authored-by: Fanli Lin Co-authored-by: Rohit Dwivedula <25080952+rohitdwivedula@users.noreply.github.com> Co-authored-by: ์กฐ์ค€๋ž˜ Co-authored-by: Dr. Artificialๆ›พๅฐๅฅ <875100501@qq.com> Co-authored-by: Raushan Turganbay Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Penut Chen <94501378+PenutChen@users.noreply.github.com> Co-authored-by: Matt Co-authored-by: Huazhong Ji Co-authored-by: Austin <31086824+avlewis@users.noreply.github.com> Co-authored-by: Kashif Rasul Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh Co-authored-by: jrhe <4038905+jrhe@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii Co-authored-by: Joรฃo Nadkarni <38245862+joaonadkarni@users.noreply.github.com> Co-authored-by: Connor Anderson Co-authored-by: leejet Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> Co-authored-by: Guang Yang <42389959+guangy10@users.noreply.github.com> Co-authored-by: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Co-authored-by: Gilad Turok <36947659+gil2rok@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Teddy Ferdinan <64476430+teddy-f-47@users.noreply.github.com> Co-authored-by: teddyferdinan Co-authored-by: Luc Georges Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: plaggy <35706832+plaggy@users.noreply.github.com> Co-authored-by: Wing Lian Co-authored-by: fkrasnov2 Co-authored-by: Joshua Lochner Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: sanchit-gandhi Co-authored-by: Ricardo Co-authored-by: Lunwen He Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Hanna Yukhymenko <49597980+ayukh@users.noreply.github.com> Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> Co-authored-by: Viktor Scherbakov Co-authored-by: Omar Salman Co-authored-by: Nikos Karampatziakis Co-authored-by: OsamaS99 <62110783+OsamaS99@users.noreply.github.com> Co-authored-by: Aaron Haag Co-authored-by: Zach Mueller Co-authored-by: Shaopeng Fu Co-authored-by: Xueshen Liu Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: TechInterMezzo Co-authored-by: Nicholas Broad Co-authored-by: Francisco Kurucz Co-authored-by: Abdi <48970896+AbdiHaryadi@users.noreply.github.com> Co-authored-by: Prakarsh Kaushik <66624139+RUFFY-369@users.noreply.github.com> Co-authored-by: Fanli Lin Co-authored-by: Ao Tang Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Chris Toukmaji <51040574+christoukmaji@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com> Co-authored-by: boyunJang Co-authored-by: Chaewon Song Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: HyunJi Shin <74661937+shinhyunji36@users.noreply.github.com> Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com> Co-authored-by: append-only Co-authored-by: Bill Zhou <20598803+blubitz@users.noreply.github.com> Co-authored-by: Jonathan Rahn Co-authored-by: Minki Kim <100768622+1kmmk1@users.noreply.github.com> Co-authored-by: Sungmin Oh Co-authored-by: SeungYoun Lee <84276596+win2dvp21@users.noreply.github.com> Co-authored-by: ๊น€์ค€์žฌ <55151385+junejae@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Jiyoon <62553866+enchantee00@users.noreply.github.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: doomdagadiggiedahdah <77366355+doomdagadiggiedahdah@users.noreply.github.com> Co-authored-by: Wonseok Lee (Jack) <10275397+pocca2048@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: Yunfei Chu Co-authored-by: Ekaterina Aidova Co-authored-by: 010kim Co-authored-by: Chulhwa (Evan) Han Co-authored-by: Jinuk <45095330+JinukHong@users.noreply.github.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com> Co-authored-by: SeungAhSon Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Guang Yang Co-authored-by: zhanweidu Co-authored-by: duzhanwei Co-authored-by: thsamaji <60818655+thsamajiki@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Chaehong Jeong Co-authored-by: Lucain Co-authored-by: Ahnjj_DEV Co-authored-by: Quentin Gallouรฉdec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Lysandre Debut Co-authored-by: Eric Hartford Co-authored-by: Bertrand Thia <56003053+bt2513@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Seungwoo Lee <49184578+pinesnow72@users.noreply.github.com> Co-authored-by: fmo-mt <118255577+fmo-mt@users.noreply.github.com> Co-authored-by: Jerry Zhang Co-authored-by: Marc Sun Co-authored-by: Dina Suehiro Jones Co-authored-by: jp Co-authored-by: muddlebee Co-authored-by: Ao Tang Co-authored-by: YangshenโšกDeng Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com> Co-authored-by: Alex Calderwood Co-authored-by: MAHIR DAIYAN Co-authored-by: Aaron Chung <35474496+AaronZLT@users.noreply.github.com> Co-authored-by: Yitong Huang Co-authored-by: Alan-Blanchet Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: bzantium Co-authored-by: Yoshi Suhara Co-authored-by: Gal Cohen (galco) Co-authored-by: Gal Cohen Co-authored-by: Nicholas Broad Co-authored-by: Ahmed Almaghz <53489256+AhmedAlmaghz@users.noreply.github.com> Co-authored-by: Ruilin Huang Co-authored-by: Shubham Ugare Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Andrรฉs Marafioti Co-authored-by: Franz Louis Cesista Co-authored-by: Stefano Fiorucci Co-authored-by: github-actions[bot] --- .circleci/TROUBLESHOOT.md | 2 +- .circleci/config.yml | 71 +- .circleci/create_circleci_config.py | 315 +- .circleci/parse_test_outputs.py | 70 + .github/ISSUE_TEMPLATE/bug-report.yml | 53 +- .github/ISSUE_TEMPLATE/feature-request.yml | 4 +- .github/ISSUE_TEMPLATE/i18n.md | 2 +- .github/PULL_REQUEST_TEMPLATE.md | 14 +- .github/workflows/TROUBLESHOOT.md | 2 +- .github/workflows/add-model-like.yml | 6 +- .github/workflows/benchmark.yml | 42 + .github/workflows/build-ci-docker-images.yml | 77 + .github/workflows/build-docker-images.yml | 317 +- .../build-nightly-ci-docker-images.yml | 28 +- .../workflows/build-past-ci-docker-images.yml | 10 +- .github/workflows/build_documentation.yml | 1 + .github/workflows/build_pr_documentation.yml | 1 + .github/workflows/check_tiny_models.yml | 16 +- .github/workflows/doctest_job.yml | 82 + .github/workflows/doctests.yml | 93 +- .github/workflows/model-templates.yml | 81 - .github/workflows/model_jobs.yml | 121 + .github/workflows/push-important-models.yml | 142 + .github/workflows/release-conda.yml | 2 +- .github/workflows/self-nightly-caller.yml | 43 + .../workflows/self-nightly-past-ci-caller.yml | 99 +- .github/workflows/self-nightly-scheduled.yml | 289 -- .github/workflows/self-past-caller.yml | 40 + .github/workflows/self-past.yml | 356 -- .github/workflows/self-pr-slow-ci.yml | 135 + .../workflows/self-push-amd-mi300-caller.yml | 25 + .github/workflows/self-push-amd.yml | 40 +- .github/workflows/self-push-caller.yml | 4 +- .github/workflows/self-push.yml | 41 +- .../self-scheduled-amd-mi210-caller.yml | 1 + .../self-scheduled-amd-mi250-caller.yml | 1 + .../self-scheduled-amd-mi300-caller.yml | 21 + .github/workflows/self-scheduled-amd.yml | 110 +- .github/workflows/self-scheduled-caller.yml | 78 + .github/workflows/self-scheduled.yml | 415 +- .github/workflows/slack-report.yml | 101 + .github/workflows/ssh-runner.yml | 63 + .github/workflows/stale.yml | 4 +- .github/workflows/trufflehog.yml | 18 + .github/workflows/update_metdata.yml | 2 +- CONTRIBUTING.md | 44 +- Makefile | 22 +- README.md | 334 +- README_es.md | 545 -- README_hd.md | 519 -- README_ja.md | 579 --- README_ko.md | 493 -- README_pt-br.md | 566 --- README_ru.md | 553 -- README_te.md | 558 --- README_zh-hans.md | 518 -- README_zh-hant.md | 530 -- SECURITY.md | 36 +- awesome-transformers.md | 12 +- {tests/models/deta => benchmark}/__init__.py | 0 benchmark/benchmark.py | 326 ++ benchmark/config/generation.yaml | 57 + benchmark/optimum_benchmark_wrapper.py | 16 + conftest.py | 60 +- docker/consistency.dockerfile | 16 + docker/custom-tokenizers.dockerfile | 26 + docker/examples-tf.dockerfile | 12 + docker/examples-torch.dockerfile | 11 + docker/exotic-models.dockerfile | 17 + docker/jax-light.dockerfile | 10 + docker/pipeline-tf.dockerfile | 10 + docker/pipeline-torch.dockerfile | 11 + docker/quality.dockerfile | 9 + docker/tf-light.dockerfile | 12 + docker/torch-jax-light.dockerfile | 16 + docker/torch-light.dockerfile | 11 + docker/torch-tf-light.dockerfile | 19 + docker/transformers-all-latest-gpu/Dockerfile | 53 +- docker/transformers-doc-builder/Dockerfile | 2 +- .../transformers-pytorch-amd-gpu/Dockerfile | 18 +- .../Dockerfile | 5 +- .../Dockerfile | 10 +- docker/transformers-pytorch-gpu/Dockerfile | 2 +- .../Dockerfile | 66 + docker/transformers-tensorflow-gpu/Dockerfile | 2 +- docs/README.md | 4 +- docs/TRANSLATING.md | 2 +- docs/source/_config.py | 2 +- docs/source/de/_config.py | 2 +- docs/source/de/_toctree.yml | 4 +- docs/source/de/add_new_model.md | 30 +- docs/source/de/add_new_pipeline.md | 18 +- docs/source/de/add_tensorflow_model.md | 356 -- docs/source/de/autoclass_tutorial.md | 12 +- docs/source/de/contributing.md | 334 ++ docs/source/de/index.md | 4 +- docs/source/de/installation.md | 20 +- docs/source/de/llm_tutorial.md | 6 +- docs/source/de/model_sharing.md | 2 +- docs/source/de/peft.md | 4 +- docs/source/de/pipeline_tutorial.md | 6 +- docs/source/de/preprocessing.md | 8 +- docs/source/de/quicktour.md | 6 +- docs/source/de/run_scripts.md | 32 +- docs/source/de/testing.md | 34 +- docs/source/de/training.md | 18 +- docs/source/en/_config.py | 2 +- docs/source/en/_redirects.yml | 2 + docs/source/en/_toctree.yml | 157 +- docs/source/en/add_new_model.md | 110 +- docs/source/en/add_new_pipeline.md | 10 +- docs/source/en/add_tensorflow_model.md | 356 -- docs/source/en/agents.md | 564 +++ docs/source/en/attention.md | 6 +- docs/source/en/autoclass_tutorial.md | 75 +- docs/source/en/benchmarks.md | 38 +- docs/source/en/big_models.md | 192 +- docs/source/en/chat_templating.md | 591 ++- docs/source/en/community.md | 8 +- docs/source/en/conversations.md | 290 ++ docs/source/en/create_a_model.md | 111 +- docs/source/en/custom_models.md | 12 +- docs/source/en/custom_tools.md | 789 --- docs/source/en/debugging.md | 70 + docs/source/en/deepspeed.md | 1222 +++++ docs/source/en/generation_strategies.md | 161 +- docs/source/en/gguf.md | 97 + docs/source/en/glossary.md | 16 +- docs/source/en/index.md | 51 +- docs/source/en/installation.md | 14 +- docs/source/en/internal/generation_utils.md | 113 +- docs/source/en/kv_cache.md | 346 ++ docs/source/en/llm_optims.md | 410 ++ docs/source/en/llm_tutorial.md | 20 +- docs/source/en/llm_tutorial_optimization.md | 4 +- docs/source/en/main_classes/agent.md | 79 +- docs/source/en/main_classes/backbones.md | 121 +- docs/source/en/main_classes/callback.md | 2 +- docs/source/en/main_classes/data_collator.md | 5 + docs/source/en/main_classes/deepspeed.md | 2290 +-------- .../source/en/main_classes/image_processor.md | 5 + docs/source/en/main_classes/model.md | 100 +- .../en/main_classes/optimizer_schedules.md | 2 + docs/source/en/main_classes/output.md | 4 +- docs/source/en/main_classes/pipelines.md | 21 +- docs/source/en/main_classes/quantization.md | 32 +- .../source/en/main_classes/text_generation.md | 12 +- .../audio-spectrogram-transformer.md | 28 + docs/source/en/model_doc/auto.md | 6 +- docs/source/en/model_doc/bert-generation.md | 6 +- docs/source/en/model_doc/bert.md | 49 +- docs/source/en/model_doc/blip.md | 2 + docs/source/en/model_doc/camembert.md | 6 +- docs/source/en/model_doc/chameleon.md | 192 + docs/source/en/model_doc/clip.md | 124 +- docs/source/en/model_doc/code_llama.md | 22 +- docs/source/en/model_doc/codegen.md | 1 + docs/source/en/model_doc/cohere.md | 141 + docs/source/en/model_doc/conditional_detr.md | 3 +- docs/source/en/model_doc/dac.md | 80 + docs/source/en/model_doc/dbrx.md | 119 + docs/source/en/model_doc/deformable_detr.md | 1 + docs/source/en/model_doc/deit.md | 28 + docs/source/en/model_doc/depth_anything.md | 119 + docs/source/en/model_doc/depth_anything_v2.md | 115 + docs/source/en/model_doc/deta.md | 11 +- docs/source/en/model_doc/detr.md | 5 +- docs/source/en/model_doc/dinov2.md | 22 +- docs/source/en/model_doc/distilbert.md | 6 +- docs/source/en/model_doc/efficientformer.md | 38 +- docs/source/en/model_doc/encoder-decoder.md | 8 +- docs/source/en/model_doc/ernie_m.md | 8 + docs/source/en/model_doc/falcon_mamba.md | 116 + .../en/model_doc/fastspeech2_conformer.md | 134 + docs/source/en/model_doc/fuyu.md | 2 +- docs/source/en/model_doc/gemma.md | 76 + docs/source/en/model_doc/gemma2.md | 64 + docs/source/en/model_doc/gpt-sw3.md | 6 +- docs/source/en/model_doc/gpt2.md | 125 + docs/source/en/model_doc/gpt_bigcode.md | 2 +- docs/source/en/model_doc/gpt_neox.md | 62 + docs/source/en/model_doc/gptsan-japanese.md | 8 + docs/source/en/model_doc/graphormer.md | 12 +- docs/source/en/model_doc/grounding-dino.md | 118 + docs/source/en/model_doc/hiera.md | 62 + docs/source/en/model_doc/hubert.md | 36 + docs/source/en/model_doc/idefics.md | 10 + docs/source/en/model_doc/idefics2.md | 218 + docs/source/en/model_doc/informer.md | 2 +- docs/source/en/model_doc/instructblip.md | 1 + docs/source/en/model_doc/instructblipvideo.md | 74 + docs/source/en/model_doc/jamba.md | 122 + docs/source/en/model_doc/jetmoe.md | 49 + docs/source/en/model_doc/jukebox.md | 12 +- docs/source/en/model_doc/layoutlmv2.md | 2 +- docs/source/en/model_doc/lilt.md | 2 +- docs/source/en/model_doc/llama.md | 10 + docs/source/en/model_doc/llama3.md | 81 + docs/source/en/model_doc/llava.md | 54 +- docs/source/en/model_doc/llava_next.md | 282 ++ docs/source/en/model_doc/llava_next_video.md | 266 + docs/source/en/model_doc/m2m_100.md | 42 + docs/source/en/model_doc/mamba.md | 104 + docs/source/en/model_doc/mamba2.md | 106 + docs/source/en/model_doc/marian.md | 2 +- docs/source/en/model_doc/markuplm.md | 2 +- docs/source/en/model_doc/mask2former.md | 1 + docs/source/en/model_doc/maskformer.md | 3 +- docs/source/en/model_doc/mega.md | 18 +- docs/source/en/model_doc/mgp-str.md | 2 +- docs/source/en/model_doc/mistral.md | 163 +- docs/source/en/model_doc/mixtral.md | 144 +- docs/source/en/model_doc/mms.md | 2 +- docs/source/en/model_doc/mt5.md | 4 + docs/source/en/model_doc/musicgen.md | 2 +- docs/source/en/model_doc/musicgen_melody.md | 288 ++ docs/source/en/model_doc/nat.md | 8 + docs/source/en/model_doc/nemotron.md | 148 + docs/source/en/model_doc/nezha.md | 14 +- docs/source/en/model_doc/nllb.md | 47 +- docs/source/en/model_doc/olmo.md | 45 + docs/source/en/model_doc/owlv2.md | 4 +- docs/source/en/model_doc/paligemma.md | 78 + docs/source/en/model_doc/patchtsmixer.md | 10 +- docs/source/en/model_doc/patchtst.md | 3 + docs/source/en/model_doc/pegasus_x.md | 2 +- docs/source/en/model_doc/persimmon.md | 5 + docs/source/en/model_doc/phi.md | 54 +- docs/source/en/model_doc/phi3.md | 95 + docs/source/en/model_doc/pix2struct.md | 2 +- docs/source/en/model_doc/pop2piano.md | 2 +- docs/source/en/model_doc/prophetnet.md | 2 +- docs/source/en/model_doc/pvt.md | 2 +- docs/source/en/model_doc/pvt_v2.md | 110 + docs/source/en/model_doc/qdqbert.md | 10 +- docs/source/en/model_doc/qwen2.md | 87 + docs/source/en/model_doc/qwen2_audio.md | 198 + docs/source/en/model_doc/qwen2_moe.md | 82 + docs/source/en/model_doc/realm.md | 10 +- docs/source/en/model_doc/recurrent_gemma.md | 48 + docs/source/en/model_doc/reformer.md | 2 +- docs/source/en/model_doc/roberta.md | 24 +- docs/source/en/model_doc/rt_detr.md | 111 + docs/source/en/model_doc/rwkv.md | 6 +- docs/source/en/model_doc/sam.md | 53 +- docs/source/en/model_doc/segformer.md | 6 +- docs/source/en/model_doc/seggpt.md | 91 + docs/source/en/model_doc/siglip.md | 243 + .../en/model_doc/speech-encoder-decoder.md | 4 +- docs/source/en/model_doc/speech_to_text_2.md | 8 + docs/source/en/model_doc/stablelm.md | 111 + docs/source/en/model_doc/starcoder2.md | 73 + docs/source/en/model_doc/superpoint.md | 131 + docs/source/en/model_doc/swiftformer.md | 12 +- docs/source/en/model_doc/t5.md | 43 +- docs/source/en/model_doc/transfo-xl.md | 4 +- docs/source/en/model_doc/tvlt.md | 10 +- docs/source/en/model_doc/udop.md | 113 + docs/source/en/model_doc/umt5.md | 7 +- docs/source/en/model_doc/unispeech-sat.md | 2 +- docs/source/en/model_doc/van.md | 2 +- docs/source/en/model_doc/video_llava.md | 199 + docs/source/en/model_doc/videomae.md | 28 + docs/source/en/model_doc/vipllava.md | 48 +- .../en/model_doc/vision-encoder-decoder.md | 6 +- docs/source/en/model_doc/visual_bert.md | 2 +- docs/source/en/model_doc/vit.md | 35 +- docs/source/en/model_doc/vit_hybrid.md | 36 + docs/source/en/model_doc/vit_mae.md | 28 + docs/source/en/model_doc/vit_msn.md | 28 + docs/source/en/model_doc/wav2vec2-bert.md | 90 + .../source/en/model_doc/wav2vec2-conformer.md | 2 + docs/source/en/model_doc/wav2vec2.md | 40 +- docs/source/en/model_doc/wavlm.md | 2 +- docs/source/en/model_doc/whisper.md | 55 +- docs/source/en/model_doc/xclip.md | 2 +- docs/source/en/model_doc/xlm-prophetnet.md | 8 + docs/source/en/model_doc/xlsr_wav2vec2.md | 2 + docs/source/en/model_doc/yolos.md | 29 + docs/source/en/model_doc/zoedepth.md | 108 + docs/source/en/model_memory_anatomy.md | 6 +- docs/source/en/model_sharing.md | 4 +- docs/source/en/model_summary.md | 4 +- docs/source/en/multilingual.md | 34 +- docs/source/en/peft.md | 19 +- docs/source/en/perf_hardware.md | 8 +- docs/source/en/perf_infer_cpu.md | 2 +- docs/source/en/perf_infer_gpu_one.md | 112 +- docs/source/en/perf_torch_compile.md | 4 +- docs/source/en/perf_train_cpu.md | 31 +- docs/source/en/perf_train_cpu_many.md | 106 +- docs/source/en/perf_train_gpu_many.md | 35 +- docs/source/en/perf_train_gpu_one.md | 81 +- docs/source/en/perf_train_special.md | 2 +- docs/source/en/perplexity.md | 2 +- docs/source/en/pipeline_tutorial.md | 47 +- docs/source/en/pipeline_webserver.md | 2 +- docs/source/en/pr_checks.md | 2 +- docs/source/en/preprocessing.md | 14 +- docs/source/en/quantization.md | 586 --- docs/source/en/quantization/aqlm.md | 57 + docs/source/en/quantization/awq.md | 232 + docs/source/en/quantization/bitsandbytes.md | 308 ++ docs/source/en/quantization/contribute.md | 69 + docs/source/en/quantization/eetq.md | 47 + docs/source/en/quantization/fbgemm_fp8.md | 58 + docs/source/en/quantization/gptq.md | 120 + docs/source/en/quantization/hqq.md | 69 + docs/source/en/quantization/optimum.md | 19 + docs/source/en/quantization/overview.md | 59 + docs/source/en/quantization/quanto.md | 66 + docs/source/en/quantization/torchao.md | 45 + docs/source/en/quicktour.md | 20 +- docs/source/en/run_scripts.md | 28 +- docs/source/en/serialization.md | 8 +- docs/source/en/task_summary.md | 4 +- docs/source/en/tasks/asr.md | 9 +- docs/source/en/tasks/audio_classification.md | 9 +- .../en/tasks/document_question_answering.md | 10 +- docs/source/en/tasks/idefics.md | 4 +- docs/source/en/tasks/image_captioning.md | 4 +- docs/source/en/tasks/image_classification.md | 11 +- .../en/tasks/image_feature_extraction.md | 134 + docs/source/en/tasks/image_text_to_text.md | 232 + ...e_distillation_for_image_classification.md | 2 +- docs/source/en/tasks/language_modeling.md | 97 +- docs/source/en/tasks/mask_generation.md | 238 + .../en/tasks/masked_language_modeling.md | 96 +- .../en/tasks/monocular_depth_estimation.md | 142 +- docs/source/en/tasks/multiple_choice.md | 21 +- docs/source/en/tasks/object_detection.md | 1502 ++++-- docs/source/en/tasks/prompting.md | 8 +- docs/source/en/tasks/question_answering.md | 21 +- docs/source/en/tasks/semantic_segmentation.md | 207 +- .../en/tasks/sequence_classification.md | 20 +- docs/source/en/tasks/summarization.md | 15 +- docs/source/en/tasks/text-to-speech.md | 10 +- docs/source/en/tasks/token_classification.md | 17 +- docs/source/en/tasks/translation.md | 22 +- docs/source/en/tasks/video_classification.md | 36 +- .../tasks/zero_shot_image_classification.md | 4 +- .../en/tasks/zero_shot_object_detection.md | 10 +- docs/source/en/tasks_explained.md | 2 +- docs/source/en/testing.md | 57 +- docs/source/en/tf_xla.md | 16 +- docs/source/en/tflite.md | 4 +- docs/source/en/tokenizer_summary.md | 14 +- docs/source/en/torchscript.md | 4 +- docs/source/en/trainer.md | 239 +- docs/source/en/training.md | 15 +- docs/source/en/transformers_agents.md | 323 -- docs/source/en/troubleshooting.md | 6 +- docs/source/es/_config.py | 2 +- docs/source/es/_toctree.yml | 107 +- docs/source/es/add_new_pipeline.md | 10 +- docs/source/es/attention.md | 41 + docs/source/es/autoclass_tutorial.md | 12 +- docs/source/es/chat_templating.md | 393 ++ docs/source/es/community.md | 6 +- .../source/es/converting_tensorflow_models.md | 2 +- docs/source/es/create_a_model.md | 22 +- docs/source/es/glossary.md | 12 +- docs/source/es/index.md | 4 +- docs/source/es/installation.md | 18 +- docs/source/es/model_memory_anatomy.md | 239 + docs/source/es/model_sharing.md | 2 +- docs/source/es/multilingual.md | 34 +- docs/source/es/performance.md | 61 + docs/source/es/perplexity.md | 2 +- docs/source/es/pipeline_tutorial.md | 289 +- docs/source/es/pipeline_webserver.md | 128 + docs/source/es/preprocessing.md | 4 +- docs/source/es/run_scripts.md | 28 +- docs/source/es/serialization.md | 28 +- docs/source/es/task_summary.md | 340 ++ docs/source/es/tasks/asr.md | 2 +- docs/source/es/tasks/image_captioning.md | 266 + docs/source/es/tasks/image_classification.md | 2 +- docs/source/es/tasks/language_modeling.md | 24 +- docs/source/es/tasks/multiple_choice.md | 10 +- docs/source/es/tasks/question_answering.md | 10 +- docs/source/es/tasks/summarization.md | 10 +- docs/source/es/tasks_explained.md | 295 ++ docs/source/es/tokenizer_summary.md | 175 + docs/source/es/torchscript.md | 167 + docs/source/es/trainer.md | 409 ++ docs/source/es/training.md | 12 +- docs/source/fr/_config.py | 2 +- docs/source/fr/_toctree.yml | 4 +- docs/source/fr/autoclass_tutorial.md | 56 +- docs/source/fr/index.md | 8 +- docs/source/fr/installation.md | 14 +- docs/source/fr/quicktour.md | 16 +- docs/source/fr/run_scripts_fr.md | 355 ++ docs/source/fr/tutoriel_pipeline.md | 313 ++ docs/source/hi/pipeline_tutorial.md | 16 +- docs/source/it/_config.py | 2 +- docs/source/it/add_new_model.md | 13 +- docs/source/it/add_new_pipeline.md | 10 +- docs/source/it/autoclass_tutorial.md | 12 +- docs/source/it/big_models.md | 2 +- docs/source/it/community.md | 6 +- .../source/it/converting_tensorflow_models.md | 26 +- docs/source/it/create_a_model.md | 22 +- docs/source/it/index.md | 4 +- docs/source/it/installation.md | 14 +- docs/source/it/migration.md | 469 +- docs/source/it/model_sharing.md | 2 +- docs/source/it/multilingual.md | 34 +- docs/source/it/perf_hardware.md | 8 +- docs/source/it/perf_infer_gpu_one.md | 10 +- docs/source/it/perf_train_cpu.md | 2 +- docs/source/it/perf_train_cpu_many.md | 4 +- docs/source/it/pipeline_tutorial.md | 4 +- docs/source/it/preprocessing.md | 4 +- docs/source/it/run_scripts.md | 28 +- docs/source/it/serialization.md | 33 +- docs/source/it/training.md | 12 +- docs/source/ja/_toctree.yml | 14 +- docs/source/ja/add_new_model.md | 23 +- docs/source/ja/add_tensorflow_model.md | 296 -- docs/source/ja/attention.md | 6 +- docs/source/ja/autoclass_tutorial.md | 12 +- docs/source/ja/benchmarks.md | 38 +- docs/source/ja/big_models.md | 2 +- docs/source/ja/chat_templating.md | 18 +- docs/source/ja/community.md | 8 +- docs/source/ja/create_a_model.md | 22 +- docs/source/ja/custom_tools.md | 744 +-- docs/source/ja/generation_strategies.md | 24 +- docs/source/ja/glossary.md | 20 +- docs/source/ja/index.md | 4 +- docs/source/ja/installation.md | 14 +- docs/source/ja/internal/generation_utils.md | 44 +- .../ja/internal/image_processing_utils.md | 2 +- docs/source/ja/internal/trainer_utils.md | 2 +- docs/source/ja/main_classes/agent.md | 87 +- docs/source/ja/main_classes/callback.md | 2 +- docs/source/ja/main_classes/deepspeed.md | 27 +- docs/source/ja/main_classes/output.md | 4 +- docs/source/ja/main_classes/pipelines.md | 18 +- docs/source/ja/main_classes/quantization.md | 8 +- .../source/ja/main_classes/text_generation.md | 7 - docs/source/ja/main_classes/trainer.md | 9 +- docs/source/ja/model_doc/auto.md | 2 +- docs/source/ja/model_doc/bart.md | 2 +- docs/source/ja/model_doc/bert-generation.md | 6 +- docs/source/ja/model_doc/bert.md | 2 +- docs/source/ja/model_doc/bridgetower.md | 2 +- docs/source/ja/model_doc/clip.md | 2 +- docs/source/ja/model_doc/code_llama.md | 11 +- docs/source/ja/model_doc/cpm.md | 2 +- docs/source/ja/model_doc/ctrl.md | 6 +- docs/source/ja/model_doc/deberta-v2.md | 3 +- docs/source/ja/model_doc/deit.md | 148 + docs/source/ja/model_doc/deplot.md | 65 + docs/source/ja/model_doc/deta.md | 64 + docs/source/ja/model_doc/detr.md | 217 + docs/source/ja/model_doc/dialogpt.md | 57 + docs/source/ja/model_doc/dinat.md | 93 + docs/source/ja/model_memory_anatomy.md | 6 +- docs/source/ja/model_sharing.md | 2 +- docs/source/ja/multilingual.md | 34 +- docs/source/ja/pad_truncation.md | 2 +- docs/source/ja/peft.md | 4 +- docs/source/ja/perf_hardware.md | 6 +- docs/source/ja/perf_infer_gpu_one.md | 10 +- docs/source/ja/perf_torch_compile.md | 4 +- docs/source/ja/perf_train_cpu.md | 4 +- docs/source/ja/perf_train_cpu_many.md | 10 +- docs/source/ja/perf_train_gpu_many.md | 8 +- docs/source/ja/perf_train_gpu_one.md | 12 +- docs/source/ja/perplexity.md | 2 +- docs/source/ja/pipeline_tutorial.md | 10 +- docs/source/ja/pipeline_webserver.md | 2 +- docs/source/ja/preprocessing.md | 2 +- docs/source/ja/quicktour.md | 16 +- docs/source/ja/run_scripts.md | 26 +- docs/source/ja/serialization.md | 8 +- docs/source/ja/task_summary.md | 4 +- docs/source/ja/tasks/asr.md | 9 +- docs/source/ja/tasks/audio_classification.md | 11 +- .../ja/tasks/document_question_answering.md | 11 +- docs/source/ja/tasks/idefics.md | 2 +- docs/source/ja/tasks/image_captioning.md | 4 +- docs/source/ja/tasks/image_classification.md | 9 +- ...e_distillation_for_image_classification.md | 6 +- docs/source/ja/tasks/language_modeling.md | 21 +- .../ja/tasks/masked_language_modeling.md | 18 +- .../ja/tasks/monocular_depth_estimation.md | 7 +- docs/source/ja/tasks/multiple_choice.md | 21 +- docs/source/ja/tasks/object_detection.md | 7 +- docs/source/ja/tasks/prompting.md | 4 +- docs/source/ja/tasks/question_answering.md | 19 +- docs/source/ja/tasks/semantic_segmentation.md | 19 +- .../ja/tasks/sequence_classification.md | 21 +- docs/source/ja/tasks/summarization.md | 15 +- docs/source/ja/tasks/text-to-speech.md | 2 +- docs/source/ja/tasks/token_classification.md | 16 +- docs/source/ja/tasks/translation.md | 20 +- docs/source/ja/tasks/video_classification.md | 11 +- docs/source/ja/testing.md | 15 +- docs/source/ja/tf_xla.md | 12 +- docs/source/ja/tflite.md | 4 +- docs/source/ja/tokenizer_summary.md | 4 +- docs/source/ja/torchscript.md | 4 +- docs/source/ja/training.md | 14 +- docs/source/ja/troubleshooting.md | 7 +- docs/source/ko/_config.py | 2 +- docs/source/ko/_toctree.yml | 205 +- docs/source/ko/add_new_model.md | 20 +- docs/source/ko/add_new_pipeline.md | 10 +- docs/source/ko/add_tensorflow_model.md | 262 - docs/source/ko/attention.md | 6 +- docs/source/ko/autoclass_tutorial.md | 12 +- docs/source/ko/big_models.md | 2 +- docs/source/ko/chat_templating.md | 720 +++ docs/source/ko/community.md | 4 +- docs/source/ko/contributing.md | 22 +- docs/source/ko/create_a_model.md | 22 +- docs/source/ko/custom_tools.md | 748 --- docs/source/ko/deepspeed.md | 1220 +++++ docs/source/ko/fsdp.md | 138 + docs/source/ko/generation_strategies.md | 337 ++ docs/source/ko/index.md | 4 +- docs/source/ko/installation.md | 16 +- docs/source/ko/llm_tutorial_optimization.md | 759 +++ docs/source/ko/main_classes/agent.md | 134 + docs/source/ko/model_memory_anatomy.md | 6 +- docs/source/ko/model_sharing.md | 2 +- docs/source/ko/multilingual.md | 34 +- docs/source/ko/pad_truncation.md | 2 +- docs/source/ko/peft.md | 4 +- docs/source/ko/perf_hardware.md | 8 +- docs/source/ko/perf_infer_gpu_many.md | 27 - docs/source/ko/perf_infer_gpu_one.md | 10 +- docs/source/ko/perf_train_cpu.md | 4 +- docs/source/ko/perf_train_cpu_many.md | 10 +- docs/source/ko/perf_train_gpu_many.md | 8 +- docs/source/ko/perplexity.md | 2 +- docs/source/ko/pipeline_tutorial.md | 2 +- docs/source/ko/pipeline_webserver.md | 2 +- docs/source/ko/preprocessing.md | 2 +- docs/source/ko/quantization/awq.md | 233 + docs/source/ko/quantization/bitsandbytes.md | 307 ++ docs/source/ko/quantization/eetq.md | 47 + docs/source/ko/quantization/gptq.md | 120 + docs/source/ko/quantization/quanto.md | 67 + docs/source/ko/quicktour.md | 16 +- docs/source/ko/run_scripts.md | 26 +- docs/source/ko/serialization.md | 8 +- docs/source/ko/task_summary.md | 2 +- docs/source/ko/tasks/asr.md | 9 +- docs/source/ko/tasks/audio_classification.md | 9 +- .../ko/tasks/document_question_answering.md | 10 +- docs/source/ko/tasks/idefics.md | 391 ++ docs/source/ko/tasks/image_captioning.md | 4 +- docs/source/ko/tasks/image_classification.md | 8 +- .../ko/tasks/image_feature_extraction.md | 136 + docs/source/ko/tasks/image_to_image.md | 132 + ...e_distillation_for_image_classification.md | 193 + docs/source/ko/tasks/language_modeling.md | 20 +- docs/source/ko/tasks/mask_generation.md | 228 + .../ko/tasks/masked_language_modeling.md | 19 +- .../ko/tasks/monocular_depth_estimation.md | 7 +- docs/source/ko/tasks/multiple_choice.md | 21 +- docs/source/ko/tasks/object_detection.md | 7 +- docs/source/ko/tasks/prompting.md | 384 ++ docs/source/ko/tasks/question_answering.md | 18 +- docs/source/ko/tasks/semantic_segmentation.md | 18 +- .../ko/tasks/sequence_classification.md | 18 +- docs/source/ko/tasks/summarization.md | 45 +- docs/source/ko/tasks/token_classification.md | 17 +- docs/source/ko/tasks/translation.md | 22 +- docs/source/ko/tasks/video_classification.md | 12 +- docs/source/ko/testing.md | 192 +- docs/source/ko/tf_xla.md | 12 +- docs/source/ko/tflite.md | 4 +- docs/source/ko/tokenizer_summary.md | 4 +- docs/source/ko/torchscript.md | 4 +- docs/source/ko/trainer.md | 596 +++ docs/source/ko/training.md | 14 +- docs/source/ko/troubleshooting.md | 6 +- docs/source/ms/_toctree.yml | 2 - docs/source/ms/index.md | 4 +- docs/source/pt/_config.py | 2 +- .../source/pt/converting_tensorflow_models.md | 6 +- docs/source/pt/create_a_model.md | 22 +- docs/source/pt/index.md | 4 +- docs/source/pt/installation.md | 14 +- docs/source/pt/multilingual.md | 34 +- docs/source/pt/pipeline_tutorial.md | 6 +- docs/source/pt/quicktour.md | 4 +- docs/source/pt/run_scripts.md | 28 +- docs/source/pt/serialization.md | 24 +- .../pt/tasks/sequence_classification.md | 8 +- docs/source/pt/tasks/token_classification.md | 10 +- docs/source/pt/training.md | 12 +- docs/source/te/quicktour.md | 18 +- docs/source/zh/_toctree.yml | 19 + docs/source/zh/add_new_pipeline.md | 238 + docs/source/zh/autoclass_tutorial.md | 14 +- docs/source/zh/big_models.md | 4 +- docs/source/zh/chat_templating.md | 434 ++ docs/source/zh/contributing.md | 331 ++ docs/source/zh/create_a_model.md | 22 +- docs/source/zh/fsdp.md | 161 + docs/source/zh/index.md | 613 +-- docs/source/zh/installation.md | 16 +- docs/source/zh/internal/generation_utils.md | 46 +- docs/source/zh/llm_tutorial.md | 2 +- docs/source/zh/main_classes/agent.md | 81 +- docs/source/zh/main_classes/callback.md | 2 +- docs/source/zh/main_classes/deepspeed.md | 25 +- docs/source/zh/main_classes/output.md | 4 +- docs/source/zh/main_classes/pipelines.md | 18 +- docs/source/zh/main_classes/quantization.md | 8 +- .../source/zh/main_classes/text_generation.md | 7 - docs/source/zh/main_classes/trainer.md | 6 +- docs/source/zh/model_sharing.md | 2 +- docs/source/zh/multilingual.md | 34 +- docs/source/zh/peft.md | 4 +- docs/source/zh/perf_hardware.md | 6 +- docs/source/zh/perf_torch_compile.md | 2 +- docs/source/zh/philosophy.md | 67 + docs/source/zh/pipeline_tutorial.md | 8 +- docs/source/zh/preprocessing.md | 2 +- docs/source/zh/quicktour.md | 16 +- docs/source/zh/run_scripts.md | 28 +- docs/source/zh/serialization.md | 8 +- docs/source/zh/task_summary.md | 8 +- docs/source/zh/tasks/asr.md | 392 ++ docs/source/zh/tf_xla.md | 12 +- docs/source/zh/tflite.md | 4 +- docs/source/zh/tokenizer_summary.md | 4 +- docs/source/zh/torchscript.md | 197 + docs/source/zh/training.md | 14 +- examples/README.md | 18 +- examples/diff-conversion/README.md | 20 + examples/diff-conversion/convert_examples.sh | 10 + examples/diff-conversion/diff_dummy.py | 44 + examples/diff-conversion/diff_my_new_model.py | 14 + .../diff-conversion/diff_my_new_model2.py | 31 + examples/diff-conversion/diff_new_model.py | 30 + examples/diff-conversion/diff_super.py | 38 + examples/flax/_tests_requirements.txt | 6 +- examples/flax/conftest.py | 2 +- examples/flax/image-captioning/README.md | 2 +- .../run_image_captioning_flax.py | 40 +- examples/flax/language-modeling/README.md | 70 +- .../language-modeling/run_bart_dlm_flax.py | 49 +- .../flax/language-modeling/run_clm_flax.py | 47 +- .../flax/language-modeling/run_mlm_flax.py | 45 +- .../flax/language-modeling/run_t5_mlm_flax.py | 49 +- .../language-modeling/t5_tokenizer_model.py | 8 +- examples/flax/question-answering/README.md | 4 +- examples/flax/question-answering/run_qa.py | 51 +- examples/flax/question-answering/utils_qa.py | 1 + .../run_flax_speech_recognition_seq2seq.py | 33 +- .../summarization/run_summarization_flax.py | 40 +- examples/flax/test_flax_examples.py | 15 +- examples/flax/text-classification/README.md | 2 +- .../flax/text-classification/run_flax_glue.py | 28 +- examples/flax/token-classification/README.md | 2 +- .../flax/token-classification/run_flax_ner.py | 29 +- examples/flax/vision/requirements.txt | 2 +- .../flax/vision/run_image_classification.py | 35 +- examples/legacy/benchmarking/README.md | 4 +- examples/legacy/benchmarking/plot_csv_file.py | 12 +- examples/legacy/benchmarking/run_benchmark.py | 2 +- .../multiple_choice/run_multiple_choice.py | 3 +- .../multiple_choice/utils_multiple_choice.py | 3 +- .../legacy/pytorch-lightning/requirements.txt | 2 +- examples/legacy/question-answering/README.md | 10 +- .../legacy/question-answering/run_squad.py | 9 +- .../question-answering/run_squad_trainer.py | 3 +- examples/legacy/run_camembert.py | 4 +- examples/legacy/run_language_modeling.py | 1 - examples/legacy/run_openai_gpt.py | 5 +- examples/legacy/run_swag.py | 9 +- examples/legacy/run_transfo_xl.py | 11 +- examples/legacy/seq2seq/README.md | 10 +- examples/legacy/seq2seq/finetune.sh | 2 +- examples/legacy/seq2seq/finetune_tpu.sh | 2 +- examples/legacy/seq2seq/finetune_trainer.py | 2 +- examples/legacy/seq2seq/old_test_datasets.py | 2 +- examples/legacy/seq2seq/pack_dataset.py | 2 +- examples/legacy/seq2seq/requirements.txt | 2 +- .../legacy/seq2seq/run_distributed_eval.py | 4 +- examples/legacy/seq2seq/run_eval.py | 4 +- examples/legacy/seq2seq/seq2seq_trainer.py | 6 +- .../legacy/seq2seq/seq2seq_training_args.py | 4 +- .../seq2seq/train_distil_marian_enro.sh | 2 +- .../seq2seq/train_distil_marian_enro_tpu.sh | 2 +- .../legacy/seq2seq/train_distilbart_cnn.sh | 2 +- .../legacy/seq2seq/train_mbart_cc25_enro.sh | 2 +- examples/legacy/seq2seq/xla_spawn.py | 1 - .../run_tf_text_classification.py | 313 -- .../legacy/token-classification/README.md | 8 +- .../legacy/token-classification/run_ner.py | 3 +- .../legacy/token-classification/run_tf_ner.py | 310 -- .../legacy/token-classification/utils_ner.py | 5 +- examples/pytorch/README.md | 20 +- examples/pytorch/_tests_requirements.txt | 9 +- .../pytorch/audio-classification/README.md | 4 +- .../run_audio_classification.py | 27 +- examples/pytorch/conftest.py | 2 +- .../pytorch/contrastive-image-text/README.md | 4 +- .../contrastive-image-text/run_clip.py | 41 +- .../pytorch/image-classification/README.md | 9 +- .../image-classification/requirements.txt | 2 +- .../run_image_classification.py | 74 +- .../run_image_classification_no_trainer.py | 96 +- examples/pytorch/image-pretraining/README.md | 8 +- examples/pytorch/image-pretraining/run_mae.py | 29 +- examples/pytorch/image-pretraining/run_mim.py | 25 +- .../image-pretraining/run_mim_no_trainer.py | 58 +- .../pytorch/instance-segmentation/README.md | 235 + .../instance-segmentation/requirements.txt | 5 + .../run_instance_segmentation.py | 480 ++ .../run_instance_segmentation_no_trainer.py | 744 +++ examples/pytorch/language-modeling/README.md | 73 +- .../language-modeling/requirements.txt | 2 +- examples/pytorch/language-modeling/run_clm.py | 37 +- .../language-modeling/run_clm_no_trainer.py | 52 +- examples/pytorch/language-modeling/run_fim.py | 864 ++++ .../language-modeling/run_fim_no_trainer.py | 916 ++++ examples/pytorch/language-modeling/run_mlm.py | 54 +- .../language-modeling/run_mlm_no_trainer.py | 52 +- examples/pytorch/language-modeling/run_plm.py | 36 +- examples/pytorch/multiple-choice/README.md | 6 +- examples/pytorch/multiple-choice/run_swag.py | 23 +- .../multiple-choice/run_swag_no_trainer.py | 50 +- examples/pytorch/object-detection/README.md | 233 + .../pytorch/object-detection/requirements.txt | 5 + .../object-detection/run_object_detection.py | 523 ++ .../run_object_detection_no_trainer.py | 782 +++ examples/pytorch/old_test_xla_examples.py | 6 +- examples/pytorch/question-answering/README.md | 12 +- examples/pytorch/question-answering/run_qa.py | 45 +- .../question-answering/run_qa_beam_search.py | 49 +- .../run_qa_beam_search_no_trainer.py | 72 +- .../question-answering/run_qa_no_trainer.py | 62 +- .../question-answering/run_seq2seq_qa.py | 31 +- .../pytorch/question-answering/trainer_qa.py | 5 +- .../question-answering/trainer_seq2seq_qa.py | 5 +- .../pytorch/question-answering/utils_qa.py | 1 + .../pytorch/semantic-segmentation/README.md | 9 +- .../semantic-segmentation/requirements.txt | 6 +- .../run_semantic_segmentation.py | 249 +- .../run_semantic_segmentation_no_trainer.py | 278 +- .../run_wav2vec2_pretraining_no_trainer.py | 43 +- examples/pytorch/speech-recognition/README.md | 33 +- .../run_speech_recognition_ctc.py | 72 +- .../run_speech_recognition_ctc_adapter.py | 31 +- .../run_speech_recognition_seq2seq.py | 71 +- examples/pytorch/summarization/README.md | 12 +- .../summarization/run_summarization.py | 37 +- .../run_summarization_no_trainer.py | 57 +- examples/pytorch/test_accelerate_examples.py | 65 +- examples/pytorch/test_pytorch_examples.py | 89 +- .../pytorch/text-classification/README.md | 16 +- .../text-classification/run_classification.py | 58 +- .../pytorch/text-classification/run_glue.py | 37 +- .../run_glue_no_trainer.py | 37 +- .../pytorch/text-classification/run_xnli.py | 26 +- examples/pytorch/text-generation/README.md | 4 +- .../pytorch/text-generation/run_generation.py | 4 +- .../run_generation_contrastive_search.py | 5 +- .../pytorch/token-classification/README.md | 22 +- .../pytorch/token-classification/run_ner.py | 33 +- .../run_ner_no_trainer.py | 43 +- examples/pytorch/translation/README.md | 8 +- .../pytorch/translation/run_translation.py | 50 +- .../translation/run_translation_no_trainer.py | 44 +- examples/pytorch/xla_spawn.py | 1 - examples/research_projects/README.md | 2 +- .../adversarial/requirements.txt | 2 +- .../research_projects/adversarial/run_hans.py | 2 +- .../bert-loses-patience/README.md | 2 +- .../pabee/modeling_pabee_albert.py | 6 +- .../pabee/modeling_pabee_bert.py | 7 +- .../bert-loses-patience/requirements.txt | 2 +- .../run_glue_with_pabee.py | 9 +- .../test_run_glue_with_pabee.py | 2 +- examples/research_projects/bertabs/README.md | 2 +- .../bertabs/configuration_bertabs.py | 3 +- ...ert_bertabs_original_pytorch_checkpoint.py | 4 +- .../bertabs/modeling_bertabs.py | 16 +- .../bertabs/requirements.txt | 2 +- .../bertabs/run_summarization.py | 2 +- .../bertology/requirements.txt | 2 +- .../bertology/run_bertology.py | 13 +- .../bertology/run_prune_gpt.py | 2 +- .../research_projects/codeparrot/README.md | 8 +- .../codeparrot/examples/requirements.txt | 4 +- .../examples/train_complexity_predictor.py | 2 +- .../codeparrot/requirements.txt | 4 +- .../codeparrot/scripts/arguments.py | 6 +- .../codeparrot/scripts/preprocessing.py | 2 +- .../decision_transformer/requirements.txt | 48 +- examples/research_projects/deebert/README.md | 2 +- .../deebert/requirements.txt | 2 +- .../deebert/run_glue_deebert.py | 6 +- .../deebert/test_glue_deebert.py | 12 +- .../research_projects/distillation/README.md | 2 +- .../distillation/distiller.py | 5 +- .../distillation/grouped_batch_sampler.py | 6 +- .../distillation/lm_seqs_dataset.py | 5 +- .../distillation/requirements.txt | 2 +- .../distillation/run_squad_w_distillation.py | 8 +- .../distillation/scripts/binarized_data.py | 1 + .../distillation/scripts/extract.py | 1 + .../scripts/extract_distilbert.py | 1 + .../distillation/scripts/token_counts.py | 1 + .../research_projects/distillation/train.py | 1 + .../research_projects/distillation/utils.py | 5 +- .../fsner/src/fsner/tokenizer_utils.py | 2 +- .../information-gain-filtration/README.md | 4 +- .../information-gain-filtration/igf/igf.py | 4 +- .../run_clm_igf.py | 25 +- .../research_projects/jax-projects/README.md | 42 +- .../jax-projects/big_bird/README.md | 2 +- .../jax-projects/big_bird/evaluate.py | 1 - .../jax-projects/dataset-streaming/README.md | 12 +- .../dataset-streaming/run_mlm_flax_stream.py | 1 + .../jax-projects/hybrid_clip/README.md | 16 +- .../hybrid_clip/modeling_hybrid_clip.py | 6 +- .../jax-projects/hybrid_clip/requirements.txt | 2 +- .../hybrid_clip/run_hybrid_clip.py | 3 - .../jax-projects/model_parallel/README.md | 4 +- .../jax-projects/model_parallel/run_clm_mp.py | 6 +- .../jax-projects/wav2vec2/README.md | 6 +- .../wav2vec2/run_wav2vec2_pretrain_flax.py | 2 +- .../research_projects/layoutlmv3/README.md | 4 +- .../research_projects/longform-qa/eli5_app.py | 2 +- .../luke/run_luke_ner_no_trainer.py | 3 +- .../lxmert/modeling_frcnn.py | 41 +- .../lxmert/processing_image.py | 33 +- .../research_projects/lxmert/requirements.txt | 18 +- examples/research_projects/lxmert/utils.py | 32 +- .../lxmert/visualizing_image.py | 27 +- examples/research_projects/mlm_wwm/README.md | 6 +- .../research_projects/mlm_wwm/run_mlm_wwm.py | 3 +- examples/research_projects/mm-imdb/README.md | 4 +- .../research_projects/mm-imdb/run_mmimdb.py | 9 +- .../movement-pruning/README.md | 10 +- .../movement-pruning/counts_parameters.py | 1 + .../emmental/configuration_bert_masked.py | 3 +- .../emmental/modeling_bert_masked.py | 1 - .../emmental/modules/binarizer.py | 2 +- .../movement-pruning/masked_run_glue.py | 4 +- .../movement-pruning/masked_run_squad.py | 5 +- .../onnx/summarization/run_onnx_exporter.py | 3 +- .../research_projects/performer/README.md | 4 +- .../modeling_flax_performer_utils.py | 4 +- .../performer/run_mlm_performer.py | 4 +- .../research_projects/pplm/requirements.txt | 2 +- examples/research_projects/pplm/run_pplm.py | 8 +- .../pplm/run_pplm_discrim_train.py | 9 +- .../quantization-qdqbert/README.md | 56 +- .../evaluate-hf-trt-qa.py | 5 +- .../quantization-qdqbert/quant_trainer.py | 1 + .../quantization-qdqbert/run_quant_qa.py | 4 +- .../quantization-qdqbert/trainer_quant_qa.py | 4 +- .../quantization-qdqbert/utils_qa.py | 1 + .../rag-end2end-retriever/eval_rag.py | 2 +- .../rag-end2end-retriever/lightning_base.py | 2 +- examples/research_projects/rag/README.md | 2 +- examples/research_projects/rag/eval_rag.py | 2 +- .../robust-speech-event/README.md | 8 +- .../run_speech_recognition_ctc_bnb.py | 2 +- .../run_speech_recognition_ctc_streaming.py | 2 +- .../README.md | 2 +- .../finetuning.py | 10 +- .../self-training-text-classification/run.sh | 2 +- .../selftraining.py | 8 +- .../seq2seq-distillation/README.md | 2 +- .../_test_seq2seq_examples.py | 4 +- .../seq2seq-distillation/run_eval.py | 2 +- examples/research_projects/tapex/README.md | 12 +- .../research_projects/tapex/wikisql_utils.py | 8 +- .../research_projects/token-healing/README.md | 40 + .../token-healing/run_token_healing.py | 62 + .../visual_bert/modeling_frcnn.py | 41 +- .../visual_bert/processing_image.py | 33 +- .../visual_bert/requirements.txt | 18 +- .../research_projects/visual_bert/utils.py | 32 +- .../visual_bert/visualizing_image.py | 27 +- .../research_projects/vqgan-clip/README.md | 6 +- .../vqgan-clip/requirements.txt | 2 +- .../wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md | 12 +- examples/research_projects/wav2vec2/README.md | 20 +- .../wav2vec2/finetune_base_100.sh | 2 +- .../wav2vec2/finetune_base_timit_asr.sh | 2 +- .../wav2vec2/finetune_large_lv60_100.sh | 2 +- .../wav2vec2/finetune_large_lv60_timit_asr.sh | 2 +- ...tune_large_xlsr_53_arabic_speech_corpus.sh | 2 +- .../finetune_wav2vec2_xlsr_turkish.sh | 2 +- .../wav2vec2/run_common_voice.py | 4 +- .../wav2vec2/test_wav2vec2_deepspeed.py | 2 +- examples/research_projects/xtreme-s/README.md | 4 +- .../xtreme-s/run_xtreme_s.py | 2 +- .../zero-shot-distillation/README.md | 2 +- examples/tensorflow/README.md | 8 +- examples/tensorflow/_tests_requirements.txt | 2 +- examples/tensorflow/benchmarking/README.md | 4 +- .../tensorflow/benchmarking/plot_csv_file.py | 12 +- .../benchmarking/run_benchmark_tf.py | 2 +- .../contrastive-image-text/README.md | 2 +- .../contrastive-image-text/run_clip.py | 33 +- .../tensorflow/image-classification/README.md | 6 +- .../run_image_classification.py | 37 +- .../prepare_tfrecord_shards.py | 13 +- .../language-modeling-tpu/requirements.txt | 2 +- .../language-modeling-tpu/run_mlm.py | 18 +- .../language-modeling-tpu/train_unigram.py | 15 +- .../tensorflow/language-modeling/README.md | 16 +- .../tensorflow/language-modeling/run_clm.py | 25 +- .../tensorflow/language-modeling/run_mlm.py | 28 +- examples/tensorflow/multiple-choice/README.md | 2 +- .../tensorflow/multiple-choice/run_swag.py | 25 +- .../tensorflow/question-answering/README.md | 18 +- .../tensorflow/question-answering/run_qa.py | 50 +- .../tensorflow/question-answering/utils_qa.py | 1 + .../summarization/run_summarization.py | 39 +- .../tensorflow/test_tensorflow_examples.py | 31 +- .../tensorflow/text-classification/README.md | 15 +- .../text-classification/run_glue.py | 30 +- .../run_text_classification.py | 40 +- .../tensorflow/token-classification/README.md | 4 +- .../token-classification/run_ner.py | 28 +- examples/tensorflow/translation/README.md | 4 +- .../tensorflow/translation/run_translation.py | 29 +- hubconf.py | 28 +- i18n/README_ar.md | 317 ++ i18n/README_de.md | 317 ++ i18n/README_es.md | 295 ++ i18n/README_fr.md | 314 ++ i18n/README_hd.md | 269 + i18n/README_ja.md | 329 ++ i18n/README_ko.md | 244 + i18n/README_pt-br.md | 326 ++ i18n/README_ru.md | 316 ++ i18n/README_te.md | 315 ++ i18n/README_vi.md | 317 ++ i18n/README_zh-hans.md | 268 + i18n/README_zh-hant.md | 280 ++ notebooks/README.md | 2 +- pyproject.toml | 13 +- scripts/benchmark/trainer-benchmark.py | 4 +- scripts/check_tokenizers.py | 2 +- scripts/tatoeba/README.md | 2 +- setup.py | 87 +- src/transformers/__init__.py | 2920 +++++------ src/transformers/activations_tf.py | 31 +- .../{tools => agents}/__init__.py | 24 +- .../{tools => agents}/agent_types.py | 108 +- src/transformers/agents/agents.py | 1101 ++++ src/transformers/agents/default_tools.py | 188 + .../document_question_answering.py | 25 +- .../{tools => agents}/evaluate_agent.py | 306 +- .../image_question_answering.py | 21 +- src/transformers/agents/llm_engine.py | 104 + src/transformers/agents/monitoring.py | 75 + src/transformers/agents/prompts.py | 785 +++ src/transformers/agents/python_interpreter.py | 912 ++++ .../{tools => agents}/speech_to_text.py | 20 +- .../{tools => agents}/text_to_speech.py | 18 +- .../{tools/base.py => agents/tools.py} | 402 +- .../{tools => agents}/translation.py | 26 +- src/transformers/audio_utils.py | 410 +- src/transformers/benchmark/benchmark.py | 3 +- src/transformers/benchmark/benchmark_args.py | 18 +- .../benchmark/benchmark_args_utils.py | 2 +- src/transformers/benchmark/benchmark_tf.py | 3 +- src/transformers/benchmark/benchmark_utils.py | 1 - src/transformers/cache_utils.py | 1438 +++++- src/transformers/commands/add_new_model.py | 259 - .../commands/add_new_model_like.py | 63 +- src/transformers/commands/env.py | 11 +- src/transformers/commands/pt_to_tf.py | 9 +- src/transformers/commands/train.py | 2 +- src/transformers/commands/transformers_cli.py | 2 - src/transformers/commands/user.py | 2 +- src/transformers/configuration_utils.py | 127 +- src/transformers/convert_graph_to_onnx.py | 50 +- .../convert_pytorch_checkpoint_to_tf2.py | 75 +- src/transformers/convert_slow_tokenizer.py | 415 +- ...ert_slow_tokenizers_checkpoints_to_fast.py | 2 +- ...nvert_tf_hub_seq_to_seq_bert_to_pytorch.py | 3 +- src/transformers/data/__init__.py | 1 + src/transformers/data/data_collator.py | 138 +- .../data/metrics/squad_metrics.py | 1 - src/transformers/data/processors/glue.py | 2 +- src/transformers/data/processors/xnli.py | 15 +- src/transformers/deepspeed.py | 1 + src/transformers/dependency_versions_table.py | 33 +- src/transformers/dynamic_module_utils.py | 51 +- .../feature_extraction_sequence_utils.py | 3 +- src/transformers/feature_extraction_utils.py | 48 +- src/transformers/file_utils.py | 3 +- src/transformers/generation/__init__.py | 40 +- .../generation/beam_constraints.py | 17 +- src/transformers/generation/beam_search.py | 40 +- .../generation/candidate_generator.py | 218 +- .../generation/configuration_utils.py | 532 +- .../generation/flax_logits_process.py | 87 + src/transformers/generation/flax_utils.py | 8 +- src/transformers/generation/logits_process.py | 807 ++- .../generation/stopping_criteria.py | 404 +- src/transformers/generation/streamers.py | 8 +- src/transformers/generation/tf_utils.py | 78 +- src/transformers/generation/utils.py | 4458 ++++++++--------- src/transformers/generation/watermarking.py | 239 + src/transformers/generation_flax_utils.py | 28 - src/transformers/generation_tf_utils.py | 28 - src/transformers/generation_utils.py | 28 - src/transformers/hf_argparser.py | 11 +- src/transformers/image_processing_base.py | 554 ++ src/transformers/image_processing_utils.py | 579 +-- .../image_processing_utils_fast.py | 68 + src/transformers/image_transforms.py | 67 +- src/transformers/image_utils.py | 131 +- src/transformers/integrations/__init__.py | 42 +- src/transformers/integrations/aqlm.py | 100 + src/transformers/integrations/awq.py | 173 +- src/transformers/integrations/bitsandbytes.py | 159 +- src/transformers/integrations/deepspeed.py | 77 +- src/transformers/integrations/eetq.py | 121 + src/transformers/integrations/fbgemm_fp8.py | 161 + src/transformers/integrations/ggml.py | 716 +++ src/transformers/integrations/hqq.py | 121 + .../integrations/integration_utils.py | 610 ++- src/transformers/integrations/peft.py | 24 +- src/transformers/integrations/quanto.py | 94 + src/transformers/integrations/tpu.py | 36 + src/transformers/keras_callbacks.py | 6 +- .../cuda/ms_deform_attn_cuda.cu | 4 +- .../cuda/ms_deform_attn_cuda.cuh | 4 +- .../cuda/ms_deform_attn_cuda.h | 17 + .../kernels/deta/cpu/ms_deform_attn_cpu.cpp | 40 + .../kernels/deta/cpu/ms_deform_attn_cpu.h | 32 + .../kernels/deta/cuda/ms_deform_attn_cuda.cu | 156 + .../kernels/deta/cuda/ms_deform_attn_cuda.cuh | 1467 ++++++ .../kernels/deta/cuda/ms_deform_attn_cuda.h | 29 + .../deta/cuda/ms_deform_im2col_cuda.cuh | 1327 +++++ .../kernels/deta/ms_deform_attn.h | 61 + src/transformers/kernels/deta/vision.cpp | 16 + src/transformers/modelcard.py | 16 +- src/transformers/modeling_attn_mask_utils.py | 233 +- .../modeling_flash_attention_utils.py | 300 ++ .../modeling_flax_pytorch_utils.py | 80 +- src/transformers/modeling_flax_utils.py | 35 +- .../modeling_gguf_pytorch_utils.py | 193 + src/transformers/modeling_outputs.py | 226 +- src/transformers/modeling_rope_utils.py | 559 +++ src/transformers/modeling_tf_pytorch_utils.py | 156 +- src/transformers/modeling_tf_utils.py | 467 +- src/transformers/modeling_utils.py | 1583 +++--- src/transformers/models/__init__.py | 54 +- src/transformers/models/albert/__init__.py | 8 +- .../models/albert/configuration_albert.py | 17 +- ...lbert_original_tf_checkpoint_to_pytorch.py | 1 - .../models/albert/modeling_albert.py | 32 +- .../models/albert/modeling_flax_albert.py | 6 +- .../models/albert/modeling_tf_albert.py | 110 +- .../models/albert/tokenization_albert.py | 28 +- .../models/albert/tokenization_albert_fast.py | 38 +- src/transformers/models/align/__init__.py | 4 - .../models/align/configuration_align.py | 12 +- .../models/align/convert_align_tf_to_hf.py | 2 +- .../models/align/modeling_align.py | 28 +- .../models/align/processing_align.py | 92 +- src/transformers/models/altclip/__init__.py | 4 - .../models/altclip/configuration_altclip.py | 22 +- .../models/altclip/modeling_altclip.py | 33 +- .../models/altclip/processing_altclip.py | 4 +- .../audio_spectrogram_transformer/__init__.py | 8 +- ...iguration_audio_spectrogram_transformer.py | 9 +- ...trogram_transformer_original_to_pytorch.py | 4 +- .../modeling_audio_spectrogram_transformer.py | 58 +- src/transformers/models/auto/__init__.py | 10 +- src/transformers/models/auto/auto_factory.py | 39 +- .../models/auto/configuration_auto.py | 350 +- .../models/auto/feature_extraction_auto.py | 30 +- .../models/auto/image_processing_auto.py | 360 +- src/transformers/models/auto/modeling_auto.py | 195 +- .../models/auto/modeling_flax_auto.py | 13 +- .../models/auto/modeling_tf_auto.py | 14 +- .../models/auto/processing_auto.py | 64 +- .../models/auto/tokenization_auto.py | 158 +- .../models/autoformer/__init__.py | 8 +- .../autoformer/configuration_autoformer.py | 10 +- .../models/autoformer/modeling_autoformer.py | 55 +- src/transformers/models/bark/__init__.py | 4 - .../models/bark/configuration_bark.py | 7 +- .../models/bark/convert_suno_to_hf.py | 1 + .../bark/generation_configuration_bark.py | 10 +- src/transformers/models/bark/modeling_bark.py | 164 +- .../models/bark/processing_bark.py | 10 +- src/transformers/models/bart/__init__.py | 6 +- .../models/bart/configuration_bart.py | 8 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - src/transformers/models/bart/modeling_bart.py | 182 +- .../models/bart/modeling_flax_bart.py | 4 +- .../models/bart/modeling_tf_bart.py | 86 +- .../models/bart/tokenization_bart.py | 29 - .../models/bart/tokenization_bart_fast.py | 37 - .../models/barthez/tokenization_barthez.py | 21 +- .../barthez/tokenization_barthez_fast.py | 27 +- .../models/bartpho/tokenization_bartpho.py | 16 +- src/transformers/models/beit/__init__.py | 6 +- .../models/beit/configuration_beit.py | 17 +- .../beit/convert_beit_unilm_to_pytorch.py | 3 +- .../models/beit/image_processing_beit.py | 69 +- src/transformers/models/beit/modeling_beit.py | 225 +- src/transformers/models/bert/__init__.py | 8 +- .../models/bert/configuration_bert.py | 53 +- ...bert_original_tf2_checkpoint_to_pytorch.py | 1 + ..._bert_original_tf_checkpoint_to_pytorch.py | 1 - ..._bert_pytorch_checkpoint_to_original_tf.py | 4 +- ...ping_original_tf2_checkpoint_to_pytorch.py | 1 + src/transformers/models/bert/modeling_bert.py | 231 +- .../models/bert/modeling_flax_bert.py | 10 +- .../models/bert/modeling_tf_bert.py | 140 +- .../models/bert/tokenization_bert.py | 93 +- .../models/bert/tokenization_bert_fast.py | 132 - .../models/bert/tokenization_bert_tf.py | 13 +- .../configuration_bert_generation.py | 2 +- .../modeling_bert_generation.py | 20 +- .../tokenization_bert_generation.py | 15 +- .../tokenization_bert_japanese.py | 76 +- .../models/bertweet/tokenization_bertweet.py | 18 +- src/transformers/models/big_bird/__init__.py | 6 +- .../models/big_bird/configuration_big_bird.py | 12 +- ...gbird_original_tf_checkpoint_to_pytorch.py | 1 - .../models/big_bird/modeling_big_bird.py | 31 +- .../models/big_bird/tokenization_big_bird.py | 22 +- .../big_bird/tokenization_big_bird_fast.py | 34 +- .../models/bigbird_pegasus/__init__.py | 4 - .../configuration_bigbird_pegasus.py | 15 +- .../modeling_bigbird_pegasus.py | 55 +- src/transformers/models/biogpt/__init__.py | 6 +- .../models/biogpt/configuration_biogpt.py | 9 +- .../models/biogpt/modeling_biogpt.py | 34 +- .../models/biogpt/tokenization_biogpt.py | 14 +- src/transformers/models/bit/__init__.py | 6 +- .../models/bit/configuration_bit.py | 6 +- .../models/bit/convert_bit_to_pytorch.py | 1 - .../models/bit/image_processing_bit.py | 28 +- src/transformers/models/bit/modeling_bit.py | 19 +- .../models/blenderbot/__init__.py | 4 - .../blenderbot/configuration_blenderbot.py | 7 +- .../models/blenderbot/modeling_blenderbot.py | 43 +- .../blenderbot/modeling_flax_blenderbot.py | 2 +- .../blenderbot/modeling_tf_blenderbot.py | 76 +- .../blenderbot/tokenization_blenderbot.py | 32 - .../tokenization_blenderbot_fast.py | 34 +- .../models/blenderbot_small/__init__.py | 4 - .../configuration_blenderbot_small.py | 7 +- .../modeling_blenderbot_small.py | 9 +- .../modeling_flax_blenderbot_small.py | 3 +- .../modeling_tf_blenderbot_small.py | 80 +- .../tokenization_blenderbot_small.py | 39 - .../tokenization_blenderbot_small_fast.py | 42 +- src/transformers/models/blip/__init__.py | 7 +- .../models/blip/configuration_blip.py | 43 +- .../convert_blip_original_pytorch_to_hf.py | 4 +- .../models/blip/image_processing_blip.py | 24 +- src/transformers/models/blip/modeling_blip.py | 175 +- .../models/blip/modeling_blip_text.py | 8 +- .../models/blip/modeling_tf_blip.py | 102 +- .../models/blip/modeling_tf_blip_text.py | 73 +- .../models/blip/processing_blip.py | 3 +- src/transformers/models/blip_2/__init__.py | 4 - .../models/blip_2/configuration_blip_2.py | 21 +- .../models/blip_2/modeling_blip_2.py | 153 +- .../models/blip_2/processing_blip_2.py | 58 +- src/transformers/models/bloom/__init__.py | 6 +- .../models/bloom/configuration_bloom.py | 12 +- ...rt_bloom_original_checkpoint_to_pytorch.py | 1 - .../models/bloom/modeling_bloom.py | 458 +- .../models/bloom/tokenization_bloom_fast.py | 28 - .../models/bridgetower/__init__.py | 4 - .../bridgetower/configuration_bridgetower.py | 9 +- .../image_processing_bridgetower.py | 56 +- .../bridgetower/modeling_bridgetower.py | 22 +- src/transformers/models/bros/__init__.py | 6 +- .../models/bros/configuration_bros.py | 7 +- src/transformers/models/bros/modeling_bros.py | 8 +- ..._byt5_original_tf_checkpoint_to_pytorch.py | 1 - .../models/byt5/tokenization_byt5.py | 3 +- src/transformers/models/camembert/__init__.py | 8 +- .../camembert/configuration_camembert.py | 18 +- .../models/camembert/modeling_camembert.py | 33 +- .../models/camembert/modeling_tf_camembert.py | 86 +- .../camembert/tokenization_camembert.py | 14 +- .../camembert/tokenization_camembert_fast.py | 17 +- src/transformers/models/canine/__init__.py | 6 +- .../models/canine/configuration_canine.py | 9 +- ...anine_original_tf_checkpoint_to_pytorch.py | 1 - .../models/canine/modeling_canine.py | 10 +- .../models/canine/tokenization_canine.py | 6 - src/transformers/models/chameleon/__init__.py | 83 + .../chameleon/configuration_chameleon.py | 276 + .../convert_chameleon_weights_to_hf.py | 476 ++ .../chameleon/image_processing_chameleon.py | 370 ++ .../models/chameleon/modeling_chameleon.py | 1682 +++++++ .../models/chameleon/processing_chameleon.py | 162 + .../models/chinese_clip/__init__.py | 4 - .../configuration_chinese_clip.py | 23 +- .../image_processing_chinese_clip.py | 31 +- .../chinese_clip/modeling_chinese_clip.py | 25 +- .../chinese_clip/processing_chinese_clip.py | 3 +- src/transformers/models/clap/__init__.py | 4 - .../models/clap/configuration_clap.py | 13 +- .../models/clap/feature_extraction_clap.py | 1 - src/transformers/models/clap/modeling_clap.py | 57 +- .../models/clap/processing_clap.py | 2 +- src/transformers/models/clip/__init__.py | 8 +- .../models/clip/configuration_clip.py | 21 +- .../convert_clip_original_pytorch_to_hf.py | 14 +- .../models/clip/image_processing_clip.py | 52 +- src/transformers/models/clip/modeling_clip.py | 358 +- .../models/clip/modeling_flax_clip.py | 4 +- .../models/clip/modeling_tf_clip.py | 77 +- .../models/clip/processing_clip.py | 13 +- .../models/clip/tokenization_clip.py | 22 +- .../models/clip/tokenization_clip_fast.py | 28 +- src/transformers/models/clipseg/__init__.py | 4 - .../models/clipseg/configuration_clipseg.py | 18 +- .../models/clipseg/modeling_clipseg.py | 27 +- .../models/clipseg/processing_clipseg.py | 3 +- src/transformers/models/clvp/__init__.py | 4 - .../models/clvp/configuration_clvp.py | 11 +- .../models/clvp/feature_extraction_clvp.py | 2 +- src/transformers/models/clvp/modeling_clvp.py | 19 +- .../models/clvp/number_normalizer.py | 1 - .../models/clvp/processing_clvp.py | 1 - .../models/clvp/tokenization_clvp.py | 15 - .../code_llama/tokenization_code_llama.py | 77 +- .../tokenization_code_llama_fast.py | 63 +- src/transformers/models/codegen/__init__.py | 6 +- .../models/codegen/configuration_codegen.py | 19 +- .../models/codegen/modeling_codegen.py | 446 +- .../models/codegen/tokenization_codegen.py | 57 +- .../codegen/tokenization_codegen_fast.py | 59 +- src/transformers/models/cohere/__init__.py | 77 + .../models/cohere/configuration_cohere.py | 157 + .../models/cohere/modeling_cohere.py | 1171 +++++ .../models/cohere/tokenization_cohere_fast.py | 512 ++ .../models/conditional_detr/__init__.py | 4 - .../configuration_conditional_detr.py | 45 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../image_processing_conditional_detr.py | 378 +- .../modeling_conditional_detr.py | 205 +- src/transformers/models/convbert/__init__.py | 8 +- .../models/convbert/configuration_convbert.py | 13 +- .../models/convbert/modeling_convbert.py | 145 +- .../models/convbert/modeling_tf_convbert.py | 97 +- .../models/convbert/tokenization_convbert.py | 31 +- .../convbert/tokenization_convbert_fast.py | 27 +- src/transformers/models/convnext/__init__.py | 8 +- .../models/convnext/configuration_convnext.py | 11 +- .../convnext/convert_convnext_to_pytorch.py | 1 - .../convnext/image_processing_convnext.py | 26 +- .../models/convnext/modeling_convnext.py | 9 +- .../models/convnext/modeling_tf_convnext.py | 50 +- .../models/convnextv2/__init__.py | 10 +- .../convnextv2/configuration_convnextv2.py | 11 +- .../models/convnextv2/modeling_convnextv2.py | 9 +- .../convnextv2/modeling_tf_convnextv2.py | 69 +- .../models/cpm/tokenization_cpm.py | 8 +- .../models/cpm/tokenization_cpm_fast.py | 10 +- src/transformers/models/cpmant/__init__.py | 6 +- .../models/cpmant/configuration_cpmant.py | 9 +- .../models/cpmant/modeling_cpmant.py | 8 +- .../models/cpmant/tokenization_cpmant.py | 15 +- src/transformers/models/ctrl/__init__.py | 8 +- .../models/ctrl/configuration_ctrl.py | 6 +- src/transformers/models/ctrl/modeling_ctrl.py | 15 +- .../models/ctrl/modeling_tf_ctrl.py | 52 +- .../models/ctrl/tokenization_ctrl.py | 11 - src/transformers/models/cvt/__init__.py | 8 +- .../models/cvt/configuration_cvt.py | 7 +- ..._original_pytorch_checkpoint_to_pytorch.py | 6 +- src/transformers/models/cvt/modeling_cvt.py | 17 +- .../models/cvt/modeling_tf_cvt.py | 94 +- src/transformers/models/dac/__init__.py | 60 + .../models/dac/configuration_dac.py | 111 + .../models/dac/convert_dac_checkpoint.py | 261 + .../models/dac/feature_extraction_dac.py | 170 + src/transformers/models/dac/modeling_dac.py | 717 +++ src/transformers/models/data2vec/__init__.py | 14 +- .../data2vec/configuration_data2vec_audio.py | 7 +- .../data2vec/configuration_data2vec_text.py | 7 +- .../data2vec/configuration_data2vec_vision.py | 9 +- ..._original_pytorch_checkpoint_to_pytorch.py | 3 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../data2vec/modeling_data2vec_audio.py | 323 +- .../models/data2vec/modeling_data2vec_text.py | 21 +- .../data2vec/modeling_data2vec_vision.py | 216 +- .../data2vec/modeling_tf_data2vec_vision.py | 119 +- src/transformers/models/dbrx/__init__.py | 51 + .../models/dbrx/configuration_dbrx.py | 258 + src/transformers/models/dbrx/modeling_dbrx.py | 1442 ++++++ src/transformers/models/deberta/__init__.py | 8 +- .../models/deberta/configuration_deberta.py | 14 +- .../models/deberta/modeling_deberta.py | 36 +- .../models/deberta/modeling_tf_deberta.py | 119 +- .../models/deberta/tokenization_deberta.py | 41 +- .../deberta/tokenization_deberta_fast.py | 41 +- .../models/deberta_v2/__init__.py | 7 +- .../deberta_v2/configuration_deberta_v2.py | 18 +- .../models/deberta_v2/modeling_deberta_v2.py | 33 +- .../deberta_v2/modeling_tf_deberta_v2.py | 133 +- .../deberta_v2/tokenization_deberta_v2.py | 33 +- .../tokenization_deberta_v2_fast.py | 30 - .../models/decision_transformer/__init__.py | 8 +- .../configuration_decision_transformer.py | 9 +- .../modeling_decision_transformer.py | 17 +- .../models/deformable_detr/__init__.py | 6 +- .../configuration_deformable_detr.py | 44 +- .../convert_deformable_detr_to_pytorch.py | 5 +- .../image_processing_deformable_detr.py | 378 +- .../models/deformable_detr/load_custom.py | 3 +- .../modeling_deformable_detr.py | 256 +- src/transformers/models/deit/__init__.py | 8 +- .../models/deit/configuration_deit.py | 11 +- .../deit/convert_deit_timm_to_pytorch.py | 1 - .../models/deit/image_processing_deit.py | 30 +- src/transformers/models/deit/modeling_deit.py | 133 +- .../models/deit/modeling_tf_deit.py | 162 +- ...original_gluonnlp_checkpoint_to_pytorch.py | 3 +- .../models/{ => deprecated}/deta/__init__.py | 8 +- .../deta/configuration_deta.py | 55 +- .../deta/convert_deta_resnet_to_pytorch.py | 5 +- .../deta/convert_deta_swin_to_pytorch.py | 5 +- .../deta/image_processing_deta.py | 362 +- .../{ => deprecated}/deta/modeling_deta.py | 323 +- .../efficientformer/__init__.py | 15 +- .../configuration_efficientformer.py | 12 +- ..._original_pytorch_checkpoint_to_pytorch.py | 0 .../image_processing_efficientformer.py | 50 +- .../modeling_efficientformer.py | 19 +- .../modeling_tf_efficientformer.py | 127 +- .../{ => deprecated}/ernie_m/__init__.py | 8 +- .../ernie_m/configuration_ernie_m.py | 27 +- .../ernie_m/modeling_ernie_m.py | 26 +- .../ernie_m/tokenization_ernie_m.py | 28 +- .../gptsan_japanese/__init__.py | 8 +- .../configuration_gptsan_japanese.py | 13 +- ...convert_gptsan_tf_checkpoint_to_pytorch.py | 0 .../modeling_gptsan_japanese.py | 31 +- .../tokenization_gptsan_japanese.py | 61 +- .../{ => deprecated}/graphormer/__init__.py | 8 +- .../graphormer/algos_graphormer.pyx | 0 .../graphormer/collating_graphormer.py | 2 +- .../graphormer/configuration_graphormer.py | 12 +- .../graphormer/modeling_graphormer.py | 17 +- .../{ => deprecated}/jukebox/__init__.py | 6 +- .../jukebox/configuration_jukebox.py | 10 +- .../jukebox/convert_jukebox.py | 0 .../jukebox/modeling_jukebox.py | 14 +- .../jukebox/tokenization_jukebox.py | 27 +- .../models/deprecated/mctct/__init__.py | 7 +- .../deprecated/mctct/configuration_mctct.py | 7 +- .../models/deprecated/mctct/modeling_mctct.py | 14 +- .../deprecated/mctct/processing_mctct.py | 1 + .../models/{ => deprecated}/mega/__init__.py | 8 +- .../mega/configuration_mega.py | 13 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 + .../{ => deprecated}/mega/modeling_mega.py | 20 +- .../deprecated/mmbt/configuration_mmbt.py | 4 +- .../models/deprecated/mmbt/modeling_mmbt.py | 5 +- .../models/{ => deprecated}/nat/__init__.py | 8 +- .../{ => deprecated}/nat/configuration_nat.py | 13 +- .../{ => deprecated}/nat/modeling_nat.py | 40 +- .../models/{ => deprecated}/nezha/__init__.py | 8 +- .../nezha/configuration_nezha.py | 8 +- .../{ => deprecated}/nezha/modeling_nezha.py | 36 +- .../models/deprecated/open_llama/__init__.py | 4 +- .../open_llama/configuration_open_llama.py | 14 +- .../open_llama/modeling_open_llama.py | 27 +- .../{ => deprecated}/qdqbert/__init__.py | 8 +- .../qdqbert/configuration_qdqbert.py | 19 +- .../qdqbert/modeling_qdqbert.py | 39 +- .../models/{ => deprecated}/realm/__init__.py | 8 +- .../realm/configuration_realm.py | 28 +- .../{ => deprecated}/realm/modeling_realm.py | 57 +- .../{ => deprecated}/realm/retrieval_realm.py | 4 +- .../realm/tokenization_realm.py | 56 +- .../realm/tokenization_realm_fast.py | 78 +- .../models/deprecated/retribert/__init__.py | 6 +- .../retribert/configuration_retribert.py | 9 +- .../retribert/modeling_retribert.py | 6 - .../retribert/tokenization_retribert.py | 40 +- .../retribert/tokenization_retribert_fast.py | 29 - .../speech_to_text_2/__init__.py | 8 +- .../configuration_speech_to_text_2.py | 13 +- .../modeling_speech_to_text_2.py | 25 +- .../processing_speech_to_text_2.py | 3 +- .../tokenization_speech_to_text_2.py | 24 +- .../deprecated/tapex/tokenization_tapex.py | 20 - .../trajectory_transformer/__init__.py | 8 +- .../configuration_trajectory_transformer.py | 9 +- ..._original_pytorch_checkpoint_to_pytorch.py | 2 +- .../modeling_trajectory_transformer.py | 7 +- .../models/deprecated/transfo_xl/__init__.py | 8 +- .../transfo_xl/configuration_transfo_xl.py | 8 +- ...fo_xl_original_tf_checkpoint_to_pytorch.py | 1 - .../transfo_xl/modeling_tf_transfo_xl.py | 54 +- .../modeling_tf_transfo_xl_utilities.py | 6 +- .../transfo_xl/modeling_transfo_xl.py | 18 +- .../modeling_transfo_xl_utilities.py | 3 +- .../transfo_xl/tokenization_transfo_xl.py | 24 +- .../models/{ => deprecated}/tvlt/__init__.py | 8 +- .../tvlt/configuration_tvlt.py | 12 +- .../tvlt/feature_extraction_tvlt.py | 6 +- .../tvlt/image_processing_tvlt.py | 54 +- .../{ => deprecated}/tvlt/modeling_tvlt.py | 37 +- .../{ => deprecated}/tvlt/processing_tvlt.py | 2 +- .../models/deprecated/van/__init__.py | 6 +- .../deprecated/van/configuration_van.py | 8 +- .../deprecated/van/convert_van_to_pytorch.py | 1 - .../models/deprecated/van/modeling_van.py | 9 +- .../{ => deprecated}/vit_hybrid/__init__.py | 8 +- .../vit_hybrid/configuration_vit_hybrid.py | 46 +- .../convert_vit_hybrid_timm_to_pytorch.py | 1 - .../vit_hybrid/image_processing_vit_hybrid.py | 54 +- .../vit_hybrid/modeling_vit_hybrid.py | 80 +- .../xlm_prophetnet/__init__.py | 8 +- .../configuration_xlm_prophetnet.py | 13 +- .../xlm_prophetnet/modeling_xlm_prophetnet.py | 45 +- .../tokenization_xlm_prophetnet.py | 22 +- .../models/depth_anything/__init__.py | 52 + .../configuration_depth_anything.py | 165 + .../convert_depth_anything_to_hf.py | 368 ++ .../depth_anything/modeling_depth_anything.py | 467 ++ src/transformers/models/detr/__init__.py | 6 +- .../models/detr/configuration_detr.py | 48 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/detr/convert_detr_to_pytorch.py | 1 - .../models/detr/image_processing_detr.py | 357 +- src/transformers/models/detr/modeling_detr.py | 197 +- src/transformers/models/dinat/__init__.py | 6 +- .../models/dinat/configuration_dinat.py | 7 +- .../models/dinat/modeling_dinat.py | 41 +- src/transformers/models/dinov2/__init__.py | 33 +- .../models/dinov2/configuration_dinov2.py | 6 +- .../models/dinov2/convert_dinov2_to_hf.py | 6 +- .../models/dinov2/modeling_dinov2.py | 25 +- .../models/dinov2/modeling_flax_dinov2.py | 795 +++ .../models/distilbert/__init__.py | 6 - .../distilbert/configuration_distilbert.py | 21 +- .../models/distilbert/modeling_distilbert.py | 157 +- .../distilbert/modeling_flax_distilbert.py | 4 +- .../distilbert/modeling_tf_distilbert.py | 80 +- .../distilbert/tokenization_distilbert.py | 43 +- .../tokenization_distilbert_fast.py | 55 - .../dit/convert_dit_unilm_to_pytorch.py | 1 - src/transformers/models/donut/__init__.py | 6 +- .../models/donut/configuration_donut_swin.py | 7 +- .../models/donut/convert_donut_to_pytorch.py | 26 +- .../models/donut/image_processing_donut.py | 29 +- .../models/donut/modeling_donut_swin.py | 100 +- .../models/donut/processing_donut.py | 5 +- src/transformers/models/dpr/__init__.py | 16 +- .../models/dpr/configuration_dpr.py | 23 +- ...vert_dpr_original_checkpoint_to_pytorch.py | 6 +- src/transformers/models/dpr/modeling_dpr.py | 30 +- .../models/dpr/modeling_tf_dpr.py | 41 +- .../models/dpr/tokenization_dpr.py | 92 - .../models/dpr/tokenization_dpr_fast.py | 92 - src/transformers/models/dpt/__init__.py | 6 +- .../models/dpt/configuration_dpt.py | 77 +- .../models/dpt/convert_dinov2_depth_to_hf.py | 3 +- .../models/dpt/convert_dpt_beit_to_hf.py | 1 - .../dpt/convert_dpt_hybrid_to_pytorch.py | 7 +- .../models/dpt/convert_dpt_swinv2_to_hf.py | 1 - .../models/dpt/convert_dpt_to_pytorch.py | 5 +- .../models/dpt/image_processing_dpt.py | 36 +- src/transformers/models/dpt/modeling_dpt.py | 78 +- .../models/efficientnet/__init__.py | 4 - .../configuration_efficientnet.py | 6 +- .../image_processing_efficientnet.py | 30 +- .../efficientnet/modeling_efficientnet.py | 9 +- src/transformers/models/electra/__init__.py | 8 +- .../models/electra/configuration_electra.py | 17 +- ...ectra_original_tf_checkpoint_to_pytorch.py | 1 - .../models/electra/modeling_electra.py | 30 +- .../models/electra/modeling_tf_electra.py | 96 +- .../models/electra/tokenization_electra.py | 47 +- .../electra/tokenization_electra_fast.py | 62 - src/transformers/models/encodec/__init__.py | 8 +- .../models/encodec/configuration_encodec.py | 8 +- .../convert_encodec_checkpoint_to_pytorch.py | 2 +- .../models/encodec/modeling_encodec.py | 63 +- .../configuration_encoder_decoder.py | 4 +- .../modeling_encoder_decoder.py | 29 +- .../modeling_flax_encoder_decoder.py | 21 +- .../modeling_tf_encoder_decoder.py | 20 +- src/transformers/models/ernie/__init__.py | 6 +- .../models/ernie/configuration_ernie.py | 20 +- .../models/ernie/modeling_ernie.py | 37 +- src/transformers/models/esm/__init__.py | 8 +- .../models/esm/configuration_esm.py | 6 +- src/transformers/models/esm/convert_esm.py | 1 - src/transformers/models/esm/modeling_esm.py | 15 +- .../models/esm/modeling_esmfold.py | 2 +- .../models/esm/modeling_tf_esm.py | 101 +- .../models/esm/openfold_utils/chunk_utils.py | 6 +- .../models/esm/openfold_utils/feats.py | 6 +- .../models/esm/openfold_utils/protein.py | 1 + .../esm/openfold_utils/residue_constants.py | 2 +- .../models/esm/openfold_utils/rigid_utils.py | 4 +- .../models/esm/openfold_utils/tensor_utils.py | 14 +- .../models/esm/tokenization_esm.py | 30 +- src/transformers/models/falcon/__init__.py | 6 +- .../models/falcon/configuration_falcon.py | 29 +- .../models/falcon/modeling_falcon.py | 747 +-- .../models/falcon_mamba/__init__.py | 58 + .../configuration_falcon_mamba.py | 158 + .../falcon_mamba/modeling_falcon_mamba.py | 853 ++++ .../models/fastspeech2_conformer/__init__.py | 69 + .../configuration_fastspeech2_conformer.py | 475 ++ ..._original_pytorch_checkpoint_to_pytorch.py | 210 + .../fastspeech2_conformer/convert_hifigan.py | 134 + .../convert_model_with_hifigan.py | 102 + .../modeling_fastspeech2_conformer.py | 1681 +++++++ .../tokenization_fastspeech2_conformer.py | 185 + src/transformers/models/flaubert/__init__.py | 8 +- .../models/flaubert/configuration_flaubert.py | 10 +- .../models/flaubert/modeling_flaubert.py | 22 +- .../models/flaubert/modeling_tf_flaubert.py | 49 +- .../models/flaubert/tokenization_flaubert.py | 46 +- src/transformers/models/flava/__init__.py | 4 - .../models/flava/configuration_flava.py | 34 +- .../models/flava/image_processing_flava.py | 24 +- .../models/flava/modeling_flava.py | 38 +- src/transformers/models/fnet/__init__.py | 6 +- .../models/fnet/configuration_fnet.py | 10 +- ...net_original_flax_checkpoint_to_pytorch.py | 1 - src/transformers/models/fnet/modeling_fnet.py | 22 +- .../models/fnet/tokenization_fnet.py | 16 +- .../models/fnet/tokenization_fnet_fast.py | 20 +- src/transformers/models/focalnet/__init__.py | 6 +- .../models/focalnet/configuration_focalnet.py | 6 +- .../models/focalnet/modeling_focalnet.py | 10 +- src/transformers/models/fsmt/__init__.py | 4 +- .../models/fsmt/configuration_fsmt.py | 5 +- src/transformers/models/fsmt/modeling_fsmt.py | 11 +- .../models/fsmt/tokenization_fsmt.py | 26 +- src/transformers/models/funnel/__init__.py | 8 +- .../models/funnel/configuration_funnel.py | 19 +- ...unnel_original_tf_checkpoint_to_pytorch.py | 1 - .../models/funnel/modeling_funnel.py | 32 +- .../models/funnel/modeling_tf_funnel.py | 92 +- .../models/funnel/tokenization_funnel.py | 34 +- .../models/funnel/tokenization_funnel_fast.py | 54 +- src/transformers/models/fuyu/__init__.py | 4 +- .../models/fuyu/configuration_fuyu.py | 31 +- .../models/fuyu/image_processing_fuyu.py | 28 +- src/transformers/models/fuyu/modeling_fuyu.py | 49 +- .../models/fuyu/processing_fuyu.py | 7 +- src/transformers/models/gemma/__init__.py | 123 + .../models/gemma/configuration_gemma.py | 145 + .../gemma/convert_gemma_weights_to_hf.py | 206 + src/transformers/models/gemma/diff_gemma.py | 617 +++ .../models/gemma/modeling_flax_gemma.py | 774 +++ .../models/gemma/modeling_gemma.py | 1389 +++++ .../models/gemma/tokenization_gemma.py | 327 ++ .../models/gemma/tokenization_gemma_fast.py | 199 + src/transformers/models/gemma2/__init__.py | 61 + .../models/gemma2/configuration_gemma2.py | 152 + .../gemma2/convert_gemma2_weights_to_hf.py | 239 + src/transformers/models/gemma2/diff_gemma2.py | 578 +++ .../models/gemma2/modeling_gemma2.py | 1351 +++++ src/transformers/models/git/__init__.py | 6 +- .../models/git/configuration_git.py | 6 +- .../models/git/convert_git_to_pytorch.py | 5 +- src/transformers/models/git/modeling_git.py | 172 +- src/transformers/models/git/processing_git.py | 13 +- src/transformers/models/glpn/__init__.py | 6 +- .../models/glpn/configuration_glpn.py | 7 +- .../models/glpn/convert_glpn_to_pytorch.py | 1 - .../models/glpn/image_processing_glpn.py | 21 +- src/transformers/models/glpn/modeling_glpn.py | 9 +- src/transformers/models/gpt2/__init__.py | 8 +- .../models/gpt2/configuration_gpt2.py | 13 +- ..._gpt2_original_tf_checkpoint_to_pytorch.py | 1 - .../models/gpt2/modeling_flax_gpt2.py | 2 +- src/transformers/models/gpt2/modeling_gpt2.py | 369 +- .../models/gpt2/modeling_tf_gpt2.py | 52 +- .../models/gpt2/tokenization_gpt2.py | 43 +- .../models/gpt2/tokenization_gpt2_fast.py | 51 +- .../models/gpt2/tokenization_gpt2_tf.py | 7 +- .../models/gpt_bigcode/__init__.py | 6 +- .../gpt_bigcode/configuration_gpt_bigcode.py | 6 +- .../gpt_bigcode/modeling_gpt_bigcode.py | 209 +- src/transformers/models/gpt_neo/__init__.py | 6 +- .../models/gpt_neo/configuration_gpt_neo.py | 9 +- .../convert_gpt_neo_mesh_tf_to_pytorch.py | 1 - .../models/gpt_neo/modeling_gpt_neo.py | 536 +- src/transformers/models/gpt_neox/__init__.py | 6 +- .../models/gpt_neox/configuration_gpt_neox.py | 12 +- .../models/gpt_neox/modeling_gpt_neox.py | 799 +-- .../gpt_neox/tokenization_gpt_neox_fast.py | 140 +- .../models/gpt_neox_japanese/__init__.py | 6 +- .../configuration_gpt_neox_japanese.py | 6 +- .../modeling_gpt_neox_japanese.py | 20 +- .../tokenization_gpt_neox_japanese.py | 36 +- .../gpt_sw3/convert_megatron_to_pytorch.py | 2 +- .../models/gpt_sw3/tokenization_gpt_sw3.py | 46 +- src/transformers/models/gptj/__init__.py | 6 +- .../models/gptj/configuration_gptj.py | 8 +- src/transformers/models/gptj/modeling_gptj.py | 603 ++- .../models/gptj/modeling_tf_gptj.py | 55 +- .../models/grounding_dino/__init__.py | 75 + .../configuration_grounding_dino.py | 295 ++ .../convert_grounding_dino_to_hf.py | 491 ++ .../image_processing_grounding_dino.py | 1588 ++++++ .../grounding_dino/modeling_grounding_dino.py | 3145 ++++++++++++ .../processing_grounding_dino.py | 245 + src/transformers/models/groupvit/__init__.py | 6 - .../models/groupvit/configuration_groupvit.py | 22 +- .../models/groupvit/modeling_groupvit.py | 17 +- .../models/groupvit/modeling_tf_groupvit.py | 140 +- .../models/herbert/tokenization_herbert.py | 17 +- .../herbert/tokenization_herbert_fast.py | 15 - src/transformers/models/hiera/__init__.py | 59 + .../models/hiera/configuration_hiera.py | 191 + .../models/hiera/convert_hiera_to_hf.py | 369 ++ .../models/hiera/modeling_hiera.py | 1567 ++++++ src/transformers/models/hubert/__init__.py | 8 +- .../models/hubert/configuration_hubert.py | 9 +- ...rt_original_s3prl_checkpoint_to_pytorch.py | 1 - ..._original_pytorch_checkpoint_to_pytorch.py | 1 - ...rt_original_s3prl_checkpoint_to_pytorch.py | 1 - .../models/hubert/modeling_hubert.py | 322 +- .../models/hubert/modeling_tf_hubert.py | 148 +- src/transformers/models/ibert/__init__.py | 6 +- .../models/ibert/configuration_ibert.py | 11 +- .../models/ibert/modeling_ibert.py | 19 +- src/transformers/models/idefics/__init__.py | 36 +- .../models/idefics/configuration_idefics.py | 11 +- .../idefics/image_processing_idefics.py | 6 +- .../models/idefics/modeling_idefics.py | 374 +- .../models/idefics/modeling_tf_idefics.py | 1812 +++++++ src/transformers/models/idefics/perceiver.py | 1 + .../models/idefics/perceiver_tf.py | 195 + .../models/idefics/processing_idefics.py | 171 +- src/transformers/models/idefics/vision.py | 13 +- src/transformers/models/idefics/vision_tf.py | 572 +++ src/transformers/models/idefics2/__init__.py | 72 + .../models/idefics2/configuration_idefics2.py | 262 + .../convert_idefics2_weights_to_hf.py | 185 + .../idefics2/image_processing_idefics2.py | 596 +++ .../models/idefics2/modeling_idefics2.py | 1708 +++++++ .../models/idefics2/processing_idefics2.py | 253 + src/transformers/models/imagegpt/__init__.py | 8 +- .../models/imagegpt/configuration_imagegpt.py | 8 +- ...onvert_imagegpt_original_tf2_to_pytorch.py | 1 - .../imagegpt/image_processing_imagegpt.py | 14 +- .../models/imagegpt/modeling_imagegpt.py | 18 +- src/transformers/models/informer/__init__.py | 9 +- .../models/informer/configuration_informer.py | 7 - .../models/informer/modeling_informer.py | 10 +- .../models/instructblip/__init__.py | 4 - .../configuration_instructblip.py | 23 +- ...onvert_instructblip_original_to_pytorch.py | 2 +- .../instructblip/modeling_instructblip.py | 148 +- .../instructblip/processing_instructblip.py | 69 +- .../models/instructblipvideo/__init__.py | 83 + .../configuration_instructblipvideo.py | 375 ++ ...t_instructblipvideo_original_to_pytorch.py | 305 ++ .../diff_instructblipvideo.py | 460 ++ .../image_processing_instructblipvideo.py | 345 ++ .../modeling_instructblipvideo.py | 1696 +++++++ .../processing_instructblipvideo.py | 219 + src/transformers/models/jamba/__init__.py | 58 + .../models/jamba/configuration_jamba.py | 224 + .../models/jamba/modeling_jamba.py | 1717 +++++++ src/transformers/models/jetmoe/__init__.py | 56 + .../models/jetmoe/configuration_jetmoe.py | 149 + .../models/jetmoe/modeling_jetmoe.py | 1500 ++++++ src/transformers/models/kosmos2/__init__.py | 6 +- .../models/kosmos2/configuration_kosmos2.py | 11 +- .../models/kosmos2/modeling_kosmos2.py | 20 +- .../models/kosmos2/processing_kosmos2.py | 5 +- src/transformers/models/layoutlm/__init__.py | 8 +- .../models/layoutlm/configuration_layoutlm.py | 12 +- .../models/layoutlm/modeling_layoutlm.py | 26 +- .../models/layoutlm/modeling_tf_layoutlm.py | 97 +- .../models/layoutlm/tokenization_layoutlm.py | 30 +- .../layoutlm/tokenization_layoutlm_fast.py | 34 +- .../models/layoutlmv2/__init__.py | 6 +- .../layoutlmv2/configuration_layoutlmv2.py | 11 +- .../layoutlmv2/image_processing_layoutlmv2.py | 20 +- .../models/layoutlmv2/modeling_layoutlmv2.py | 60 +- .../layoutlmv2/tokenization_layoutlmv2.py | 30 +- .../tokenization_layoutlmv2_fast.py | 24 - .../models/layoutlmv3/__init__.py | 6 - .../layoutlmv3/configuration_layoutlmv3.py | 8 +- .../layoutlmv3/image_processing_layoutlmv3.py | 32 +- .../models/layoutlmv3/modeling_layoutlmv3.py | 41 +- .../layoutlmv3/modeling_tf_layoutlmv3.py | 111 +- .../layoutlmv3/tokenization_layoutlmv3.py | 18 - .../tokenization_layoutlmv3_fast.py | 18 - .../models/layoutxlm/processing_layoutxlm.py | 1 + .../layoutxlm/tokenization_layoutxlm.py | 7 +- .../layoutxlm/tokenization_layoutxlm_fast.py | 12 +- src/transformers/models/led/__init__.py | 6 +- .../models/led/configuration_led.py | 7 +- src/transformers/models/led/modeling_led.py | 63 +- .../models/led/modeling_tf_led.py | 132 +- .../models/led/tokenization_led.py | 17 - .../models/led/tokenization_led_fast.py | 18 - src/transformers/models/levit/__init__.py | 6 +- .../models/levit/configuration_levit.py | 7 +- .../levit/convert_levit_timm_to_pytorch.py | 1 - .../models/levit/image_processing_levit.py | 31 +- .../models/levit/modeling_levit.py | 8 +- src/transformers/models/lilt/__init__.py | 6 +- .../models/lilt/configuration_lilt.py | 8 +- src/transformers/models/lilt/modeling_lilt.py | 13 +- src/transformers/models/llama/__init__.py | 15 +- .../models/llama/configuration_llama.py | 97 +- .../llama/convert_llama_weights_to_hf.py | 254 +- .../models/llama/modeling_flax_llama.py | 24 +- .../models/llama/modeling_llama.py | 1046 ++-- .../models/llama/tokenization_llama.py | 120 +- .../models/llama/tokenization_llama_fast.py | 116 +- src/transformers/models/llava/__init__.py | 13 +- .../models/llava/configuration_llava.py | 53 +- .../llava/convert_llava_weights_to_hf.py | 116 +- .../models/llava/modeling_llava.py | 230 +- .../models/llava/processing_llava.py | 83 +- .../models/llava_next/__init__.py | 72 + .../llava_next/configuration_llava_next.py | 144 + .../convert_llava_next_weights_to_hf.py | 397 ++ .../llava_next/image_processing_llava_next.py | 754 +++ .../models/llava_next/modeling_llava_next.py | 959 ++++ .../llava_next/processing_llava_next.py | 241 + .../models/llava_next_video/__init__.py | 70 + .../configuration_llava_next_video.py | 167 + .../convert_llava_next_video_weights_to_hf.py | 276 + .../llava_next_video/diff_llava_next_video.py | 591 +++ .../image_processing_llava_next_video.py | 421 ++ .../modeling_llava_next_video.py | 1115 +++++ .../processing_llava_next_video.py | 238 + .../models/longformer/__init__.py | 6 - .../longformer/configuration_longformer.py | 17 +- ...r_original_pytorch_lightning_to_pytorch.py | 1 - .../models/longformer/modeling_longformer.py | 55 +- .../longformer/modeling_tf_longformer.py | 135 +- .../longformer/tokenization_longformer.py | 45 +- .../tokenization_longformer_fast.py | 65 +- src/transformers/models/longt5/__init__.py | 6 +- .../models/longt5/configuration_longt5.py | 10 +- .../convert_longt5x_checkpoint_to_flax.py | 6 +- .../models/longt5/modeling_flax_longt5.py | 13 +- .../models/longt5/modeling_longt5.py | 11 +- src/transformers/models/luke/__init__.py | 6 +- .../models/luke/configuration_luke.py | 7 +- src/transformers/models/luke/modeling_luke.py | 56 +- .../models/luke/tokenization_luke.py | 23 +- src/transformers/models/lxmert/__init__.py | 6 +- .../models/lxmert/configuration_lxmert.py | 7 +- ...xmert_original_tf_checkpoint_to_pytorch.py | 1 - .../models/lxmert/modeling_lxmert.py | 24 +- .../models/lxmert/modeling_tf_lxmert.py | 114 +- .../models/lxmert/tokenization_lxmert.py | 21 +- .../models/lxmert/tokenization_lxmert_fast.py | 22 - src/transformers/models/m2m_100/__init__.py | 6 +- .../models/m2m_100/configuration_m2m_100.py | 8 +- .../models/m2m_100/modeling_m2m_100.py | 192 +- .../models/m2m_100/tokenization_m2m_100.py | 21 +- src/transformers/models/mamba/__init__.py | 58 + .../models/mamba/configuration_mamba.py | 157 + ...convert_mamba_ssm_checkpoint_to_pytorch.py | 153 + .../models/mamba/modeling_mamba.py | 808 +++ src/transformers/models/mamba2/__init__.py | 58 + .../models/mamba2/configuration_mamba2.py | 180 + ...onvert_mamba2_ssm_checkpoint_to_pytorch.py | 69 + .../models/mamba2/modeling_mamba2.py | 1081 ++++ src/transformers/models/marian/__init__.py | 6 +- .../models/marian/configuration_marian.py | 8 +- .../convert_marian_tatoeba_to_pytorch.py | 7 +- .../marian/convert_marian_to_pytorch.py | 8 +- .../models/marian/modeling_flax_marian.py | 6 +- .../models/marian/modeling_marian.py | 15 +- .../models/marian/modeling_tf_marian.py | 72 +- .../models/marian/tokenization_marian.py | 22 - src/transformers/models/markuplm/__init__.py | 6 +- .../models/markuplm/configuration_markuplm.py | 7 +- .../markuplm/feature_extraction_markuplm.py | 2 +- .../models/markuplm/modeling_markuplm.py | 24 +- .../models/markuplm/processing_markuplm.py | 1 + .../models/markuplm/tokenization_markuplm.py | 19 - .../markuplm/tokenization_markuplm_fast.py | 19 - .../models/mask2former/__init__.py | 9 +- .../mask2former/configuration_mask2former.py | 45 +- .../image_processing_mask2former.py | 159 +- .../mask2former/modeling_mask2former.py | 62 +- .../models/maskformer/__init__.py | 6 +- .../maskformer/configuration_maskformer.py | 45 +- .../configuration_maskformer_swin.py | 2 +- .../convert_maskformer_resnet_to_pytorch.py | 5 +- .../convert_maskformer_swin_to_pytorch.py | 5 +- .../maskformer/image_processing_maskformer.py | 174 +- .../models/maskformer/modeling_maskformer.py | 160 +- .../maskformer/modeling_maskformer_swin.py | 52 +- src/transformers/models/mbart/__init__.py | 6 +- .../models/mbart/configuration_mbart.py | 8 +- .../models/mbart/modeling_flax_mbart.py | 4 +- .../models/mbart/modeling_mbart.py | 172 +- .../models/mbart/modeling_tf_mbart.py | 80 +- .../models/mbart/tokenization_mbart.py | 17 - .../models/mbart/tokenization_mbart_fast.py | 21 - .../models/mbart50/tokenization_mbart50.py | 14 +- .../mbart50/tokenization_mbart50_fast.py | 18 - .../models/megatron_bert/__init__.py | 6 +- .../configuration_megatron_bert.py | 10 +- .../megatron_bert/modeling_megatron_bert.py | 16 +- ...eckpoint_reshaping_and_interoperability.py | 2 +- .../convert_megatron_gpt2_checkpoint.py | 4 +- src/transformers/models/mgp_str/__init__.py | 6 +- .../models/mgp_str/configuration_mgp_str.py | 6 +- .../models/mgp_str/modeling_mgp_str.py | 8 +- .../models/mgp_str/processing_mgp_str.py | 4 +- .../models/mgp_str/tokenization_mgp_str.py | 10 - src/transformers/models/mistral/__init__.py | 58 +- .../models/mistral/configuration_mistral.py | 13 +- .../mistral/convert_mistral_weights_to_hf.py | 30 +- .../models/mistral/modeling_flax_mistral.py | 742 +++ .../models/mistral/modeling_mistral.py | 780 ++- .../models/mistral/modeling_tf_mistral.py | 1055 ++++ src/transformers/models/mixtral/__init__.py | 6 +- .../models/mixtral/configuration_mixtral.py | 14 +- .../models/mixtral/modeling_mixtral.py | 746 +-- .../models/mluke/tokenization_mluke.py | 22 +- .../models/mobilebert/__init__.py | 6 - .../mobilebert/configuration_mobilebert.py | 11 +- .../models/mobilebert/modeling_mobilebert.py | 16 +- .../mobilebert/modeling_tf_mobilebert.py | 107 +- .../mobilebert/tokenization_mobilebert.py | 17 +- .../tokenization_mobilebert_fast.py | 15 - .../models/mobilenet_v1/__init__.py | 4 - .../configuration_mobilenet_v1.py | 8 +- ...nvert_original_tf_checkpoint_to_pytorch.py | 1 - .../image_processing_mobilenet_v1.py | 29 +- .../mobilenet_v1/modeling_mobilenet_v1.py | 11 +- .../models/mobilenet_v2/__init__.py | 4 - .../configuration_mobilenet_v2.py | 10 +- ...nvert_original_tf_checkpoint_to_pytorch.py | 1 - .../image_processing_mobilenet_v2.py | 30 +- .../mobilenet_v2/modeling_mobilenet_v2.py | 55 +- src/transformers/models/mobilevit/__init__.py | 8 +- .../mobilevit/configuration_mobilevit.py | 20 +- .../mobilevit/convert_mlcvnets_to_pytorch.py | 1 - .../mobilevit/image_processing_mobilevit.py | 225 +- .../models/mobilevit/modeling_mobilevit.py | 46 +- .../models/mobilevit/modeling_tf_mobilevit.py | 113 +- .../models/mobilevitv2/__init__.py | 4 - .../mobilevitv2/configuration_mobilevitv2.py | 6 +- .../convert_mlcvnets_to_pytorch.py | 1 - .../mobilevitv2/modeling_mobilevitv2.py | 28 +- src/transformers/models/mpnet/__init__.py | 8 +- .../models/mpnet/configuration_mpnet.py | 6 +- .../models/mpnet/modeling_mpnet.py | 11 +- .../models/mpnet/modeling_tf_mpnet.py | 82 +- .../models/mpnet/tokenization_mpnet.py | 21 +- .../models/mpnet/tokenization_mpnet_fast.py | 20 - src/transformers/models/mpt/__init__.py | 6 +- .../models/mpt/configuration_mpt.py | 7 +- src/transformers/models/mpt/modeling_mpt.py | 25 +- src/transformers/models/mra/__init__.py | 6 +- .../models/mra/configuration_mra.py | 8 +- src/transformers/models/mra/modeling_mra.py | 51 +- src/transformers/models/mt5/__init__.py | 2 + .../models/mt5/configuration_mt5.py | 33 +- .../models/mt5/modeling_flax_mt5.py | 2 +- src/transformers/models/mt5/modeling_mt5.py | 109 +- .../models/mt5/modeling_tf_mt5.py | 2 +- src/transformers/models/musicgen/__init__.py | 4 - .../models/musicgen/configuration_musicgen.py | 24 +- .../musicgen/convert_musicgen_transformers.py | 19 +- .../models/musicgen/modeling_musicgen.py | 718 ++- .../models/musicgen/processing_musicgen.py | 1 + .../models/musicgen_melody/__init__.py | 86 + .../configuration_musicgen_melody.py | 269 + .../convert_musicgen_melody_transformers.py | 267 + .../feature_extraction_musicgen_melody.py | 331 ++ .../modeling_musicgen_melody.py | 2577 ++++++++++ .../processing_musicgen_melody.py | 175 + src/transformers/models/mvp/__init__.py | 6 +- .../models/mvp/configuration_mvp.py | 7 +- src/transformers/models/mvp/modeling_mvp.py | 22 +- .../models/mvp/tokenization_mvp.py | 17 - .../models/mvp/tokenization_mvp_fast.py | 20 - src/transformers/models/nemotron/__init__.py | 68 + .../models/nemotron/configuration_nemotron.py | 153 + .../nemotron/convert_nemotron_nemo_to_hf.py | 346 ++ .../models/nemotron/modeling_nemotron.py | 1484 ++++++ .../models/nllb/tokenization_nllb.py | 59 +- .../models/nllb/tokenization_nllb_fast.py | 42 +- src/transformers/models/nllb_moe/__init__.py | 10 +- .../models/nllb_moe/configuration_nllb_moe.py | 7 +- .../models/nllb_moe/modeling_nllb_moe.py | 44 +- .../models/nougat/convert_nougat_to_hf.py | 24 +- .../models/nougat/image_processing_nougat.py | 29 +- .../models/nougat/tokenization_nougat_fast.py | 10 +- .../models/nystromformer/__init__.py | 6 +- .../configuration_nystromformer.py | 9 +- .../nystromformer/modeling_nystromformer.py | 12 +- src/transformers/models/olmo/__init__.py | 59 + .../models/olmo/configuration_olmo.py | 181 + .../models/olmo/convert_olmo_weights_to_hf.py | 248 + src/transformers/models/olmo/modeling_olmo.py | 1215 +++++ src/transformers/models/oneformer/__init__.py | 6 +- .../oneformer/configuration_oneformer.py | 40 +- .../oneformer/image_processing_oneformer.py | 189 +- .../models/oneformer/modeling_oneformer.py | 38 +- .../models/oneformer/processing_oneformer.py | 3 +- src/transformers/models/openai/__init__.py | 8 +- .../models/openai/configuration_openai.py | 6 +- ...penai_original_tf_checkpoint_to_pytorch.py | 1 - .../models/openai/modeling_openai.py | 14 +- .../models/openai/modeling_tf_openai.py | 40 +- .../models/openai/tokenization_openai.py | 14 +- .../models/openai/tokenization_openai_fast.py | 13 - src/transformers/models/opt/__init__.py | 6 +- .../models/opt/configuration_opt.py | 12 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/opt/modeling_flax_opt.py | 2 +- src/transformers/models/opt/modeling_opt.py | 176 +- .../models/opt/modeling_tf_opt.py | 44 +- src/transformers/models/owlv2/__init__.py | 4 - .../models/owlv2/configuration_owlv2.py | 12 +- .../models/owlv2/image_processing_owlv2.py | 49 +- .../models/owlv2/modeling_owlv2.py | 147 +- .../models/owlv2/processing_owlv2.py | 3 +- src/transformers/models/owlvit/__init__.py | 4 - .../models/owlvit/configuration_owlvit.py | 14 +- .../models/owlvit/image_processing_owlvit.py | 45 +- .../models/owlvit/modeling_owlvit.py | 82 +- .../models/owlvit/processing_owlvit.py | 3 +- src/transformers/models/paligemma/__init__.py | 54 + .../paligemma/configuration_paligemma.py | 160 + .../convert_paligemma_weights_to_hf.py | 347 ++ .../models/paligemma/modeling_paligemma.py | 528 ++ .../models/paligemma/processing_paligemma.py | 308 ++ .../models/patchtsmixer/__init__.py | 8 +- .../configuration_patchtsmixer.py | 16 +- .../patchtsmixer/modeling_patchtsmixer.py | 84 +- src/transformers/models/patchtst/__init__.py | 9 +- .../models/patchtst/configuration_patchtst.py | 9 - .../models/patchtst/modeling_patchtst.py | 28 +- src/transformers/models/pegasus/__init__.py | 6 +- .../models/pegasus/configuration_pegasus.py | 7 +- .../models/pegasus/modeling_flax_pegasus.py | 7 +- .../models/pegasus/modeling_pegasus.py | 29 +- .../models/pegasus/modeling_tf_pegasus.py | 76 +- .../models/pegasus/tokenization_pegasus.py | 10 - .../pegasus/tokenization_pegasus_fast.py | 16 +- src/transformers/models/pegasus_x/__init__.py | 6 +- .../pegasus_x/configuration_pegasus_x.py | 8 +- .../models/pegasus_x/modeling_pegasus_x.py | 49 +- src/transformers/models/perceiver/__init__.py | 6 +- .../perceiver/configuration_perceiver.py | 7 +- .../convert_perceiver_haiku_to_pytorch.py | 1 - .../perceiver/image_processing_perceiver.py | 29 +- .../models/perceiver/modeling_perceiver.py | 106 +- .../perceiver/tokenization_perceiver.py | 3 +- src/transformers/models/persimmon/__init__.py | 6 +- .../persimmon/configuration_persimmon.py | 10 +- .../models/persimmon/modeling_persimmon.py | 427 +- src/transformers/models/phi/__init__.py | 6 +- .../models/phi/configuration_phi.py | 32 +- .../models/phi/convert_phi_weights_to_hf.py | 92 +- src/transformers/models/phi/modeling_phi.py | 717 ++- src/transformers/models/phi3/__init__.py | 67 + .../models/phi3/configuration_phi3.py | 221 + src/transformers/models/phi3/modeling_phi3.py | 1571 ++++++ .../models/phobert/tokenization_phobert.py | 21 +- .../models/pix2struct/__init__.py | 4 - .../pix2struct/configuration_pix2struct.py | 14 +- .../pix2struct/image_processing_pix2struct.py | 16 +- .../models/pix2struct/modeling_pix2struct.py | 24 +- src/transformers/models/plbart/__init__.py | 6 +- .../models/plbart/configuration_plbart.py | 8 +- .../models/plbart/modeling_plbart.py | 43 +- .../models/plbart/tokenization_plbart.py | 59 - .../models/poolformer/__init__.py | 4 - .../poolformer/configuration_poolformer.py | 8 +- .../poolformer/image_processing_poolformer.py | 29 +- .../models/poolformer/modeling_poolformer.py | 9 +- src/transformers/models/pop2piano/__init__.py | 6 +- .../pop2piano/configuration_pop2piano.py | 7 +- .../convert_pop2piano_weights_to_hf.py | 4 +- .../pop2piano/feature_extraction_pop2piano.py | 2 +- .../models/pop2piano/modeling_pop2piano.py | 18 +- .../models/pop2piano/processing_pop2piano.py | 2 +- .../pop2piano/tokenization_pop2piano.py | 17 +- .../models/prophetnet/__init__.py | 6 +- .../prophetnet/configuration_prophetnet.py | 8 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/prophetnet/modeling_prophetnet.py | 11 +- .../prophetnet/tokenization_prophetnet.py | 23 +- src/transformers/models/pvt/__init__.py | 6 +- .../models/pvt/configuration_pvt.py | 7 +- .../models/pvt/convert_pvt_to_pytorch.py | 1 - .../models/pvt/image_processing_pvt.py | 21 +- src/transformers/models/pvt/modeling_pvt.py | 8 +- src/transformers/models/pvt_v2/__init__.py | 64 + .../models/pvt_v2/configuration_pvt_v2.py | 153 + .../pvt_v2/convert_pvt_v2_to_pytorch.py | 295 ++ .../models/pvt_v2/modeling_pvt_v2.py | 700 +++ src/transformers/models/qwen2/__init__.py | 82 + .../models/qwen2/configuration_qwen2.py | 140 + .../models/qwen2/modeling_qwen2.py | 1423 ++++++ .../models/qwen2/tokenization_qwen2.py | 339 ++ .../models/qwen2/tokenization_qwen2_fast.py | 134 + .../models/qwen2_audio/__init__.py | 57 + .../qwen2_audio/configuration_qwen2_audio.py | 199 + .../qwen2_audio/modeling_qwen2_audio.py | 1375 +++++ .../qwen2_audio/processing_qwen2_audio.py | 177 + src/transformers/models/qwen2_moe/__init__.py | 64 + .../qwen2_moe/configuration_qwen2_moe.py | 177 + .../models/qwen2_moe/modeling_qwen2_moe.py | 1620 ++++++ .../models/rag/configuration_rag.py | 5 +- src/transformers/models/rag/modeling_rag.py | 71 +- .../models/rag/modeling_tf_rag.py | 42 +- src/transformers/models/rag/retrieval_rag.py | 12 +- .../models/rag/tokenization_rag.py | 1 + .../models/recurrent_gemma/__init__.py | 59 + .../configuration_recurrent_gemma.py | 158 + .../convert_recurrent_gemma_to_hf.py | 222 + .../modeling_recurrent_gemma.py | 948 ++++ src/transformers/models/reformer/__init__.py | 6 +- .../models/reformer/configuration_reformer.py | 9 +- ...ert_reformer_trax_checkpoint_to_pytorch.py | 1 - .../models/reformer/modeling_reformer.py | 18 +- .../models/reformer/tokenization_reformer.py | 17 +- .../reformer/tokenization_reformer_fast.py | 22 +- src/transformers/models/regnet/__init__.py | 8 +- .../models/regnet/configuration_regnet.py | 6 +- .../convert_regnet_seer_10b_to_pytorch.py | 4 +- .../regnet/convert_regnet_to_pytorch.py | 5 +- .../models/regnet/modeling_regnet.py | 18 +- .../models/regnet/modeling_tf_regnet.py | 66 +- src/transformers/models/rembert/__init__.py | 10 +- .../models/rembert/configuration_rembert.py | 10 +- ...onvert_rembert_tf_checkpoint_to_pytorch.py | 1 - .../models/rembert/modeling_rembert.py | 8 +- .../models/rembert/modeling_tf_rembert.py | 95 +- .../models/rembert/tokenization_rembert.py | 13 - .../rembert/tokenization_rembert_fast.py | 17 +- src/transformers/models/resnet/__init__.py | 10 +- .../models/resnet/configuration_resnet.py | 6 +- .../resnet/convert_resnet_to_pytorch.py | 1 - .../models/resnet/modeling_resnet.py | 16 +- .../models/resnet/modeling_tf_resnet.py | 57 +- src/transformers/models/roberta/__init__.py | 8 +- .../models/roberta/configuration_roberta.py | 14 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/roberta/modeling_flax_roberta.py | 2 +- .../models/roberta/modeling_roberta.py | 35 +- .../models/roberta/modeling_tf_roberta.py | 90 +- .../models/roberta/tokenization_roberta.py | 36 +- .../roberta/tokenization_roberta_fast.py | 49 +- .../models/roberta_prelayernorm/__init__.py | 6 - .../configuration_roberta_prelayernorm.py | 11 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../modeling_flax_roberta_prelayernorm.py | 3 +- .../modeling_roberta_prelayernorm.py | 16 +- .../modeling_tf_roberta_prelayernorm.py | 94 +- src/transformers/models/roc_bert/__init__.py | 6 +- .../models/roc_bert/configuration_roc_bert.py | 8 +- .../models/roc_bert/modeling_roc_bert.py | 28 +- .../models/roc_bert/tokenization_roc_bert.py | 29 +- src/transformers/models/roformer/__init__.py | 10 +- .../models/roformer/configuration_roformer.py | 22 +- ...ormer_original_tf_checkpoint_to_pytorch.py | 1 - .../models/roformer/modeling_flax_roformer.py | 12 +- .../models/roformer/modeling_roformer.py | 18 +- .../models/roformer/modeling_tf_roformer.py | 102 +- .../models/roformer/tokenization_roformer.py | 45 +- .../roformer/tokenization_roformer_fast.py | 60 +- src/transformers/models/rt_detr/__init__.py | 78 + .../models/rt_detr/configuration_rt_detr.py | 361 ++ .../rt_detr/configuration_rt_detr_resnet.py | 111 + ..._detr_original_pytorch_checkpoint_to_hf.py | 782 +++ .../rt_detr/image_processing_rt_detr.py | 1098 ++++ .../models/rt_detr/modeling_rt_detr.py | 2699 ++++++++++ .../models/rt_detr/modeling_rt_detr_resnet.py | 434 ++ src/transformers/models/rwkv/__init__.py | 6 +- .../models/rwkv/configuration_rwkv.py | 17 +- .../rwkv/convert_rwkv_checkpoint_to_hf.py | 1 - src/transformers/models/rwkv/modeling_rwkv.py | 29 +- src/transformers/models/sam/__init__.py | 8 +- .../models/sam/configuration_sam.py | 9 +- ...l_to_hf_format.py => convert_sam_to_hf.py} | 137 +- .../models/sam/image_processing_sam.py | 305 +- src/transformers/models/sam/modeling_sam.py | 23 +- .../models/sam/modeling_tf_sam.py | 114 +- src/transformers/models/sam/processing_sam.py | 3 + .../models/seamless_m4t/__init__.py | 6 +- .../configuration_seamless_m4t.py | 9 +- .../seamless_m4t/convert_fairseq2_to_hf.py | 3 +- .../feature_extraction_seamless_m4t.py | 17 +- .../seamless_m4t/modeling_seamless_m4t.py | 84 +- .../seamless_m4t/tokenization_seamless_m4t.py | 35 +- .../tokenization_seamless_m4t_fast.py | 16 +- .../models/seamless_m4t_v2/__init__.py | 6 +- .../configuration_seamless_m4t_v2.py | 10 +- .../seamless_m4t_v2/convert_fairseq2_to_hf.py | 3 +- .../modeling_seamless_m4t_v2.py | 68 +- src/transformers/models/segformer/__init__.py | 10 +- .../segformer/configuration_segformer.py | 9 +- .../convert_segformer_original_to_pytorch.py | 1 - .../segformer/image_processing_segformer.py | 61 +- .../models/segformer/modeling_segformer.py | 13 +- .../models/segformer/modeling_tf_segformer.py | 104 +- src/transformers/models/seggpt/__init__.py | 67 + .../models/seggpt/configuration_seggpt.py | 140 + .../models/seggpt/convert_seggpt_to_hf.py | 221 + .../models/seggpt/image_processing_seggpt.py | 615 +++ .../models/seggpt/modeling_seggpt.py | 1021 ++++ src/transformers/models/sew/__init__.py | 6 +- .../models/sew/configuration_sew.py | 7 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - src/transformers/models/sew/modeling_sew.py | 335 +- src/transformers/models/sew_d/__init__.py | 6 +- .../models/sew_d/configuration_sew_d.py | 12 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/sew_d/modeling_sew_d.py | 54 +- src/transformers/models/siglip/__init__.py | 108 + .../models/siglip/configuration_siglip.py | 298 ++ .../models/siglip/convert_siglip_to_hf.py | 412 ++ .../models/siglip/image_processing_siglip.py | 241 + .../models/siglip/modeling_siglip.py | 1568 ++++++ .../models/siglip/processing_siglip.py | 142 + .../models/siglip/tokenization_siglip.py | 375 ++ .../configuration_speech_encoder_decoder.py | 2 +- ...rt_wav2vec2_seq2seq_original_to_pytorch.py | 1 - ...xt_wav2vec2_seq2seq_original_to_pytorch.py | 1 - .../modeling_flax_speech_encoder_decoder.py | 6 +- .../modeling_speech_encoder_decoder.py | 14 +- .../models/speech_to_text/__init__.py | 8 +- .../configuration_speech_to_text.py | 9 +- .../feature_extraction_speech_to_text.py | 2 +- .../speech_to_text/modeling_speech_to_text.py | 12 +- .../modeling_tf_speech_to_text.py | 76 +- .../processing_speech_to_text.py | 1 + .../tokenization_speech_to_text.py | 15 +- src/transformers/models/speecht5/__init__.py | 6 - .../models/speecht5/configuration_speecht5.py | 12 +- .../models/speecht5/modeling_speecht5.py | 78 +- .../models/speecht5/tokenization_speecht5.py | 23 +- src/transformers/models/splinter/__init__.py | 6 +- .../models/splinter/configuration_splinter.py | 12 +- .../models/splinter/modeling_splinter.py | 22 +- .../models/splinter/tokenization_splinter.py | 30 +- .../splinter/tokenization_splinter_fast.py | 26 - .../models/squeezebert/__init__.py | 4 - .../squeezebert/configuration_squeezebert.py | 17 +- .../squeezebert/modeling_squeezebert.py | 13 +- .../squeezebert/tokenization_squeezebert.py | 32 +- .../tokenization_squeezebert_fast.py | 39 - src/transformers/models/stablelm/__init__.py | 64 + .../models/stablelm/configuration_stablelm.py | 185 + .../models/stablelm/modeling_stablelm.py | 1519 ++++++ .../models/starcoder2/__init__.py | 64 + .../starcoder2/configuration_starcoder2.py | 145 + .../models/starcoder2/modeling_starcoder2.py | 1400 ++++++ .../models/superpoint/__init__.py | 69 + .../superpoint/configuration_superpoint.py | 87 + .../convert_superpoint_to_pytorch.py | 175 + .../superpoint/image_processing_superpoint.py | 272 + .../models/superpoint/modeling_superpoint.py | 499 ++ .../models/swiftformer/__init__.py | 28 +- .../swiftformer/configuration_swiftformer.py | 18 +- .../convert_swiftformer_original_to_hf.py | 1 - .../swiftformer/modeling_swiftformer.py | 37 +- .../swiftformer/modeling_tf_swiftformer.py | 863 ++++ src/transformers/models/swin/__init__.py | 8 +- .../models/swin/configuration_swin.py | 9 +- .../swin/convert_swin_simmim_to_pytorch.py | 12 +- .../swin/convert_swin_timm_to_pytorch.py | 12 +- src/transformers/models/swin/modeling_swin.py | 123 +- .../models/swin/modeling_tf_swin.py | 121 +- src/transformers/models/swin2sr/__init__.py | 6 +- .../models/swin2sr/configuration_swin2sr.py | 8 +- .../convert_swin2sr_original_to_pytorch.py | 24 +- .../swin2sr/image_processing_swin2sr.py | 14 +- .../models/swin2sr/modeling_swin2sr.py | 25 +- src/transformers/models/swinv2/__init__.py | 6 +- .../models/swinv2/configuration_swinv2.py | 8 +- .../swinv2/convert_swinv2_timm_to_pytorch.py | 18 +- .../models/swinv2/modeling_swinv2.py | 108 +- .../models/switch_transformers/__init__.py | 4 - .../configuration_switch_transformers.py | 7 +- .../switch_transformers/convert_big_switch.py | 2 +- .../modeling_switch_transformers.py | 34 +- src/transformers/models/t5/__init__.py | 10 +- .../models/t5/configuration_t5.py | 13 +- ...rt_t5_original_tf_checkpoint_to_pytorch.py | 1 - .../t5/convert_t5x_checkpoint_to_flax.py | 134 +- .../models/t5/modeling_flax_t5.py | 25 +- src/transformers/models/t5/modeling_t5.py | 124 +- src/transformers/models/t5/modeling_tf_t5.py | 120 +- src/transformers/models/t5/tokenization_t5.py | 50 +- .../models/t5/tokenization_t5_fast.py | 40 +- .../models/table_transformer/__init__.py | 4 - .../configuration_table_transformer.py | 52 +- .../convert_table_transformer_to_hf.py | 1 - ...convert_table_transformer_to_hf_no_timm.py | 1 - .../modeling_table_transformer.py | 139 +- src/transformers/models/tapas/__init__.py | 8 +- .../models/tapas/configuration_tapas.py | 17 - ...tapas_original_tf_checkpoint_to_pytorch.py | 1 - .../models/tapas/modeling_tapas.py | 41 +- .../models/tapas/modeling_tf_tapas.py | 134 +- .../models/tapas/tokenization_tapas.py | 111 +- .../time_series_transformer/__init__.py | 8 +- .../configuration_time_series_transformer.py | 9 +- .../modeling_time_series_transformer.py | 8 +- .../models/timesformer/__init__.py | 6 +- .../timesformer/configuration_timesformer.py | 8 +- .../timesformer/modeling_timesformer.py | 9 +- .../configuration_timm_backbone.py | 4 +- .../timm_backbone/modeling_timm_backbone.py | 17 +- src/transformers/models/trocr/__init__.py | 7 +- .../models/trocr/configuration_trocr.py | 9 +- .../trocr/convert_trocr_unilm_to_pytorch.py | 3 +- .../models/trocr/modeling_trocr.py | 38 +- .../models/trocr/processing_trocr.py | 1 + src/transformers/models/tvp/__init__.py | 8 +- .../models/tvp/configuration_tvp.py | 41 +- .../models/tvp/image_processing_tvp.py | 32 +- src/transformers/models/tvp/modeling_tvp.py | 155 +- src/transformers/models/tvp/processing_tvp.py | 1 - src/transformers/models/udop/__init__.py | 96 + .../models/udop/configuration_udop.py | 157 + .../models/udop/convert_udop_to_hf.py | 224 + src/transformers/models/udop/modeling_udop.py | 2041 ++++++++ .../models/udop/processing_udop.py | 204 + .../models/udop/tokenization_udop.py | 1464 ++++++ .../models/udop/tokenization_udop_fast.py | 1012 ++++ src/transformers/models/umt5/__init__.py | 2 + .../models/umt5/configuration_umt5.py | 38 +- .../convert_umt5_checkpoint_to_pytorch.py | 6 +- src/transformers/models/umt5/modeling_umt5.py | 86 +- src/transformers/models/unispeech/__init__.py | 6 +- .../unispeech/configuration_unispeech.py | 11 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/unispeech/modeling_unispeech.py | 322 +- .../models/unispeech_sat/__init__.py | 6 +- .../configuration_unispeech_sat.py | 11 +- ...ch_original_s3prl_checkpoint_to_pytorch.py | 1 - ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../unispeech_sat/modeling_unispeech_sat.py | 344 +- src/transformers/models/univnet/__init__.py | 8 +- .../models/univnet/configuration_univnet.py | 7 +- .../models/univnet/modeling_univnet.py | 7 +- .../models/upernet/configuration_upernet.py | 35 +- .../models/upernet/modeling_upernet.py | 29 +- .../models/video_llava/__init__.py | 71 + .../video_llava/configuration_video_llava.py | 134 + .../convert_video_llava_weights_to_hf.py | 159 + .../image_processing_video_llava.py | 404 ++ .../video_llava/modeling_video_llava.py | 698 +++ .../video_llava/processing_video_llava.py | 209 + src/transformers/models/videomae/__init__.py | 6 +- .../models/videomae/configuration_videomae.py | 8 +- .../videomae/image_processing_videomae.py | 28 +- .../models/videomae/modeling_videomae.py | 58 +- src/transformers/models/vilt/__init__.py | 6 +- .../models/vilt/configuration_vilt.py | 8 +- .../vilt/convert_vilt_original_to_pytorch.py | 3 +- .../models/vilt/image_processing_vilt.py | 31 +- src/transformers/models/vilt/modeling_vilt.py | 24 +- src/transformers/models/vipllava/__init__.py | 6 +- .../models/vipllava/configuration_vipllava.py | 28 +- .../convert_vipllava_weights_to_hf.py | 6 +- .../models/vipllava/modeling_vipllava.py | 213 +- .../configuration_vision_encoder_decoder.py | 2 +- .../modeling_flax_vision_encoder_decoder.py | 15 +- .../modeling_tf_vision_encoder_decoder.py | 19 +- .../modeling_vision_encoder_decoder.py | 30 +- .../configuration_vision_text_dual_encoder.py | 24 +- .../modeling_flax_vision_text_dual_encoder.py | 13 +- .../modeling_tf_vision_text_dual_encoder.py | 22 +- .../modeling_vision_text_dual_encoder.py | 19 +- .../processing_vision_text_dual_encoder.py | 3 +- .../models/visual_bert/__init__.py | 6 +- .../visual_bert/configuration_visual_bert.py | 23 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../visual_bert/modeling_visual_bert.py | 56 +- src/transformers/models/vit/__init__.py | 24 +- .../models/vit/configuration_vit.py | 7 +- .../models/vit/convert_dino_to_pytorch.py | 1 - .../models/vit/convert_vit_timm_to_pytorch.py | 1 - .../models/vit/image_processing_vit.py | 21 +- .../models/vit/image_processing_vit_fast.py | 288 ++ .../models/vit/modeling_tf_vit.py | 68 +- src/transformers/models/vit/modeling_vit.py | 55 +- src/transformers/models/vit_mae/__init__.py | 6 +- .../models/vit_mae/configuration_vit_mae.py | 9 +- .../models/vit_mae/modeling_tf_vit_mae.py | 232 +- .../models/vit_mae/modeling_vit_mae.py | 229 +- src/transformers/models/vit_msn/__init__.py | 6 +- .../models/vit_msn/configuration_vit_msn.py | 8 +- .../models/vit_msn/modeling_vit_msn.py | 57 +- src/transformers/models/vitdet/__init__.py | 6 +- .../models/vitdet/configuration_vitdet.py | 7 +- .../models/vitdet/modeling_vitdet.py | 46 +- src/transformers/models/vitmatte/__init__.py | 6 +- .../models/vitmatte/configuration_vitmatte.py | 37 +- .../vitmatte/image_processing_vitmatte.py | 38 +- .../models/vitmatte/modeling_vitmatte.py | 28 +- src/transformers/models/vits/__init__.py | 8 +- .../models/vits/configuration_vits.py | 7 +- src/transformers/models/vits/modeling_vits.py | 17 +- .../models/vits/tokenization_vits.py | 14 - src/transformers/models/vivit/__init__.py | 6 +- .../models/vivit/configuration_vivit.py | 9 +- .../vivit/convert_vivit_flax_to_pytorch.py | 1 + .../models/vivit/image_processing_vivit.py | 28 +- .../models/vivit/modeling_vivit.py | 84 +- src/transformers/models/wav2vec2/__init__.py | 8 +- .../models/wav2vec2/configuration_wav2vec2.py | 11 +- ..._original_pytorch_checkpoint_to_pytorch.py | 22 +- ...c2_original_s3prl_checkpoint_to_pytorch.py | 1 - .../wav2vec2/feature_extraction_wav2vec2.py | 8 +- .../models/wav2vec2/modeling_flax_wav2vec2.py | 2 +- .../models/wav2vec2/modeling_tf_wav2vec2.py | 158 +- .../models/wav2vec2/modeling_wav2vec2.py | 362 +- .../models/wav2vec2/processing_wav2vec2.py | 1 + .../models/wav2vec2/tokenization_wav2vec2.py | 25 +- .../models/wav2vec2_bert/__init__.py | 64 + .../configuration_wav2vec2_bert.py | 310 ++ .../convert_wav2vec2_seamless_checkpoint.py | 217 + .../wav2vec2_bert/modeling_wav2vec2_bert.py | 1667 ++++++ .../wav2vec2_bert/processing_wav2vec2_bert.py | 146 + .../models/wav2vec2_conformer/__init__.py | 8 +- .../configuration_wav2vec2_conformer.py | 16 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../modeling_wav2vec2_conformer.py | 60 +- .../tokenization_wav2vec2_phoneme.py | 17 - .../processing_wav2vec2_with_lm.py | 32 +- src/transformers/models/wavlm/__init__.py | 6 +- .../models/wavlm/configuration_wavlm.py | 7 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - ...lm_original_s3prl_checkpoint_to_pytorch.py | 1 - .../models/wavlm/modeling_wavlm.py | 63 +- src/transformers/models/whisper/__init__.py | 8 +- .../models/whisper/configuration_whisper.py | 11 +- .../models/whisper/convert_openai_to_hf.py | 14 +- .../whisper/feature_extraction_whisper.py | 85 +- .../models/whisper/generation_whisper.py | 1823 +++++++ .../models/whisper/modeling_flax_whisper.py | 4 +- .../models/whisper/modeling_tf_whisper.py | 85 +- .../models/whisper/modeling_whisper.py | 1635 ++---- .../models/whisper/processing_whisper.py | 1 - .../models/whisper/tokenization_whisper.py | 130 +- .../whisper/tokenization_whisper_fast.py | 139 +- src/transformers/models/x_clip/__init__.py | 4 - .../models/x_clip/configuration_x_clip.py | 18 +- .../models/x_clip/modeling_x_clip.py | 18 +- src/transformers/models/xglm/__init__.py | 9 +- .../models/xglm/configuration_xglm.py | 7 +- .../models/xglm/modeling_flax_xglm.py | 3 +- .../models/xglm/modeling_tf_xglm.py | 48 +- src/transformers/models/xglm/modeling_xglm.py | 34 +- .../models/xglm/tokenization_xglm.py | 13 +- .../models/xglm/tokenization_xglm_fast.py | 15 - src/transformers/models/xlm/__init__.py | 8 +- .../models/xlm/configuration_xlm.py | 18 +- ..._original_pytorch_checkpoint_to_pytorch.py | 1 - .../models/xlm/modeling_tf_xlm.py | 68 +- src/transformers/models/xlm/modeling_xlm.py | 34 +- .../models/xlm/tokenization_xlm.py | 403 +- .../models/xlm_roberta/__init__.py | 8 - .../xlm_roberta/configuration_xlm_roberta.py | 26 +- .../xlm_roberta/modeling_flax_xlm_roberta.py | 8 +- .../xlm_roberta/modeling_tf_xlm_roberta.py | 89 +- .../xlm_roberta/modeling_xlm_roberta.py | 35 +- .../xlm_roberta/tokenization_xlm_roberta.py | 33 +- .../tokenization_xlm_roberta_fast.py | 49 +- .../models/xlm_roberta_xl/__init__.py | 4 - .../configuration_xlm_roberta_xl.py | 12 +- .../xlm_roberta_xl/modeling_xlm_roberta_xl.py | 49 +- src/transformers/models/xlnet/__init__.py | 8 +- .../models/xlnet/configuration_xlnet.py | 9 +- ...xlnet_original_tf_checkpoint_to_pytorch.py | 1 - .../models/xlnet/modeling_tf_xlnet.py | 76 +- .../models/xlnet/modeling_xlnet.py | 57 +- .../models/xlnet/tokenization_xlnet.py | 16 +- .../models/xlnet/tokenization_xlnet_fast.py | 20 +- src/transformers/models/xmod/__init__.py | 5 +- .../models/xmod/configuration_xmod.py | 15 +- src/transformers/models/xmod/modeling_xmod.py | 19 +- src/transformers/models/yolos/__init__.py | 6 +- .../models/yolos/configuration_yolos.py | 9 +- .../models/yolos/convert_yolos_to_pytorch.py | 1 - .../models/yolos/image_processing_yolos.py | 381 +- .../models/yolos/modeling_yolos.py | 89 +- src/transformers/models/yoso/__init__.py | 6 +- .../models/yoso/configuration_yoso.py | 9 +- src/transformers/models/yoso/modeling_yoso.py | 62 +- src/transformers/models/zoedepth/__init__.py | 67 + .../models/zoedepth/configuration_zoedepth.py | 234 + .../models/zoedepth/convert_zoedepth_to_hf.py | 426 ++ .../zoedepth/image_processing_zoedepth.py | 444 ++ .../models/zoedepth/modeling_zoedepth.py | 1403 ++++++ src/transformers/onnx/__main__.py | 2 +- src/transformers/onnx/convert.py | 54 +- src/transformers/optimization.py | 166 +- src/transformers/optimization_tf.py | 26 +- src/transformers/pipelines/__init__.py | 97 +- .../pipelines/audio_classification.py | 6 +- src/transformers/pipelines/audio_utils.py | 75 +- .../pipelines/automatic_speech_recognition.py | 166 +- src/transformers/pipelines/base.py | 194 +- src/transformers/pipelines/conversational.py | 324 -- .../pipelines/depth_estimation.py | 17 +- .../pipelines/document_question_answering.py | 15 +- .../pipelines/feature_extraction.py | 46 +- src/transformers/pipelines/fill_mask.py | 14 +- .../pipelines/image_classification.py | 111 +- .../pipelines/image_feature_extraction.py | 112 + .../pipelines/image_segmentation.py | 8 +- src/transformers/pipelines/image_to_image.py | 6 +- src/transformers/pipelines/image_to_text.py | 50 +- src/transformers/pipelines/mask_generation.py | 33 +- .../pipelines/object_detection.py | 10 +- src/transformers/pipelines/pt_utils.py | 13 +- .../pipelines/question_answering.py | 4 +- .../pipelines/table_question_answering.py | 17 +- .../pipelines/text2text_generation.py | 18 +- .../pipelines/text_classification.py | 25 +- src/transformers/pipelines/text_generation.py | 128 +- src/transformers/pipelines/text_to_audio.py | 5 +- .../pipelines/token_classification.py | 7 +- .../pipelines/video_classification.py | 42 +- .../pipelines/visual_question_answering.py | 53 +- .../zero_shot_audio_classification.py | 14 +- .../pipelines/zero_shot_classification.py | 4 +- .../zero_shot_image_classification.py | 20 +- .../pipelines/zero_shot_object_detection.py | 6 +- src/transformers/processing_utils.py | 733 ++- src/transformers/pytorch_utils.py | 24 +- src/transformers/quantizers/__init__.py | 15 + src/transformers/quantizers/auto.py | 180 + src/transformers/quantizers/base.py | 226 + src/transformers/quantizers/quantizer_aqlm.py | 98 + src/transformers/quantizers/quantizer_awq.py | 126 + .../quantizers/quantizer_bnb_4bit.py | 331 ++ .../quantizers/quantizer_bnb_8bit.py | 300 ++ src/transformers/quantizers/quantizer_eetq.py | 170 + .../quantizers/quantizer_fbgemm_fp8.py | 205 + src/transformers/quantizers/quantizer_gptq.py | 94 + src/transformers/quantizers/quantizer_hqq.py | 200 + .../quantizers/quantizer_quanto.py | 200 + .../quantizers/quantizer_torchao.py | 172 + .../quantizers/quantizers_utils.py | 26 + src/transformers/safetensors_conversion.py | 48 +- src/transformers/testing_utils.py | 242 +- src/transformers/tf_utils.py | 39 + src/transformers/time_series_utils.py | 1 + src/transformers/tokenization_utils.py | 112 +- src/transformers/tokenization_utils_base.py | 602 ++- src/transformers/tokenization_utils_fast.py | 34 +- src/transformers/tools/agents.py | 771 --- src/transformers/tools/image_captioning.py | 51 - src/transformers/tools/image_segmentation.py | 58 - src/transformers/tools/prompts.py | 48 - src/transformers/tools/python_interpreter.py | 253 - src/transformers/tools/text_classification.py | 70 - .../tools/text_question_answering.py | 52 - src/transformers/tools/text_summarization.py | 52 - src/transformers/trainer.py | 1426 +++++- src/transformers/trainer_callback.py | 145 +- src/transformers/trainer_pt_utils.py | 285 +- src/transformers/trainer_seq2seq.py | 52 +- src/transformers/trainer_tf.py | 801 --- src/transformers/trainer_utils.py | 110 +- src/transformers/training_args.py | 616 ++- src/transformers/training_args_seq2seq.py | 3 +- src/transformers/training_args_tf.py | 12 +- src/transformers/utils/__init__.py | 32 +- src/transformers/utils/backbone_utils.py | 101 +- src/transformers/utils/chat_template_utils.py | 316 ++ src/transformers/utils/deprecation.py | 169 + src/transformers/utils/doc.py | 22 +- .../utils/dummy_detectron2_objects.py | 3 - src/transformers/utils/dummy_flax_objects.py | 66 +- src/transformers/utils/dummy_pt_objects.py | 2267 ++++++--- .../utils/dummy_sentencepiece_objects.py | 25 +- src/transformers/utils/dummy_tf_objects.py | 267 +- .../utils/dummy_tokenizers_objects.py | 30 +- .../utils/dummy_torchaudio_objects.py | 16 + .../utils/dummy_torchvision_objects.py | 16 + .../utils/dummy_vision_objects.py | 119 +- src/transformers/utils/fx.py | 396 +- src/transformers/utils/generic.py | 220 +- src/transformers/utils/hub.py | 196 +- src/transformers/utils/import_utils.py | 396 +- src/transformers/utils/logging.py | 20 +- src/transformers/utils/notebook.py | 8 +- src/transformers/utils/peft_utils.py | 8 +- src/transformers/utils/quantization_config.py | 528 +- .../utils/sentencepiece_model_pb2_new.py | 1 + src/transformers/utils/versions.py | 2 +- .../adding_a_new_example_script/README.md | 4 +- .../run_{{cookiecutter.example_shortcut}}.py | 30 +- .../ADD_NEW_MODEL_PROPOSAL_TEMPLATE.md | 8 +- templates/adding_a_new_model/README.md | 257 +- .../__init__.py | 286 -- .../configuration.json | 11 - ...on_{{cookiecutter.lowercase_modelname}}.py | 240 - ...ax_{{cookiecutter.lowercase_modelname}}.py | 3240 ------------ ...tf_{{cookiecutter.lowercase_modelname}}.py | 2823 ----------- ...ng_{{cookiecutter.lowercase_modelname}}.py | 3274 ------------ ...ax_{{cookiecutter.lowercase_modelname}}.py | 669 --- ...tf_{{cookiecutter.lowercase_modelname}}.py | 971 ---- ...ng_{{cookiecutter.lowercase_modelname}}.py | 1070 ---- ...ce_{{cookiecutter.lowercase_modelname}}.py | 472 -- ...st_{{cookiecutter.lowercase_modelname}}.py | 201 - ...on_{{cookiecutter.lowercase_modelname}}.py | 332 -- .../{{cookiecutter.lowercase_modelname}}.md | 234 - .../adding_a_new_model/cookiecutter.json | 19 - .../open_model_proposals/ADD_BIG_BIRD.md | 8 +- .../tests/encoder-bert-tokenizer.json | 11 - .../tests/flax-encoder-bert-tokenizer.json | 11 - .../tests/flax-seq-2-seq-bart-tokenizer.json | 11 - .../tests/pt-encoder-bert-tokenizer.json | 11 - .../tests/pt-seq-2-seq-bart-tokenizer.json | 11 - .../adding_a_new_model/tests/standalone.json | 11 - .../tests/tf-encoder-bert-tokenizer.json | 11 - .../tests/tf-seq-2-seq-bart-tokenizer.json | 11 - .../efficientformer => agents}/__init__.py | 0 tests/{tools => agents}/test_agent_types.py | 2 +- tests/agents/test_agents.py | 237 + .../test_document_question_answering.py | 15 - tests/agents/test_final_answer.py | 71 + .../test_image_question_answering.py | 11 - tests/agents/test_python_interpreter.py | 827 +++ .../{tools => agents}/test_speech_to_text.py | 16 +- .../{tools => agents}/test_text_to_speech.py | 16 +- tests/agents/test_tools_common.py | 107 + tests/agents/test_translation.py | 68 + tests/deepspeed/ds_config_zero3.json | 11 +- tests/deepspeed/test_deepspeed.py | 193 +- tests/deepspeed/test_model_zoo.py | 23 +- tests/extended/test_trainer_ext.py | 15 +- .../tests_samples/COCO/000000004016.png | Bin 0 -> 636411 bytes tests/fsdp/test_fsdp.py | 59 +- tests/generation/test_configuration_utils.py | 205 +- tests/generation/test_flax_logits_process.py | 45 +- tests/generation/test_framework_agnostic.py | 16 +- tests/generation/test_logits_process.py | 249 +- tests/generation/test_stopping_criteria.py | 191 +- tests/generation/test_streamers.py | 4 +- tests/generation/test_tf_utils.py | 104 +- tests/generation/test_utils.py | 2749 +++++----- tests/models/albert/test_modeling_albert.py | 9 +- .../albert/test_modeling_flax_albert.py | 4 +- .../models/albert/test_modeling_tf_albert.py | 9 +- .../models/albert/test_tokenization_albert.py | 5 +- tests/models/align/test_modeling_align.py | 67 +- tests/models/align/test_processor_align.py | 9 +- tests/models/altclip/test_modeling_altclip.py | 29 +- ...xtraction_audio_spectrogram_transformer.py | 2 +- ..._modeling_audio_spectrogram_transformer.py | 16 +- tests/models/auto/test_configuration_auto.py | 2 +- .../models/auto/test_image_processing_auto.py | 21 +- tests/models/auto/test_modeling_auto.py | 219 +- tests/models/auto/test_modeling_flax_auto.py | 8 +- tests/models/auto/test_modeling_tf_auto.py | 98 +- tests/models/auto/test_modeling_tf_pytorch.py | 139 +- tests/models/auto/test_processor_auto.py | 256 +- tests/models/auto/test_tokenization_auto.py | 93 +- .../autoformer/test_modeling_autoformer.py | 8 +- tests/models/bark/test_modeling_bark.py | 80 +- tests/models/bart/test_modeling_bart.py | 30 +- tests/models/bart/test_modeling_tf_bart.py | 5 - tests/models/bart/test_tokenization_bart.py | 2 + .../barthez/test_tokenization_barthez.py | 3 +- .../bartpho/test_tokenization_bartpho.py | 1 + .../models/beit/test_image_processing_beit.py | 21 +- tests/models/beit/test_modeling_beit.py | 67 +- tests/models/bert/test_modeling_bert.py | 134 +- tests/models/bert/test_modeling_flax_bert.py | 2 +- tests/models/bert/test_tokenization_bert.py | 25 +- .../models/bert/test_tokenization_bert_tf.py | 9 +- .../test_tokenization_bert_generation.py | 1 + .../test_tokenization_bert_japanese.py | 81 +- .../bertweet/test_tokenization_bertweet.py | 1 + .../models/big_bird/test_modeling_big_bird.py | 12 +- .../big_bird/test_tokenization_big_bird.py | 3 +- .../test_modeling_bigbird_pegasus.py | 27 +- tests/models/biogpt/test_modeling_biogpt.py | 13 +- .../models/biogpt/test_tokenization_biogpt.py | 4 +- tests/models/bit/test_modeling_bit.py | 35 +- .../blenderbot/test_modeling_blenderbot.py | 9 +- .../blenderbot/test_modeling_tf_blenderbot.py | 1 - .../test_tokenization_blenderbot.py | 1 + .../test_modeling_blenderbot_small.py | 11 +- .../test_modeling_flax_blenderbot_small.py | 2 +- .../test_modeling_tf_blenderbot_small.py | 3 +- .../test_tokenization_blenderbot_small.py | 2 + .../models/blip/test_image_processing_blip.py | 11 +- tests/models/blip/test_modeling_blip.py | 144 +- tests/models/blip/test_modeling_blip_text.py | 12 +- tests/models/blip/test_modeling_tf_blip.py | 39 +- .../models/blip/test_modeling_tf_blip_text.py | 10 +- tests/models/blip_2/test_modeling_blip_2.py | 92 +- tests/models/bloom/test_modeling_bloom.py | 11 +- tests/models/bloom/test_tokenization_bloom.py | 13 +- .../test_image_processing_bridgetower.py | 6 + .../bridgetower/test_modeling_bridgetower.py | 15 +- tests/models/bros/test_modeling_bros.py | 14 +- tests/models/byt5/test_tokenization_byt5.py | 16 +- .../camembert/test_modeling_camembert.py | 2 +- .../camembert/test_tokenization_camembert.py | 15 +- tests/models/canine/test_modeling_canine.py | 22 +- .../models/canine/test_tokenization_canine.py | 24 +- .../models/{ernie_m => chameleon}/__init__.py | 0 .../test_image_processing_chameleon.py | 206 + .../chameleon/test_modeling_chameleon.py | 459 ++ .../test_image_processing_chinese_clip.py | 21 +- .../test_modeling_chinese_clip.py | 32 +- tests/models/clap/test_modeling_clap.py | 45 +- .../models/clip/test_image_processing_clip.py | 1 + tests/models/clip/test_modeling_clip.py | 461 +- tests/models/clip/test_modeling_tf_clip.py | 43 +- tests/models/clip/test_processor_clip.py | 6 +- tests/models/clip/test_tokenization_clip.py | 18 +- tests/models/clipseg/test_modeling_clipseg.py | 46 +- tests/models/clvp/test_modeling_clvp.py | 13 +- tests/models/clvp/test_tokenization_clvp.py | 3 +- .../test_tokenization_code_llama.py | 24 +- tests/models/codegen/test_modeling_codegen.py | 8 +- .../codegen/test_tokenization_codegen.py | 57 +- .../{gptsan_japanese => cohere}/__init__.py | 0 tests/models/cohere/test_modeling_cohere.py | 417 ++ .../models/cohere/test_tokenization_cohere.py | 288 ++ .../test_image_processing_conditional_detr.py | 346 ++ .../test_modeling_conditional_detr.py | 79 +- .../models/convbert/test_modeling_convbert.py | 12 +- .../convbert/test_modeling_tf_convbert.py | 3 +- .../test_image_processing_convnext.py | 2 + .../models/convnext/test_modeling_convnext.py | 33 +- .../convnext/test_modeling_tf_convnext.py | 2 +- .../convnextv2/test_modeling_convnextv2.py | 37 +- .../convnextv2/test_modeling_tf_convnextv2.py | 2 +- tests/models/cpmant/test_modeling_cpmant.py | 13 +- .../models/cpmant/test_tokenization_cpmant.py | 1 + tests/models/ctrl/test_modeling_ctrl.py | 11 +- tests/models/ctrl/test_modeling_tf_ctrl.py | 16 +- tests/models/ctrl/test_tokenization_ctrl.py | 1 + tests/models/cvt/test_modeling_cvt.py | 37 +- tests/models/cvt/test_modeling_tf_cvt.py | 21 +- tests/models/{graphormer => dac}/__init__.py | 0 .../models/dac/test_feature_extraction_dac.py | 216 + tests/models/dac/test_modeling_dac.py | 749 +++ .../data2vec/test_modeling_data2vec_audio.py | 17 +- .../data2vec/test_modeling_data2vec_text.py | 15 +- .../data2vec/test_modeling_data2vec_vision.py | 54 +- .../test_modeling_tf_data2vec_vision.py | 18 +- tests/models/{jukebox => dbrx}/__init__.py | 0 tests/models/dbrx/test_modeling_dbrx.py | 397 ++ tests/models/deberta/test_modeling_deberta.py | 9 +- .../deberta/test_tokenization_deberta.py | 1 + .../deberta_v2/test_modeling_deberta_v2.py | 9 +- .../test_tokenization_deberta_v2.py | 31 +- .../test_modeling_decision_transformer.py | 17 +- .../test_image_processing_deformable_detr.py | 346 ++ .../test_modeling_deformable_detr.py | 120 +- .../models/deit/test_image_processing_deit.py | 2 + tests/models/deit/test_modeling_deit.py | 73 +- tests/models/deit/test_modeling_tf_deit.py | 35 +- .../{mega => depth_anything}/__init__.py | 0 .../test_modeling_depth_anything.py | 292 ++ .../models/deta/test_image_processing_deta.py | 246 - tests/models/deta/test_modeling_deta.py | 586 --- .../models/detr/test_image_processing_detr.py | 344 +- tests/models/detr/test_modeling_detr.py | 80 +- tests/models/dinat/test_modeling_dinat.py | 30 +- tests/models/dinov2/test_modeling_dinov2.py | 14 +- .../dinov2/test_modeling_flax_dinov2.py | 263 + .../distilbert/test_modeling_distilbert.py | 25 +- .../distilbert/test_modeling_tf_distilbert.py | 7 +- .../test_tokenization_distilbert.py | 1 + .../donut/test_image_processing_donut.py | 2 + .../models/donut/test_modeling_donut_swin.py | 34 +- tests/models/donut/test_processing_donut.py | 11 +- tests/models/dpr/test_modeling_dpr.py | 29 +- tests/models/dpr/test_modeling_tf_dpr.py | 27 +- tests/models/dpr/test_tokenization_dpr.py | 7 +- tests/models/dpt/test_image_processing_dpt.py | 2 + tests/models/dpt/test_modeling_dpt.py | 50 +- .../dpt/test_modeling_dpt_auto_backbone.py | 21 +- tests/models/dpt/test_modeling_dpt_hybrid.py | 27 +- .../test_image_processing_efficientformer.py | 99 - .../test_modeling_efficientformer.py | 481 -- .../test_modeling_tf_efficientformer.py | 411 -- .../test_image_processing_efficientnet.py | 2 + .../test_modeling_efficientnet.py | 43 +- tests/models/electra/test_modeling_electra.py | 7 +- .../electra/test_modeling_tf_electra.py | 2 +- .../electra/test_tokenization_electra.py | 23 +- .../test_feature_extraction_encodec.py | 20 +- tests/models/encodec/test_modeling_encodec.py | 98 +- .../test_modeling_encoder_decoder.py | 42 +- .../test_modeling_flax_encoder_decoder.py | 20 +- .../test_modeling_tf_encoder_decoder.py | 34 +- tests/models/ernie/test_modeling_ernie.py | 10 +- tests/models/ernie_m/test_modeling_ernie_m.py | 325 -- .../ernie_m/test_tokenization_ernie_m.py | 142 - tests/models/esm/test_modeling_esm.py | 40 +- tests/models/esm/test_modeling_esmfold.py | 43 +- tests/models/esm/test_modeling_tf_esm.py | 10 +- tests/models/esm/test_tokenization_esm.py | 22 + tests/models/falcon/test_modeling_falcon.py | 130 +- .../models/{nat => falcon_mamba}/__init__.py | 0 .../test_modeling_falcon_mamba.py | 526 ++ .../__init__.py | 0 .../test_modeling_fastspeech2_conformer.py | 807 +++ ...test_tokenization_fastspeech2_conformer.py | 191 + .../models/flaubert/test_modeling_flaubert.py | 24 +- .../flaubert/test_modeling_tf_flaubert.py | 7 +- .../flaubert/test_tokenization_flaubert.py | 75 + .../flava/test_image_processing_flava.py | 2 + tests/models/flava/test_modeling_flava.py | 164 +- tests/models/fnet/test_modeling_fnet.py | 14 +- tests/models/fnet/test_tokenization_fnet.py | 5 +- .../models/focalnet/test_modeling_focalnet.py | 32 +- tests/models/fsmt/test_modeling_fsmt.py | 21 +- tests/models/fsmt/test_tokenization_fsmt.py | 5 +- .../models/funnel/test_tokenization_funnel.py | 9 +- tests/models/fuyu/test_modeling_fuyu.py | 8 +- tests/models/{qdqbert => gemma}/__init__.py | 0 .../models/gemma/test_modeling_flax_gemma.py | 266 + tests/models/gemma/test_modeling_gemma.py | 856 ++++ tests/models/gemma/test_tokenization_gemma.py | 544 ++ tests/models/{realm => gemma2}/__init__.py | 0 tests/models/gemma2/test_modeling_gemma2.py | 208 + tests/models/git/test_modeling_git.py | 23 +- .../models/glpn/test_image_processing_glpn.py | 4 + tests/models/glpn/test_modeling_glpn.py | 32 +- tests/models/gpt2/test_modeling_flax_gpt2.py | 6 +- tests/models/gpt2/test_modeling_gpt2.py | 116 +- tests/models/gpt2/test_modeling_tf_gpt2.py | 43 +- tests/models/gpt2/test_tokenization_gpt2.py | 9 +- .../models/gpt2/test_tokenization_gpt2_tf.py | 7 +- .../gpt_bigcode/test_modeling_gpt_bigcode.py | 12 +- .../gpt_neo/test_modeling_flax_gpt_neo.py | 4 +- tests/models/gpt_neo/test_modeling_gpt_neo.py | 10 +- .../models/gpt_neox/test_modeling_gpt_neox.py | 179 +- .../test_modeling_gpt_neox_japanese.py | 3 +- .../test_tokenization_gpt_neox_japanese.py | 4 +- .../gpt_sw3/test_tokenization_gpt_sw3.py | 14 +- tests/models/gptj/test_modeling_flax_gptj.py | 4 +- tests/models/gptj/test_modeling_gptj.py | 59 +- .../test_modeling_gptsan_japanese.py | 464 -- .../test_tokenization_gptsan_japanese.py | 217 - .../graphormer/test_modeling_graphormer.py | 1302 ----- .../__init__.py | 0 .../test_image_processing_grounding_dino.py | 634 +++ .../test_modeling_grounding_dino.py | 743 +++ .../test_processor_grounding_dino.py | 265 + .../models/groupvit/test_modeling_groupvit.py | 33 +- .../groupvit/test_modeling_tf_groupvit.py | 43 +- .../herbert/test_tokenization_herbert.py | 6 +- tests/models/{tvlt => hiera}/__init__.py | 0 tests/models/hiera/test_modeling_hiera.py | 631 +++ tests/models/hubert/test_modeling_hubert.py | 33 +- tests/models/ibert/test_modeling_ibert.py | 24 +- .../idefics/test_image_processing_idefics.py | 19 +- tests/models/idefics/test_modeling_idefics.py | 33 +- .../idefics/test_modeling_tf_idefics.py | 565 +++ .../models/idefics/test_processor_idefics.py | 48 +- .../{vit_hybrid => idefics2}/__init__.py | 0 .../test_image_processing_idefics2.py | 311 ++ .../models/idefics2/test_modeling_idefics2.py | 575 +++ .../idefics2/test_processing_idefics2.py | 235 + .../test_image_processing_imagegpt.py | 49 +- .../models/imagegpt/test_modeling_imagegpt.py | 44 +- .../models/informer/test_modeling_informer.py | 27 +- .../test_modeling_instructblip.py | 91 +- .../test_processor_instructblip.py | 2 +- .../__init__.py | 0 ...test_image_processing_instrictblipvideo.py | 192 + .../test_modeling_instructblipvideo.py | 615 +++ tests/{tools => models/jamba}/__init__.py | 0 tests/models/jamba/test_modeling_jamba.py | 761 +++ tests/models/jetmoe/__init__.py | 0 tests/models/jetmoe/test_modeling_jetmoe.py | 536 ++ tests/models/jukebox/test_modeling_jukebox.py | 407 -- .../jukebox/test_tokenization_jukebox.py | 209 - tests/models/kosmos2/test_modeling_kosmos2.py | 23 +- .../models/kosmos2/test_processor_kosmos2.py | 10 + .../layoutlm/test_modeling_tf_layoutlm.py | 7 +- .../layoutlm/test_tokenization_layoutlm.py | 6 +- .../test_image_processing_layoutlmv2.py | 9 +- .../layoutlmv2/test_modeling_layoutlmv2.py | 71 +- .../layoutlmv2/test_processor_layoutlmv2.py | 4 +- .../test_tokenization_layoutlmv2.py | 71 +- .../test_image_processing_layoutlmv3.py | 8 +- .../layoutlmv3/test_modeling_layoutlmv3.py | 9 +- .../layoutlmv3/test_modeling_tf_layoutlmv3.py | 9 +- .../layoutlmv3/test_processor_layoutlmv3.py | 2 +- .../test_tokenization_layoutlmv3.py | 57 +- .../layoutxlm/test_processor_layoutxlm.py | 4 +- .../layoutxlm/test_tokenization_layoutxlm.py | 110 +- tests/models/led/test_modeling_led.py | 20 +- tests/models/led/test_modeling_tf_led.py | 1 - tests/models/led/test_tokenization_led.py | 2 + .../levit/test_image_processing_levit.py | 2 + tests/models/levit/test_modeling_levit.py | 54 +- tests/models/lilt/test_modeling_lilt.py | 7 +- tests/models/llama/test_modeling_llama.py | 766 ++- tests/models/llama/test_tokenization_llama.py | 147 +- tests/models/llava/test_modeling_llava.py | 242 +- tests/models/llava/test_processor_llava.py | 47 + tests/models/llava_next/__init__.py | 0 .../test_image_processing_llava_next.py | 222 + .../llava_next/test_modeling_llava_next.py | 598 +++ .../llava_next/test_processor_llava_next.py | 41 + tests/models/llava_next_video/__init__.py | 0 .../test_image_processing_llava_next_video.py | 218 + .../test_modeling_llava_next_video.py | 558 +++ .../longformer/test_modeling_longformer.py | 6 +- .../test_tokenization_longformer.py | 7 +- tests/models/longt5/test_modeling_longt5.py | 37 +- tests/models/luke/test_modeling_luke.py | 10 +- tests/models/luke/test_tokenization_luke.py | 8 +- tests/models/lxmert/test_modeling_lxmert.py | 29 +- .../models/lxmert/test_modeling_tf_lxmert.py | 2 +- .../models/lxmert/test_tokenization_lxmert.py | 7 +- tests/models/m2m_100/test_modeling_m2m_100.py | 59 +- .../m2m_100/test_tokenization_m2m_100.py | 5 +- tests/models/mamba/__init__.py | 0 tests/models/mamba/test_modeling_mamba.py | 546 ++ tests/models/mamba2/__init__.py | 0 tests/models/mamba2/test_modeling_mamba2.py | 393 ++ .../marian/test_modeling_flax_marian.py | 4 - tests/models/marian/test_modeling_marian.py | 28 +- .../models/marian/test_modeling_tf_marian.py | 5 - .../models/marian/test_tokenization_marian.py | 1 + .../markuplm/test_tokenization_markuplm.py | 57 +- .../test_image_processing_mask2former.py | 32 +- .../mask2former/test_modeling_mask2former.py | 43 +- .../test_image_processing_maskformer.py | 32 +- .../maskformer/test_modeling_maskformer.py | 122 +- .../test_modeling_maskformer_swin.py | 27 +- tests/models/mbart/test_modeling_mbart.py | 30 +- tests/models/mbart/test_modeling_tf_mbart.py | 1 - tests/models/mbart/test_tokenization_mbart.py | 5 +- .../mbart50/test_tokenization_mbart50.py | 3 +- tests/models/mega/test_modeling_mega.py | 732 --- .../test_modeling_megatron_bert.py | 5 +- .../test_modeling_megatron_gpt2.py | 2 +- tests/models/mgp_str/test_modeling_mgp_str.py | 12 +- .../models/mgp_str/test_processor_mgp_str.py | 2 +- .../mgp_str/test_tokenization_mgp_str.py | 7 +- .../mistral/test_modeling_flax_mistral.py | 243 + tests/models/mistral/test_modeling_mistral.py | 325 +- .../mistral/test_modeling_tf_mistral.py | 367 ++ tests/models/mixtral/test_modeling_mixtral.py | 180 +- tests/models/mluke/test_tokenization_mluke.py | 8 +- .../mobilebert/test_modeling_mobilebert.py | 2 +- .../mobilebert/test_modeling_tf_mobilebert.py | 4 +- .../test_tokenization_mobilebert.py | 28 +- .../test_image_processing_mobilenet_v1.py | 2 + .../test_modeling_mobilenet_v1.py | 20 +- .../test_image_processing_mobilenet_v2.py | 2 + .../test_modeling_mobilenet_v2.py | 20 +- .../test_image_processing_mobilevit.py | 137 +- .../mobilevit/test_modeling_mobilevit.py | 14 +- .../mobilevit/test_modeling_tf_mobilevit.py | 10 +- .../mobilevitv2/test_modeling_mobilevitv2.py | 14 +- tests/models/mpnet/test_modeling_mpnet.py | 2 +- tests/models/mpnet/test_tokenization_mpnet.py | 5 +- tests/models/mpt/test_modeling_mpt.py | 29 +- tests/models/mra/test_modeling_mra.py | 16 +- tests/models/mt5/test_modeling_mt5.py | 1040 +++- .../models/musicgen/test_modeling_musicgen.py | 1719 +++++-- .../musicgen/test_processing_musicgen.py | 5 +- tests/models/musicgen_melody/__init__.py | 0 ...test_feature_extraction_musicgen_melody.py | 231 + .../test_modeling_musicgen_melody.py | 2588 ++++++++++ .../test_processor_musicgen_melody.py | 179 + tests/models/mvp/test_modeling_mvp.py | 10 +- tests/models/mvp/test_tokenization_mvp.py | 2 + tests/models/nat/test_modeling_nat.py | 383 -- tests/models/nemotron/__init__.py | 0 .../models/nemotron/test_modeling_nemotron.py | 246 + tests/models/nezha/test_modeling_nezha.py | 490 -- tests/models/nllb/test_tokenization_nllb.py | 57 +- .../models/nllb_moe/test_modeling_nllb_moe.py | 42 +- .../nougat/test_image_processing_nougat.py | 2 + .../models/nougat/test_tokenization_nougat.py | 9 +- .../test_modeling_nystromformer.py | 10 +- tests/models/olmo/__init__.py | 0 tests/models/olmo/test_modeling_olmo.py | 451 ++ .../test_image_processing_oneformer.py | 74 +- .../oneformer/test_modeling_oneformer.py | 39 +- .../oneformer/test_processor_oneformer.py | 22 +- tests/models/openai/test_modeling_openai.py | 9 +- .../models/openai/test_modeling_tf_openai.py | 9 +- .../models/openai/test_tokenization_openai.py | 3 +- tests/models/opt/test_modeling_opt.py | 5 +- .../owlv2/test_image_processing_owlv2.py | 174 + tests/models/owlv2/test_modeling_owlv2.py | 53 +- .../owlvit/test_image_processing_owlvit.py | 2 + tests/models/owlvit/test_modeling_owlvit.py | 45 +- tests/models/paligemma/__init__.py | 0 .../paligemma/test_modeling_paligemma.py | 573 +++ .../test_modeling_patchtsmixer.py | 129 +- .../models/patchtst/test_modeling_patchtst.py | 17 +- tests/models/pegasus/test_modeling_pegasus.py | 9 +- .../pegasus/test_modeling_tf_pegasus.py | 1 - .../pegasus/test_tokenization_pegasus.py | 10 +- .../pegasus_x/test_modeling_pegasus_x.py | 6 +- .../perceiver/test_modeling_perceiver.py | 114 +- .../perceiver/test_tokenization_perceiver.py | 18 +- .../persimmon/test_modeling_persimmon.py | 100 +- tests/models/phi/test_modeling_phi.py | 150 +- tests/models/phi3/__init__.py | 0 tests/models/phi3/test_modeling_phi3.py | 569 +++ .../phobert/test_tokenization_phobert.py | 1 + .../test_image_processing_pix2struct.py | 26 +- .../pix2struct/test_modeling_pix2struct.py | 40 +- .../pix2struct/test_processor_pix2struct.py | 13 +- tests/models/plbart/test_modeling_plbart.py | 18 +- .../models/plbart/test_tokenization_plbart.py | 1 + .../test_image_processing_poolformer.py | 2 + .../poolformer/test_modeling_poolformer.py | 20 +- .../test_feature_extraction_pop2piano.py | 6 +- .../pop2piano/test_modeling_pop2piano.py | 17 +- .../pop2piano/test_tokenization_pop2piano.py | 6 +- .../prophetnet/test_modeling_prophetnet.py | 11 +- .../test_tokenization_prophetnet.py | 17 +- tests/models/pvt/test_image_processing_pvt.py | 2 + tests/models/pvt/test_modeling_pvt.py | 26 +- tests/models/pvt_v2/__init__.py | 0 tests/models/pvt_v2/test_modeling_pvt_v2.py | 442 ++ tests/models/qdqbert/test_modeling_qdqbert.py | 575 --- tests/models/qwen2/__init__.py | 0 tests/models/qwen2/test_modeling_qwen2.py | 635 +++ tests/models/qwen2/test_tokenization_qwen2.py | 224 + tests/models/qwen2_audio/__init__.py | 0 .../qwen2_audio/test_modeling_qwen2_audio.py | 379 ++ .../qwen2_audio/test_processor_qwen2_audio.py | 114 + tests/models/qwen2_moe/__init__.py | 0 .../qwen2_moe/test_modeling_qwen2_moe.py | 699 +++ tests/models/rag/test_modeling_rag.py | 15 +- tests/models/rag/test_modeling_tf_rag.py | 17 +- tests/models/realm/test_modeling_realm.py | 554 -- tests/models/realm/test_retrieval_realm.py | 187 - tests/models/realm/test_tokenization_realm.py | 321 -- tests/models/recurrent_gemma/__init__.py | 0 .../test_modeling_recurrent_gemma.py | 506 ++ .../models/reformer/test_modeling_reformer.py | 41 +- .../reformer/test_tokenization_reformer.py | 5 +- tests/models/regnet/test_modeling_regnet.py | 40 +- .../models/regnet/test_modeling_tf_regnet.py | 18 +- tests/models/rembert/test_modeling_rembert.py | 10 +- .../rembert/test_tokenization_rembert.py | 17 +- tests/models/resnet/test_modeling_resnet.py | 40 +- .../models/resnet/test_modeling_tf_resnet.py | 18 +- .../roberta/test_modeling_flax_roberta.py | 2 +- tests/models/roberta/test_modeling_roberta.py | 19 +- .../roberta/test_modeling_tf_roberta.py | 13 +- .../roberta/test_tokenization_roberta.py | 4 +- ...test_modeling_flax_roberta_prelayernorm.py | 2 +- .../test_modeling_roberta_prelayernorm.py | 14 +- .../test_modeling_tf_roberta_prelayernorm.py | 9 +- .../models/roc_bert/test_modeling_roc_bert.py | 9 +- .../roc_bert/test_tokenization_roc_bert.py | 13 +- .../models/roformer/test_modeling_roformer.py | 10 +- .../roformer/test_tokenization_roformer.py | 18 +- tests/models/rt_detr/__init__.py | 0 .../rt_detr/test_image_processing_rt_detr.py | 365 ++ tests/models/rt_detr/test_modeling_rt_detr.py | 768 +++ .../rt_detr/test_modeling_rt_detr_resnet.py | 130 + tests/models/rwkv/test_modeling_rwkv.py | 55 +- tests/models/sam/test_modeling_sam.py | 16 +- tests/models/sam/test_modeling_tf_sam.py | 8 +- tests/models/sam/test_processor_sam.py | 42 +- .../test_feature_extraction_seamless_m4t.py | 93 + .../test_modeling_seamless_m4t.py | 47 +- .../test_tokenization_seamless_m4t.py | 27 +- .../test_modeling_seamless_m4t_v2.py | 38 +- .../test_image_processing_segformer.py | 22 +- .../segformer/test_modeling_segformer.py | 25 +- .../segformer/test_modeling_tf_segformer.py | 9 +- tests/models/seggpt/__init__.py | 0 .../seggpt/test_image_processing_seggpt.py | 311 ++ tests/models/seggpt/test_modeling_seggpt.py | 464 ++ tests/models/sew/test_modeling_sew.py | 28 +- tests/models/sew_d/test_modeling_sew_d.py | 28 +- tests/models/siglip/__init__.py | 0 .../test_image_processing_siglip.py} | 56 +- tests/models/siglip/test_modeling_siglip.py | 995 ++++ .../models/siglip/test_tokenization_siglip.py | 455 ++ ...st_modeling_flax_speech_encoder_decoder.py | 4 +- .../test_modeling_speech_encoder_decoder.py | 58 +- .../test_feature_extraction_speech_to_text.py | 2 +- .../test_modeling_speech_to_text.py | 34 +- .../test_modeling_tf_speech_to_text.py | 2 +- .../test_tokenization_speech_to_text.py | 1 + .../test_modeling_speech_to_text_2.py | 216 - .../test_tokenization_speech_to_text_2.py | 97 - .../test_feature_extraction_speecht5.py | 4 +- .../models/speecht5/test_modeling_speecht5.py | 362 +- .../speecht5/test_tokenization_speecht5.py | 17 + .../models/splinter/test_modeling_splinter.py | 9 +- .../squeezebert/test_modeling_squeezebert.py | 9 +- .../test_tokenization_squeezebert.py | 1 + tests/models/stablelm/__init__.py | 0 .../models/stablelm/test_modeling_stablelm.py | 620 +++ tests/models/starcoder2/__init__.py | 0 .../starcoder2/test_modeling_starcoder2.py | 569 +++ tests/models/superpoint/__init__.py | 0 .../test_image_processing_superpoint.py | 112 + .../superpoint/test_modeling_superpoint.py | 307 ++ .../swiftformer/test_modeling_swiftformer.py | 14 +- .../test_modeling_tf_swiftformer.py | 271 + tests/models/swin/test_modeling_swin.py | 52 +- tests/models/swin/test_modeling_tf_swin.py | 15 +- .../swin2sr/test_image_processing_swin2sr.py | 6 +- tests/models/swin2sr/test_modeling_swin2sr.py | 50 +- tests/models/swinv2/test_modeling_swinv2.py | 69 +- .../test_modeling_switch_transformers.py | 32 +- tests/models/t5/test_modeling_flax_t5.py | 16 +- tests/models/t5/test_modeling_t5.py | 94 +- tests/models/t5/test_modeling_tf_t5.py | 49 +- tests/models/t5/test_tokenization_t5.py | 88 +- .../test_modeling_table_transformer.py | 72 +- tests/models/tapas/test_modeling_tapas.py | 7 +- tests/models/tapas/test_modeling_tf_tapas.py | 4 + tests/models/tapas/test_tokenization_tapas.py | 61 +- .../test_modeling_time_series_transformer.py | 10 +- .../timesformer/test_modeling_timesformer.py | 14 +- .../test_modeling_timm_backbone.py | 61 +- tests/models/trocr/test_modeling_trocr.py | 12 +- .../tvlt/test_feature_extraction_tvlt.py | 182 - .../models/tvlt/test_image_processor_tvlt.py | 294 -- tests/models/tvlt/test_modeling_tvlt.py | 626 --- tests/models/tvlt/test_processor_tvlt.py | 116 - tests/models/tvp/test_image_processing_tvp.py | 2 + tests/models/tvp/test_modeling_tvp.py | 83 +- tests/models/udop/__init__.py | 0 tests/models/udop/test_modeling_udop.py | 577 +++ tests/models/udop/test_processor_udop.py | 513 ++ tests/models/udop/test_tokenization_udop.py | 1972 ++++++++ tests/models/umt5/test_modeling_umt5.py | 194 +- .../unispeech/test_modeling_unispeech.py | 13 +- .../test_modeling_unispeech_sat.py | 32 +- tests/models/univnet/test_modeling_univnet.py | 56 +- tests/models/upernet/test_modeling_upernet.py | 68 +- tests/models/video_llava/__init__.py | 0 .../test_image_processing_video_llava.py | 328 ++ .../video_llava/test_modeling_video_llava.py | 624 +++ .../test_image_processing_videomae.py | 2 + .../models/videomae/test_modeling_videomae.py | 20 +- .../models/vilt/test_image_processing_vilt.py | 6 + tests/models/vilt/test_modeling_vilt.py | 41 +- .../models/vipllava/test_modeling_vipllava.py | 123 +- .../vipllava/test_processor_vipllava.py | 41 + ...st_modeling_flax_vision_encoder_decoder.py | 4 +- ...test_modeling_tf_vision_encoder_decoder.py | 31 +- .../test_modeling_vision_encoder_decoder.py | 134 +- ..._modeling_flax_vision_text_dual_encoder.py | 3 +- ...st_modeling_tf_vision_text_dual_encoder.py | 3 +- .../test_modeling_vision_text_dual_encoder.py | 5 +- .../visual_bert/test_modeling_visual_bert.py | 9 +- tests/models/vit/test_image_processing_vit.py | 30 +- tests/models/vit/test_modeling_flax_vit.py | 3 + tests/models/vit/test_modeling_tf_vit.py | 11 +- tests/models/vit/test_modeling_vit.py | 28 +- .../vit_hybrid/test_modeling_vit_hybrid.py | 277 - .../vit_mae/test_modeling_tf_vit_mae.py | 54 +- tests/models/vit_mae/test_modeling_vit_mae.py | 60 +- tests/models/vit_msn/test_modeling_vit_msn.py | 17 +- tests/models/vitdet/test_modeling_vitdet.py | 13 +- .../test_image_processing_vitmatte.py | 24 + .../models/vitmatte/test_modeling_vitmatte.py | 62 +- tests/models/vits/test_modeling_vits.py | 25 +- tests/models/vits/test_tokenization_vits.py | 12 +- .../vivit/test_image_processing_vivit.py | 2 + tests/models/vivit/test_modeling_vivit.py | 35 +- .../test_feature_extraction_wav2vec2.py | 14 +- .../wav2vec2/test_modeling_flax_wav2vec2.py | 6 +- .../wav2vec2/test_modeling_tf_wav2vec2.py | 2 +- .../models/wav2vec2/test_modeling_wav2vec2.py | 108 +- .../wav2vec2/test_tokenization_wav2vec2.py | 35 +- tests/models/wav2vec2_bert/__init__.py | 0 .../test_modeling_wav2vec2_bert.py | 914 ++++ .../test_processor_wav2vec2_bert.py | 156 + .../test_modeling_wav2vec2_conformer.py | 20 +- .../test_tokenization_wav2vec2_phoneme.py | 20 +- .../test_processor_wav2vec2_with_lm.py | 35 +- tests/models/wavlm/test_modeling_wavlm.py | 31 +- .../test_feature_extraction_whisper.py | 52 +- .../whisper/test_modeling_flax_whisper.py | 2 +- .../whisper/test_modeling_tf_whisper.py | 4 +- tests/models/whisper/test_modeling_whisper.py | 1435 +++++- .../whisper/test_tokenization_whisper.py | 113 +- tests/models/x_clip/test_modeling_x_clip.py | 33 +- tests/models/xglm/test_modeling_tf_xglm.py | 7 +- tests/models/xglm/test_modeling_xglm.py | 10 +- tests/models/xglm/test_tokenization_xglm.py | 3 +- tests/models/xlm/test_modeling_tf_xlm.py | 9 +- tests/models/xlm/test_modeling_xlm.py | 18 +- tests/models/xlm/test_tokenization_xlm.py | 3 +- .../test_modeling_xlm_prophetnet.py | 150 - .../test_tokenization_xlm_prophetnet.py | 153 - .../test_modeling_flax_xlm_roberta.py | 4 +- .../xlm_roberta/test_modeling_xlm_roberta.py | 4 +- .../test_tokenization_xlm_roberta.py | 9 +- .../test_modeling_xlm_roberta_xl.py | 8 +- tests/models/xlnet/test_modeling_tf_xlnet.py | 9 +- tests/models/xlnet/test_modeling_xlnet.py | 17 +- tests/models/xlnet/test_tokenization_xlnet.py | 7 +- tests/models/xmod/test_modeling_xmod.py | 8 +- .../yolos/test_image_processing_yolos.py | 322 +- tests/models/yolos/test_modeling_yolos.py | 21 +- tests/models/yoso/test_modeling_yoso.py | 11 +- tests/models/zoedepth/__init__.py | 0 .../test_image_processing_zoedepth.py | 188 + .../models/zoedepth/test_modeling_zoedepth.py | 257 + tests/optimization/test_optimization.py | 31 +- .../peft_integration/test_peft_integration.py | 12 +- .../test_pipelines_audio_classification.py | 10 +- ..._pipelines_automatic_speech_recognition.py | 481 +- tests/pipelines/test_pipelines_common.py | 110 +- .../test_pipelines_conversational.py | 439 -- .../test_pipelines_depth_estimation.py | 8 +- ...t_pipelines_document_question_answering.py | 40 +- .../test_pipelines_feature_extraction.py | 18 +- tests/pipelines/test_pipelines_fill_mask.py | 10 +- .../test_pipelines_image_classification.py | 80 +- ...test_pipelines_image_feature_extraction.py | 191 + .../test_pipelines_image_segmentation.py | 10 +- .../test_pipelines_image_to_image.py | 10 +- .../pipelines/test_pipelines_image_to_text.py | 69 +- .../test_pipelines_mask_generation.py | 8 +- .../test_pipelines_object_detection.py | 6 +- .../test_pipelines_question_answering.py | 4 +- .../pipelines/test_pipelines_summarization.py | 4 +- ...test_pipelines_table_question_answering.py | 40 +- .../test_pipelines_text2text_generation.py | 4 +- .../test_pipelines_text_classification.py | 46 +- .../test_pipelines_text_generation.py | 181 +- .../pipelines/test_pipelines_text_to_audio.py | 25 +- .../test_pipelines_token_classification.py | 6 +- tests/pipelines/test_pipelines_translation.py | 8 +- .../test_pipelines_video_classification.py | 11 +- ...est_pipelines_visual_question_answering.py | 71 +- tests/pipelines/test_pipelines_zero_shot.py | 12 +- ...ipelines_zero_shot_audio_classification.py | 30 +- ...ipelines_zero_shot_image_classification.py | 44 +- ...st_pipelines_zero_shot_object_detection.py | 10 +- .../quantization/aqlm_integration/__init__.py | 0 .../aqlm_integration/test_aqlm.py | 256 + tests/quantization/autoawq/test_awq.py | 142 +- tests/quantization/bnb/README.md | 8 +- tests/quantization/bnb/test_4bit.py | 59 +- tests/quantization/bnb/test_mixed_int8.py | 88 +- .../quantization/eetq_integration/__init__.py | 0 .../eetq_integration/test_eetq.py | 171 + tests/quantization/fbgemm_fp8/__init__.py | 0 .../fbgemm_fp8/test_fbgemm_fp8.py | 270 + tests/quantization/ggml/__init__.py | 0 tests/quantization/ggml/test_ggml.py | 266 + tests/quantization/gptq/test_gptq.py | 28 +- tests/quantization/hqq/test_hqq.py | 167 + .../quanto_integration/__init__.py | 0 .../quanto_integration/test_quanto.py | 469 ++ .../torchao_integration/__init__.py | 0 .../torchao_integration/test_torchao.py | 213 + .../pytorch/run_glue_model_parallelism.py | 6 +- tests/sagemaker/scripts/tensorflow/run_tf.py | 22 +- .../scripts/tensorflow/run_tf_dist.py | 9 +- .../test_multi_node_data_parallel.py | 6 +- .../test_multi_node_model_parallel.py | 4 +- tests/sagemaker/test_single_node_gpu.py | 4 +- tests/test_cache_utils.py | 231 - tests/test_configuration_common.py | 4 +- tests/test_feature_extraction_utils.py | 144 - tests/test_image_processing_common.py | 433 +- tests/test_image_processing_utils.py | 154 - tests/test_image_transforms.py | 4 + tests/test_modeling_common.py | 1834 +++++-- tests/test_modeling_flax_common.py | 8 +- tests/test_modeling_tf_common.py | 47 +- tests/test_pipeline_mixin.py | 229 +- tests/test_processing_common.py | 362 ++ tests/test_tokenization_common.py | 569 ++- tests/test_tokenization_utils.py | 280 -- tests/tokenization/test_tokenization_fast.py | 19 +- tests/tokenization/test_tokenization_utils.py | 37 +- tests/tools/test_image_captioning.py | 53 - tests/tools/test_image_segmentation.py | 53 - tests/tools/test_python_interpreter.py | 131 - tests/tools/test_text_classification.py | 43 - tests/tools/test_text_question_answering.py | 52 - tests/tools/test_text_summarization.py | 64 - tests/tools/test_tools_common.py | 133 - tests/tools/test_translation.py | 86 - tests/trainer/test_data_collator.py | 1219 +++++ tests/trainer/test_trainer.py | 1642 +++++- tests/trainer/test_trainer_callback.py | 261 +- tests/trainer/test_trainer_distributed.py | 17 +- tests/trainer/test_trainer_seq2seq.py | 43 +- tests/trainer/test_trainer_utils.py | 90 + tests/utils/test_add_new_model_like.py | 63 +- tests/utils/test_audio_utils.py | 983 ++++ tests/utils/test_backbone_utils.py | 170 +- tests/utils/test_cache_utils.py | 598 +++ tests/utils/test_chat_template_utils.py | 476 ++ tests/{ => utils}/test_configuration_utils.py | 185 +- tests/utils/test_deprecation.py | 170 + tests/utils/test_doc_samples.py | 2 +- tests/utils/test_feature_extraction_utils.py | 150 + tests/utils/test_generic.py | 73 + tests/utils/test_hf_argparser.py | 71 +- tests/utils/test_hub_utils.py | 25 +- tests/utils/test_image_processing_utils.py | 144 +- tests/utils/test_image_utils.py | 58 +- tests/utils/test_model_card.py | 7 +- tests/utils/test_model_output.py | 43 +- tests/{ => utils}/test_modeling_flax_utils.py | 253 +- tests/utils/test_modeling_rope_utils.py | 439 ++ tests/utils/test_modeling_tf_core.py | 25 +- tests/{ => utils}/test_modeling_tf_utils.py | 362 +- tests/{ => utils}/test_modeling_utils.py | 1193 +++-- tests/utils/test_offline.py | 103 +- tests/utils/test_tokenization_utils.py | 336 ++ tests/utils/test_versions_utils.py | 2 +- tests/utils/tiny_model_summary.json | 44 - utils/add_pipeline_model_mapping_to_test.py | 1 - utils/check_config_attributes.py | 33 +- utils/check_config_docstrings.py | 4 +- utils/check_copies.py | 240 +- utils/check_doc_toc.py | 1 - utils/check_docstrings.py | 294 +- utils/check_doctest_list.py | 1 + utils/check_dummies.py | 1 + utils/check_inits.py | 1 + utils/check_repo.py | 29 +- utils/check_support_list.py | 3 +- utils/check_table.py | 4 +- utils/check_task_guides.py | 168 - utils/create_dummy_models.py | 2 +- utils/custom_init_isort.py | 1 + utils/deprecate_models.py | 378 ++ utils/diff_model_converter.py | 602 +++ utils/download_glue_data.py | 2 +- utils/extract_warnings.py | 2 +- utils/get_ci_error_statistics.py | 26 + utils/get_previous_daily_ci.py | 7 +- utils/important_models.txt | 4 + utils/models_to_deprecate.py | 200 + utils/not_doctested.txt | 97 +- utils/notification_service.py | 265 +- utils/notification_service_doc_tests.py | 164 +- utils/notification_service_quantization.py | 274 + utils/patch_helper.py | 92 + utils/pr_slow_ci_models.py | 145 + utils/release.py | 36 +- utils/set_cuda_devices_for_ci.py | 26 + utils/slow_documentation_tests.txt | 7 +- utils/sort_auto_mappings.py | 1 + utils/split_doctest_jobs.py | 91 + utils/split_model_tests.py | 65 + utils/tests_fetcher.py | 45 +- utils/update_metadata.py | 32 + utils/update_tiny_models.py | 3 +- 3288 files changed, 270083 insertions(+), 92355 deletions(-) create mode 100644 .circleci/parse_test_outputs.py create mode 100644 .github/workflows/benchmark.yml create mode 100644 .github/workflows/build-ci-docker-images.yml create mode 100644 .github/workflows/doctest_job.yml delete mode 100644 .github/workflows/model-templates.yml create mode 100644 .github/workflows/model_jobs.yml create mode 100644 .github/workflows/push-important-models.yml create mode 100644 .github/workflows/self-nightly-caller.yml delete mode 100644 .github/workflows/self-nightly-scheduled.yml create mode 100644 .github/workflows/self-past-caller.yml delete mode 100644 .github/workflows/self-past.yml create mode 100644 .github/workflows/self-pr-slow-ci.yml create mode 100644 .github/workflows/self-push-amd-mi300-caller.yml create mode 100644 .github/workflows/self-scheduled-amd-mi300-caller.yml create mode 100644 .github/workflows/self-scheduled-caller.yml create mode 100644 .github/workflows/slack-report.yml create mode 100644 .github/workflows/ssh-runner.yml create mode 100644 .github/workflows/trufflehog.yml delete mode 100644 README_es.md delete mode 100644 README_hd.md delete mode 100644 README_ja.md delete mode 100644 README_ko.md delete mode 100644 README_pt-br.md delete mode 100644 README_ru.md delete mode 100644 README_te.md delete mode 100644 README_zh-hans.md delete mode 100644 README_zh-hant.md rename {tests/models/deta => benchmark}/__init__.py (100%) create mode 100644 benchmark/benchmark.py create mode 100644 benchmark/config/generation.yaml create mode 100644 benchmark/optimum_benchmark_wrapper.py create mode 100644 docker/consistency.dockerfile create mode 100644 docker/custom-tokenizers.dockerfile create mode 100644 docker/examples-tf.dockerfile create mode 100644 docker/examples-torch.dockerfile create mode 100644 docker/exotic-models.dockerfile create mode 100644 docker/jax-light.dockerfile create mode 100644 docker/pipeline-tf.dockerfile create mode 100644 docker/pipeline-torch.dockerfile create mode 100644 docker/quality.dockerfile create mode 100644 docker/tf-light.dockerfile create mode 100644 docker/torch-jax-light.dockerfile create mode 100644 docker/torch-light.dockerfile create mode 100644 docker/torch-tf-light.dockerfile create mode 100755 docker/transformers-quantization-latest-gpu/Dockerfile delete mode 100644 docs/source/de/add_tensorflow_model.md create mode 100644 docs/source/de/contributing.md delete mode 100644 docs/source/en/add_tensorflow_model.md create mode 100644 docs/source/en/agents.md create mode 100644 docs/source/en/conversations.md delete mode 100644 docs/source/en/custom_tools.md create mode 100644 docs/source/en/deepspeed.md create mode 100644 docs/source/en/gguf.md create mode 100644 docs/source/en/kv_cache.md create mode 100644 docs/source/en/llm_optims.md mode change 100644 => 100755 docs/source/en/main_classes/quantization.md create mode 100644 docs/source/en/model_doc/chameleon.md create mode 100644 docs/source/en/model_doc/cohere.md create mode 100644 docs/source/en/model_doc/dac.md create mode 100644 docs/source/en/model_doc/dbrx.md create mode 100644 docs/source/en/model_doc/depth_anything.md create mode 100644 docs/source/en/model_doc/depth_anything_v2.md create mode 100644 docs/source/en/model_doc/falcon_mamba.md create mode 100644 docs/source/en/model_doc/fastspeech2_conformer.md create mode 100644 docs/source/en/model_doc/gemma.md create mode 100644 docs/source/en/model_doc/gemma2.md create mode 100644 docs/source/en/model_doc/grounding-dino.md create mode 100644 docs/source/en/model_doc/hiera.md create mode 100644 docs/source/en/model_doc/idefics2.md create mode 100644 docs/source/en/model_doc/instructblipvideo.md create mode 100644 docs/source/en/model_doc/jamba.md create mode 100644 docs/source/en/model_doc/jetmoe.md create mode 100644 docs/source/en/model_doc/llama3.md create mode 100644 docs/source/en/model_doc/llava_next.md create mode 100644 docs/source/en/model_doc/llava_next_video.md create mode 100644 docs/source/en/model_doc/mamba.md create mode 100644 docs/source/en/model_doc/mamba2.md create mode 100644 docs/source/en/model_doc/musicgen_melody.md create mode 100644 docs/source/en/model_doc/nemotron.md create mode 100644 docs/source/en/model_doc/olmo.md create mode 100644 docs/source/en/model_doc/paligemma.md create mode 100644 docs/source/en/model_doc/phi3.md create mode 100644 docs/source/en/model_doc/pvt_v2.md create mode 100644 docs/source/en/model_doc/qwen2.md create mode 100644 docs/source/en/model_doc/qwen2_audio.md create mode 100644 docs/source/en/model_doc/qwen2_moe.md create mode 100644 docs/source/en/model_doc/recurrent_gemma.md create mode 100644 docs/source/en/model_doc/rt_detr.md create mode 100644 docs/source/en/model_doc/seggpt.md create mode 100644 docs/source/en/model_doc/siglip.md create mode 100644 docs/source/en/model_doc/stablelm.md create mode 100644 docs/source/en/model_doc/starcoder2.md create mode 100644 docs/source/en/model_doc/superpoint.md create mode 100644 docs/source/en/model_doc/udop.md create mode 100644 docs/source/en/model_doc/video_llava.md create mode 100644 docs/source/en/model_doc/wav2vec2-bert.md create mode 100644 docs/source/en/model_doc/zoedepth.md delete mode 100644 docs/source/en/quantization.md create mode 100644 docs/source/en/quantization/aqlm.md create mode 100644 docs/source/en/quantization/awq.md create mode 100644 docs/source/en/quantization/bitsandbytes.md create mode 100644 docs/source/en/quantization/contribute.md create mode 100644 docs/source/en/quantization/eetq.md create mode 100644 docs/source/en/quantization/fbgemm_fp8.md create mode 100644 docs/source/en/quantization/gptq.md create mode 100644 docs/source/en/quantization/hqq.md create mode 100644 docs/source/en/quantization/optimum.md create mode 100644 docs/source/en/quantization/overview.md create mode 100644 docs/source/en/quantization/quanto.md create mode 100644 docs/source/en/quantization/torchao.md mode change 100644 => 100755 docs/source/en/quicktour.md create mode 100644 docs/source/en/tasks/image_feature_extraction.md create mode 100644 docs/source/en/tasks/image_text_to_text.md create mode 100644 docs/source/en/tasks/mask_generation.md delete mode 100644 docs/source/en/transformers_agents.md create mode 100644 docs/source/es/attention.md create mode 100644 docs/source/es/chat_templating.md create mode 100644 docs/source/es/model_memory_anatomy.md create mode 100644 docs/source/es/performance.md create mode 100644 docs/source/es/pipeline_webserver.md create mode 100644 docs/source/es/task_summary.md create mode 100644 docs/source/es/tasks/image_captioning.md create mode 100644 docs/source/es/tasks_explained.md create mode 100644 docs/source/es/tokenizer_summary.md create mode 100644 docs/source/es/torchscript.md create mode 100644 docs/source/es/trainer.md create mode 100644 docs/source/fr/run_scripts_fr.md create mode 100644 docs/source/fr/tutoriel_pipeline.md delete mode 100644 docs/source/ja/add_tensorflow_model.md create mode 100644 docs/source/ja/model_doc/deit.md create mode 100644 docs/source/ja/model_doc/deplot.md create mode 100644 docs/source/ja/model_doc/deta.md create mode 100644 docs/source/ja/model_doc/detr.md create mode 100644 docs/source/ja/model_doc/dialogpt.md create mode 100644 docs/source/ja/model_doc/dinat.md delete mode 100644 docs/source/ko/add_tensorflow_model.md create mode 100644 docs/source/ko/chat_templating.md delete mode 100644 docs/source/ko/custom_tools.md create mode 100644 docs/source/ko/deepspeed.md create mode 100644 docs/source/ko/fsdp.md create mode 100644 docs/source/ko/generation_strategies.md create mode 100644 docs/source/ko/llm_tutorial_optimization.md create mode 100644 docs/source/ko/main_classes/agent.md delete mode 100644 docs/source/ko/perf_infer_gpu_many.md create mode 100644 docs/source/ko/quantization/awq.md create mode 100644 docs/source/ko/quantization/bitsandbytes.md create mode 100644 docs/source/ko/quantization/eetq.md create mode 100644 docs/source/ko/quantization/gptq.md create mode 100644 docs/source/ko/quantization/quanto.md create mode 100644 docs/source/ko/tasks/idefics.md create mode 100644 docs/source/ko/tasks/image_feature_extraction.md create mode 100644 docs/source/ko/tasks/image_to_image.md create mode 100644 docs/source/ko/tasks/knowledge_distillation_for_image_classification.md create mode 100644 docs/source/ko/tasks/mask_generation.md create mode 100644 docs/source/ko/tasks/prompting.md create mode 100644 docs/source/ko/trainer.md create mode 100644 docs/source/zh/add_new_pipeline.md create mode 100644 docs/source/zh/chat_templating.md create mode 100644 docs/source/zh/contributing.md create mode 100644 docs/source/zh/fsdp.md create mode 100644 docs/source/zh/philosophy.md create mode 100644 docs/source/zh/tasks/asr.md create mode 100644 docs/source/zh/torchscript.md create mode 100644 examples/diff-conversion/README.md create mode 100644 examples/diff-conversion/convert_examples.sh create mode 100644 examples/diff-conversion/diff_dummy.py create mode 100644 examples/diff-conversion/diff_my_new_model.py create mode 100644 examples/diff-conversion/diff_my_new_model2.py create mode 100644 examples/diff-conversion/diff_new_model.py create mode 100644 examples/diff-conversion/diff_super.py delete mode 100755 examples/legacy/text-classification/run_tf_text_classification.py delete mode 100755 examples/legacy/token-classification/run_tf_ner.py create mode 100644 examples/pytorch/instance-segmentation/README.md create mode 100644 examples/pytorch/instance-segmentation/requirements.txt create mode 100644 examples/pytorch/instance-segmentation/run_instance_segmentation.py create mode 100644 examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py create mode 100644 examples/pytorch/language-modeling/run_fim.py create mode 100644 examples/pytorch/language-modeling/run_fim_no_trainer.py create mode 100644 examples/pytorch/object-detection/README.md create mode 100644 examples/pytorch/object-detection/requirements.txt create mode 100644 examples/pytorch/object-detection/run_object_detection.py create mode 100644 examples/pytorch/object-detection/run_object_detection_no_trainer.py create mode 100644 examples/research_projects/token-healing/README.md create mode 100644 examples/research_projects/token-healing/run_token_healing.py create mode 100644 i18n/README_ar.md create mode 100644 i18n/README_de.md create mode 100644 i18n/README_es.md create mode 100644 i18n/README_fr.md create mode 100644 i18n/README_hd.md create mode 100644 i18n/README_ja.md create mode 100644 i18n/README_ko.md create mode 100644 i18n/README_pt-br.md create mode 100644 i18n/README_ru.md create mode 100644 i18n/README_te.md create mode 100644 i18n/README_vi.md create mode 100644 i18n/README_zh-hans.md create mode 100644 i18n/README_zh-hant.md mode change 100644 => 100755 src/transformers/__init__.py rename src/transformers/{tools => agents}/__init__.py (65%) rename src/transformers/{tools => agents}/agent_types.py (68%) create mode 100644 src/transformers/agents/agents.py create mode 100644 src/transformers/agents/default_tools.py rename src/transformers/{tools => agents}/document_question_answering.py (80%) rename src/transformers/{tools => agents}/evaluate_agent.py (56%) rename src/transformers/{tools => agents}/image_question_answering.py (81%) create mode 100644 src/transformers/agents/llm_engine.py create mode 100644 src/transformers/agents/monitoring.py create mode 100644 src/transformers/agents/prompts.py create mode 100644 src/transformers/agents/python_interpreter.py rename src/transformers/{tools => agents}/speech_to_text.py (67%) rename src/transformers/{tools => agents}/text_to_speech.py (81%) rename src/transformers/{tools/base.py => agents/tools.py} (70%) rename src/transformers/{tools => agents}/translation.py (91%) delete mode 100644 src/transformers/commands/add_new_model.py create mode 100644 src/transformers/generation/watermarking.py delete mode 100644 src/transformers/generation_flax_utils.py delete mode 100644 src/transformers/generation_tf_utils.py delete mode 100644 src/transformers/generation_utils.py create mode 100644 src/transformers/image_processing_base.py create mode 100644 src/transformers/image_processing_utils_fast.py mode change 100644 => 100755 src/transformers/integrations/__init__.py create mode 100644 src/transformers/integrations/aqlm.py create mode 100644 src/transformers/integrations/eetq.py create mode 100644 src/transformers/integrations/fbgemm_fp8.py create mode 100644 src/transformers/integrations/ggml.py create mode 100755 src/transformers/integrations/hqq.py mode change 100644 => 100755 src/transformers/integrations/integration_utils.py create mode 100644 src/transformers/integrations/quanto.py create mode 100644 src/transformers/integrations/tpu.py create mode 100644 src/transformers/kernels/deta/cpu/ms_deform_attn_cpu.cpp create mode 100644 src/transformers/kernels/deta/cpu/ms_deform_attn_cpu.h create mode 100644 src/transformers/kernels/deta/cuda/ms_deform_attn_cuda.cu create mode 100644 src/transformers/kernels/deta/cuda/ms_deform_attn_cuda.cuh create mode 100644 src/transformers/kernels/deta/cuda/ms_deform_attn_cuda.h create mode 100644 src/transformers/kernels/deta/cuda/ms_deform_im2col_cuda.cuh create mode 100644 src/transformers/kernels/deta/ms_deform_attn.h create mode 100644 src/transformers/kernels/deta/vision.cpp create mode 100644 src/transformers/modeling_flash_attention_utils.py create mode 100644 src/transformers/modeling_gguf_pytorch_utils.py create mode 100644 src/transformers/modeling_rope_utils.py mode change 100644 => 100755 src/transformers/modeling_utils.py mode change 100755 => 100644 src/transformers/models/auto/configuration_auto.py mode change 100755 => 100644 src/transformers/models/auto/modeling_auto.py create mode 100644 src/transformers/models/chameleon/__init__.py create mode 100644 src/transformers/models/chameleon/configuration_chameleon.py create mode 100644 src/transformers/models/chameleon/convert_chameleon_weights_to_hf.py create mode 100644 src/transformers/models/chameleon/image_processing_chameleon.py create mode 100644 src/transformers/models/chameleon/modeling_chameleon.py create mode 100644 src/transformers/models/chameleon/processing_chameleon.py create mode 100644 src/transformers/models/cohere/__init__.py create mode 100644 src/transformers/models/cohere/configuration_cohere.py create mode 100644 src/transformers/models/cohere/modeling_cohere.py create mode 100644 src/transformers/models/cohere/tokenization_cohere_fast.py create mode 100644 src/transformers/models/dac/__init__.py create mode 100644 src/transformers/models/dac/configuration_dac.py create mode 100644 src/transformers/models/dac/convert_dac_checkpoint.py create mode 100644 src/transformers/models/dac/feature_extraction_dac.py create mode 100644 src/transformers/models/dac/modeling_dac.py create mode 100644 src/transformers/models/dbrx/__init__.py create mode 100644 src/transformers/models/dbrx/configuration_dbrx.py create mode 100644 src/transformers/models/dbrx/modeling_dbrx.py rename src/transformers/models/{ => deprecated}/deta/__init__.py (83%) rename src/transformers/models/{ => deprecated}/deta/configuration_deta.py (80%) rename src/transformers/models/{ => deprecated}/deta/convert_deta_resnet_to_pytorch.py (98%) rename src/transformers/models/{ => deprecated}/deta/convert_deta_swin_to_pytorch.py (99%) rename src/transformers/models/{ => deprecated}/deta/image_processing_deta.py (75%) rename src/transformers/models/{ => deprecated}/deta/modeling_deta.py (93%) rename src/transformers/models/{ => deprecated}/efficientformer/__init__.py (84%) rename src/transformers/models/{ => deprecated}/efficientformer/configuration_efficientformer.py (95%) rename src/transformers/models/{ => deprecated}/efficientformer/convert_efficientformer_original_pytorch_checkpoint_to_pytorch.py (100%) rename src/transformers/models/{ => deprecated}/efficientformer/image_processing_efficientformer.py (92%) rename src/transformers/models/{ => deprecated}/efficientformer/modeling_efficientformer.py (98%) rename src/transformers/models/{ => deprecated}/efficientformer/modeling_tf_efficientformer.py (91%) rename src/transformers/models/{ => deprecated}/ernie_m/__init__.py (85%) rename src/transformers/models/{ => deprecated}/ernie_m/configuration_ernie_m.py (90%) rename src/transformers/models/{ => deprecated}/ernie_m/modeling_ernie_m.py (97%) rename src/transformers/models/{ => deprecated}/ernie_m/tokenization_ernie_m.py (93%) rename src/transformers/models/{ => deprecated}/gptsan_japanese/__init__.py (84%) rename src/transformers/models/{ => deprecated}/gptsan_japanese/configuration_gptsan_japanese.py (95%) rename src/transformers/models/{ => deprecated}/gptsan_japanese/convert_gptsan_tf_checkpoint_to_pytorch.py (100%) rename src/transformers/models/{ => deprecated}/gptsan_japanese/modeling_gptsan_japanese.py (96%) rename src/transformers/models/{ => deprecated}/gptsan_japanese/tokenization_gptsan_japanese.py (89%) rename src/transformers/models/{ => deprecated}/graphormer/__init__.py (77%) rename src/transformers/models/{ => deprecated}/graphormer/algos_graphormer.pyx (100%) rename src/transformers/models/{ => deprecated}/graphormer/collating_graphormer.py (98%) rename src/transformers/models/{ => deprecated}/graphormer/configuration_graphormer.py (96%) rename src/transformers/models/{ => deprecated}/graphormer/modeling_graphormer.py (98%) rename src/transformers/models/{ => deprecated}/jukebox/__init__.py (86%) rename src/transformers/models/{ => deprecated}/jukebox/configuration_jukebox.py (98%) rename src/transformers/models/{ => deprecated}/jukebox/convert_jukebox.py (100%) rename src/transformers/models/{ => deprecated}/jukebox/modeling_jukebox.py (99%) rename src/transformers/models/{ => deprecated}/jukebox/tokenization_jukebox.py (95%) rename src/transformers/models/{ => deprecated}/mega/__init__.py (85%) rename src/transformers/models/{ => deprecated}/mega/configuration_mega.py (97%) rename src/transformers/models/{ => deprecated}/mega/convert_mega_original_pytorch_checkpoint_to_pytorch.py (99%) rename src/transformers/models/{ => deprecated}/mega/modeling_mega.py (99%) rename src/transformers/models/{ => deprecated}/nat/__init__.py (80%) rename src/transformers/models/{ => deprecated}/nat/configuration_nat.py (93%) rename src/transformers/models/{ => deprecated}/nat/modeling_nat.py (97%) rename src/transformers/models/{ => deprecated}/nezha/__init__.py (83%) rename src/transformers/models/{ => deprecated}/nezha/configuration_nezha.py (95%) rename src/transformers/models/{ => deprecated}/nezha/modeling_nezha.py (98%) rename src/transformers/models/{ => deprecated}/qdqbert/__init__.py (84%) rename src/transformers/models/{ => deprecated}/qdqbert/configuration_qdqbert.py (87%) rename src/transformers/models/{ => deprecated}/qdqbert/modeling_qdqbert.py (98%) rename src/transformers/models/{ => deprecated}/realm/__init__.py (85%) rename src/transformers/models/{ => deprecated}/realm/configuration_realm.py (83%) rename src/transformers/models/{ => deprecated}/realm/modeling_realm.py (97%) rename src/transformers/models/{ => deprecated}/realm/retrieval_realm.py (99%) rename src/transformers/models/{ => deprecated}/realm/tokenization_realm.py (89%) rename src/transformers/models/{ => deprecated}/realm/tokenization_realm_fast.py (73%) rename src/transformers/models/{ => deprecated}/speech_to_text_2/__init__.py (83%) rename src/transformers/models/{ => deprecated}/speech_to_text_2/configuration_speech_to_text_2.py (93%) rename src/transformers/models/{ => deprecated}/speech_to_text_2/modeling_speech_to_text_2.py (97%) rename src/transformers/models/{ => deprecated}/speech_to_text_2/processing_speech_to_text_2.py (98%) rename src/transformers/models/{ => deprecated}/speech_to_text_2/tokenization_speech_to_text_2.py (90%) rename src/transformers/models/{ => deprecated}/tvlt/__init__.py (89%) rename src/transformers/models/{ => deprecated}/tvlt/configuration_tvlt.py (95%) rename src/transformers/models/{ => deprecated}/tvlt/feature_extraction_tvlt.py (98%) rename src/transformers/models/{ => deprecated}/tvlt/image_processing_tvlt.py (93%) rename src/transformers/models/{ => deprecated}/tvlt/modeling_tvlt.py (97%) rename src/transformers/models/{ => deprecated}/tvlt/processing_tvlt.py (98%) rename src/transformers/models/{ => deprecated}/vit_hybrid/__init__.py (81%) rename src/transformers/models/{ => deprecated}/vit_hybrid/configuration_vit_hybrid.py (75%) rename src/transformers/models/{ => deprecated}/vit_hybrid/convert_vit_hybrid_timm_to_pytorch.py (99%) rename src/transformers/models/{ => deprecated}/vit_hybrid/image_processing_vit_hybrid.py (92%) rename src/transformers/models/{ => deprecated}/vit_hybrid/modeling_vit_hybrid.py (93%) rename src/transformers/models/{ => deprecated}/xlm_prophetnet/__init__.py (82%) rename src/transformers/models/{ => deprecated}/xlm_prophetnet/configuration_xlm_prophetnet.py (96%) rename src/transformers/models/{ => deprecated}/xlm_prophetnet/modeling_xlm_prophetnet.py (96%) rename src/transformers/models/{ => deprecated}/xlm_prophetnet/tokenization_xlm_prophetnet.py (95%) create mode 100644 src/transformers/models/depth_anything/__init__.py create mode 100644 src/transformers/models/depth_anything/configuration_depth_anything.py create mode 100644 src/transformers/models/depth_anything/convert_depth_anything_to_hf.py create mode 100644 src/transformers/models/depth_anything/modeling_depth_anything.py create mode 100644 src/transformers/models/dinov2/modeling_flax_dinov2.py create mode 100644 src/transformers/models/falcon_mamba/__init__.py create mode 100644 src/transformers/models/falcon_mamba/configuration_falcon_mamba.py create mode 100644 src/transformers/models/falcon_mamba/modeling_falcon_mamba.py create mode 100644 src/transformers/models/fastspeech2_conformer/__init__.py create mode 100644 src/transformers/models/fastspeech2_conformer/configuration_fastspeech2_conformer.py create mode 100644 src/transformers/models/fastspeech2_conformer/convert_fastspeech2_conformer_original_pytorch_checkpoint_to_pytorch.py create mode 100644 src/transformers/models/fastspeech2_conformer/convert_hifigan.py create mode 100644 src/transformers/models/fastspeech2_conformer/convert_model_with_hifigan.py create mode 100644 src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py create mode 100644 src/transformers/models/fastspeech2_conformer/tokenization_fastspeech2_conformer.py create mode 100644 src/transformers/models/gemma/__init__.py create mode 100644 src/transformers/models/gemma/configuration_gemma.py create mode 100644 src/transformers/models/gemma/convert_gemma_weights_to_hf.py create mode 100644 src/transformers/models/gemma/diff_gemma.py create mode 100644 src/transformers/models/gemma/modeling_flax_gemma.py create mode 100644 src/transformers/models/gemma/modeling_gemma.py create mode 100644 src/transformers/models/gemma/tokenization_gemma.py create mode 100644 src/transformers/models/gemma/tokenization_gemma_fast.py create mode 100644 src/transformers/models/gemma2/__init__.py create mode 100644 src/transformers/models/gemma2/configuration_gemma2.py create mode 100644 src/transformers/models/gemma2/convert_gemma2_weights_to_hf.py create mode 100644 src/transformers/models/gemma2/diff_gemma2.py create mode 100644 src/transformers/models/gemma2/modeling_gemma2.py create mode 100644 src/transformers/models/grounding_dino/__init__.py create mode 100644 src/transformers/models/grounding_dino/configuration_grounding_dino.py create mode 100644 src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py create mode 100644 src/transformers/models/grounding_dino/image_processing_grounding_dino.py create mode 100644 src/transformers/models/grounding_dino/modeling_grounding_dino.py create mode 100644 src/transformers/models/grounding_dino/processing_grounding_dino.py create mode 100644 src/transformers/models/hiera/__init__.py create mode 100644 src/transformers/models/hiera/configuration_hiera.py create mode 100644 src/transformers/models/hiera/convert_hiera_to_hf.py create mode 100644 src/transformers/models/hiera/modeling_hiera.py create mode 100644 src/transformers/models/idefics/modeling_tf_idefics.py create mode 100644 src/transformers/models/idefics/perceiver_tf.py create mode 100644 src/transformers/models/idefics/vision_tf.py create mode 100644 src/transformers/models/idefics2/__init__.py create mode 100644 src/transformers/models/idefics2/configuration_idefics2.py create mode 100644 src/transformers/models/idefics2/convert_idefics2_weights_to_hf.py create mode 100644 src/transformers/models/idefics2/image_processing_idefics2.py create mode 100644 src/transformers/models/idefics2/modeling_idefics2.py create mode 100644 src/transformers/models/idefics2/processing_idefics2.py create mode 100644 src/transformers/models/instructblipvideo/__init__.py create mode 100644 src/transformers/models/instructblipvideo/configuration_instructblipvideo.py create mode 100644 src/transformers/models/instructblipvideo/convert_instructblipvideo_original_to_pytorch.py create mode 100644 src/transformers/models/instructblipvideo/diff_instructblipvideo.py create mode 100644 src/transformers/models/instructblipvideo/image_processing_instructblipvideo.py create mode 100644 src/transformers/models/instructblipvideo/modeling_instructblipvideo.py create mode 100644 src/transformers/models/instructblipvideo/processing_instructblipvideo.py create mode 100644 src/transformers/models/jamba/__init__.py create mode 100644 src/transformers/models/jamba/configuration_jamba.py create mode 100755 src/transformers/models/jamba/modeling_jamba.py create mode 100644 src/transformers/models/jetmoe/__init__.py create mode 100644 src/transformers/models/jetmoe/configuration_jetmoe.py create mode 100644 src/transformers/models/jetmoe/modeling_jetmoe.py create mode 100644 src/transformers/models/llava_next/__init__.py create mode 100644 src/transformers/models/llava_next/configuration_llava_next.py create mode 100644 src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py create mode 100644 src/transformers/models/llava_next/image_processing_llava_next.py create mode 100644 src/transformers/models/llava_next/modeling_llava_next.py create mode 100644 src/transformers/models/llava_next/processing_llava_next.py create mode 100644 src/transformers/models/llava_next_video/__init__.py create mode 100644 src/transformers/models/llava_next_video/configuration_llava_next_video.py create mode 100644 src/transformers/models/llava_next_video/convert_llava_next_video_weights_to_hf.py create mode 100644 src/transformers/models/llava_next_video/diff_llava_next_video.py create mode 100644 src/transformers/models/llava_next_video/image_processing_llava_next_video.py create mode 100644 src/transformers/models/llava_next_video/modeling_llava_next_video.py create mode 100644 src/transformers/models/llava_next_video/processing_llava_next_video.py create mode 100644 src/transformers/models/mamba/__init__.py create mode 100644 src/transformers/models/mamba/configuration_mamba.py create mode 100644 src/transformers/models/mamba/convert_mamba_ssm_checkpoint_to_pytorch.py create mode 100644 src/transformers/models/mamba/modeling_mamba.py create mode 100644 src/transformers/models/mamba2/__init__.py create mode 100644 src/transformers/models/mamba2/configuration_mamba2.py create mode 100644 src/transformers/models/mamba2/convert_mamba2_ssm_checkpoint_to_pytorch.py create mode 100644 src/transformers/models/mamba2/modeling_mamba2.py create mode 100644 src/transformers/models/mistral/modeling_flax_mistral.py create mode 100644 src/transformers/models/mistral/modeling_tf_mistral.py create mode 100644 src/transformers/models/musicgen_melody/__init__.py create mode 100644 src/transformers/models/musicgen_melody/configuration_musicgen_melody.py create mode 100644 src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py create mode 100644 src/transformers/models/musicgen_melody/feature_extraction_musicgen_melody.py create mode 100644 src/transformers/models/musicgen_melody/modeling_musicgen_melody.py create mode 100644 src/transformers/models/musicgen_melody/processing_musicgen_melody.py create mode 100644 src/transformers/models/nemotron/__init__.py create mode 100644 src/transformers/models/nemotron/configuration_nemotron.py create mode 100644 src/transformers/models/nemotron/convert_nemotron_nemo_to_hf.py create mode 100644 src/transformers/models/nemotron/modeling_nemotron.py create mode 100644 src/transformers/models/olmo/__init__.py create mode 100644 src/transformers/models/olmo/configuration_olmo.py create mode 100644 src/transformers/models/olmo/convert_olmo_weights_to_hf.py create mode 100644 src/transformers/models/olmo/modeling_olmo.py create mode 100644 src/transformers/models/paligemma/__init__.py create mode 100644 src/transformers/models/paligemma/configuration_paligemma.py create mode 100644 src/transformers/models/paligemma/convert_paligemma_weights_to_hf.py create mode 100644 src/transformers/models/paligemma/modeling_paligemma.py create mode 100644 src/transformers/models/paligemma/processing_paligemma.py create mode 100644 src/transformers/models/phi3/__init__.py create mode 100644 src/transformers/models/phi3/configuration_phi3.py create mode 100644 src/transformers/models/phi3/modeling_phi3.py create mode 100644 src/transformers/models/pvt_v2/__init__.py create mode 100644 src/transformers/models/pvt_v2/configuration_pvt_v2.py create mode 100644 src/transformers/models/pvt_v2/convert_pvt_v2_to_pytorch.py create mode 100644 src/transformers/models/pvt_v2/modeling_pvt_v2.py create mode 100644 src/transformers/models/qwen2/__init__.py create mode 100644 src/transformers/models/qwen2/configuration_qwen2.py create mode 100644 src/transformers/models/qwen2/modeling_qwen2.py create mode 100644 src/transformers/models/qwen2/tokenization_qwen2.py create mode 100644 src/transformers/models/qwen2/tokenization_qwen2_fast.py create mode 100644 src/transformers/models/qwen2_audio/__init__.py create mode 100644 src/transformers/models/qwen2_audio/configuration_qwen2_audio.py create mode 100644 src/transformers/models/qwen2_audio/modeling_qwen2_audio.py create mode 100644 src/transformers/models/qwen2_audio/processing_qwen2_audio.py create mode 100644 src/transformers/models/qwen2_moe/__init__.py create mode 100644 src/transformers/models/qwen2_moe/configuration_qwen2_moe.py create mode 100644 src/transformers/models/qwen2_moe/modeling_qwen2_moe.py create mode 100644 src/transformers/models/recurrent_gemma/__init__.py create mode 100644 src/transformers/models/recurrent_gemma/configuration_recurrent_gemma.py create mode 100644 src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py create mode 100644 src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py create mode 100644 src/transformers/models/rt_detr/__init__.py create mode 100644 src/transformers/models/rt_detr/configuration_rt_detr.py create mode 100644 src/transformers/models/rt_detr/configuration_rt_detr_resnet.py create mode 100644 src/transformers/models/rt_detr/convert_rt_detr_original_pytorch_checkpoint_to_hf.py create mode 100644 src/transformers/models/rt_detr/image_processing_rt_detr.py create mode 100644 src/transformers/models/rt_detr/modeling_rt_detr.py create mode 100644 src/transformers/models/rt_detr/modeling_rt_detr_resnet.py rename src/transformers/models/sam/{convert_sam_original_to_hf_format.py => convert_sam_to_hf.py} (69%) create mode 100644 src/transformers/models/seggpt/__init__.py create mode 100644 src/transformers/models/seggpt/configuration_seggpt.py create mode 100644 src/transformers/models/seggpt/convert_seggpt_to_hf.py create mode 100644 src/transformers/models/seggpt/image_processing_seggpt.py create mode 100644 src/transformers/models/seggpt/modeling_seggpt.py create mode 100644 src/transformers/models/siglip/__init__.py create mode 100644 src/transformers/models/siglip/configuration_siglip.py create mode 100644 src/transformers/models/siglip/convert_siglip_to_hf.py create mode 100644 src/transformers/models/siglip/image_processing_siglip.py create mode 100644 src/transformers/models/siglip/modeling_siglip.py create mode 100644 src/transformers/models/siglip/processing_siglip.py create mode 100644 src/transformers/models/siglip/tokenization_siglip.py create mode 100644 src/transformers/models/stablelm/__init__.py create mode 100644 src/transformers/models/stablelm/configuration_stablelm.py create mode 100755 src/transformers/models/stablelm/modeling_stablelm.py create mode 100644 src/transformers/models/starcoder2/__init__.py create mode 100644 src/transformers/models/starcoder2/configuration_starcoder2.py create mode 100644 src/transformers/models/starcoder2/modeling_starcoder2.py create mode 100644 src/transformers/models/superpoint/__init__.py create mode 100644 src/transformers/models/superpoint/configuration_superpoint.py create mode 100644 src/transformers/models/superpoint/convert_superpoint_to_pytorch.py create mode 100644 src/transformers/models/superpoint/image_processing_superpoint.py create mode 100644 src/transformers/models/superpoint/modeling_superpoint.py create mode 100644 src/transformers/models/swiftformer/modeling_tf_swiftformer.py create mode 100644 src/transformers/models/udop/__init__.py create mode 100644 src/transformers/models/udop/configuration_udop.py create mode 100644 src/transformers/models/udop/convert_udop_to_hf.py create mode 100644 src/transformers/models/udop/modeling_udop.py create mode 100644 src/transformers/models/udop/processing_udop.py create mode 100644 src/transformers/models/udop/tokenization_udop.py create mode 100644 src/transformers/models/udop/tokenization_udop_fast.py create mode 100644 src/transformers/models/video_llava/__init__.py create mode 100644 src/transformers/models/video_llava/configuration_video_llava.py create mode 100644 src/transformers/models/video_llava/convert_video_llava_weights_to_hf.py create mode 100644 src/transformers/models/video_llava/image_processing_video_llava.py create mode 100644 src/transformers/models/video_llava/modeling_video_llava.py create mode 100644 src/transformers/models/video_llava/processing_video_llava.py mode change 100644 => 100755 src/transformers/models/videomae/modeling_videomae.py create mode 100644 src/transformers/models/vit/image_processing_vit_fast.py create mode 100644 src/transformers/models/wav2vec2_bert/__init__.py create mode 100644 src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py create mode 100644 src/transformers/models/wav2vec2_bert/convert_wav2vec2_seamless_checkpoint.py create mode 100644 src/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py create mode 100644 src/transformers/models/wav2vec2_bert/processing_wav2vec2_bert.py create mode 100644 src/transformers/models/whisper/generation_whisper.py create mode 100644 src/transformers/models/zoedepth/__init__.py create mode 100644 src/transformers/models/zoedepth/configuration_zoedepth.py create mode 100644 src/transformers/models/zoedepth/convert_zoedepth_to_hf.py create mode 100644 src/transformers/models/zoedepth/image_processing_zoedepth.py create mode 100644 src/transformers/models/zoedepth/modeling_zoedepth.py delete mode 100644 src/transformers/pipelines/conversational.py create mode 100644 src/transformers/pipelines/image_feature_extraction.py create mode 100755 src/transformers/quantizers/__init__.py create mode 100755 src/transformers/quantizers/auto.py create mode 100644 src/transformers/quantizers/base.py create mode 100644 src/transformers/quantizers/quantizer_aqlm.py create mode 100644 src/transformers/quantizers/quantizer_awq.py create mode 100644 src/transformers/quantizers/quantizer_bnb_4bit.py create mode 100644 src/transformers/quantizers/quantizer_bnb_8bit.py create mode 100644 src/transformers/quantizers/quantizer_eetq.py create mode 100644 src/transformers/quantizers/quantizer_fbgemm_fp8.py create mode 100644 src/transformers/quantizers/quantizer_gptq.py create mode 100755 src/transformers/quantizers/quantizer_hqq.py create mode 100644 src/transformers/quantizers/quantizer_quanto.py create mode 100644 src/transformers/quantizers/quantizer_torchao.py create mode 100644 src/transformers/quantizers/quantizers_utils.py delete mode 100644 src/transformers/tools/agents.py delete mode 100644 src/transformers/tools/image_captioning.py delete mode 100644 src/transformers/tools/image_segmentation.py delete mode 100644 src/transformers/tools/prompts.py delete mode 100644 src/transformers/tools/python_interpreter.py delete mode 100644 src/transformers/tools/text_classification.py delete mode 100644 src/transformers/tools/text_question_answering.py delete mode 100644 src/transformers/tools/text_summarization.py delete mode 100644 src/transformers/trainer_tf.py mode change 100644 => 100755 src/transformers/utils/__init__.py create mode 100644 src/transformers/utils/chat_template_utils.py create mode 100644 src/transformers/utils/deprecation.py create mode 100644 src/transformers/utils/dummy_torchaudio_objects.py create mode 100644 src/transformers/utils/dummy_torchvision_objects.py mode change 100644 => 100755 src/transformers/utils/import_utils.py mode change 100644 => 100755 src/transformers/utils/quantization_config.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/__init__.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration.json delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_flax_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py delete mode 100755 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_flax_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_tf_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/tokenization_fast_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/tokenization_{{cookiecutter.lowercase_modelname}}.py delete mode 100644 templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.md delete mode 100644 templates/adding_a_new_model/cookiecutter.json delete mode 100644 templates/adding_a_new_model/tests/encoder-bert-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/standalone.json delete mode 100644 templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json delete mode 100644 templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json rename tests/{models/efficientformer => agents}/__init__.py (100%) rename tests/{tools => agents}/test_agent_types.py (98%) create mode 100644 tests/agents/test_agents.py rename tests/{tools => agents}/test_document_question_answering.py (67%) create mode 100644 tests/agents/test_final_answer.py rename tests/{tools => agents}/test_image_question_answering.py (71%) create mode 100644 tests/agents/test_python_interpreter.py rename tests/{tools => agents}/test_speech_to_text.py (76%) rename tests/{tools => agents}/test_text_to_speech.py (75%) create mode 100644 tests/agents/test_tools_common.py create mode 100644 tests/agents/test_translation.py create mode 100644 tests/fixtures/tests_samples/COCO/000000004016.png rename tests/models/{ernie_m => chameleon}/__init__.py (100%) create mode 100644 tests/models/chameleon/test_image_processing_chameleon.py create mode 100644 tests/models/chameleon/test_modeling_chameleon.py rename tests/models/{gptsan_japanese => cohere}/__init__.py (100%) create mode 100644 tests/models/cohere/test_modeling_cohere.py create mode 100644 tests/models/cohere/test_tokenization_cohere.py rename tests/models/{graphormer => dac}/__init__.py (100%) create mode 100644 tests/models/dac/test_feature_extraction_dac.py create mode 100644 tests/models/dac/test_modeling_dac.py rename tests/models/{jukebox => dbrx}/__init__.py (100%) create mode 100644 tests/models/dbrx/test_modeling_dbrx.py rename tests/models/{mega => depth_anything}/__init__.py (100%) create mode 100644 tests/models/depth_anything/test_modeling_depth_anything.py delete mode 100644 tests/models/deta/test_image_processing_deta.py delete mode 100644 tests/models/deta/test_modeling_deta.py create mode 100644 tests/models/dinov2/test_modeling_flax_dinov2.py delete mode 100644 tests/models/efficientformer/test_image_processing_efficientformer.py delete mode 100644 tests/models/efficientformer/test_modeling_efficientformer.py delete mode 100644 tests/models/efficientformer/test_modeling_tf_efficientformer.py delete mode 100644 tests/models/ernie_m/test_modeling_ernie_m.py delete mode 100644 tests/models/ernie_m/test_tokenization_ernie_m.py rename tests/models/{nat => falcon_mamba}/__init__.py (100%) create mode 100644 tests/models/falcon_mamba/test_modeling_falcon_mamba.py rename tests/models/{nezha => fastspeech2_conformer}/__init__.py (100%) create mode 100644 tests/models/fastspeech2_conformer/test_modeling_fastspeech2_conformer.py create mode 100644 tests/models/fastspeech2_conformer/test_tokenization_fastspeech2_conformer.py create mode 100644 tests/models/flaubert/test_tokenization_flaubert.py rename tests/models/{qdqbert => gemma}/__init__.py (100%) create mode 100644 tests/models/gemma/test_modeling_flax_gemma.py create mode 100644 tests/models/gemma/test_modeling_gemma.py create mode 100644 tests/models/gemma/test_tokenization_gemma.py rename tests/models/{realm => gemma2}/__init__.py (100%) create mode 100644 tests/models/gemma2/test_modeling_gemma2.py delete mode 100644 tests/models/gptsan_japanese/test_modeling_gptsan_japanese.py delete mode 100644 tests/models/gptsan_japanese/test_tokenization_gptsan_japanese.py delete mode 100644 tests/models/graphormer/test_modeling_graphormer.py rename tests/models/{speech_to_text_2 => grounding_dino}/__init__.py (100%) create mode 100644 tests/models/grounding_dino/test_image_processing_grounding_dino.py create mode 100644 tests/models/grounding_dino/test_modeling_grounding_dino.py create mode 100644 tests/models/grounding_dino/test_processor_grounding_dino.py rename tests/models/{tvlt => hiera}/__init__.py (100%) create mode 100644 tests/models/hiera/test_modeling_hiera.py create mode 100644 tests/models/idefics/test_modeling_tf_idefics.py rename tests/models/{vit_hybrid => idefics2}/__init__.py (100%) create mode 100644 tests/models/idefics2/test_image_processing_idefics2.py create mode 100644 tests/models/idefics2/test_modeling_idefics2.py create mode 100644 tests/models/idefics2/test_processing_idefics2.py rename tests/models/{xlm_prophetnet => instructblipvideo}/__init__.py (100%) create mode 100644 tests/models/instructblipvideo/test_image_processing_instrictblipvideo.py create mode 100644 tests/models/instructblipvideo/test_modeling_instructblipvideo.py rename tests/{tools => models/jamba}/__init__.py (100%) create mode 100644 tests/models/jamba/test_modeling_jamba.py create mode 100644 tests/models/jetmoe/__init__.py create mode 100644 tests/models/jetmoe/test_modeling_jetmoe.py delete mode 100644 tests/models/jukebox/test_modeling_jukebox.py delete mode 100644 tests/models/jukebox/test_tokenization_jukebox.py create mode 100644 tests/models/llava/test_processor_llava.py create mode 100644 tests/models/llava_next/__init__.py create mode 100644 tests/models/llava_next/test_image_processing_llava_next.py create mode 100644 tests/models/llava_next/test_modeling_llava_next.py create mode 100644 tests/models/llava_next/test_processor_llava_next.py create mode 100644 tests/models/llava_next_video/__init__.py create mode 100644 tests/models/llava_next_video/test_image_processing_llava_next_video.py create mode 100644 tests/models/llava_next_video/test_modeling_llava_next_video.py create mode 100644 tests/models/mamba/__init__.py create mode 100644 tests/models/mamba/test_modeling_mamba.py create mode 100644 tests/models/mamba2/__init__.py create mode 100644 tests/models/mamba2/test_modeling_mamba2.py delete mode 100644 tests/models/mega/test_modeling_mega.py create mode 100644 tests/models/mistral/test_modeling_flax_mistral.py create mode 100644 tests/models/mistral/test_modeling_tf_mistral.py create mode 100644 tests/models/musicgen_melody/__init__.py create mode 100644 tests/models/musicgen_melody/test_feature_extraction_musicgen_melody.py create mode 100644 tests/models/musicgen_melody/test_modeling_musicgen_melody.py create mode 100644 tests/models/musicgen_melody/test_processor_musicgen_melody.py delete mode 100644 tests/models/nat/test_modeling_nat.py create mode 100644 tests/models/nemotron/__init__.py create mode 100644 tests/models/nemotron/test_modeling_nemotron.py delete mode 100644 tests/models/nezha/test_modeling_nezha.py create mode 100644 tests/models/olmo/__init__.py create mode 100644 tests/models/olmo/test_modeling_olmo.py create mode 100644 tests/models/owlv2/test_image_processing_owlv2.py create mode 100644 tests/models/paligemma/__init__.py create mode 100644 tests/models/paligemma/test_modeling_paligemma.py create mode 100644 tests/models/phi3/__init__.py create mode 100644 tests/models/phi3/test_modeling_phi3.py create mode 100644 tests/models/pvt_v2/__init__.py create mode 100644 tests/models/pvt_v2/test_modeling_pvt_v2.py delete mode 100644 tests/models/qdqbert/test_modeling_qdqbert.py create mode 100644 tests/models/qwen2/__init__.py create mode 100644 tests/models/qwen2/test_modeling_qwen2.py create mode 100644 tests/models/qwen2/test_tokenization_qwen2.py create mode 100644 tests/models/qwen2_audio/__init__.py create mode 100644 tests/models/qwen2_audio/test_modeling_qwen2_audio.py create mode 100644 tests/models/qwen2_audio/test_processor_qwen2_audio.py create mode 100644 tests/models/qwen2_moe/__init__.py create mode 100644 tests/models/qwen2_moe/test_modeling_qwen2_moe.py delete mode 100644 tests/models/realm/test_modeling_realm.py delete mode 100644 tests/models/realm/test_retrieval_realm.py delete mode 100644 tests/models/realm/test_tokenization_realm.py create mode 100644 tests/models/recurrent_gemma/__init__.py create mode 100644 tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py create mode 100644 tests/models/rt_detr/__init__.py create mode 100644 tests/models/rt_detr/test_image_processing_rt_detr.py create mode 100644 tests/models/rt_detr/test_modeling_rt_detr.py create mode 100644 tests/models/rt_detr/test_modeling_rt_detr_resnet.py create mode 100644 tests/models/seggpt/__init__.py create mode 100644 tests/models/seggpt/test_image_processing_seggpt.py create mode 100644 tests/models/seggpt/test_modeling_seggpt.py create mode 100644 tests/models/siglip/__init__.py rename tests/models/{owlv2/test_image_processor_owlv2.py => siglip/test_image_processing_siglip.py} (70%) create mode 100644 tests/models/siglip/test_modeling_siglip.py create mode 100644 tests/models/siglip/test_tokenization_siglip.py delete mode 100644 tests/models/speech_to_text_2/test_modeling_speech_to_text_2.py delete mode 100644 tests/models/speech_to_text_2/test_tokenization_speech_to_text_2.py create mode 100644 tests/models/stablelm/__init__.py create mode 100644 tests/models/stablelm/test_modeling_stablelm.py create mode 100644 tests/models/starcoder2/__init__.py create mode 100644 tests/models/starcoder2/test_modeling_starcoder2.py create mode 100644 tests/models/superpoint/__init__.py create mode 100644 tests/models/superpoint/test_image_processing_superpoint.py create mode 100644 tests/models/superpoint/test_modeling_superpoint.py create mode 100644 tests/models/swiftformer/test_modeling_tf_swiftformer.py delete mode 100644 tests/models/tvlt/test_feature_extraction_tvlt.py delete mode 100644 tests/models/tvlt/test_image_processor_tvlt.py delete mode 100644 tests/models/tvlt/test_modeling_tvlt.py delete mode 100644 tests/models/tvlt/test_processor_tvlt.py create mode 100644 tests/models/udop/__init__.py create mode 100644 tests/models/udop/test_modeling_udop.py create mode 100644 tests/models/udop/test_processor_udop.py create mode 100644 tests/models/udop/test_tokenization_udop.py create mode 100644 tests/models/video_llava/__init__.py create mode 100644 tests/models/video_llava/test_image_processing_video_llava.py create mode 100644 tests/models/video_llava/test_modeling_video_llava.py create mode 100644 tests/models/vipllava/test_processor_vipllava.py delete mode 100644 tests/models/vit_hybrid/test_modeling_vit_hybrid.py create mode 100644 tests/models/wav2vec2_bert/__init__.py create mode 100644 tests/models/wav2vec2_bert/test_modeling_wav2vec2_bert.py create mode 100644 tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py delete mode 100644 tests/models/xlm_prophetnet/test_modeling_xlm_prophetnet.py delete mode 100644 tests/models/xlm_prophetnet/test_tokenization_xlm_prophetnet.py create mode 100644 tests/models/zoedepth/__init__.py create mode 100644 tests/models/zoedepth/test_image_processing_zoedepth.py create mode 100644 tests/models/zoedepth/test_modeling_zoedepth.py delete mode 100644 tests/pipelines/test_pipelines_conversational.py create mode 100644 tests/pipelines/test_pipelines_image_feature_extraction.py create mode 100644 tests/quantization/aqlm_integration/__init__.py create mode 100644 tests/quantization/aqlm_integration/test_aqlm.py create mode 100644 tests/quantization/eetq_integration/__init__.py create mode 100644 tests/quantization/eetq_integration/test_eetq.py create mode 100644 tests/quantization/fbgemm_fp8/__init__.py create mode 100644 tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py create mode 100644 tests/quantization/ggml/__init__.py create mode 100644 tests/quantization/ggml/test_ggml.py create mode 100755 tests/quantization/hqq/test_hqq.py create mode 100644 tests/quantization/quanto_integration/__init__.py create mode 100644 tests/quantization/quanto_integration/test_quanto.py create mode 100644 tests/quantization/torchao_integration/__init__.py create mode 100644 tests/quantization/torchao_integration/test_torchao.py delete mode 100644 tests/test_cache_utils.py delete mode 100644 tests/test_feature_extraction_utils.py delete mode 100644 tests/test_image_processing_utils.py create mode 100644 tests/test_processing_common.py delete mode 100644 tests/test_tokenization_utils.py delete mode 100644 tests/tools/test_image_captioning.py delete mode 100644 tests/tools/test_image_segmentation.py delete mode 100644 tests/tools/test_python_interpreter.py delete mode 100644 tests/tools/test_text_classification.py delete mode 100644 tests/tools/test_text_question_answering.py delete mode 100644 tests/tools/test_text_summarization.py delete mode 100644 tests/tools/test_tools_common.py delete mode 100644 tests/tools/test_translation.py create mode 100644 tests/utils/test_cache_utils.py create mode 100644 tests/utils/test_chat_template_utils.py rename tests/{ => utils}/test_configuration_utils.py (60%) create mode 100644 tests/utils/test_deprecation.py create mode 100644 tests/utils/test_feature_extraction_utils.py rename tests/{ => utils}/test_modeling_flax_utils.py (62%) create mode 100644 tests/utils/test_modeling_rope_utils.py rename tests/{ => utils}/test_modeling_tf_utils.py (68%) rename tests/{ => utils}/test_modeling_utils.py (63%) mode change 100755 => 100644 create mode 100644 tests/utils/test_tokenization_utils.py delete mode 100644 utils/check_task_guides.py create mode 100644 utils/deprecate_models.py create mode 100644 utils/diff_model_converter.py create mode 100644 utils/important_models.txt create mode 100644 utils/models_to_deprecate.py create mode 100644 utils/notification_service_quantization.py create mode 100644 utils/patch_helper.py create mode 100644 utils/pr_slow_ci_models.py create mode 100644 utils/set_cuda_devices_for_ci.py create mode 100644 utils/split_doctest_jobs.py create mode 100644 utils/split_model_tests.py mode change 100644 => 100755 utils/update_metadata.py diff --git a/.circleci/TROUBLESHOOT.md b/.circleci/TROUBLESHOOT.md index c662a921ba56f3..484d62b46a87f4 100644 --- a/.circleci/TROUBLESHOOT.md +++ b/.circleci/TROUBLESHOOT.md @@ -1,6 +1,6 @@ # Troubleshooting -This is a document explaining how to deal with various issues on Circle-CI. The entries may include actually solutions or pointers to Issues that cover those. +This is a document explaining how to deal with various issues on Circle-CI. The entries may include actual solutions or pointers to Issues that cover those. ## Circle CI diff --git a/.circleci/config.yml b/.circleci/config.yml index 44d50547804f04..6558dc1454b273 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -12,7 +12,7 @@ jobs: # Ensure running with CircleCI/huggingface check_circleci_user: docker: - - image: cimg/python:3.8.12 + - image: python:3.10-slim parallelism: 1 steps: - run: echo $CIRCLE_PROJECT_USERNAME @@ -26,13 +26,12 @@ jobs: fetch_tests: working_directory: ~/transformers docker: - - image: cimg/python:3.8.12 + - image: huggingface/transformers-quality parallelism: 1 steps: - checkout - - run: pip install --upgrade --upgrade-strategy eager pip - - run: pip install -U --upgrade-strategy eager GitPython - - run: pip install -U --upgrade-strategy eager . + - run: uv pip install -U -e . + - run: echo 'export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)"' >> "$BASH_ENV" && source "$BASH_ENV" - run: mkdir -p test_preparation - run: python utils/tests_fetcher.py | tee tests_fetched_summary.txt - store_artifacts: @@ -82,31 +81,28 @@ jobs: path: ~/transformers/test_preparation/filtered_test_list.txt - store_artifacts: path: test_preparation/examples_test_list.txt - - run: python .circleci/create_circleci_config.py --fetcher_folder test_preparation + - run: export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)" && echo $GIT_COMMIT_MESSAGE && python .circleci/create_circleci_config.py --fetcher_folder test_preparation - run: | if [ ! -s test_preparation/generated_config.yml ]; then echo "No tests to run, exiting early!" circleci-agent step halt fi - - run: cp test_preparation/generated_config.yml test_preparation/generated_config.txt - store_artifacts: - path: test_preparation/generated_config.txt + path: test_preparation/generated_config.yml - store_artifacts: - path: test_preparation/filtered_test_list_cross_tests.txt + path: test_preparation/filtered_test_list_cross_tests.txt - continuation/continue: - configuration_path: test_preparation/generated_config.yml + configuration_path: test_preparation/generated_config.yml # To run all tests for the nightly build fetch_all_tests: working_directory: ~/transformers docker: - - image: cimg/python:3.8.12 + - image: huggingface/transformers-quality parallelism: 1 steps: - checkout - - run: pip install --upgrade --upgrade-strategy eager pip - - run: pip install -U --upgrade-strategy eager GitPython - - run: pip install -U --upgrade-strategy eager . + - run: uv pip install -e . - run: | mkdir test_preparation echo -n "tests" > test_preparation/test_list.txt @@ -126,7 +122,7 @@ jobs: check_code_quality: working_directory: ~/transformers docker: - - image: cimg/python:3.8.12 + - image: huggingface/transformers-quality resource_class: large environment: TRANSFORMERS_IS_CI: yes @@ -134,39 +130,24 @@ jobs: parallelism: 1 steps: - checkout - - restore_cache: - keys: - - v0.7-code_quality-pip-{{ checksum "setup.py" }} - - v0.7-code-quality-pip - - restore_cache: - keys: - - v0.7-code_quality-site-packages-{{ checksum "setup.py" }} - - v0.7-code-quality-site-packages - - run: pip install --upgrade --upgrade-strategy eager pip - - run: pip install -U --upgrade-strategy eager .[all,quality] - - save_cache: - key: v0.7-code_quality-pip-{{ checksum "setup.py" }} - paths: - - '~/.cache/pip' - - save_cache: - key: v0.7-code_quality-site-packages-{{ checksum "setup.py" }} - paths: - - '~/.pyenv/versions/' + - run: uv pip install -e . - run: name: Show installed libraries and their versions command: pip freeze | tee installed.txt - store_artifacts: path: ~/transformers/installed.txt + - run: python -c "from transformers import *" || (echo '๐Ÿšจ import failed, this means you introduced unprotected imports! ๐Ÿšจ'; exit 1) - run: ruff check examples tests src utils - run: ruff format tests src utils --check - run: python utils/custom_init_isort.py --check_only - run: python utils/sort_auto_mappings.py --check_only - run: python utils/check_doc_toc.py + - run: python utils/check_docstrings.py --check_all check_repository_consistency: working_directory: ~/transformers docker: - - image: cimg/python:3.8.12 + - image: huggingface/transformers-consistency resource_class: large environment: TRANSFORMERS_IS_CI: yes @@ -174,24 +155,7 @@ jobs: parallelism: 1 steps: - checkout - - restore_cache: - keys: - - v0.7-repository_consistency-pip-{{ checksum "setup.py" }} - - v0.7-repository_consistency-pip - - restore_cache: - keys: - - v0.7-repository_consistency-site-packages-{{ checksum "setup.py" }} - - v0.7-repository_consistency-site-packages - - run: pip install --upgrade --upgrade-strategy eager pip - - run: pip install -U --upgrade-strategy eager .[all,quality] - - save_cache: - key: v0.7-repository_consistency-pip-{{ checksum "setup.py" }} - paths: - - '~/.cache/pip' - - save_cache: - key: v0.7-repository_consistency-site-packages-{{ checksum "setup.py" }} - paths: - - '~/.pyenv/versions/' + - run: uv pip install -e . - run: name: Show installed libraries and their versions command: pip freeze | tee installed.txt @@ -207,7 +171,6 @@ jobs: - run: python utils/check_doctest_list.py - run: make deps_table_check_updated - run: python utils/update_metadata.py --check-only - - run: python utils/check_task_guides.py - run: python utils/check_docstrings.py - run: python utils/check_support_list.py @@ -228,4 +191,4 @@ workflows: - check_circleci_user - check_code_quality - check_repository_consistency - - fetch_all_tests \ No newline at end of file + - fetch_all_tests diff --git a/.circleci/create_circleci_config.py b/.circleci/create_circleci_config.py index 41e83d87438ea0..a7dd366389dc8f 100644 --- a/.circleci/create_circleci_config.py +++ b/.circleci/create_circleci_config.py @@ -19,7 +19,7 @@ import random from dataclasses import dataclass from typing import Any, Dict, List, Optional - +import glob import yaml @@ -32,7 +32,7 @@ "RUN_PT_FLAX_CROSS_TESTS": False, } # Disable the use of {"s": None} as the output is way too long, causing the navigation on CircleCI impractical -COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "dist": "loadfile"} +COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "dist": "loadfile", "v": None} DEFAULT_DOCKER_IMAGE = [{"image": "cimg/python:3.8.12"}] @@ -41,7 +41,6 @@ class EmptyJob: def to_dict(self): return { - "working_directory": "~/transformers", "docker": copy.deepcopy(DEFAULT_DOCKER_IMAGE), "steps":["checkout"], } @@ -52,16 +51,15 @@ class CircleCIJob: name: str additional_env: Dict[str, Any] = None cache_name: str = None - cache_version: str = "0.7" + cache_version: str = "0.8.2" docker_image: List[Dict[str, str]] = None install_steps: List[str] = None marker: Optional[str] = None parallelism: Optional[int] = 1 - pytest_num_workers: int = 8 + pytest_num_workers: int = 12 pytest_options: Dict[str, Any] = None - resource_class: Optional[str] = "xlarge" + resource_class: Optional[str] = "2xlarge" tests_to_run: Optional[List[str]] = None - working_directory: str = "~/transformers" # This should be only used for doctest job! command_timeout: Optional[int] = None @@ -74,6 +72,12 @@ def __post_init__(self): if self.docker_image is None: # Let's avoid changing the default list and make a copy. self.docker_image = copy.deepcopy(DEFAULT_DOCKER_IMAGE) + else: + # BIG HACK WILL REMOVE ONCE FETCHER IS UPDATED + print(os.environ.get("GIT_COMMIT_MESSAGE")) + if "[build-ci-image]" in os.environ.get("GIT_COMMIT_MESSAGE", "") or os.environ.get("GIT_COMMIT_MESSAGE", "") == "dev-ci": + self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev" + print(f"Using {self.docker_image} docker image") if self.install_steps is None: self.install_steps = [] if self.pytest_options is None: @@ -92,7 +96,6 @@ def to_dict(self): cache_branch_prefix = "pull" job = { - "working_directory": self.working_directory, "docker": self.docker_image, "environment": env, } @@ -102,50 +105,14 @@ def to_dict(self): job["parallelism"] = self.parallelism steps = [ "checkout", - {"attach_workspace": {"at": "~/transformers/test_preparation"}}, - { - "restore_cache": { - "keys": [ - # check the fully-matched cache first - f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-pip-" + '{{ checksum "setup.py" }}', - # try the partially-matched cache from `main` - f"v{self.cache_version}-{self.cache_name}-main-pip-", - # try the general partially-matched cache - f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-pip-", - ] - } - }, - { - "restore_cache": { - "keys": [ - f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-site-packages-" + '{{ checksum "setup.py" }}', - f"v{self.cache_version}-{self.cache_name}-main-site-packages-", - f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-site-packages-", - ] - } - }, + {"attach_workspace": {"at": "test_preparation"}}, ] steps.extend([{"run": l} for l in self.install_steps]) - steps.extend([{"run": 'pip install "fsspec>=2023.5.0,<2023.10.0"'}]) - steps.extend([{"run": "pip install pytest-subtests"}]) - steps.append( - { - "save_cache": { - "key": f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-pip-" + '{{ checksum "setup.py" }}', - "paths": ["~/.cache/pip"], - } - } - ) - steps.append( - { - "save_cache": { - "key": f"v{self.cache_version}-{self.cache_name}-{cache_branch_prefix}-site-packages-" + '{{ checksum "setup.py" }}', - "paths": ["~/.pyenv/versions/"], - } - } - ) - steps.append({"run": {"name": "Show installed libraries and their versions", "command": "pip freeze | tee installed.txt"}}) - steps.append({"store_artifacts": {"path": "~/transformers/installed.txt"}}) + steps.append({"run": {"name": "Show installed libraries and their size", "command": """du -h -d 1 "$(pip -V | cut -d ' ' -f 4 | sed 's/pip//g')" | grep -vE "dist-info|_distutils_hack|__pycache__" | sort -h | tee installed.txt || true"""}}) + steps.append({"run": {"name": "Show installed libraries and their versions", "command": """pip list --format=freeze | tee installed.txt || true"""}}) + + steps.append({"run":{"name":"Show biggest libraries","command":"""dpkg-query --show --showformat='${Installed-Size}\t${Package}\n' | sort -rh | head -25 | sort -h | awk '{ package=$2; sub(".*/", "", package); printf("%.5f GB %s\n", $1/1024/1024, package)}' || true"""}}) + steps.append({"store_artifacts": {"path": "installed.txt"}}) all_options = {**COMMON_PYTEST_OPTIONS, **self.pytest_options} pytest_flags = [f"--{key}={value}" if (value is not None or key in ["doctest-modules"]) else f"-{key}" for key, value in all_options.items()] @@ -155,10 +122,15 @@ def to_dict(self): steps.append({"run": {"name": "Create `test-results` directory", "command": "mkdir test-results"}}) + # Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues + if "examples" in self.name: + steps.append({"run": {"name": "Download NLTK files", "command": """python -c "import nltk; nltk.download('punkt', quiet=True)" """}}) + test_command = "" if self.command_timeout: test_command = f"timeout {self.command_timeout} " - test_command += f"python -m pytest --junitxml=test-results/junit.xml -n {self.pytest_num_workers} " + " ".join(pytest_flags) + # junit familiy xunit1 is necessary to support splitting on test name or class name with circleci split + test_command += f"python3 -m pytest -rsfE -p no:warnings --tb=short -o junit_family=xunit1 --junitxml=test-results/junit.xml -n {self.pytest_num_workers} " + " ".join(pytest_flags) if self.parallelism == 1: if self.tests_to_run is None: @@ -171,7 +143,7 @@ def to_dict(self): if tests is None: folder = os.environ["test_preparation_dir"] test_file = os.path.join(folder, "filtered_test_list.txt") - if os.path.exists(test_file): + if os.path.exists(test_file): # We take this job's tests from the filtered test_list.txt with open(test_file) as f: tests = f.read().split(" ") @@ -183,17 +155,26 @@ def to_dict(self): if test.endswith(".py"): expanded_tests.append(test) elif test == "tests/models": - expanded_tests.extend([os.path.join(test, x) for x in os.listdir(test)]) + if "tokenization" in self.name: + expanded_tests.extend(glob.glob("tests/models/**/test_tokenization*.py", recursive=True)) + elif self.name in ["flax","torch","tf"]: + name = self.name if self.name != "torch" else "" + if self.name == "torch": + all_tests = glob.glob(f"tests/models/**/test_modeling_{name}*.py", recursive=True) + filtered = [k for k in all_tests if ("_tf_") not in k and "_flax_" not in k] + expanded_tests.extend(filtered) + else: + expanded_tests.extend(glob.glob(f"tests/models/**/test_modeling_{name}*.py", recursive=True)) + else: + expanded_tests.extend(glob.glob("tests/models/**/test_modeling*.py", recursive=True)) elif test == "tests/pipelines": - expanded_tests.extend([os.path.join(test, x) for x in os.listdir(test)]) + expanded_tests.extend(glob.glob("tests/models/**/test_modeling*.py", recursive=True)) else: expanded_tests.append(test) - # Avoid long tests always being collected together - random.shuffle(expanded_tests) tests = " ".join(expanded_tests) # Each executor to run ~10 tests - n_executors = max(len(tests) // 10, 1) + n_executors = max(len(expanded_tests) // 10, 1) # Avoid empty test list on some executor(s) or launching too many executors if n_executors > self.parallelism: n_executors = self.parallelism @@ -206,13 +187,9 @@ def to_dict(self): command = 'TESTS=$(circleci tests split tests.txt) && echo $TESTS > splitted_tests.txt' steps.append({"run": {"name": "Split tests", "command": command}}) - steps.append({"store_artifacts": {"path": "~/transformers/tests.txt"}}) - steps.append({"store_artifacts": {"path": "~/transformers/splitted_tests.txt"}}) + steps.append({"store_artifacts": {"path": "tests.txt"}}) + steps.append({"store_artifacts": {"path": "splitted_tests.txt"}}) - test_command = "" - if self.timeout: - test_command = f"timeout {self.timeout} " - test_command += f"python -m pytest -n {self.pytest_num_workers} " + " ".join(pytest_flags) test_command += " $(cat splitted_tests.txt)" if self.marker is not None: test_command += f" -m {self.marker}" @@ -227,43 +204,18 @@ def to_dict(self): # failure. test_command = f"({test_command}) || true" else: - test_command += " || true" + test_command = f"({test_command} | tee tests_output.txt)" steps.append({"run": {"name": "Run tests", "command": test_command}}) - # Deal with errors - check_test_command = f'if [ -s reports/{self.job_name}/errors.txt ]; ' - check_test_command += 'then echo "Some tests errored out!"; echo ""; ' - check_test_command += f'cat reports/{self.job_name}/errors.txt; ' - check_test_command += 'echo ""; echo ""; ' - - py_command = f'import os; fp = open("reports/{self.job_name}/summary_short.txt"); failed = os.linesep.join([x for x in fp.read().split(os.linesep) if x.startswith("ERROR ")]); fp.close(); fp = open("summary_short.txt", "w"); fp.write(failed); fp.close()' - check_test_command += f"$(python3 -c '{py_command}'); " - check_test_command += 'cat summary_short.txt; echo ""; exit -1; ' - - # Deeal with failed tests - check_test_command += f'elif [ -s reports/{self.job_name}/failures_short.txt ]; ' - check_test_command += 'then echo "Some tests failed!"; echo ""; ' - check_test_command += f'cat reports/{self.job_name}/failures_short.txt; ' - check_test_command += 'echo ""; echo ""; ' - - py_command = f'import os; fp = open("reports/{self.job_name}/summary_short.txt"); failed = os.linesep.join([x for x in fp.read().split(os.linesep) if x.startswith("FAILED ")]); fp.close(); fp = open("summary_short.txt", "w"); fp.write(failed); fp.close()' - check_test_command += f"$(python3 -c '{py_command}'); " - check_test_command += 'cat summary_short.txt; echo ""; exit -1; ' - - check_test_command += f'elif [ -s reports/{self.job_name}/stats.txt ]; then echo "All tests pass!"; ' - - # return code `124` means the previous (pytest run) step is timeout - if self.name == "pr_documentation_tests": - check_test_command += 'elif [ -f 124.txt ]; then echo "doctest timeout!"; ' - - check_test_command += 'else echo "other fatal error"; echo ""; exit -1; fi;' - - steps.append({"run": {"name": "Check test results", "command": check_test_command}}) + steps.append({"run": {"name": "Skipped tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}}) + steps.append({"run": {"name": "Failed tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}}) + steps.append({"run": {"name": "Errors", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}}) steps.append({"store_test_results": {"path": "test-results"}}) + steps.append({"store_artifacts": {"path": "tests_output.txt"}}) + steps.append({"store_artifacts": {"path": "test-results/junit.xml"}}) + steps.append({"store_artifacts": {"path": "reports"}}) - steps.append({"store_artifacts": {"path": "~/transformers/tests_output.txt"}}) - steps.append({"store_artifacts": {"path": "~/transformers/reports"}}) job["steps"] = steps return job @@ -275,15 +227,9 @@ def job_name(self): # JOBS torch_and_tf_job = CircleCIJob( "torch_and_tf", + docker_image=[{"image":"huggingface/transformers-torch-tf-light"}], + install_steps=["uv venv && uv pip install ."], additional_env={"RUN_PT_TF_CROSS_TESTS": True}, - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng git-lfs cmake", - "git lfs install", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]", - "pip install -U --upgrade-strategy eager tensorflow_probability", - "pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", - ], marker="is_pt_tf_cross_test", pytest_options={"rA": None, "durations": 0}, ) @@ -292,75 +238,61 @@ def job_name(self): torch_and_flax_job = CircleCIJob( "torch_and_flax", additional_env={"RUN_PT_FLAX_CROSS_TESTS": True}, - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng", - "pip install -U --upgrade-strategy eager --upgrade pip", - "pip install -U --upgrade-strategy eager .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]", - "pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", - ], + docker_image=[{"image":"huggingface/transformers-torch-jax-light"}], + install_steps=["uv venv && uv pip install ."], marker="is_pt_flax_cross_test", pytest_options={"rA": None, "durations": 0}, ) - torch_job = CircleCIJob( "torch", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng time", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]", - "pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", - ], - parallelism=1, - pytest_num_workers=6, + docker_image=[{"image": "huggingface/transformers-torch-light"}], + install_steps=["uv venv && uv pip install ."], + parallelism=6, + pytest_num_workers=4 +) + +tokenization_job = CircleCIJob( + "tokenization", + docker_image=[{"image": "huggingface/transformers-torch-light"}], + install_steps=["uv venv && uv pip install ."], + parallelism=6, + pytest_num_workers=4 ) tf_job = CircleCIJob( "tf", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng cmake", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]", - "pip install -U --upgrade-strategy eager tensorflow_probability", - ], - parallelism=1, + docker_image=[{"image":"huggingface/transformers-tf-light"}], + install_steps=["uv venv", "uv pip install -e."], + parallelism=6, + pytest_num_workers=4, ) flax_job = CircleCIJob( "flax", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[flax,testing,sentencepiece,flax-speech,vision]", - ], - parallelism=1, + docker_image=[{"image":"huggingface/transformers-jax-light"}], + install_steps=["uv venv && uv pip install ."], + parallelism=6, + pytest_num_workers=4 ) pipelines_torch_job = CircleCIJob( "pipelines_torch", additional_env={"RUN_PIPELINE_TESTS": True}, - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm,video]", - ], + docker_image=[{"image":"huggingface/transformers-torch-light"}], + install_steps=["uv venv && uv pip install ."], marker="is_pipeline_test", - pytest_num_workers=6, ) pipelines_tf_job = CircleCIJob( "pipelines_tf", additional_env={"RUN_PIPELINE_TESTS": True}, - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y cmake", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,tf-cpu,testing,sentencepiece,vision]", - "pip install -U --upgrade-strategy eager tensorflow_probability", - ], + docker_image=[{"image":"huggingface/transformers-tf-light"}], + install_steps=["uv venv && uv pip install ."], marker="is_pipeline_test", ) @@ -368,22 +300,8 @@ def job_name(self): custom_tokenizers_job = CircleCIJob( "custom_tokenizers", additional_env={"RUN_CUSTOM_TOKENIZERS": True}, - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y cmake", - { - "name": "install jumanpp", - "command": - "wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz\n" - "tar xvf jumanpp-2.0.0-rc3.tar.xz\n" - "mkdir jumanpp-2.0.0-rc3/bld\n" - "cd jumanpp-2.0.0-rc3/bld\n" - "sudo cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local\n" - "sudo make install\n", - }, - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]", - "python -m unidic download", - ], + docker_image=[{"image": "huggingface/transformers-custom-tokenizers"}], + install_steps=["uv venv","uv pip install -e ."], parallelism=None, resource_class=None, tests_to_run=[ @@ -398,13 +316,9 @@ def job_name(self): "examples_torch", additional_env={"OMP_NUM_THREADS": 8}, cache_name="torch_examples", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,torch,sentencepiece,testing,torch-speech]", - "pip install -U --upgrade-strategy eager -r examples/pytorch/_tests_requirements.txt", - "pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", - ], + docker_image=[{"image":"huggingface/transformers-examples-torch"}], + # TODO @ArthurZucker remove this once docker is easier to build + install_steps=["uv venv && uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"], pytest_num_workers=1, ) @@ -412,35 +326,20 @@ def job_name(self): examples_tensorflow_job = CircleCIJob( "examples_tensorflow", cache_name="tensorflow_examples", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y cmake", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[sklearn,tensorflow,sentencepiece,testing]", - "pip install -U --upgrade-strategy eager -r examples/tensorflow/_tests_requirements.txt", - ], -) - - -examples_flax_job = CircleCIJob( - "examples_flax", - cache_name="flax_examples", - install_steps=[ - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[flax,testing,sentencepiece]", - "pip install -U --upgrade-strategy eager -r examples/flax/_tests_requirements.txt", - ], + docker_image=[{"image":"huggingface/transformers-examples-tf"}], + install_steps=["uv venv && uv pip install . && uv pip install -r examples/tensorflow/_tests_requirements.txt"], + parallelism=8 ) hub_job = CircleCIJob( "hub", additional_env={"HUGGINGFACE_CO_STAGING": True}, + docker_image=[{"image":"huggingface/transformers-torch-light"}], install_steps=[ - "sudo apt-get -y update && sudo apt-get install git-lfs", + "uv venv && uv pip install .", 'git config --global user.email "ci@dummy.com"', 'git config --global user.name "ci"', - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[torch,sentencepiece,testing,vision]", ], marker="is_staging_test", pytest_num_workers=1, @@ -449,10 +348,11 @@ def job_name(self): onnx_job = CircleCIJob( "onnx", + docker_image=[{"image":"huggingface/transformers-torch-tf-light"}], install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y cmake", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[torch,tf,testing,sentencepiece,onnxruntime,vision,rjieba]", + "uv venv && uv pip install .", + "uv pip install --upgrade eager pip", + "uv pip install .[torch,tf,testing,sentencepiece,onnxruntime,vision,rjieba]", ], pytest_options={"k onnx": None}, pytest_num_workers=1, @@ -461,37 +361,25 @@ def job_name(self): exotic_models_job = CircleCIJob( "exotic_models", - install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[torch,testing,vision]", - "pip install -U --upgrade-strategy eager torchvision", - "pip install -U --upgrade-strategy eager scipy", - "pip install -U --upgrade-strategy eager 'git+https://github.com/facebookresearch/detectron2.git'", - "sudo apt install tesseract-ocr", - "pip install -U --upgrade-strategy eager pytesseract", - "pip install -U --upgrade-strategy eager natten", - "pip install -U --upgrade-strategy eager python-Levenshtein", - "pip install -U --upgrade-strategy eager opencv-python", - "pip install -U --upgrade-strategy eager nltk", - ], + install_steps=["uv venv && uv pip install ."], + docker_image=[{"image":"huggingface/transformers-exotic-models"}], tests_to_run=[ "tests/models/*layoutlmv*", "tests/models/*nat", "tests/models/deta", + "tests/models/udop", "tests/models/nougat", ], - pytest_num_workers=1, + pytest_num_workers=12, + parallelism=4, pytest_options={"durations": 100}, ) repo_utils_job = CircleCIJob( "repo_utils", - install_steps=[ - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager .[quality,testing,torch]", - ], + docker_image=[{"image":"huggingface/transformers-consistency"}], + install_steps=["uv venv && uv pip install ."], parallelism=None, pytest_num_workers=1, resource_class="large", @@ -507,16 +395,9 @@ def job_name(self): command = f'echo "{py_command}" > pr_documentation_tests_temp.txt' doc_test_job = CircleCIJob( "pr_documentation_tests", + docker_image=[{"image":"huggingface/transformers-consistency"}], additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"}, install_steps=[ - "sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng time ffmpeg", - "pip install --upgrade --upgrade-strategy eager pip", - "pip install -U --upgrade-strategy eager -e .[dev]", - "pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate", - "pip install --upgrade --upgrade-strategy eager pytest pytest-sugar", - "pip install -U --upgrade-strategy eager natten", - "find -name __pycache__ -delete", - "find . -name \*.pyc -delete", # Add an empty file to keep the test step running correctly even no file is selected to be tested. "touch dummy.py", { @@ -550,11 +431,11 @@ def job_name(self): hub_job, onnx_job, exotic_models_job, + tokenization_job ] EXAMPLES_TESTS = [ examples_torch_job, examples_tensorflow_job, - examples_flax_job, ] PIPELINE_TESTS = [ pipelines_torch_job, diff --git a/.circleci/parse_test_outputs.py b/.circleci/parse_test_outputs.py new file mode 100644 index 00000000000000..b80ce8513a1f91 --- /dev/null +++ b/.circleci/parse_test_outputs.py @@ -0,0 +1,70 @@ +import re +import argparse + +def parse_pytest_output(file_path): + skipped_tests = {} + skipped_count = 0 + with open(file_path, 'r') as file: + for line in file: + match = re.match(r'^SKIPPED \[(\d+)\] (tests/.*): (.*)$', line) + if match: + skipped_count += 1 + test_file, test_line, reason = match.groups() + skipped_tests[reason] = skipped_tests.get(reason, []) + [(test_file, test_line)] + for k,v in sorted(skipped_tests.items(), key=lambda x:len(x[1])): + print(f"{len(v):4} skipped because: {k}") + print("Number of skipped tests:", skipped_count) + +def parse_pytest_failure_output(file_path): + failed_tests = {} + failed_count = 0 + with open(file_path, 'r') as file: + for line in file: + match = re.match(r'^FAILED (tests/.*) - (.*): (.*)$', line) + if match: + failed_count += 1 + _, error, reason = match.groups() + failed_tests[reason] = failed_tests.get(reason, []) + [error] + for k,v in sorted(failed_tests.items(), key=lambda x:len(x[1])): + print(f"{len(v):4} failed because `{v[0]}` -> {k}") + print("Number of failed tests:", failed_count) + if failed_count>0: + exit(1) + +def parse_pytest_errors_output(file_path): + print(file_path) + error_tests = {} + error_count = 0 + with open(file_path, 'r') as file: + for line in file: + match = re.match(r'^ERROR (tests/.*) - (.*): (.*)$', line) + if match: + error_count += 1 + _, test_error, reason = match.groups() + error_tests[reason] = error_tests.get(reason, []) + [test_error] + for k,v in sorted(error_tests.items(), key=lambda x:len(x[1])): + print(f"{len(v):4} errored out because of `{v[0]}` -> {k}") + print("Number of errors:", error_count) + if error_count>0: + exit(1) + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument("--file", help="file to parse") + parser.add_argument("--skip", action="store_true", help="show skipped reasons") + parser.add_argument("--fail", action="store_true", help="show failed tests") + parser.add_argument("--errors", action="store_true", help="show failed tests") + args = parser.parse_args() + + if args.skip: + parse_pytest_output(args.file) + + if args.fail: + parse_pytest_failure_output(args.file) + + if args.errors: + parse_pytest_errors_output(args.file) + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/.github/ISSUE_TEMPLATE/bug-report.yml b/.github/ISSUE_TEMPLATE/bug-report.yml index 1ec76462acfdff..7415ca71d46640 100644 --- a/.github/ISSUE_TEMPLATE/bug-report.yml +++ b/.github/ISSUE_TEMPLATE/bug-report.yml @@ -1,6 +1,17 @@ name: "\U0001F41B Bug Report" description: Submit a bug report to help us improve transformers +labels: [ "bug" ] body: + - type: markdown + attributes: + value: | + Thanks for taking the time to fill out this bug report! ๐Ÿค— + + Before you submit your bug report: + + - If it is your first time submitting, be sure to check our [bug report guidelines](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#did-you-find-a-bug) + - Try our [docs bot](https://huggingface.co/spaces/huggingchat/hf-docs-chat) -- it might be able to help you with your issue + - type: textarea id: system-info attributes: @@ -17,50 +28,50 @@ body: description: | Your issue will be replied to more quickly if you can figure out the right person to tag with @ If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**. - + All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and a core maintainer will ping the right person. - + Please tag fewer than 3 people. - + Models: - - text models: @ArthurZucker and @younesbelkada + - text models: @ArthurZucker - vision models: @amyeroberts - speech models: @sanchit-gandhi - graph models: @clefourrier - + Library: - + - flax: @sanchit-gandhi - - generate: @gante + - generate: @zucchini-nlp (visual-language models) or @gante (all others) - pipelines: @Narsil - tensorflow: @gante and @Rocketknight1 - tokenizers: @ArthurZucker - - trainer: @muellerzr and @pacman100 - + - trainer: @muellerzr @SunMarc + Integrations: - - - deepspeed: HF Trainer/Accelerate: @pacman100 + + - deepspeed: HF Trainer/Accelerate: @muellerzr - ray/raytune: @richardliaw, @amogkam - Big Model Inference: @SunMarc - - quantization (bitsandbytes, autogpt): @SunMarc and @younesbelkada - - Documentation: @stevhliu and @MKhalusova - + - quantization (bitsandbytes, autogpt): @SunMarc + + Documentation: @stevhliu + Model hub: - for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator. - + HF projects: - + - accelerate: [different repo](https://github.com/huggingface/accelerate) - datasets: [different repo](https://github.com/huggingface/datasets) - diffusers: [different repo](https://github.com/huggingface/diffusers) - rust tokenizers: [different repo](https://github.com/huggingface/tokenizers) - + Maintained examples (not research project or legacy): - + - Flax: @sanchit-gandhi - PyTorch: See Models above and tag the person corresponding to the modality of the example. - TensorFlow: @Rocketknight1 @@ -101,11 +112,11 @@ body: placeholder: | Steps to reproduce the behavior: - + 1. 2. 3. - + - type: textarea id: expected-behavior diff --git a/.github/ISSUE_TEMPLATE/feature-request.yml b/.github/ISSUE_TEMPLATE/feature-request.yml index 318dc1f9b288c2..ff0d452a807f6e 100644 --- a/.github/ISSUE_TEMPLATE/feature-request.yml +++ b/.github/ISSUE_TEMPLATE/feature-request.yml @@ -1,6 +1,6 @@ name: "\U0001F680 Feature request" description: Submit a proposal/request for a new transformers feature -labels: [ "feature" ] +labels: [ "Feature request" ] body: - type: textarea id: feature-request @@ -19,7 +19,7 @@ body: label: Motivation description: | Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too. - + - type: textarea id: contribution diff --git a/.github/ISSUE_TEMPLATE/i18n.md b/.github/ISSUE_TEMPLATE/i18n.md index 52667f930508a6..5b91427d55b73c 100644 --- a/.github/ISSUE_TEMPLATE/i18n.md +++ b/.github/ISSUE_TEMPLATE/i18n.md @@ -34,7 +34,7 @@ Some notes: ## Tutorial section - [ ] [pipeline_tutorial.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/pipeline_tutorial.md) -- [ ] [autoclass_tutorial.md](https://github.com/huggingface/transformers/blob/master/docs/source/autoclass_tutorial.md) +- [ ] [autoclass_tutorial.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/autoclass_tutorial.md) - [ ] [preprocessing.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/preprocessing.md) - [ ] [training.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/training.md) - [ ] [accelerate.md](https://github.com/huggingface/transformers/blob/main/docs/source/en/accelerate.md) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index d9e6b15f00fd25..cf638dc5925544 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). -- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), +- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. @@ -39,7 +39,7 @@ members/contributors who may be interested in your PR. Models: -- text models: @ArthurZucker and @younesbelkada +- text models: @ArthurZucker - vision models: @amyeroberts - speech models: @sanchit-gandhi - graph models: @clefourrier @@ -47,20 +47,20 @@ Models: Library: - flax: @sanchit-gandhi -- generate: @gante +- generate: @zucchini-nlp (visual-language models) or @gante (all others) - pipelines: @Narsil - tensorflow: @gante and @Rocketknight1 - tokenizers: @ArthurZucker -- trainer: @muellerzr and @pacman100 +- trainer: @muellerzr and @SunMarc Integrations: -- deepspeed: HF Trainer/Accelerate: @pacman100 +- deepspeed: HF Trainer/Accelerate: @muellerzr - ray/raytune: @richardliaw, @amogkam - Big Model Inference: @SunMarc -- quantization (bitsandbytes, autogpt): @SunMarc and @younesbelkada +- quantization (bitsandbytes, autogpt): @SunMarc -Documentation: @stevhliu and @MKhalusova +Documentation: @stevhliu HF projects: diff --git a/.github/workflows/TROUBLESHOOT.md b/.github/workflows/TROUBLESHOOT.md index 616ba8e55bd208..f6101e6d70b59a 100644 --- a/.github/workflows/TROUBLESHOOT.md +++ b/.github/workflows/TROUBLESHOOT.md @@ -1,6 +1,6 @@ # Troubleshooting -This is a document explaining how to deal with various issues on github-actions self-hosted CI. The entries may include actually solutions or pointers to Issues that cover those. +This is a document explaining how to deal with various issues on github-actions self-hosted CI. The entries may include actual solutions or pointers to Issues that cover those. ## GitHub Actions (self-hosted CI) diff --git a/.github/workflows/add-model-like.yml b/.github/workflows/add-model-like.yml index 8bdd66e4466d62..cd676831784406 100644 --- a/.github/workflows/add-model-like.yml +++ b/.github/workflows/add-model-like.yml @@ -16,14 +16,14 @@ jobs: name: "Add new model like template tests" runs-on: ubuntu-22.04 steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Install dependencies run: | sudo apt -y update && sudo apt install -y libsndfile1-dev - name: Load cached virtual environment - uses: actions/cache@v2 + uses: actions/cache@v4 id: cache with: path: ~/venv/ @@ -74,7 +74,7 @@ jobs: - name: Test suite reports artifacts if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: run_all_tests_new_models_test_reports path: reports/tests_new_models diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml new file mode 100644 index 00000000000000..cb9a3d7b7974aa --- /dev/null +++ b/.github/workflows/benchmark.yml @@ -0,0 +1,42 @@ +name: Self-hosted runner (benchmark) + +on: + schedule: + - cron: "17 2 * * *" + workflow_call: + +env: + HF_HOME: /mnt/cache + TF_FORCE_GPU_ALLOW_GROWTH: true + + +jobs: + benchmark: + name: Benchmark + runs-on: [single-gpu, nvidia-gpu, a10, ci] + container: + image: huggingface/transformers-all-latest-gpu + options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + steps: + - name: Update clone + working-directory: /transformers + run: | + git fetch && git checkout ${{ github.sha }} + + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + + - name: Benchmark (daily) + if: github.event_name == 'schedule' + working-directory: /transformers + run: | + python3 -m pip install optimum-benchmark>=0.3.0 + HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun + + - name: Benchmark (merged to main event) + if: github.event_name == 'push' && github.ref_name == 'main' + working-directory: /transformers + run: | + python3 -m pip install optimum-benchmark>=0.3.0 + HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results_merge_event --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun diff --git a/.github/workflows/build-ci-docker-images.yml b/.github/workflows/build-ci-docker-images.yml new file mode 100644 index 00000000000000..9d947684ee867e --- /dev/null +++ b/.github/workflows/build-ci-docker-images.yml @@ -0,0 +1,77 @@ +name: Build pr ci-docker + +on: + push: + branches: + - push-ci-image # for now let's only build on this branch + repository_dispatch: + workflow_call: + inputs: + image_postfix: + required: true + type: string + schedule: + - cron: "6 0 * * *" + + +concurrency: + group: ${{ github.workflow }} + cancel-in-progress: true + +jobs: + build: + runs-on: ubuntu-22.04 + + if: ${{ contains(github.event.head_commit.message, '[build-ci-image]') || contains(github.event.head_commit.message, '[push-ci-image]') && '!cancelled()' || github.event_name == 'schedule' }} + + strategy: + matrix: + file: ["quality", "consistency", "custom-tokenizers", "torch-light", "tf-light", "exotic-models", "torch-tf-light", "torch-jax-light", "jax-light", "examples-torch", "examples-tf"] + continue-on-error: true + + steps: + - + name: Set tag + run: | + if ${{contains(github.event.head_commit.message, '[build-ci-image]')}}; then + echo "TAG=huggingface/transformers-${{ matrix.file }}:dev" >> "$GITHUB_ENV" + echo "setting it to DEV!" + else + echo "TAG=huggingface/transformers-${{ matrix.file }}" >> "$GITHUB_ENV" + + fi + - + name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + - + name: Check out code + uses: actions/checkout@v4 + - + name: Login to DockerHub + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_PASSWORD }} + - + name: Build ${{ matrix.file }}.dockerfile + uses: docker/build-push-action@v5 + with: + context: ./docker + build-args: | + REF=${{ github.sha }} + file: "./docker/${{ matrix.file }}.dockerfile" + push: ${{ contains(github.event.head_commit.message, 'ci-image]') || github.event_name == 'schedule' }} + tags: ${{ env.TAG }} + + notify: + runs-on: ubuntu-22.04 + if: ${{ contains(github.event.head_commit.message, '[build-ci-image]') || contains(github.event.head_commit.message, '[push-ci-image]') && '!cancelled()' || github.event_name == 'schedule' }} + steps: + - name: Post to Slack + if: ${{ contains(github.event.head_commit.message, '[push-ci-image]') && github.event_name != 'schedule' }} + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: "#transformers-ci-circleci-images" + title: ๐Ÿค— New docker images for CircleCI are pushed. + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} diff --git a/.github/workflows/build-docker-images.yml b/.github/workflows/build-docker-images.yml index be070a95d3a94f..df772db773e262 100644 --- a/.github/workflows/build-docker-images.yml +++ b/.github/workflows/build-docker-images.yml @@ -20,24 +20,14 @@ concurrency: jobs: latest-docker: name: "Latest PyTorch + TensorFlow [dev]" - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -67,26 +57,25 @@ jobs: push: true tags: huggingface/transformers-all-latest-gpu-push-ci + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the transformers-all-latest-gpu-push-ci docker build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + latest-torch-deepspeed-docker: name: "Latest PyTorch + DeepSpeed" - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -103,27 +92,26 @@ jobs: push: true tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }} + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}} + title: ๐Ÿค— Results of the transformers-pytorch-deepspeed-latest-gpu docker build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + # Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`) latest-torch-deepspeed-docker-for-push-ci-daily-build: name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)" - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -144,18 +132,27 @@ jobs: push: true tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + doc-builder: name: "Doc builder" # Push CI doesn't need this image if: inputs.image_postfix != '-push-ci' - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -170,28 +167,27 @@ jobs: push: true tags: huggingface/transformers-doc-builder + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the huggingface/transformers-doc-builder docker build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + latest-pytorch: name: "Latest PyTorch [dev]" # Push CI doesn't need this image if: inputs.image_postfix != '-push-ci' - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -208,54 +204,75 @@ jobs: push: true tags: huggingface/transformers-pytorch-gpu -# Need to be fixed with the help from Guillaume. -# latest-pytorch-amd: -# name: "Latest PyTorch (AMD) [dev]" -# runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210] -# steps: -# - name: Set up Docker Buildx -# uses: docker/setup-buildx-action@v3 -# - name: Check out code -# uses: actions/checkout@v3 -# - name: Login to DockerHub -# uses: docker/login-action@v3 -# with: -# username: ${{ secrets.DOCKERHUB_USERNAME }} -# password: ${{ secrets.DOCKERHUB_PASSWORD }} -# - name: Build and push -# uses: docker/build-push-action@v5 -# with: -# context: ./docker/transformers-pytorch-amd-gpu -# build-args: | -# REF=main -# push: true -# tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }} -# # Push CI images still need to be re-built daily -# - -# name: Build and push (for Push CI) in a daily basis -# # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. -# # The later case is useful for manual image building for debugging purpose. Use another tag in this case! -# if: inputs.image_postfix != '-push-ci' -# uses: docker/build-push-action@v5 -# with: -# context: ./docker/transformers-pytorch-amd-gpu -# build-args: | -# REF=main -# push: true -# tags: huggingface/transformers-pytorch-amd-gpu-push-ci + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the huggingface/transformers-pytorch-gpudocker build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + + latest-pytorch-amd: + name: "Latest PyTorch (AMD) [dev]" + runs-on: [intel-cpu, 8-cpu, ci] + steps: + - + name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + - + name: Check out code + uses: actions/checkout@v4 + - + name: Login to DockerHub + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_PASSWORD }} + - + name: Build and push + uses: docker/build-push-action@v5 + with: + context: ./docker/transformers-pytorch-amd-gpu + build-args: | + REF=main + push: true + tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }} + # Push CI images still need to be re-built daily + - + name: Build and push (for Push CI) in a daily basis + # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. + # The later case is useful for manual image building for debugging purpose. Use another tag in this case! + if: inputs.image_postfix != '-push-ci' + uses: docker/build-push-action@v5 + with: + context: ./docker/transformers-pytorch-amd-gpu + build-args: | + REF=main + push: true + tags: huggingface/transformers-pytorch-amd-gpu-push-ci + + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} latest-tensorflow: name: "Latest TensorFlow [dev]" # Push CI doesn't need this image if: inputs.image_postfix != '-push-ci' - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v3 @@ -272,38 +289,96 @@ jobs: push: true tags: huggingface/transformers-tensorflow-gpu - # latest-pytorch-deepspeed-amd: - # name: "PyTorch + DeepSpeed (AMD) [dev]" + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the huggingface/transformers-tensorflow-gpu build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + + latest-pytorch-deepspeed-amd: + name: "PyTorch + DeepSpeed (AMD) [dev]" + runs-on: [intel-cpu, 8-cpu, ci] + steps: + - + name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + - + name: Check out code + uses: actions/checkout@v4 + - + name: Login to DockerHub + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_PASSWORD }} + - + name: Build and push + uses: docker/build-push-action@v5 + with: + context: ./docker/transformers-pytorch-deepspeed-amd-gpu + build-args: | + REF=main + push: true + tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }} + # Push CI images still need to be re-built daily + - + name: Build and push (for Push CI) in a daily basis + # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. + # The later case is useful for manual image building for debugging purpose. Use another tag in this case! + if: inputs.image_postfix != '-push-ci' + uses: docker/build-push-action@v5 + with: + context: ./docker/transformers-pytorch-deepspeed-amd-gpu + build-args: | + REF=main + push: true + tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci + + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the transformers-pytorch-deepspeed-amd-gpu build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} - # runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210] - # steps: - # - name: Set up Docker Buildx - # uses: docker/setup-buildx-action@v3 - # - name: Check out code - # uses: actions/checkout@v3 - # - name: Login to DockerHub - # uses: docker/login-action@v3 - # with: - # username: ${{ secrets.DOCKERHUB_USERNAME }} - # password: ${{ secrets.DOCKERHUB_PASSWORD }} - # - name: Build and push - # uses: docker/build-push-action@v5 - # with: - # context: ./docker/transformers-pytorch-deepspeed-amd-gpu - # build-args: | - # REF=main - # push: true - # tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }} - # # Push CI images still need to be re-built daily - # - - # name: Build and push (for Push CI) in a daily basis - # # This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`. - # # The later case is useful for manual image building for debugging purpose. Use another tag in this case! - # if: inputs.image_postfix != '-push-ci' - # uses: docker/build-push-action@v5 - # with: - # context: ./docker/transformers-pytorch-deepspeed-amd-gpu - # build-args: | - # REF=main - # push: true - # tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci + latest-quantization-torch-docker: + name: "Latest Pytorch + Quantization [dev]" + # Push CI doesn't need this image + if: inputs.image_postfix != '-push-ci' + runs-on: [intel-cpu, 8-cpu, ci] + steps: + - + name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + - + name: Check out code + uses: actions/checkout@v4 + - + name: Login to DockerHub + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_PASSWORD }} + - + name: Build and push + uses: docker/build-push-action@v5 + with: + context: ./docker/transformers-quantization-latest-gpu + build-args: | + REF=main + push: true + tags: huggingface/transformers-quantization-latest-gpu${{ inputs.image_postfix }} + + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }} + title: ๐Ÿค— Results of the transformers-quantization-latest-gpu build + status: ${{ job.status }} + slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} diff --git a/.github/workflows/build-nightly-ci-docker-images.yml b/.github/workflows/build-nightly-ci-docker-images.yml index 63bc7daa743425..0b1b7df5f8a2ed 100644 --- a/.github/workflows/build-nightly-ci-docker-images.yml +++ b/.github/workflows/build-nightly-ci-docker-images.yml @@ -13,24 +13,14 @@ concurrency: jobs: latest-with-torch-nightly-docker: name: "Nightly PyTorch + Stable TensorFlow" - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v2 @@ -50,24 +40,14 @@ jobs: nightly-torch-deepspeed-docker: name: "Nightly PyTorch + DeepSpeed" - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - - name: Cleanup disk - run: | - sudo ls -l /usr/local/lib/ - sudo ls -l /usr/share/ - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - sudo rm -rf /usr/local/lib/android - sudo rm -rf /usr/share/dotnet - sudo du -sh /usr/local/lib/ - sudo du -sh /usr/share/ - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - name: Login to DockerHub uses: docker/login-action@v2 diff --git a/.github/workflows/build-past-ci-docker-images.yml b/.github/workflows/build-past-ci-docker-images.yml index 21028568c963eb..6ee60b8a6b60f2 100644 --- a/.github/workflows/build-past-ci-docker-images.yml +++ b/.github/workflows/build-past-ci-docker-images.yml @@ -15,15 +15,15 @@ jobs: strategy: fail-fast: false matrix: - version: ["1.13", "1.12", "1.11", "1.10"] - runs-on: ubuntu-22.04 + version: ["1.13", "1.12", "1.11"] + runs-on: [intel-cpu, 8-cpu, ci] steps: - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - id: get-base-image name: Get Base Image @@ -60,14 +60,14 @@ jobs: fail-fast: false matrix: version: ["2.11", "2.10", "2.9", "2.8", "2.7", "2.6", "2.5"] - runs-on: ubuntu-22.04 + runs-on: [intel-cpu, 8-cpu, ci] steps: - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Check out code - uses: actions/checkout@v3 + uses: actions/checkout@v4 - id: get-base-image name: Get Base Image diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index 99f0f15230a017..e3e3b5f2df37f1 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -16,6 +16,7 @@ jobs: package: transformers notebook_folder: transformers_doc languages: de en es fr hi it ko pt tr zh ja te + custom_container: huggingface/transformers-doc-builder secrets: token: ${{ secrets.HUGGINGFACE_PUSH }} hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }} diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index f6fa4c8d537cc6..c8d073ea34688f 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -15,3 +15,4 @@ jobs: pr_number: ${{ github.event.number }} package: transformers languages: de en es fr hi it ko pt tr zh ja te + custom_container: huggingface/transformers-doc-builder diff --git a/.github/workflows/check_tiny_models.yml b/.github/workflows/check_tiny_models.yml index 898e441a4234c6..a2b4846051a054 100644 --- a/.github/workflows/check_tiny_models.yml +++ b/.github/workflows/check_tiny_models.yml @@ -17,13 +17,13 @@ jobs: runs-on: ubuntu-22.04 steps: - name: Checkout transformers - uses: actions/checkout@v3 + uses: actions/checkout@v4 with: fetch-depth: 2 - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Set up Python 3.8 - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: # Semantic version range syntax or exact version of a Python version python-version: '3.8' @@ -36,7 +36,7 @@ jobs: pip install --upgrade pip python -m pip install -U .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm,video,tf-cpu] pip install tensorflow_probability - python -m pip install -U natten + python -m pip install -U 'natten<0.15.0' - name: Create all tiny models (locally) run: | @@ -44,7 +44,7 @@ jobs: - name: Local tiny model reports artifacts if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: tiny_local_model_creation_reports path: tiny_local_models/reports @@ -56,13 +56,13 @@ jobs: - name: Test suite reports artifacts if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: tiny_local_model_creation_reports path: reports/tests_pipelines - name: Create + Upload tiny models for new model architecture(s) - run: | + run: | python utils/update_tiny_models.py --num_workers 2 - name: Full report @@ -76,7 +76,7 @@ jobs: - name: New tiny model creation reports artifacts if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: tiny_model_creation_reports path: tiny_models/reports diff --git a/.github/workflows/doctest_job.yml b/.github/workflows/doctest_job.yml new file mode 100644 index 00000000000000..98be985292e3e0 --- /dev/null +++ b/.github/workflows/doctest_job.yml @@ -0,0 +1,82 @@ +name: Doctest job + +on: + workflow_call: + inputs: + job_splits: + required: true + type: string + split_keys: + required: true + type: string + +env: + HF_HOME: /mnt/cache + TRANSFORMERS_IS_CI: yes + RUN_SLOW: yes + OMP_NUM_THREADS: 16 + MKL_NUM_THREADS: 16 + SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} + TF_FORCE_GPU_ALLOW_GROWTH: true + +jobs: + run_doctests: + name: " " + strategy: + max-parallel: 8 # 8 jobs at a time + fail-fast: false + matrix: + split_keys: ${{ fromJson(inputs.split_keys) }} + runs-on: [single-gpu, nvidia-gpu, t4, ci] + container: + image: huggingface/transformers-all-latest-gpu + options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + steps: + - name: Update clone + working-directory: /transformers + run: git fetch && git checkout ${{ github.sha }} + + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .[flax] + + - name: GPU visibility + working-directory: /transformers + run: | + python3 utils/print_env.py + + - name: Show installed libraries and their versions + run: pip freeze + + - name: Get doctest files + working-directory: /transformers + run: | + echo "${{ toJson(fromJson(inputs.job_splits)[matrix.split_keys]) }}" > doc_tests.txt + cat doc_tests.txt + + - name: Set `split_keys` + shell: bash + run: | + echo "${{ matrix.split_keys }}" + split_keys=${{ matrix.split_keys }} + split_keys=${split_keys//'/'/'_'} + echo "split_keys" + echo "split_keys=$split_keys" >> $GITHUB_ENV + + - name: Run doctests + working-directory: /transformers + run: | + cat doc_tests.txt + python3 -m pytest -v --make-reports doc_tests_gpu_${{ env.split_keys }} --doctest-modules $(cat doc_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.md" + + - name: Failure short reports + if: ${{ failure() }} + continue-on-error: true + run: cat /transformers/reports/doc_tests_gpu_${{ env.split_keys }}/failures_short.txt + + - name: "Test suite reports artifacts: doc_tests_gpu_test_reports_${{ env.split_keys }}" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: doc_tests_gpu_test_reports_${{ env.split_keys }} + path: /transformers/reports/doc_tests_gpu_${{ env.split_keys }} diff --git a/.github/workflows/doctests.yml b/.github/workflows/doctests.yml index 0384144ceac741..4b515c741a3a72 100644 --- a/.github/workflows/doctests.yml +++ b/.github/workflows/doctests.yml @@ -3,81 +3,86 @@ name: Doctests on: push: branches: - - doctest* + - run_doctest* repository_dispatch: schedule: - cron: "17 2 * * *" - env: - HF_HOME: /mnt/cache - TRANSFORMERS_IS_CI: yes - RUN_SLOW: yes - OMP_NUM_THREADS: 16 - MKL_NUM_THREADS: 16 - SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} - TF_FORCE_GPU_ALLOW_GROWTH: true + NUM_SLICES: 3 jobs: - run_doctests: + setup: + name: Setup runs-on: [single-gpu, nvidia-gpu, t4, ci] container: image: huggingface/transformers-all-latest-gpu options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + outputs: + job_splits: ${{ steps.set-matrix.outputs.job_splits }} + split_keys: ${{ steps.set-matrix.outputs.split_keys }} steps: - - name: uninstall transformers (installed during docker image build) - run: python3 -m pip uninstall -y transformers - - - uses: actions/checkout@v3 - - name: NVIDIA-SMI + - name: Update clone + working-directory: /transformers run: | - nvidia-smi - - - name: Install transformers in edit mode - run: python3 -m pip install -e .[flax] + git fetch && git checkout ${{ github.sha }} - - name: GPU visibility - run: | - python3 utils/print_env.py + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - name: Show installed libraries and their versions + working-directory: /transformers run: pip freeze - - name: Get doctest files + - name: Check values for matrix + working-directory: /transformers run: | - $(python3 -c 'from utils.tests_fetcher import get_all_doctest_files; to_test = get_all_doctest_files(); to_test = " ".join(to_test); fp = open("doc_tests.txt", "w"); fp.write(to_test); fp.close()') + python3 utils/split_doctest_jobs.py + python3 utils/split_doctest_jobs.py --only_return_keys --num_splits ${{ env.NUM_SLICES }} - - name: Run doctests + - id: set-matrix + working-directory: /transformers + name: Set values for matrix run: | - python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules $(cat doc_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.md" - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat reports/doc_tests_gpu/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: doc_tests_gpu_test_reports - path: reports/doc_tests_gpu + echo "job_splits=$(python3 utils/split_doctest_jobs.py)" >> $GITHUB_OUTPUT + echo "split_keys=$(python3 utils/split_doctest_jobs.py --only_return_keys --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT + call_doctest_job: + name: "Call doctest jobs" + needs: setup + strategy: + max-parallel: 1 # 1 split at a time (in `doctest_job.yml`, we set `8` to run 8 jobs at the same time) + fail-fast: false + matrix: + split_keys: ${{ fromJson(needs.setup.outputs.split_keys) }} + uses: ./.github/workflows/doctest_job.yml + with: + job_splits: ${{ needs.setup.outputs.job_splits }} + split_keys: ${{ toJson(matrix.split_keys) }} + secrets: inherit send_results: name: Send results to webhook runs-on: ubuntu-22.04 if: always() - needs: [run_doctests] + needs: [call_doctest_job] steps: - - uses: actions/checkout@v3 - - uses: actions/download-artifact@v3 + - uses: actions/checkout@v4 + - uses: actions/download-artifact@v4 - name: Send message to Slack env: CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} - CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }} - CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }} - CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} + ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} + # Use `CI_SLACK_CHANNEL_DUMMY_TESTS` when doing experimentation + SLACK_REPORT_CHANNEL: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }} run: | pip install slack_sdk python utils/notification_service_doc_tests.py + + - name: "Upload results" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: doc_test_results + path: doc_test_results \ No newline at end of file diff --git a/.github/workflows/model-templates.yml b/.github/workflows/model-templates.yml deleted file mode 100644 index eb77d9dcbe1e64..00000000000000 --- a/.github/workflows/model-templates.yml +++ /dev/null @@ -1,81 +0,0 @@ -name: Model templates runner - -on: - repository_dispatch: - schedule: - - cron: "0 2 * * *" - -jobs: - run_tests_templates: - runs-on: ubuntu-22.04 - steps: - - name: Checkout repository - uses: actions/checkout@v3 - - - name: Install dependencies - run: | - sudo apt -y update && sudo apt install -y libsndfile1-dev - - - name: Load cached virtual environment - uses: actions/cache@v2 - id: cache - with: - path: ~/venv/ - key: v4-tests_templates-${{ hashFiles('setup.py') }} - - - name: Create virtual environment on cache miss - if: steps.cache.outputs.cache-hit != 'true' - run: | - python -m venv ~/venv && . ~/venv/bin/activate - pip install --upgrade pip!=21.3 - pip install -e .[dev] - - - name: Check transformers location - # make `transformers` available as package (required since we use `-e` flag) and check it's indeed from the repo. - run: | - . ~/venv/bin/activate - python setup.py develop - transformer_loc=$(pip show transformers | grep "Location: " | cut -c11-) - transformer_repo_loc=$(pwd .) - if [ "$transformer_loc" != "$transformer_repo_loc/src" ]; then - echo "transformers is from $transformer_loc but it shoud be from $transformer_repo_loc/src." - echo "A fix is required. Stop testing." - exit 1 - fi - - - name: Create model files - run: | - . ~/venv/bin/activate - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json --path=templates/adding_a_new_model - transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model - make style - python utils/check_table.py --fix_and_overwrite - python utils/check_dummies.py --fix_and_overwrite - python utils/check_copies.py --fix_and_overwrite - - - name: Run all non-slow tests - run: | - . ~/venv/bin/activate - python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_templates tests/*template* - - - name: Run style changes - run: | - . ~/venv/bin/activate - make style && make quality && make repo-consistency - - - name: Failure short reports - if: ${{ always() }} - run: cat reports/tests_templates/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: run_all_tests_templates_test_reports - path: reports/tests_templates diff --git a/.github/workflows/model_jobs.yml b/.github/workflows/model_jobs.yml new file mode 100644 index 00000000000000..454d03f4245681 --- /dev/null +++ b/.github/workflows/model_jobs.yml @@ -0,0 +1,121 @@ +name: model jobs + +on: + workflow_call: + inputs: + folder_slices: + required: true + type: string + machine_type: + required: true + type: string + slice_id: + required: true + type: number + runner: + required: true + type: string + docker: + required: true + type: string + +env: + HF_HOME: /mnt/cache + TRANSFORMERS_IS_CI: yes + OMP_NUM_THREADS: 8 + MKL_NUM_THREADS: 8 + RUN_SLOW: yes + # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. + # This token is created under the bot `hf-transformers-bot`. + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} + SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} + TF_FORCE_GPU_ALLOW_GROWTH: true + RUN_PT_TF_CROSS_TESTS: 1 + CUDA_VISIBLE_DEVICES: 0,1 + +jobs: + run_models_gpu: + name: " " + strategy: + max-parallel: 8 + fail-fast: false + matrix: + folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }} + runs-on: ['${{ inputs.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] + container: + image: ${{ inputs.docker }} + options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + steps: + - name: Echo input and matrix info + shell: bash + run: | + echo "${{ inputs.folder_slices }}" + echo "${{ matrix.folders }}" + echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}" + + - name: Echo folder ${{ matrix.folders }} + shell: bash + # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to + # set the artifact folder names (because the character `/` is not allowed). + run: | + echo "${{ matrix.folders }}" + matrix_folders=${{ matrix.folders }} + matrix_folders=${matrix_folders/'models/'/'models_'} + echo "$matrix_folders" + echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV + + - name: Update clone + working-directory: /transformers + run: git fetch && git checkout ${{ github.sha }} + + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + + - name: Update / Install some packages (for Past CI) + if: ${{ contains(inputs.docker, '-past-') }} + working-directory: /transformers + run: | + python3 -m pip install -U datasets + + - name: Update / Install some packages (for Past CI) + if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }} + working-directory: /transformers + run: | + python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate + + - name: NVIDIA-SMI + run: | + nvidia-smi + + - name: Environment + working-directory: /transformers + run: | + python3 utils/print_env.py + + - name: Show installed libraries and their versions + working-directory: /transformers + run: pip freeze + + - name: Run all tests on GPU + working-directory: /transformers + run: python3 -m pytest -rsfE -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} + + - name: Failure short reports + if: ${{ failure() }} + continue-on-error: true + run: cat /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt + + - name: Run test + shell: bash + run: | + mkdir -p /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports + echo "hello" > /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt + echo "${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports" + + - name: "Test suite reports artifacts: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports diff --git a/.github/workflows/push-important-models.yml b/.github/workflows/push-important-models.yml new file mode 100644 index 00000000000000..41bcd43fcc6fc2 --- /dev/null +++ b/.github/workflows/push-important-models.yml @@ -0,0 +1,142 @@ +name: Slow tests on important models (on Push - A10) + +on: + push: + branches: [ main ] + +env: + OUTPUT_SLACK_CHANNEL_ID: "C06L2SGMEEA" + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} + HF_HOME: /mnt/cache + TRANSFORMERS_IS_CI: yes + OMP_NUM_THREADS: 8 + MKL_NUM_THREADS: 8 + RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`. + SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} + TF_FORCE_GPU_ALLOW_GROWTH: true + RUN_PT_TF_CROSS_TESTS: 1 + +jobs: + get_modified_models: + name: "Get all modified files" + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - name: Check out code + uses: actions/checkout@v4 + + - name: Get changed files + id: changed-files + uses: tj-actions/changed-files@3f54ebb830831fc121d3263c1857cfbdc310cdb9 #v42 + with: + files: src/transformers/models/** + + - name: Run step if only the files listed above change + if: steps.changed-files.outputs.any_changed == 'true' + id: set-matrix + env: + ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }} + run: | + model_arrays=() + for file in $ALL_CHANGED_FILES; do + model_path="${file#*models/}" + model_path="models/${model_path%%/*}" + if grep -qFx "$model_path" utils/important_models.txt; then + # Append the file to the matrix string + model_arrays+=("$model_path") + fi + done + matrix_string=$(printf '"%s", ' "${model_arrays[@]}" | sed 's/, $//') + echo "matrix=[$matrix_string]" >> $GITHUB_OUTPUT + test_modified_files: + needs: get_modified_models + name: Slow & FA2 tests + runs-on: [single-gpu, nvidia-gpu, a10, ci] + container: + image: huggingface/transformers-all-latest-gpu + options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }} + strategy: + fail-fast: false + matrix: + model-name: ${{ fromJson(needs.get_modified_models.outputs.matrix) }} + + steps: + - name: Check out code + uses: actions/checkout@v4 + + - name: Install locally transformers & other libs + run: | + apt install sudo + sudo -H pip install --upgrade pip + sudo -H pip uninstall -y transformers + sudo -H pip install -U -e ".[testing]" + MAX_JOBS=4 pip install flash-attn --no-build-isolation + pip install bitsandbytes + + - name: NVIDIA-SMI + run: | + nvidia-smi + + - name: Show installed libraries and their versions + run: pip freeze + + - name: Run FA2 tests + id: run_fa2_tests + run: + pytest -rsfE -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_* + + - name: "Test suite reports artifacts: ${{ matrix.model-name }}_fa2_tests" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: ${{ matrix.model-name }}_fa2_tests + path: /transformers/reports/${{ matrix.model-name }}_fa2_tests + + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }} + title: ๐Ÿค— Results of the FA2 tests - ${{ matrix.model-name }} + status: ${{ steps.run_fa2_tests.conclusion}} + slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }} + + - name: Run integration tests + id: run_integration_tests + if: always() + run: + pytest -rsfE -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_* + + - name: "Test suite reports artifacts: tests_integration_${{ matrix.model-name }}" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: tests_integration_${{ matrix.model-name }} + path: /transformers/reports/tests_integration_${{ matrix.model-name }} + + - name: Post to Slack + if: always() + uses: huggingface/hf-workflows/.github/actions/post-slack@main + with: + slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }} + title: ๐Ÿค— Results of the Integration tests - ${{ matrix.model-name }} + status: ${{ steps.run_integration_tests.conclusion}} + slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }} + + - name: Tailscale # In order to be able to SSH when a test fails + if: ${{ runner.debug == '1'}} + uses: huggingface/tailscale-action@v1 + with: + authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }} + slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }} + slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + waitForSSH: true + + benchmark: + name: Benchmark workflow + needs: get_modified_models + if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }} + uses: ./.github/workflows/benchmark.yml + secrets: inherit diff --git a/.github/workflows/release-conda.yml b/.github/workflows/release-conda.yml index 7a1990eec6b3d7..c0e28d7a510d7f 100644 --- a/.github/workflows/release-conda.yml +++ b/.github/workflows/release-conda.yml @@ -19,7 +19,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v1 + uses: actions/checkout@v4 - name: Install miniconda uses: conda-incubator/setup-miniconda@v2 diff --git a/.github/workflows/self-nightly-caller.yml b/.github/workflows/self-nightly-caller.yml new file mode 100644 index 00000000000000..5538e2d56e7490 --- /dev/null +++ b/.github/workflows/self-nightly-caller.yml @@ -0,0 +1,43 @@ +name: Self-hosted runner (nightly-ci) + + +on: + repository_dispatch: + schedule: + - cron: "17 2 * * *" + push: + branches: + - run_nightly_ci* + +jobs: + build_nightly_ci_images: + name: Build Nightly CI Docker Images + if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci')) + uses: ./.github/workflows/build-nightly-ci-docker-images.yml + secrets: inherit + + model-ci: + name: Model CI + needs: [build_nightly_ci_images] + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_models_gpu + slack_report_channel: "#transformers-ci-past-future" + runner: ci + docker: huggingface/transformers-all-latest-torch-nightly-gpu + ci_event: Nightly CI + secrets: inherit + + deepspeed-ci: + name: DeepSpeed CI + needs: [build_nightly_ci_images] + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_torch_cuda_extensions_gpu + slack_report_channel: "#transformers-ci-past-future" + runner: ci + # test deepspeed nightly build with the latest release torch + docker: huggingface/transformers-pytorch-deepspeed-latest-gpu + ci_event: Nightly CI + working-directory-prefix: /workspace + secrets: inherit diff --git a/.github/workflows/self-nightly-past-ci-caller.yml b/.github/workflows/self-nightly-past-ci-caller.yml index dfc258e5be856a..142399a6366ce6 100644 --- a/.github/workflows/self-nightly-past-ci-caller.yml +++ b/.github/workflows/self-nightly-past-ci-caller.yml @@ -2,32 +2,30 @@ name: Self-hosted runner (nightly-past-ci-caller) on: schedule: - # 2:17 am on each Sunday and Thursday - - - cron: "17 2 * * 0,4" + - cron: "17 2,14 * * *" push: branches: - - run_nightly_ci* - run_past_ci* jobs: - build_nightly_ci_images: - name: Build Nightly CI Docker Images - if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci')) - uses: ./.github/workflows/build-nightly-ci-docker-images.yml - secrets: inherit - - run_nightly_ci: - name: Nightly CI - needs: [build_nightly_ci_images] - uses: ./.github/workflows/self-nightly-scheduled.yml - secrets: inherit + get_number: + name: Get number + runs-on: ubuntu-22.04 + outputs: + run_number: ${{ steps.get_number.outputs.run_number }} + steps: + - name: Get number + id: get_number + run: | + echo "${{ github.run_number }}" + echo "$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" + echo "run_number=$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" >> $GITHUB_OUTPUT run_past_ci_pytorch_1-13: name: PyTorch 1.13 - if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) - needs: [run_nightly_ci] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 0 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) + uses: ./.github/workflows/self-past-caller.yml with: framework: pytorch version: "1.13" @@ -36,9 +34,9 @@ jobs: run_past_ci_pytorch_1-12: name: PyTorch 1.12 - if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) - needs: [run_past_ci_pytorch_1-13] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 1 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) + uses: ./.github/workflows/self-past-caller.yml with: framework: pytorch version: "1.12" @@ -47,31 +45,20 @@ jobs: run_past_ci_pytorch_1-11: name: PyTorch 1.11 - if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) - needs: [run_past_ci_pytorch_1-12] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 2 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) + uses: ./.github/workflows/self-past-caller.yml with: framework: pytorch version: "1.11" sha: ${{ github.sha }} secrets: inherit - run_past_ci_pytorch_1-10: - name: PyTorch 1.10 - if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) - needs: [run_past_ci_pytorch_1-11] - uses: ./.github/workflows/self-past.yml - with: - framework: pytorch - version: "1.10" - sha: ${{ github.sha }} - secrets: inherit - run_past_ci_tensorflow_2-11: name: TensorFlow 2.11 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_pytorch_1-10] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 3 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.11" @@ -80,9 +67,9 @@ jobs: run_past_ci_tensorflow_2-10: name: TensorFlow 2.10 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-11] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 4 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.10" @@ -91,9 +78,9 @@ jobs: run_past_ci_tensorflow_2-9: name: TensorFlow 2.9 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-10] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 5 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.9" @@ -102,9 +89,9 @@ jobs: run_past_ci_tensorflow_2-8: name: TensorFlow 2.8 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-9] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 6 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.8" @@ -113,9 +100,9 @@ jobs: run_past_ci_tensorflow_2-7: name: TensorFlow 2.7 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-8] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 7 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.7" @@ -124,9 +111,9 @@ jobs: run_past_ci_tensorflow_2-6: name: TensorFlow 2.6 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-7] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 8 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.6" @@ -135,9 +122,9 @@ jobs: run_past_ci_tensorflow_2-5: name: TensorFlow 2.5 - if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) - needs: [run_past_ci_tensorflow_2-6] - uses: ./.github/workflows/self-past.yml + needs: get_number + if: needs.get_number.outputs.run_number == 9 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) + uses: ./.github/workflows/self-past-caller.yml with: framework: tensorflow version: "2.5" diff --git a/.github/workflows/self-nightly-scheduled.yml b/.github/workflows/self-nightly-scheduled.yml deleted file mode 100644 index 37dc98f340a16d..00000000000000 --- a/.github/workflows/self-nightly-scheduled.yml +++ /dev/null @@ -1,289 +0,0 @@ -name: Self-hosted runner (nightly-ci) - -# Note that each job's dependencies go into a corresponding docker file. -# -# For example for `run_all_tests_torch_cuda_extensions_gpu` the docker image is -# `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at -# `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile` - -on: - repository_dispatch: - workflow_call: - -env: - HF_HOME: /mnt/cache - TRANSFORMERS_IS_CI: yes - OMP_NUM_THREADS: 8 - MKL_NUM_THREADS: 8 - RUN_SLOW: yes - SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} - TF_FORCE_GPU_ALLOW_GROWTH: true - RUN_PT_TF_CROSS_TESTS: 1 - CUDA_VISIBLE_DEVICES: 0,1 - -jobs: - setup: - name: Setup - strategy: - matrix: - machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-all-latest-torch-nightly-gpu - options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - outputs: - matrix: ${{ steps.set-matrix.outputs.matrix }} - steps: - - name: Update clone - working-directory: /transformers - run: | - git fetch && git checkout ${{ github.sha }} - - - name: Cleanup - working-directory: /transformers - run: | - rm -rf tests/__pycache__ - rm -rf tests/models/__pycache__ - rm -rf reports - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - id: set-matrix - name: Identify models to test - working-directory: /transformers/tests - run: | - echo "matrix=$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')" >> $GITHUB_OUTPUT - - - name: NVIDIA-SMI - run: | - nvidia-smi - - run_tests_single_gpu: - name: Model tests - strategy: - fail-fast: false - matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [single-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-all-latest-torch-nightly-gpu - options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup - steps: - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ github.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Environment - working-directory: /transformers - run: | - python3 utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports_postfix_nightly - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} - - run_tests_multi_gpu: - name: Model tests - strategy: - fail-fast: false - matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-all-latest-torch-nightly-gpu - options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup - steps: - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ github.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Environment - working-directory: /transformers - run: | - python3 utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports_postfix_nightly - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} - - run_all_tests_torch_cuda_extensions_gpu: - name: Torch CUDA extension tests - strategy: - fail-fast: false - matrix: - machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - needs: setup - container: - image: huggingface/transformers-pytorch-deepspeed-nightly-gpu - options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - steps: - - name: Update clone - working-directory: /workspace/transformers - run: git fetch && git checkout ${{ github.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /workspace/transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: Remove cached torch extensions - run: rm -rf /github/home/.cache/torch_extensions/ - - # To avoid unknown test failures - - name: Pre build DeepSpeed *again* - working-directory: /workspace - run: | - python3 -m pip uninstall -y deepspeed - rm -rf DeepSpeed - git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build - DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Environment - working-directory: /workspace/transformers - run: | - python utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /workspace/transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /workspace/transformers - run: | - python -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports_postfix_nightly - path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu - - send_results: - name: Send results to webhook - runs-on: ubuntu-22.04 - if: always() - needs: [ - setup, - run_tests_single_gpu, - run_tests_multi_gpu, - run_all_tests_torch_cuda_extensions_gpu - ] - steps: - - name: Preliminary job status - shell: bash - # For the meaning of these environment variables, see the job `Setup` - run: | - echo "Setup status: ${{ needs.setup.result }}" - - - uses: actions/checkout@v3 - - uses: actions/download-artifact@v3 - - name: Send message to Slack - env: - CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} - CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }} - CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} - CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} - CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }} - ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} - CI_EVENT: Nightly CI - SETUP_STATUS: ${{ needs.setup.result }} - # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change - # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. - run: | - pip install slack_sdk - pip show slack_sdk - python utils/notification_service.py "${{ needs.setup.outputs.matrix }}" - - - # delete-artifact - - uses: geekyeggo/delete-artifact@v2 - with: - name: | - single-* - multi-* diff --git a/.github/workflows/self-past-caller.yml b/.github/workflows/self-past-caller.yml new file mode 100644 index 00000000000000..1929a01c34d947 --- /dev/null +++ b/.github/workflows/self-past-caller.yml @@ -0,0 +1,40 @@ +name: Self-hosted runner (past-ci) + + +on: + workflow_call: + inputs: + framework: + required: true + type: string + version: + required: true + type: string + # Use this to control the commit to test against + sha: + default: 'main' + required: false + type: string + +jobs: + model-ci: + name: Model CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_models_gpu + slack_report_channel: "#transformers-ci-past-future" + runner: past-ci + docker: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu + ci_event: Past CI - ${{ inputs.framework }}-${{ inputs.version }} + secrets: inherit + + deepspeed-ci: + name: DeepSpeed CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_torch_cuda_extensions_gpu + slack_report_channel: "#transformers-ci-past-future" + runner: past-ci + docker: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu + ci_event: Past CI - ${{ inputs.framework }}-${{ inputs.version }} + secrets: inherit diff --git a/.github/workflows/self-past.yml b/.github/workflows/self-past.yml deleted file mode 100644 index ed60c92f6745a8..00000000000000 --- a/.github/workflows/self-past.yml +++ /dev/null @@ -1,356 +0,0 @@ -name: Self-hosted runner (past-ci) - -# Note that each job's dependencies go into a corresponding docker file. -# -# For example for `run_all_tests_torch_cuda_extensions_gpu` the docker image is -# `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at -# `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile` - -on: - workflow_call: - inputs: - framework: - required: true - type: string - version: - required: true - type: string - # Use this to control the commit to test against - sha: - default: 'main' - required: false - type: string - -env: - HF_HOME: /mnt/cache - TRANSFORMERS_IS_CI: yes - OMP_NUM_THREADS: 8 - MKL_NUM_THREADS: 8 - RUN_SLOW: yes - SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} - TF_FORCE_GPU_ALLOW_GROWTH: true - RUN_PT_TF_CROSS_TESTS: 1 - CUDA_VISIBLE_DEVICES: 0,1 - -jobs: - setup: - name: Setup - strategy: - matrix: - machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu - options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - outputs: - matrix: ${{ steps.set-matrix.outputs.matrix }} - steps: - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ inputs.sha }} - - - name: Cleanup - working-directory: /transformers - run: | - rm -rf tests/__pycache__ - rm -rf tests/models/__pycache__ - rm -rf reports - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - id: set-matrix - working-directory: /transformers - name: Identify models to test - run: | - cd tests - echo "matrix=$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')" >> $GITHUB_OUTPUT - - run_tests_single_gpu: - name: Model tests - strategy: - fail-fast: false - matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [single-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu - options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup - steps: - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ inputs.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: Update some packages - working-directory: /transformers - run: python3 -m pip install -U datasets - - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Install - if: inputs.framework == 'pytorch' - working-directory: /transformers - run: | - python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate - - - name: Environment - working-directory: /transformers - run: | - python3 utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - - name: Save job name - if: ${{ always() }} - shell: bash - run: | - matrix_folders=${matrix_folders/'models_'/'models/'} - job_name="Model tests ($matrix_folders, ${{ matrix.machine_type }})" - echo "$job_name" - echo "$job_name" > /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/job_name.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports_postfix_${{ inputs.framework }}-${{ inputs.version }} - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} - - run_tests_multi_gpu: - name: Model tests - strategy: - fail-fast: false - matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - container: - image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu - options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup - steps: - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ inputs.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: Update some packages - working-directory: /transformers - run: python3 -m pip install -U datasets - - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Install - if: inputs.framework == 'pytorch' - working-directory: /transformers - run: | - python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate - - - name: Environment - working-directory: /transformers - run: | - python3 utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - - name: Save job name - if: ${{ always() }} - shell: bash - run: | - matrix_folders=${matrix_folders/'models_'/'models/'} - job_name="Model tests ($matrix_folders, ${{ matrix.machine_type }})" - echo "$job_name" - echo "$job_name" > /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/job_name.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports_postfix_${{ inputs.framework }}-${{ inputs.version }} - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} - - run_all_tests_torch_cuda_extensions_gpu: - name: Torch CUDA extension tests - if: inputs.framework == 'pytorch' - strategy: - fail-fast: false - matrix: - machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, past-ci] - needs: setup - container: - image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu - options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - steps: - - name: Update clone - working-directory: /transformers - run: git fetch && git checkout ${{ github.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - - name: Update some packages - working-directory: /transformers - run: python3 -m pip install -U datasets - - - name: Install - working-directory: /transformers - run: | - python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate - - - name: Remove cached torch extensions - run: rm -rf /github/home/.cache/torch_extensions/ - - # To avoid unknown test failures - - name: Pre build DeepSpeed *again* - working-directory: / - run: | - python3 -m pip uninstall -y deepspeed - rm -rf DeepSpeed - git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build - DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check - - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Environment - working-directory: /transformers - run: | - python3 utils/print_env.py - - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze - - - name: Run all tests on GPU - working-directory: /transformers - run: | - python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports_postfix_${{ inputs.framework }}-${{ inputs.version }} - path: /transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu - - send_results: - name: Send results to webhook - runs-on: ubuntu-22.04 - if: always() - needs: [ - setup, - run_tests_single_gpu, - run_tests_multi_gpu, - run_all_tests_torch_cuda_extensions_gpu - ] - steps: - - name: Preliminary job status - shell: bash - # For the meaning of these environment variables, see the job `Setup` - run: | - echo "Setup status: ${{ needs.setup.result }}" - - - uses: actions/checkout@v3 - - uses: actions/download-artifact@v3 - - # Create a directory to store test failure tables in the next step - - name: Create directory - run: mkdir test_failure_tables - - - name: Send message to Slack - env: - CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} - CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }} - CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} - CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} - CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }} - ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} - CI_EVENT: Past CI - ${{ inputs.framework }}-${{ inputs.version }} - SETUP_STATUS: ${{ needs.setup.result }} - # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change - # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. - run: | - pip install slack_sdk - pip show slack_sdk - python utils/notification_service.py "${{ needs.setup.outputs.matrix }}" - - # Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack. - - name: Failure table artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: test_failure_tables_${{ inputs.framework }}-${{ inputs.version }} - path: test_failure_tables - - # delete-artifact - - uses: geekyeggo/delete-artifact@v2 - with: - name: | - single-* - multi-* diff --git a/.github/workflows/self-pr-slow-ci.yml b/.github/workflows/self-pr-slow-ci.yml new file mode 100644 index 00000000000000..2287b5e3f31587 --- /dev/null +++ b/.github/workflows/self-pr-slow-ci.yml @@ -0,0 +1,135 @@ +name: PR slow CI + +on: + pull_request: + paths: + - "src/transformers/models/*/modeling_*.py" + - "tests/**/test_*.py" + +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} + cancel-in-progress: true + +env: + HF_HOME: /mnt/cache + TRANSFORMERS_IS_CI: yes + OMP_NUM_THREADS: 8 + MKL_NUM_THREADS: 8 + RUN_SLOW: yes + # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. + # This token is created under the bot `hf-transformers-bot`. + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} + SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} + TF_FORCE_GPU_ALLOW_GROWTH: true + RUN_PT_TF_CROSS_TESTS: 1 + CUDA_VISIBLE_DEVICES: 0,1 + +jobs: + find_models_to_run: + runs-on: ubuntu-22.04 + name: Find models to run slow tests + # Triggered only if the required label `run-slow` is added + if: ${{ contains(github.event.pull_request.labels.*.name, 'run-slow') }} + outputs: + models: ${{ steps.models_to_run.outputs.models }} + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: "0" + ref: ${{ github.event.pull_request.head.sha }} + + - name: Get commit message + run: | + echo "commit_message=$(git show -s --format=%s)" >> $GITHUB_ENV + + - name: Get models to run slow tests + run: | + echo "${{ env.commit_message }}" + python -m pip install GitPython + python utils/pr_slow_ci_models.py --commit_message "${{ env.commit_message }}" | tee output.txt + echo "models=$(tail -n 1 output.txt)" >> $GITHUB_ENV + + - name: Models to run slow tests + id: models_to_run + run: | + echo "${{ env.models }}" + echo "models=${{ env.models }}" >> $GITHUB_OUTPUT + + run_models_gpu: + name: Run all tests for the model + # Triggered only `find_models_to_run` is triggered (label `run-slow` is added) which gives the models to run + # (either a new model PR or via a commit message) + if: ${{ needs.find_models_to_run.outputs.models != '[]' }} + needs: find_models_to_run + strategy: + fail-fast: false + matrix: + folders: ${{ fromJson(needs.find_models_to_run.outputs.models) }} + machine_type: [single-gpu, multi-gpu] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, ci] + container: + image: huggingface/transformers-all-latest-gpu + options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + steps: + - name: Echo input and matrix info + shell: bash + run: | + echo "${{ matrix.folders }}" + + - name: Echo folder ${{ matrix.folders }} + shell: bash + # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to + # set the artifact folder names (because the character `/` is not allowed). + run: | + echo "${{ matrix.folders }}" + matrix_folders=${{ matrix.folders }} + matrix_folders=${matrix_folders/'models/'/'models_'} + echo "$matrix_folders" + echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV + + - name: Update clone + working-directory: /transformers + run: git fetch && git fetch origin pull/${{ github.event.pull_request.number }}/head:pull/${{ github.event.pull_request.number }}/merge && git checkout pull/${{ github.event.pull_request.number }}/merge + + - name: Reinstall transformers in edit mode (remove the one installed during docker image build) + working-directory: /transformers + run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + + - name: NVIDIA-SMI + run: | + nvidia-smi + + - name: Environment + working-directory: /transformers + run: | + python3 utils/print_env.py + + - name: Show installed libraries and their versions + working-directory: /transformers + run: pip freeze + + - name: Run all tests on GPU + working-directory: /transformers + run: | + export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})" + echo $CUDA_VISIBLE_DEVICES + python3 -m pytest -v -rsfE --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} + + - name: Failure short reports + if: ${{ failure() }} + continue-on-error: true + run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt + + - name: Make sure report directory exists + shell: bash + run: | + mkdir -p /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports + echo "hello" > /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt + echo "${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports" + + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports" + if: ${{ always() }} + uses: actions/upload-artifact@v4 + with: + name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports diff --git a/.github/workflows/self-push-amd-mi300-caller.yml b/.github/workflows/self-push-amd-mi300-caller.yml new file mode 100644 index 00000000000000..a8ee4e540ecf3f --- /dev/null +++ b/.github/workflows/self-push-amd-mi300-caller.yml @@ -0,0 +1,25 @@ +name: Self-hosted runner (AMD mi300 CI caller) + +on: + workflow_run: + workflows: ["Self-hosted runner (push-caller)"] + branches: ["main"] + types: [completed] + push: + branches: + - run_amd_push_ci_caller* + paths: + - "src/**" + - "tests/**" + - ".github/**" + - "templates/**" + - "utils/**" + +jobs: + run_amd_ci: + name: AMD mi300 + if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci')))) + uses: ./.github/workflows/self-push-amd.yml + with: + gpu_flavor: mi300 + secrets: inherit diff --git a/.github/workflows/self-push-amd.yml b/.github/workflows/self-push-amd.yml index 313f3b85a63d41..ce6b9fe91caa2b 100644 --- a/.github/workflows/self-push-amd.yml +++ b/.github/workflows/self-push-amd.yml @@ -15,14 +15,27 @@ env: PYTEST_TIMEOUT: 60 TF_FORCE_GPU_ALLOW_GROWTH: true RUN_PT_TF_CROSS_TESTS: 1 + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} jobs: + check_runner_status: + name: Check Runner Status + runs-on: ubuntu-22.04 + steps: + - name: Checkout transformers + uses: actions/checkout@v4 + with: + fetch-depth: 2 + + - name: Check Runner Status + run: python utils/check_self_hosted_runner.py --target_runners amd-mi210-single-gpu-ci-runner-docker --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }} + check_runners: name: Check Runners strategy: matrix: machine_type: [single-gpu, multi-gpu] - runs-on: rocm + runs-on: [rocm, self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -41,7 +54,7 @@ jobs: strategy: matrix: machine_type: [single-gpu, multi-gpu] - runs-on: rocm + runs-on: [rocm, self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -112,7 +125,7 @@ jobs: python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt - name: Report fetched tests - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: test_fetched path: /transformers/test_preparation.txt @@ -136,7 +149,7 @@ jobs: echo "matrix=$keys" >> $GITHUB_OUTPUT echo "test_map=$test_map" >> $GITHUB_OUTPUT - run_tests_amdgpu: + run_models_gpu: name: Model tests needs: setup_gpu # `dummy` means there is no test to run @@ -146,7 +159,7 @@ jobs: matrix: folders: ${{ fromJson(needs.setup_gpu.outputs.matrix) }} machine_type: [single-gpu, multi-gpu] - runs-on: rocm + runs-on: [rocm, self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now options: --device /dev/kfd --device /dev/dri --env HIP_VISIBLE_DEVICES --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -227,19 +240,19 @@ jobs: - name: Run all non-slow selected tests on GPU working-directory: /transformers run: | - python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }} + python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports ${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }} -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} + name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports send_results: name: Send results to webhook @@ -249,6 +262,9 @@ jobs: check_runners, setup_gpu, run_tests_amdgpu + run_models_gpu, +# run_tests_torch_cuda_extensions_single_gpu, +# run_tests_torch_cuda_extensions_multi_gpu ] steps: - name: Preliminary job status @@ -281,7 +297,7 @@ jobs: echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}" echo "env.CI_SHA = ${{ env.CI_SHA }}" - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 # To avoid failure when multiple commits are merged into `main` in a short period of time. # Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ... # (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit) diff --git a/.github/workflows/self-push-caller.yml b/.github/workflows/self-push-caller.yml index 9247848b89ec6d..59adde4c54e077 100644 --- a/.github/workflows/self-push-caller.yml +++ b/.github/workflows/self-push-caller.yml @@ -19,13 +19,13 @@ jobs: outputs: changed: ${{ steps.was_changed.outputs.changed }} steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 with: fetch-depth: "2" - name: Get changed files id: changed-files - uses: tj-actions/changed-files@v22.2 + uses: tj-actions/changed-files@v41 - name: Was setup changed id: was_changed diff --git a/.github/workflows/self-push.yml b/.github/workflows/self-push.yml index e6f1f3b3050f7a..31f68c291b5a0f 100644 --- a/.github/workflows/self-push.yml +++ b/.github/workflows/self-push.yml @@ -97,7 +97,7 @@ jobs: python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt - name: Report fetched tests - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: test_fetched path: /transformers/test_preparation.txt @@ -207,9 +207,9 @@ jobs: continue-on-error: true run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} @@ -302,9 +302,9 @@ jobs: continue-on-error: true run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} @@ -385,19 +385,19 @@ jobs: working-directory: /workspace/transformers # TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests. run: | - python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended + python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt + run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports - path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu + name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports + path: /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports run_tests_torch_cuda_extensions_multi_gpu: name: Torch CUDA extension tests @@ -475,19 +475,19 @@ jobs: working-directory: /workspace/transformers # TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests. run: | - python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended + python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt + run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports - path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu + name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports + path: /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports send_results: name: Send results to webhook @@ -530,7 +530,7 @@ jobs: echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}" echo "env.CI_SHA = ${{ env.CI_SHA }}" - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 # To avoid failure when multiple commits are merged into `main` in a short period of time. # Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ... # (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit) @@ -545,7 +545,7 @@ jobs: git checkout ${{ env.CI_SHA }} echo "log = $(git log -n 1)" - - uses: actions/download-artifact@v3 + - uses: actions/download-artifact@v4 - name: Send message to Slack env: CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} @@ -563,6 +563,7 @@ jobs: # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. run: | - pip install slack_sdk + pip install huggingface_hub + pip install slack_sdk pip show slack_sdk python utils/notification_service.py "${{ needs.setup.outputs.matrix }}" diff --git a/.github/workflows/self-scheduled-amd-mi210-caller.yml b/.github/workflows/self-scheduled-amd-mi210-caller.yml index cdb968901058b6..6abba6894aaffa 100644 --- a/.github/workflows/self-scheduled-amd-mi210-caller.yml +++ b/.github/workflows/self-scheduled-amd-mi210-caller.yml @@ -16,4 +16,5 @@ jobs: uses: ./.github/workflows/self-scheduled-amd.yml with: gpu_flavor: mi210 + slack_report_channel: "#transformers-ci-daily-amd" secrets: inherit diff --git a/.github/workflows/self-scheduled-amd-mi250-caller.yml b/.github/workflows/self-scheduled-amd-mi250-caller.yml index dc7d12f173935e..36365d4a67f1e2 100644 --- a/.github/workflows/self-scheduled-amd-mi250-caller.yml +++ b/.github/workflows/self-scheduled-amd-mi250-caller.yml @@ -16,4 +16,5 @@ jobs: uses: ./.github/workflows/self-scheduled-amd.yml with: gpu_flavor: mi250 + slack_report_channel: "#transformers-ci-daily-amd" secrets: inherit diff --git a/.github/workflows/self-scheduled-amd-mi300-caller.yml b/.github/workflows/self-scheduled-amd-mi300-caller.yml new file mode 100644 index 00000000000000..a9e7b934c34b77 --- /dev/null +++ b/.github/workflows/self-scheduled-amd-mi300-caller.yml @@ -0,0 +1,21 @@ +name: Self-hosted runner (AMD mi300 scheduled CI caller) + +on: + workflow_run: + workflows: ["Self-hosted runner (AMD scheduled CI caller)"] + branches: ["main"] + types: [completed] + push: + branches: + - run_amd_scheduled_ci_caller* + +jobs: + run_amd_ci: + name: AMD mi300 + needs: build-docker-containers + if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci')))) + uses: ./.github/workflows/self-scheduled-amd.yml + with: + gpu_flavor: mi300 + slack_report_channel: "#transformers-ci-daily-amd" + secrets: inherit diff --git a/.github/workflows/self-scheduled-amd.yml b/.github/workflows/self-scheduled-amd.yml index 3d41a3b95e6c50..f3b17bfbffb022 100644 --- a/.github/workflows/self-scheduled-amd.yml +++ b/.github/workflows/self-scheduled-amd.yml @@ -16,6 +16,7 @@ env: OMP_NUM_THREADS: 8 MKL_NUM_THREADS: 8 RUN_SLOW: yes + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} @@ -28,12 +29,12 @@ jobs: runs-on: ubuntu-22.04 steps: - name: Checkout transformers - uses: actions/checkout@v3 + uses: actions/checkout@v4 with: fetch-depth: 2 - name: Check Runner Status - run: python utils/check_self_hosted_runner.py --target_runners hf-amd-mi210-ci-1gpu-1,hf-amd-mi250-ci-1gpu-1 --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }} + run: python utils/check_self_hosted_runner.py --target_runners hf-amd-mi210-ci-1gpu-1,hf-amd-mi250-ci-1gpu-1,hf-amd-mi300-ci-1gpu-1 --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }} check_runners: name: Check Runners @@ -41,7 +42,7 @@ jobs: strategy: matrix: machine_type: [single-gpu, multi-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -62,7 +63,7 @@ jobs: strategy: matrix: machine_type: [single-gpu, multi-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -107,7 +108,7 @@ jobs: run: | python3 utils/print_env.py - run_tests_single_gpu: + run_models_gpu_single_gpu: name: Single GPU tests strategy: max-parallel: 1 # For now, not to parallelize. Can change later if it works well. @@ -115,7 +116,7 @@ jobs: matrix: folders: ${{ fromJson(needs.setup.outputs.matrix) }} machine_type: [single-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -161,21 +162,21 @@ jobs: - name: Run all tests on GPU working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} + run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} + name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports - run_tests_multi_gpu: + run_models_gpu_multi_gpu: name: Multi GPU tests strategy: max-parallel: 1 @@ -183,7 +184,7 @@ jobs: matrix: folders: ${{ fromJson(needs.setup.outputs.matrix) }} machine_type: [multi-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -229,19 +230,19 @@ jobs: - name: Run all tests on GPU working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} + run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} + name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports run_examples_gpu: name: Examples tests @@ -249,7 +250,7 @@ jobs: fail-fast: false matrix: machine_type: [single-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -286,19 +287,19 @@ jobs: working-directory: /transformers run: | pip install -r examples/pytorch/_tests_requirements.txt - python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_examples_gpu examples/pytorch + python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_examples_gpu_test_reports examples/pytorch -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_examples_gpu/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_examples_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_examples_gpu - path: /transformers/reports/${{ matrix.machine_type }}_examples_gpu + name: ${{ matrix.machine_type }}_run_examples_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports run_pipelines_torch_gpu: name: PyTorch pipelines tests @@ -306,7 +307,7 @@ jobs: fail-fast: false matrix: machine_type: [single-gpu, multi-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] container: image: huggingface/transformers-pytorch-amd-gpu options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ @@ -342,28 +343,28 @@ jobs: - name: Run all pipeline tests on GPU working-directory: /transformers run: | - python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_tests_torch_pipeline_gpu tests/pipelines + python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_torch_pipeline_gpu - path: /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu + name: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports - run_tests_torch_deepspeed_gpu: + run_torch_cuda_extensions_gpu: name: Torch ROCm deepspeed tests strategy: fail-fast: false matrix: machine_type: [single-gpu, multi-gpu] - runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] + runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}'] needs: setup container: image: huggingface/transformers-pytorch-deepspeed-amd-gpu @@ -399,19 +400,19 @@ jobs: - name: Run all tests on GPU working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_deepspeed_gpu tests/deepspeed tests/extended + run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended -m "not not_device_test" - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_torch_deepspeed_gpu/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_torch_deepspeed_gpu_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_torch_deepspeed_gpu + name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports run_extract_warnings: name: Extract warnings in CI artifacts @@ -421,15 +422,15 @@ jobs: check_runner_status, check_runners, setup, - run_tests_single_gpu, - run_tests_multi_gpu, + run_models_gpu_single_gpu, + run_models_gpu_multi_gpu, run_examples_gpu, run_pipelines_torch_gpu, - run_tests_torch_deepspeed_gpu + run_torch_cuda_extensions_gpu ] steps: - name: Checkout transformers - uses: actions/checkout@v3 + uses: actions/checkout@v4 with: fetch-depth: 2 @@ -442,7 +443,7 @@ jobs: - name: Create output directory run: mkdir warnings_in_ci - - uses: actions/download-artifact@v3 + - uses: actions/download-artifact@v4 with: path: warnings_in_ci @@ -457,7 +458,7 @@ jobs: - name: Upload artifact if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: warnings_in_ci path: warnings_in_ci/selected_warnings.json @@ -470,11 +471,11 @@ jobs: check_runner_status, check_runners, setup, - run_tests_single_gpu, - run_tests_multi_gpu, + run_models_gpu_single_gpu, + run_models_gpu_multi_gpu, run_examples_gpu, run_pipelines_torch_gpu, - run_tests_torch_deepspeed_gpu, + run_torch_cuda_extensions_gpu, run_extract_warnings ] steps: @@ -486,8 +487,8 @@ jobs: echo "Runner status: ${{ needs.check_runners.result }}" echo "Setup status: ${{ needs.setup.result }}" - - uses: actions/checkout@v3 - - uses: actions/download-artifact@v3 + - uses: actions/checkout@v4 + - uses: actions/download-artifact@v4 - name: Send message to Slack env: CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} @@ -505,6 +506,7 @@ jobs: # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. run: | sudo apt-get install -y curl + pip install huggingface_hub pip install slack_sdk pip show slack_sdk python utils/notification_service.py "${{ needs.setup.outputs.matrix }}" @@ -512,7 +514,7 @@ jobs: # Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack. - name: Failure table artifacts if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: test_failure_tables path: test_failure_tables diff --git a/.github/workflows/self-scheduled-caller.yml b/.github/workflows/self-scheduled-caller.yml new file mode 100644 index 00000000000000..75ea3bb24bc7fa --- /dev/null +++ b/.github/workflows/self-scheduled-caller.yml @@ -0,0 +1,78 @@ +name: Self-hosted runner (scheduled) + + +on: + repository_dispatch: + schedule: + - cron: "17 2 * * *" + push: + branches: + - run_scheduled_ci* + +jobs: + model-ci: + name: Model CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_models_gpu + slack_report_channel: "#transformers-ci-daily-models" + runner: daily-ci + docker: huggingface/transformers-all-latest-gpu + ci_event: Daily CI + secrets: inherit + + torch-pipeline: + name: Torch pipeline CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_pipelines_torch_gpu + slack_report_channel: "#transformers-ci-daily-pipeline-torch" + runner: daily-ci + docker: huggingface/transformers-pytorch-gpu + ci_event: Daily CI + secrets: inherit + + tf-pipeline: + name: TF pipeline CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_pipelines_tf_gpu + slack_report_channel: "#transformers-ci-daily-pipeline-tf" + runner: daily-ci + docker: huggingface/transformers-tensorflow-gpu + ci_event: Daily CI + secrets: inherit + + example-ci: + name: Example CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_examples_gpu + slack_report_channel: "#transformers-ci-daily-examples" + runner: daily-ci + docker: huggingface/transformers-all-latest-gpu + ci_event: Daily CI + secrets: inherit + + deepspeed-ci: + name: DeepSpeed CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_torch_cuda_extensions_gpu + slack_report_channel: "#transformers-ci-daily-deepspeed" + runner: daily-ci + docker: huggingface/transformers-pytorch-deepspeed-latest-gpu + ci_event: Daily CI + working-directory-prefix: /workspace + secrets: inherit + + quantization-ci: + name: Quantization CI + uses: ./.github/workflows/self-scheduled.yml + with: + job: run_quantization_torch_gpu + slack_report_channel: "#transformers-ci-daily-quantization" + runner: daily-ci + docker: huggingface/transformers-quantization-latest-gpu + ci_event: Daily CI + secrets: inherit diff --git a/.github/workflows/self-scheduled.yml b/.github/workflows/self-scheduled.yml index 995df2e07880ac..b056759aa77379 100644 --- a/.github/workflows/self-scheduled.yml +++ b/.github/workflows/self-scheduled.yml @@ -2,17 +2,32 @@ name: Self-hosted runner (scheduled) # Note that each job's dependencies go into a corresponding docker file. # -# For example for `run_all_tests_torch_cuda_extensions_gpu` the docker image is +# For example for `run_torch_cuda_extensions_gpu` the docker image is # `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at # `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile` on: - repository_dispatch: - schedule: - - cron: "17 2 * * *" - push: - branches: - - run_scheduled_ci* + workflow_call: + inputs: + job: + required: true + type: string + slack_report_channel: + required: true + type: string + runner: + required: true + type: string + docker: + required: true + type: string + ci_event: + required: true + type: string + working-directory-prefix: + default: '' + required: false + type: string env: HF_HOME: /mnt/cache @@ -20,23 +35,30 @@ env: OMP_NUM_THREADS: 8 MKL_NUM_THREADS: 8 RUN_SLOW: yes + # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. + # This token is created under the bot `hf-transformers-bot`. + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} TF_FORCE_GPU_ALLOW_GROWTH: true RUN_PT_TF_CROSS_TESTS: 1 CUDA_VISIBLE_DEVICES: 0,1 + NUM_SLICES: 2 jobs: setup: + if: contains(fromJSON('["run_models_gpu", "run_quantization_torch_gpu"]'), inputs.job) name: Setup strategy: matrix: machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: image: huggingface/transformers-all-latest-gpu options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ outputs: - matrix: ${{ steps.set-matrix.outputs.matrix }} + folder_slices: ${{ steps.set-matrix.outputs.folder_slices }} + slice_ids: ${{ steps.set-matrix.outputs.slice_ids }} + quantization_matrix: ${{ steps.set-matrix-quantization.outputs.quantization_matrix }} steps: - name: Update clone working-directory: /transformers @@ -55,39 +77,54 @@ jobs: run: pip freeze - id: set-matrix + if: ${{ inputs.job == 'run_models_gpu' }} name: Identify models to test working-directory: /transformers/tests run: | - echo "matrix=$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')" >> $GITHUB_OUTPUT + echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT + echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT + + - id: set-matrix-quantization + if: ${{ inputs.job == 'run_quantization_torch_gpu' }} + name: Identify quantization method to test + working-directory: /transformers/tests + run: | + echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT - name: NVIDIA-SMI run: | nvidia-smi - run_tests_single_gpu: - name: Model tests + run_models_gpu: + if: ${{ inputs.job == 'run_models_gpu' }} + name: " " + needs: setup strategy: fail-fast: false matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [single-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] + machine_type: [single-gpu, multi-gpu] + slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }} + uses: ./.github/workflows/model_jobs.yml + with: + folder_slices: ${{ needs.setup.outputs.folder_slices }} + machine_type: ${{ matrix.machine_type }} + slice_id: ${{ matrix.slice_id }} + runner: ${{ inputs.runner }} + docker: ${{ inputs.docker }} + secrets: inherit + + run_pipelines_torch_gpu: + if: ${{ inputs.job == 'run_pipelines_torch_gpu' }} + name: PyTorch pipelines + strategy: + fail-fast: false + matrix: + machine_type: [single-gpu, multi-gpu] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: - image: huggingface/transformers-all-latest-gpu - options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup + image: huggingface/transformers-pytorch-gpu + options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ steps: - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - name: Update clone working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} @@ -109,49 +146,39 @@ jobs: working-directory: /transformers run: pip freeze - - name: Run all tests on GPU + - name: Run all pipeline tests on GPU working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} + run: | + python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} + name: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports - run_tests_multi_gpu: - name: Model tests + run_pipelines_tf_gpu: + if: ${{ inputs.job == 'run_pipelines_tf_gpu' }} + name: TensorFlow pipelines strategy: fail-fast: false matrix: - folders: ${{ fromJson(needs.setup.outputs.matrix) }} - machine_type: [multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] + machine_type: [single-gpu, multi-gpu] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: - image: huggingface/transformers-all-latest-gpu + image: huggingface/transformers-tensorflow-gpu options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup steps: - - name: Echo folder ${{ matrix.folders }} - shell: bash - # For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to - # set the artifact folder names (because the character `/` is not allowed). - run: | - echo "${{ matrix.folders }}" - matrix_folders=${{ matrix.folders }} - matrix_folders=${matrix_folders/'models/'/'models_'} - echo "$matrix_folders" - echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV - - name: Update clone working-directory: /transformers - run: git fetch && git checkout ${{ github.sha }} + run: | + git fetch && git checkout ${{ github.sha }} - name: Reinstall transformers in edit mode (remove the one installed during docker image build) working-directory: /transformers @@ -170,33 +197,34 @@ jobs: working-directory: /transformers run: pip freeze - - name: Run all tests on GPU + - name: Run all pipeline tests on GPU working-directory: /transformers - run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }} + run: | + python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_run_pipelines_tf_gpu_test_reports tests/pipelines - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt + if: ${{ always() }} + run: | + cat /transformers/reports/${{ matrix.machine_type }}_run_pipelines_tf_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_pipelines_tf_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports - path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} + name: ${{ matrix.machine_type }}_run_pipelines_tf_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_pipelines_tf_gpu_test_reports run_examples_gpu: + if: ${{ inputs.job == 'run_examples_gpu' }} name: Examples directory strategy: fail-fast: false matrix: machine_type: [single-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: image: huggingface/transformers-all-latest-gpu options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup steps: - name: Update clone working-directory: /transformers @@ -223,197 +251,169 @@ jobs: working-directory: /transformers run: | pip install -r examples/pytorch/_tests_requirements.txt - python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_examples_gpu examples/pytorch + python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_examples_gpu_test_reports examples/pytorch - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_examples_gpu/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_examples_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_examples_gpu - path: /transformers/reports/${{ matrix.machine_type }}_examples_gpu + name: ${{ matrix.machine_type }}_run_examples_gpu_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports - run_pipelines_torch_gpu: - name: PyTorch pipelines + run_torch_cuda_extensions_gpu: + if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }} + name: Torch CUDA extension tests strategy: fail-fast: false matrix: machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: - image: huggingface/transformers-pytorch-gpu + image: ${{ inputs.docker }} options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup steps: - name: Update clone - working-directory: /transformers + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: git fetch && git checkout ${{ github.sha }} - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - name: NVIDIA-SMI - run: | - nvidia-smi - - - name: Environment - working-directory: /transformers + - name: Update / Install some packages (for Past CI) + if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }} + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: | - python3 utils/print_env.py + python3 -m pip install -U datasets + python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate - - name: Show installed libraries and their versions - working-directory: /transformers - run: pip freeze + - name: Remove cached torch extensions + run: rm -rf /github/home/.cache/torch_extensions/ - - name: Run all pipeline tests on GPU - working-directory: /transformers + # To avoid unknown test failures + - name: Pre build DeepSpeed *again* (for daily CI) + if: ${{ contains(inputs.ci_event, 'Daily CI') }} + working-directory: ${{ inputs.working-directory-prefix }}/ run: | - python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_tests_torch_pipeline_gpu tests/pipelines - - - name: Failure short reports - if: ${{ failure() }} - continue-on-error: true - run: cat /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu/failures_short.txt - - - name: Test suite reports artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: ${{ matrix.machine_type }}_run_tests_torch_pipeline_gpu - path: /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu + python3 -m pip uninstall -y deepspeed + DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check - run_pipelines_tf_gpu: - name: TensorFlow pipelines - strategy: - fail-fast: false - matrix: - machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] - container: - image: huggingface/transformers-tensorflow-gpu - options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ - needs: setup - steps: - - name: Update clone - working-directory: /transformers + # To avoid unknown test failures + - name: Pre build DeepSpeed *again* (for nightly & Past CI) + if: ${{ contains(inputs.ci_event, 'Nightly CI') || contains(inputs.ci_event, 'Past CI') }} + working-directory: ${{ inputs.working-directory-prefix }}/ run: | - git fetch && git checkout ${{ github.sha }} - - - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /transformers - run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . + python3 -m pip uninstall -y deepspeed + rm -rf DeepSpeed + git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build + DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check - name: NVIDIA-SMI run: | nvidia-smi - name: Environment - working-directory: /transformers + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: | python3 utils/print_env.py - name: Show installed libraries and their versions - working-directory: /transformers + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: pip freeze - - name: Run all pipeline tests on GPU - working-directory: /transformers + - name: Run all tests on GPU + working-directory: ${{ inputs.working-directory-prefix }}/transformers run: | - python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_tests_tf_pipeline_gpu tests/pipelines + python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended - name: Failure short reports - if: ${{ always() }} - run: | - cat /transformers/reports/${{ matrix.machine_type }}_tests_tf_pipeline_gpu/failures_short.txt + if: ${{ failure() }} + continue-on-error: true + run: cat ${{ inputs.working-directory-prefix }}/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_tf_pipeline_gpu - path: /transformers/reports/${{ matrix.machine_type }}_tests_tf_pipeline_gpu + name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports + path: ${{ inputs.working-directory-prefix }}/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports - run_all_tests_torch_cuda_extensions_gpu: - name: Torch CUDA extension tests + run_quantization_torch_gpu: + if: ${{ inputs.job == 'run_quantization_torch_gpu' }} + name: " " + needs: setup strategy: + max-parallel: 4 fail-fast: false matrix: + folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }} machine_type: [single-gpu, multi-gpu] - runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, daily-ci] - needs: setup + runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] container: - image: huggingface/transformers-pytorch-deepspeed-latest-gpu + image: huggingface/transformers-quantization-latest-gpu options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ steps: + - name: Echo folder ${{ matrix.folders }} + shell: bash + run: | + echo "${{ matrix.folders }}" + matrix_folders=${{ matrix.folders }} + matrix_folders=${matrix_folders/'quantization/'/'quantization_'} + echo "$matrix_folders" + echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV + - name: Update clone - working-directory: /workspace/transformers + working-directory: /transformers run: git fetch && git checkout ${{ github.sha }} - name: Reinstall transformers in edit mode (remove the one installed during docker image build) - working-directory: /workspace/transformers + working-directory: /transformers run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . - - name: Remove cached torch extensions - run: rm -rf /github/home/.cache/torch_extensions/ - - # To avoid unknown test failures - - name: Pre build DeepSpeed *again* - working-directory: /workspace - run: | - python3 -m pip uninstall -y deepspeed - DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check - - name: NVIDIA-SMI run: | nvidia-smi - name: Environment - working-directory: /workspace/transformers + working-directory: /transformers run: | - python utils/print_env.py + python3 utils/print_env.py - name: Show installed libraries and their versions - working-directory: /workspace/transformers + working-directory: /transformers run: pip freeze - - name: Run all tests on GPU - working-directory: /workspace/transformers + - name: Run quantization tests on GPU + working-directory: /transformers run: | - python -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended + python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} - name: Failure short reports if: ${{ failure() }} continue-on-error: true - run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt + run: cat /transformers/reports/${{ matrix.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt - - name: Test suite reports artifacts + - name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports" if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: - name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports - path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu + name: ${{ matrix.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports + path: /transformers/reports/${{ matrix.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports run_extract_warnings: + # Let's only do this for the job `run_models_gpu` to simplify the (already complex) logic. + if: ${{ always() && inputs.job == 'run_models_gpu' }} name: Extract warnings in CI artifacts runs-on: ubuntu-22.04 - if: always() - needs: [ - setup, - run_tests_single_gpu, - run_tests_multi_gpu, - run_examples_gpu, - run_pipelines_tf_gpu, - run_pipelines_torch_gpu, - run_all_tests_torch_cuda_extensions_gpu - ] + needs: [setup, run_models_gpu] steps: - name: Checkout transformers - uses: actions/checkout@v3 + uses: actions/checkout@v4 with: fetch-depth: 2 @@ -426,7 +426,7 @@ jobs: - name: Create output directory run: mkdir warnings_in_ci - - uses: actions/download-artifact@v3 + - uses: actions/download-artifact@v4 with: path: warnings_in_ci @@ -441,58 +441,33 @@ jobs: - name: Upload artifact if: ${{ always() }} - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: warnings_in_ci path: warnings_in_ci/selected_warnings.json send_results: - name: Send results to webhook - runs-on: ubuntu-22.04 - if: always() + name: Slack Report needs: [ setup, - run_tests_single_gpu, - run_tests_multi_gpu, - run_examples_gpu, - run_pipelines_tf_gpu, + run_models_gpu, run_pipelines_torch_gpu, - run_all_tests_torch_cuda_extensions_gpu, + run_pipelines_tf_gpu, + run_examples_gpu, + run_torch_cuda_extensions_gpu, + run_quantization_torch_gpu, run_extract_warnings ] - steps: - - name: Preliminary job status - shell: bash - # For the meaning of these environment variables, see the job `Setup` - run: | - echo "Setup status: ${{ needs.setup.result }}" - - - uses: actions/checkout@v3 - - uses: actions/download-artifact@v3 - - name: Send message to Slack - env: - CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} - CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }} - CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} - CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} - CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} - ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} - CI_EVENT: scheduled - CI_SHA: ${{ github.sha }} - CI_WORKFLOW_REF: ${{ github.workflow_ref }} - SETUP_STATUS: ${{ needs.setup.result }} - # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change - # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. - run: | - sudo apt-get install -y curl - pip install slack_sdk - pip show slack_sdk - python utils/notification_service.py "${{ needs.setup.outputs.matrix }}" - - # Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack. - - name: Failure table artifacts - if: ${{ always() }} - uses: actions/upload-artifact@v3 - with: - name: prev_ci_results - path: prev_ci_results + if: ${{ always() }} + uses: ./.github/workflows/slack-report.yml + with: + job: ${{ inputs.job }} + # This would be `skipped` if `setup` is skipped. + setup_status: ${{ needs.setup.result }} + slack_report_channel: ${{ inputs.slack_report_channel }} + # This would be an empty string if `setup` is skipped. + folder_slices: ${{ needs.setup.outputs.folder_slices }} + quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }} + ci_event: ${{ inputs.ci_event }} + + secrets: inherit diff --git a/.github/workflows/slack-report.yml b/.github/workflows/slack-report.yml new file mode 100644 index 00000000000000..ee2962ba89c37f --- /dev/null +++ b/.github/workflows/slack-report.yml @@ -0,0 +1,101 @@ +name: CI slack report + +on: + workflow_call: + inputs: + job: + required: true + type: string + slack_report_channel: + required: true + type: string + setup_status: + required: true + type: string + folder_slices: + required: true + type: string + quantization_matrix: + required: true + type: string + ci_event: + required: true + type: string + +env: + TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }} + +jobs: + send_results: + name: Send results to webhook + runs-on: ubuntu-22.04 + if: always() + steps: + - name: Preliminary job status + shell: bash + # For the meaning of these environment variables, see the job `Setup` + run: | + echo "Setup status: ${{ inputs.setup_status }}" + + - uses: actions/checkout@v4 + - uses: actions/download-artifact@v4 + - name: Send message to Slack + if: ${{ inputs.job != 'run_quantization_torch_gpu' }} + env: + CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} + CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }} + CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} + CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} + SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }} + ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} + CI_EVENT: ${{ inputs.ci_event }} + CI_SHA: ${{ github.sha }} + CI_WORKFLOW_REF: ${{ github.workflow_ref }} + CI_TEST_JOB: ${{ inputs.job }} + SETUP_STATUS: ${{ inputs.setup_status }} + # We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change + # `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`. + # For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an + # empty string, and the called script still get one argument (which is the emtpy string). + run: | + sudo apt-get install -y curl + pip install huggingface_hub + pip install slack_sdk + pip show slack_sdk + python utils/notification_service.py "${{ inputs.folder_slices }}" + + # Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack. + - name: Failure table artifacts + uses: actions/upload-artifact@v4 + with: + name: ci_results_${{ inputs.job }} + path: ci_results_${{ inputs.job }} + + - uses: actions/checkout@v4 + - uses: actions/download-artifact@v4 + - name: Send message to Slack for quantization workflow + if: ${{ inputs.job == 'run_quantization_torch_gpu' }} + env: + CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} + ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }} + SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }} + CI_EVENT: ${{ inputs.ci_event }} + CI_SHA: ${{ github.sha }} + CI_TEST_JOB: ${{ inputs.job }} + SETUP_STATUS: ${{ inputs.setup_status }} + # We pass `needs.setup.outputs.quantization_matrix` as the argument. A processing in `notification_service_quantization.py` to change + # `quantization/bnb` to `quantization_bnb` is required, as the artifact names use `_` instead of `/`. + run: | + sudo apt-get install -y curl + pip install huggingface_hub + pip install slack_sdk + pip show slack_sdk + python utils/notification_service_quantization.py "${{ inputs.quantization_matrix }}" + + # Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack. + - name: Failure table artifacts + if: ${{ inputs.job == 'run_quantization_torch_gpu' }} + uses: actions/upload-artifact@v4 + with: + name: ci_results_${{ inputs.job }} + path: ci_results_${{ inputs.job }} \ No newline at end of file diff --git a/.github/workflows/ssh-runner.yml b/.github/workflows/ssh-runner.yml new file mode 100644 index 00000000000000..7b47c0f437fa85 --- /dev/null +++ b/.github/workflows/ssh-runner.yml @@ -0,0 +1,63 @@ +name: SSH into our runners + +on: + workflow_dispatch: + inputs: + runner_type: + description: 'Type of runner to test (a10 or t4)' + required: true + docker_image: + description: 'Name of the Docker image' + required: true + num_gpus: + description: 'Type of the number of gpus to use (`single` or `multi`)' + required: true + +env: + HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }} + HF_HOME: /mnt/cache + TRANSFORMERS_IS_CI: yes + OMP_NUM_THREADS: 8 + MKL_NUM_THREADS: 8 + RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`. + SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} + TF_FORCE_GPU_ALLOW_GROWTH: true + CUDA_VISIBLE_DEVICES: 0,1 + RUN_PT_TF_CROSS_TESTS: 1 + +jobs: + ssh_runner: + name: "SSH" + runs-on: ["${{ github.event.inputs.num_gpus }}-gpu", nvidia-gpu, "${{ github.event.inputs.runner_type }}", ci] + container: + image: ${{ github.event.inputs.docker_image }} + options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ + + steps: + - name: Update clone + working-directory: /transformers + run: | + git fetch && git checkout ${{ github.sha }} + + - name: Cleanup + working-directory: /transformers + run: | + rm -rf tests/__pycache__ + rm -rf tests/models/__pycache__ + rm -rf reports + + - name: Show installed libraries and their versions + working-directory: /transformers + run: pip freeze + + - name: NVIDIA-SMI + run: | + nvidia-smi + + - name: Tailscale # In order to be able to SSH when a test fails + uses: huggingface/tailscale-action@main + with: + authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }} + slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }} + slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }} + waitForSSH: true diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml index 4a7e94bac429db..d0dfeb8b4b7129 100644 --- a/.github/workflows/stale.yml +++ b/.github/workflows/stale.yml @@ -12,10 +12,10 @@ jobs: env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Setup Python - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: python-version: 3.8 diff --git a/.github/workflows/trufflehog.yml b/.github/workflows/trufflehog.yml new file mode 100644 index 00000000000000..29a11e9354dbb1 --- /dev/null +++ b/.github/workflows/trufflehog.yml @@ -0,0 +1,18 @@ +on: + push: + +name: Secret Leaks + +permissions: + contents: read + +jobs: + trufflehog: + runs-on: ubuntu-latest + steps: + - name: Checkout code + uses: actions/checkout@v4 + with: + fetch-depth: 0 + - name: Secret Scanning + uses: trufflesecurity/trufflehog@main diff --git a/.github/workflows/update_metdata.yml b/.github/workflows/update_metdata.yml index a2269e32e4d3cd..90cd73077ac0bb 100644 --- a/.github/workflows/update_metdata.yml +++ b/.github/workflows/update_metdata.yml @@ -14,7 +14,7 @@ jobs: shell: bash -l {0} steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Setup environment run: | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9ccfc46c2c148c..4d62a44ab250d5 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -40,8 +40,7 @@ There are several ways you can contribute to ๐Ÿค— Transformers: If you don't know where to start, there is a special [Good First Issue](https://github.com/huggingface/transformers/contribute) listing. It will give you a list of -open issues that are beginner-friendly and help you start contributing to open-source. Just comment on the issue that you'd like to work -on. +open issues that are beginner-friendly and help you start contributing to open-source. The best way to do that is to open a Pull Request and link it to the issue that you'd like to work on. We try to give priority to opened PRs as we can easily track the progress of the fix, and if the contributor does not have time anymore, someone else can take the PR over. For something slightly more challenging, you can also take a look at the [Good Second Issue](https://github.com/huggingface/transformers/labels/Good%20Second%20Issue) list. In general though, if you feel like you know what you're doing, go for it and we'll help you get there! ๐Ÿš€ @@ -49,7 +48,7 @@ For something slightly more challenging, you can also take a look at the [Good S ## Fixing outstanding issues -If you notice an issue with the existing code and have a fix in mind, feel free to [start contributing](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md/#create-a-pull-request) and open a Pull Request! +If you notice an issue with the existing code and have a fix in mind, feel free to [start contributing](#create-a-pull-request) and open a Pull Request! ## Submitting a bug-related issue or feature request @@ -62,7 +61,10 @@ feedback. The ๐Ÿค— Transformers library is robust and reliable thanks to users who report the problems they encounter. Before you report an issue, we would really appreciate it if you could **make sure the bug was not -already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code. If you're unsure whether the bug is in your code or the library, please ask in the [forum](https://discuss.huggingface.co/) first. This helps us respond quicker to fixing issues related to the library versus general questions. +already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code. If you're unsure whether the bug is in your code or the library, please ask in the [forum](https://discuss.huggingface.co/) or on our [discord](https://discord.com/invite/hugging-face-879548962464493619) first. This helps us respond quicker to fixing issues related to the library versus general questions. + +> [!TIP] +> We have a [docs bot](https://huggingface.co/spaces/huggingchat/hf-docs-chat), and we highly encourage you to ask all your questions there. There is always a chance your bug can be fixed with a simple flag ๐Ÿ‘พ๐Ÿ”ซ Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it: @@ -103,7 +105,7 @@ We have added [templates](https://github.com/huggingface/transformers/tree/main/ ## Do you want to implement a new model? -New models are constantly released and if you want to implement a new model, please provide the following information +New models are constantly released and if you want to implement a new model, please provide the following information: * A short description of the model and a link to the paper. * Link to the implementation if it is open-sourced. @@ -111,7 +113,7 @@ New models are constantly released and if you want to implement a new model, ple If you are willing to contribute the model yourself, let us know so we can help you add it to ๐Ÿค— Transformers! -We have added a [detailed guide and templates](https://github.com/huggingface/transformers/tree/main/templates) to help you get started with adding a new model, and we also have a more technical guide for [how to add a model to ๐Ÿค— Transformers](https://huggingface.co/docs/transformers/add_new_model). +We have a technical guide for [how to add a model to ๐Ÿค— Transformers](https://huggingface.co/docs/transformers/add_new_model). ## Do you want to add documentation? @@ -130,7 +132,7 @@ You will need basic `git` proficiency to contribute to manual. Type `git --help` in a shell and enjoy! If you prefer books, [Pro Git](https://git-scm.com/book/en/v2) is a very good reference. -You'll need **[Python 3.8]((https://github.com/huggingface/transformers/blob/main/setup.py#L426))** or above to contribute to ๐Ÿค— Transformers. Follow the steps below to start contributing: +You'll need **[Python 3.8](https://github.com/huggingface/transformers/blob/main/setup.py#L449)** or above to contribute to ๐Ÿค— Transformers. Follow the steps below to start contributing: 1. Fork the [repository](https://github.com/huggingface/transformers) by clicking on the **[Fork](https://github.com/huggingface/transformers/fork)** button on the repository's page. This creates a copy of the code @@ -161,7 +163,7 @@ You'll need **[Python 3.8]((https://github.com/huggingface/transformers/blob/mai If ๐Ÿค— Transformers was already installed in the virtual environment, remove it with `pip uninstall transformers` before reinstalling it in editable mode with the `-e` flag. - + Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a failure with this command. If that's the case make sure to install the Deep Learning framework you are working with (PyTorch, TensorFlow and/or Flax) then do: @@ -220,7 +222,7 @@ You'll need **[Python 3.8]((https://github.com/huggingface/transformers/blob/mai If you're modifying documents under the `docs/source` directory, make sure the documentation can still be built. This check will also run in the CI when you open a pull request. To run a local check make sure you install the documentation builder: - + ```bash pip install ".[docs]" ``` @@ -261,7 +263,7 @@ You'll need **[Python 3.8]((https://github.com/huggingface/transformers/blob/mai If you've already opened a pull request, you'll need to force push with the `--force` flag. Otherwise, if the pull request hasn't been opened yet, you can just push your changes normally. -6. Now you can go to your fork of the repository on GitHub and click on **Pull Request** to open a pull request. Make sure you tick off all the boxes on our [checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md/#pull-request-checklist) below. When you're ready, you can send your changes to the project maintainers for review. +6. Now you can go to your fork of the repository on GitHub and click on **Pull Request** to open a pull request. Make sure you tick off all the boxes on our [checklist](#pull-request-checklist) below. When you're ready, you can send your changes to the project maintainers for review. 7. It's ok if maintainers request changes, it happens to our core contributors too! So everyone can see the changes in the pull request, work in your local @@ -295,7 +297,7 @@ repository such as [`hf-internal-testing`](https://huggingface.co/hf-internal-te to host these files and reference them by URL. We recommend placing documentation related images in the following repository: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images). -You can open a PR on this dataset repostitory and ask a Hugging Face member to merge it. +You can open a PR on this dataset repository and ask a Hugging Face member to merge it. For more information about the checks run on a pull request, take a look at our [Checks on a Pull Request](https://huggingface.co/docs/transformers/pr_checks) guide. @@ -306,7 +308,7 @@ the [tests](https://github.com/huggingface/transformers/tree/main/tests) folder [examples](https://github.com/huggingface/transformers/tree/main/examples) folder. We like `pytest` and `pytest-xdist` because it's faster. From the root of the -repository, specify a *path to a subfolder or a test file* to run the test. +repository, specify a *path to a subfolder or a test file* to run the test: ```bash python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model @@ -339,12 +341,12 @@ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_ne RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification ``` -Like the slow tests, there are other environment variables available which not enabled by default during testing: +Like the slow tests, there are other environment variables available which are not enabled by default during testing: - `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers. - `RUN_PT_FLAX_CROSS_TESTS`: Enables tests for PyTorch + Flax integration. - `RUN_PT_TF_CROSS_TESTS`: Enables tests for TensorFlow + PyTorch integration. -More environment variables and additional information can be found in the [testing_utils.py](src/transformers/testing_utils.py). +More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py). ๐Ÿค— Transformers uses `pytest` as a test runner only. It doesn't use any `pytest`-specific features in the test suite itself. @@ -378,7 +380,7 @@ One way to run the `make` command on Windows is with MSYS2: 3. Run in the shell: `pacman -Syu` and install `make` with `pacman -S make`. 4. Add `C:\msys64\usr\bin` to your PATH environment variable. -You can now use `make` from any terminal (Powershell, cmd.exe, etc.)! ๐ŸŽ‰ +You can now use `make` from any terminal (PowerShell, cmd.exe, etc.)! ๐ŸŽ‰ ### Sync a forked repository with upstream main (the Hugging Face repository) @@ -387,9 +389,9 @@ When updating the main branch of a forked repository, please follow these steps 1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main. 2. If a PR is absolutely necessary, use the following steps after checking out your branch: -```bash -git checkout -b your-branch-for-syncing -git pull --squash --no-commit upstream main -git commit -m '' -git push --set-upstream origin your-branch-for-syncing -``` + ```bash + git checkout -b your-branch-for-syncing + git pull --squash --no-commit upstream main + git commit -m '' + git push --set-upstream origin your-branch-for-syncing + ``` diff --git a/Makefile b/Makefile index f8589089c2bfc5..cfa40b7bd6ee6e 100644 --- a/Makefile +++ b/Makefile @@ -1,16 +1,18 @@ -.PHONY: deps_table_update modified_only_fixup extra_style_checks quality style fixup fix-copies test test-examples +.PHONY: deps_table_update modified_only_fixup extra_style_checks quality style fixup fix-copies test test-examples benchmark # make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!) export PYTHONPATH = src check_dirs := examples tests src utils +exclude_folders := "" + modified_only_fixup: $(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs))) @if test -n "$(modified_py_files)"; then \ echo "Checking/fixing $(modified_py_files)"; \ - ruff check $(modified_py_files) --fix; \ - ruff format $(modified_py_files);\ + ruff check $(modified_py_files) --fix --exclude $(exclude_folders); \ + ruff format $(modified_py_files) --exclude $(exclude_folders);\ else \ echo "No library .py files were modified"; \ fi @@ -42,18 +44,20 @@ repo-consistency: python utils/check_config_attributes.py python utils/check_doctest_list.py python utils/update_metadata.py --check-only - python utils/check_task_guides.py python utils/check_docstrings.py python utils/check_support_list.py # this target runs checks on all files quality: + @python -c "from transformers import *" || (echo '๐Ÿšจ import failed, this means you introduced unprotected imports! ๐Ÿšจ'; exit 1) ruff check $(check_dirs) setup.py conftest.py ruff format --check $(check_dirs) setup.py conftest.py python utils/custom_init_isort.py --check_only python utils/sort_auto_mappings.py --check_only python utils/check_doc_toc.py + python utils/check_docstrings.py --check_all + # Format source code automatically and check is there are any problems left that need manual fixing @@ -65,8 +69,8 @@ extra_style_checks: # this target runs checks on all files and potentially modifies some of them style: - ruff check $(check_dirs) setup.py conftest.py --fix - ruff format $(check_dirs) setup.py conftest.py + ruff check $(check_dirs) setup.py conftest.py --fix --exclude $(exclude_folders) + ruff format $(check_dirs) setup.py conftest.py --exclude $(exclude_folders) ${MAKE} autogenerate_code ${MAKE} extra_style_checks @@ -81,7 +85,6 @@ fix-copies: python utils/check_table.py --fix_and_overwrite python utils/check_dummies.py --fix_and_overwrite python utils/check_doctest_list.py --fix_and_overwrite - python utils/check_task_guides.py --fix_and_overwrite python utils/check_docstrings.py --fix_and_overwrite # Run tests for the library @@ -94,6 +97,11 @@ test: test-examples: python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/ +# Run benchmark + +benchmark: + python3 benchmark/benchmark.py --config-dir benchmark/config --config-name generation --commit=diff backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun + # Run tests for SageMaker DLC release test-sagemaker: # install sagemaker dependencies in advance with pip install .[sagemaker] diff --git a/README.md b/README.md index 0a45a99fd6bf7d..f80835534496e3 100644 --- a/README.md +++ b/README.md @@ -25,36 +25,30 @@ limitations under the License.

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - + Build + GitHub + Documentation + GitHub release + Contributor Covenant DOI

English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - ะ ัƒััะบะธะน | - ะ ortuguรชs | - เฐคเฑ†เฐฒเฑเฐ—เฑ | + ็ฎ€ไฝ“ไธญๆ–‡ | + ็น้ซ”ไธญๆ–‡ | + ํ•œ๊ตญ์–ด | + Espaรฑol | + ๆ—ฅๆœฌ่ชž | + เคนเคฟเคจเฅเคฆเฅ€ | + ะ ัƒััะบะธะน | + ะ ortuguรชs | + เฐคเฑ†เฐฒเฑเฐ—เฑ | + Franรงais | + Deutsch | + Tiแบฟng Viแป‡t | + ุงู„ุนุฑุจูŠุฉ |

@@ -86,35 +80,39 @@ You can test most of our models directly on their pages from the [model hub](htt Here are a few examples: - In Natural Language Processing: -- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [Natural Language Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) +In Natural Language Processing: +- [Masked word completion with BERT](https://huggingface.co/google-bert/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) +- [Named Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) +- [Text generation with Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) +- [Natural Language Inference with RoBERTa](https://huggingface.co/FacebookAI/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) - [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) +- [Question answering with DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) +- [Translation with T5](https://huggingface.co/google-t5/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) In Computer Vision: - [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224) - [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50) - [Semantic Segmentation with SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [Panoptic Segmentation with MaskFormer](https://huggingface.co/facebook/maskformer-swin-small-coco) -- [Depth Estimation with DPT](https://huggingface.co/docs/transformers/model_doc/dpt) +- [Panoptic Segmentation with Mask2Former](https://huggingface.co/facebook/mask2former-swin-large-coco-panoptic) +- [Depth Estimation with Depth Anything](https://huggingface.co/docs/transformers/main/model_doc/depth_anything) - [Video Classification with VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae) - [Universal Segmentation with OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large) In Audio: -- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) +- [Automatic Speech Recognition with Whisper](https://huggingface.co/openai/whisper-large-v3) - [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) - [Audio Classification with Audio Spectrogram Transformer](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) In Multimodal tasks: - [Table Question Answering with TAPAS](https://huggingface.co/google/tapas-base-finetuned-wtq) - [Visual Question Answering with ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) -- [Zero-shot Image Classification with CLIP](https://huggingface.co/openai/clip-vit-large-patch14) +- [Image captioning with LLaVa](https://huggingface.co/llava-hf/llava-1.5-7b-hf) +- [Zero-shot Image Classification with SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) - [Document Question Answering with LayoutLM](https://huggingface.co/impira/layoutlm-document-qa) - [Zero-shot Video Classification with X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip) +- [Zero-shot Object Detection with OWLv2](https://huggingface.co/docs/transformers/en/model_doc/owlv2) +- [Zero-shot Image Segmentation with CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg) +- [Automatic Mask Generation with SAM](https://huggingface.co/docs/transformers/model_doc/sam) ## 100 projects using Transformers @@ -195,8 +193,8 @@ In addition to `pipeline`, to download and use any of the pretrained models on y ```python >>> from transformers import AutoTokenizer, AutoModel ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") +>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") +>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased") >>> inputs = tokenizer("Hello world!", return_tensors="pt") >>> outputs = model(**inputs) @@ -206,8 +204,8 @@ And here is the equivalent code for TensorFlow: ```python >>> from transformers import AutoTokenizer, TFAutoModel ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") +>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") +>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased") >>> inputs = tokenizer("Hello world!", return_tensors="tf") >>> outputs = model(**inputs) @@ -228,7 +226,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta 1. Lower compute costs, smaller carbon footprint: - Researchers can share trained models instead of always retraining. - Practitioners can reduce compute time and production costs. - - Dozens of architectures with over 60,000 pretrained models across all modalities. + - Dozens of architectures with over 400,000 pretrained models across all modalities. 1. Choose the right framework for every part of a model's lifetime: - Train state-of-the-art models in 3 lines of code. @@ -250,7 +248,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta ### With pip -This repository is tested on Python 3.8+, Flax 0.4.1+, PyTorch 1.10+, and TensorFlow 2.6+. +This repository is tested on Python 3.8+, Flax 0.4.1+, PyTorch 1.11+, and TensorFlow 2.6+. You should install ๐Ÿค— Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). @@ -269,14 +267,14 @@ If you'd like to play with the examples or need the bleeding edge of the code an ### With conda -Since Transformers version v4.0.0, we now have a conda channel: `huggingface`. - ๐Ÿค— Transformers can be installed using conda as follows: ```shell script -conda install -c huggingface transformers +conda install conda-forge::transformers ``` +> **_NOTE:_** Installing `transformers` from the `huggingface` channel is deprecated. + Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda. > **_NOTE:_** On Windows, you may be prompted to activate Developer Mode in order to benefit from caching. If this is not an option for you, please let us know in [this issue](https://github.com/huggingface/huggingface_hub/issues/1062). @@ -287,253 +285,7 @@ Follow the installation pages of Flax, PyTorch or TensorFlow to see how to insta Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) -๐Ÿค— Transformers currently provides the following architectures (see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. Released with the paper [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (from Microsoft Research & University of Wisconsin-Madison) released with the paper [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Meta/USC/CMU/SJTU) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the repository [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA) for Approximate Self-Attention](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** (from IBM Research) released with the paper [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (from IBM) released with the paper [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/abs/2211.14730) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng), released on [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (from University of Wisconsinโ€“Madison) released with the paper [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. -1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedback before starting your PR. +๐Ÿค— Transformers currently provides the following architectures: see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them. To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the ๐Ÿค— Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks). diff --git a/README_es.md b/README_es.md deleted file mode 100644 index 2fe82606b928c3..00000000000000 --- a/README_es.md +++ /dev/null @@ -1,545 +0,0 @@ - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

Lo รบltimo de Machine Learning para JAX, PyTorch y TensorFlow

-

- -

- -

- -๐Ÿค— Transformers aporta miles de modelos preentrenados Para realizar tareas en diferentes modalidades como texto, vision, y audio. - -Estos modelos pueden ser aplicados en: - -* ๐Ÿ“ Texto, Para tareas como clasificaciรณn de texto, extracciรณn de informaciรณn, responder preguntas, resumir, traducir, generaciรณn de texto, en mรกs de 100 idiomas. -* ๐Ÿ–ผ๏ธ Imรกgenes, para tareas como clasificaciรณn de imรกgenes, detecciรณn the objetos, y segmentaciรณn. -* ๐Ÿ—ฃ๏ธ Audio, para tareas como reconocimiento de voz y clasificaciรณn de audio. - -Los modelos de Transformer tambiรฉn pueden realizar tareas en **muchas modalidades combinadas**, como responder pregunstas, reconocimiento de carรกcteres รณpticos,extracciรณn de informaciรณn de documentos escaneados, clasificaciรณn de video, y respuesta de preguntas visuales. - -๐Ÿค— Transformers aporta APIs para descargar rรกpidamente y usar estos modelos preentrenados en un texto dado, afinarlos en tus propios sets de datos y compartirlos con la comunidad en nuestro [centro de modelos](https://huggingface.co/models). Al mismo tiempo, cada mรณdulo de Python que define una arquitectura es completamente independiente y se puede modificar para permitir experimentos de investigaciรณn rรกpidos. - -๐Ÿค— Transformers estรก respaldado por las tres bibliotecas de deep learning mรกs populares โ€” [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) y [TensorFlow](https://www.tensorflow.org/) โ€” con una perfecta integraciรณn entre ellos. Es sencillo entrenar sus modelos con uno antes de cargarlos para la inferencia con el otro. - -## Demostraciones en lรญnea - -Puedes probar la mayorรญa de nuestros modelos directamente en sus pรกginas desde el [centro de modelos](https://huggingface.co/models). Tambiรฉn ofrecemos [alojamiento de modelos privados, control de versiones y una API de inferencia](https://huggingface.co/pricing) para modelos pรบblicos y privados. - -Aquรญ hay algunos ejemplos: - - En procesamiento del lenguaje natural: -- [Terminaciรณn de palabras enmascaradas con BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Reconocimiento del nombre de la entidad con Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [Generaciรณn de texto con GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [Inferencia del lenguaje natural con RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [Resumen con BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [Responder a preguntas con DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [Traducciรณn con T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -En visiรณn de ordenador: -- [Clasificaciรณn de imรกgenes con ViT](https://huggingface.co/google/vit-base-patch16-224) -- [Detecciรณn de objetos con DETR](https://huggingface.co/facebook/detr-resnet-50) -- [Segmentaciรณn semรกntica con SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [Segmentaciรณn panรณptica con DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic) -- [Segmentaciรณn Universal con OneFormer (Segmentaciรณn Semรกntica, de Instancia y Panรณptica con un solo modelo)](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large) - -En Audio: -- [Reconocimiento de voz automรกtico con Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) -- [Detecciรณn de palabras clave con Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) - -En tareas multimodales: -- [Respuesta visual a preguntas con ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) - -**[Escribe con Transformer](https://transformer.huggingface.co)**, construido por el equipo de Hugging Face, es la demostraciรณn oficial de las capacidades de generaciรณn de texto de este repositorio. - -## Si estรก buscando soporte personalizado del equipo de Hugging Face - - - HuggingFace Expert Acceleration Program -
- -## Tour rรกpido - -Para usar inmediatamente un modelo en una entrada determinada (texto, imagen, audio, ...), proporcionamos la API de `pipeline`. Los pipelines agrupan un modelo previamente entrenado con el preprocesamiento que se usรณ durante el entrenamiento de ese modelo. Aquรญ se explica cรณmo usar rรกpidamente un pipeline para clasificar textos positivos frente a negativos: - -```python ->>> from transformers import pipeline - -# Allocate a pipeline for sentiment-analysis ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -La segunda lรญnea de cรณdigo descarga y almacena en cachรฉ el modelo previamente entrenado que usa la canalizaciรณn, mientras que la tercera lo evalรบa en el texto dado. Aquรญ la respuesta es "positiva" con una confianza del 99,97%. - -Muchas tareas tienen un `pipeline` preentrenado listo para funcionar, en NLP pero tambiรฉn en visiรณn por ordenador y habla. Por ejemplo, podemos extraer fรกcilmente los objetos detectados en una imagen: - -``` python ->>> import requests ->>> from PIL import Image ->>> from transformers import pipeline - -# Download an image with cute cats ->>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" ->>> image_data = requests.get(url, stream=True).raw ->>> image = Image.open(image_data) - -# Allocate a pipeline for object detection ->>> object_detector = pipeline('object_detection') ->>> object_detector(image) -[{'score': 0.9982201457023621, - 'label': 'remote', - 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, - {'score': 0.9960021376609802, - 'label': 'remote', - 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, - {'score': 0.9954745173454285, - 'label': 'couch', - 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, - {'score': 0.9988006353378296, - 'label': 'cat', - 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, - {'score': 0.9986783862113953, - 'label': 'cat', - 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] -``` - -Aquรญ obtenemos una lista de objetos detectados en la imagen, con un cuadro que rodea el objeto y una puntuaciรณn de confianza. Aquรญ estรก la imagen original a la derecha, con las predicciones mostradas a la izquierda: - -

- - -

- -Puedes obtener mรกs informaciรณn sobre las tareas admitidas por la API de `pipeline` en [este tutorial](https://huggingface.co/docs/transformers/task_summary). - -Ademรกs de `pipeline`, para descargar y usar cualquiera de los modelos previamente entrenados en su tarea dada, todo lo que necesita son tres lรญneas de cรณdigo. Aquรญ estรก la versiรณn de PyTorch: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` - -Y aquรญ estรก el cรณdigo equivalente para TensorFlow: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -El tokenizador es responsable de todo el preprocesamiento que espera el modelo preentrenado y se puede llamar directamente en una sola cadena (como en los ejemplos anteriores) o en una lista. Darรก como resultado un diccionario que puedes usar en el cรณdigo descendente o simplemente pasarlo directamente a su modelo usando el operador de desempaquetado de argumento **. - -El modelo en si es un [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) normal o un [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (dependiendo De tu backend) que puedes usar de forma habitual. [Este tutorial](https://huggingface.co/docs/transformers/training) explica cรณmo integrar un modelo de este tipo en un ciclo de entrenamiento PyTorch o TensorFlow clรกsico, o como usar nuestra API `Trainer` para ajustar rรกpidamente un nuevo conjunto de datos. - -## ยฟPor quรฉ debo usar transformers? - -1. Modelos de รบltima generaciรณn fรกciles de usar: - - Alto rendimiento en comprensiรณn y generaciรณn de lenguaje natural, visiรณn artificial y tareas de audio. - - Baja barrera de entrada para educadores y profesionales. - - Pocas abstracciones de cara al usuario con solo tres clases para aprender. - - Una API unificada para usar todos nuestros modelos preentrenados. - -1. Menores costes de cรณmputo, menor huella de carbono: - - Los investigadores pueden compartir modelos entrenados en lugar de siempre volver a entrenar. - - Los profesionales pueden reducir el tiempo de cรณmputo y los costos de producciรณn. - - Docenas de arquitecturas con mรกs de 60 000 modelos preentrenados en todas las modalidades. - -1. Elija el marco adecuado para cada parte de la vida รบtil de un modelo: - - Entrene modelos de รบltima generaciรณn en 3 lรญneas de cรณdigo. - - Mueva un solo modelo entre los marcos TF2.0/PyTorch/JAX a voluntad. - - Elija sin problemas el marco adecuado para la formaciรณn, la evaluaciรณn y la producciรณn. - -1. Personalice fรกcilmente un modelo o un ejemplo segรบn sus necesidades: - - Proporcionamos ejemplos de cada arquitectura para reproducir los resultados publicados por sus autores originales.. - - Los internos del modelo estรกn expuestos lo mรกs consistentemente posible.. - - Los archivos modelo se pueden usar independientemente de la biblioteca para experimentos rรกpidos. - -## ยฟPor quรฉ no deberรญa usar transformers? - -- Esta biblioteca no es una caja de herramientas modular de bloques de construcciรณn para redes neuronales. El cรณdigo en los archivos del modelo no se refactoriza con abstracciones adicionales a propรณsito, de modo que los investigadores puedan iterar rรกpidamente en cada uno de los modelos sin sumergirse en abstracciones/archivos adicionales. -- La API de entrenamiento no estรก diseรฑada para funcionar en ningรบn modelo, pero estรก optimizada para funcionar con los modelos proporcionados por la biblioteca. Para bucles genรฉricos de aprendizaje automรกtico, debe usar otra biblioteca (posiblemente, [Accelerate](https://huggingface.co/docs/accelerate)). -- Si bien nos esforzamos por presentar tantos casos de uso como sea posible, los scripts en nuestra [carpeta de ejemplos](https://github.com/huggingface/transformers/tree/main/examples) son solo eso: ejemplos. Se espera que no funcionen de forma inmediata en su problema especรญfico y que deba cambiar algunas lรญneas de cรณdigo para adaptarlas a sus necesidades. - -## Instalaciรณn - -### Con pip - -Este repositorio estรก probado en Python 3.8+, Flax 0.4.1+, PyTorch 1.10+ y TensorFlow 2.6+. - -Deberรญas instalar ๐Ÿค— Transformers en un [ambiente virtual](https://docs.python.org/3/library/venv.html). Si no estas familiarizado con los entornos virtuales de Python, consulta la [guรญa de usuario](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). - -Primero, crea un entorno virtual con la versiรณn de Python que vas a usar y actรญvalo. - -Luego, deberรกs instalar al menos uno de Flax, PyTorch o TensorFlow. -Por favor, ve a la [pรกgina de instalaciรณn de TensorFlow](https://www.tensorflow.org/install/), [pรกgina de instalaciรณn de PyTorch](https://pytorch.org/get-started/locally/#start-locally) y/o las pรกginas de instalaciรณn de [Flax](https://github.com/google/flax#quick-install) y [Jax](https://github.com/google/jax#installation) con respecto al comando de instalaciรณn especรญfico para tu plataforma. - -Cuando se ha instalado uno de esos backends, los ๐Ÿค— Transformers se pueden instalar usando pip de la siguiente manera: - -```bash -pip install transformers -``` - -Si deseas jugar con los ejemplos o necesitas la รบltima versiรณn del cรณdigo y no puedes esperar a una nueva versiรณn, tienes que [instalar la librerรญa de la fuente](https://huggingface.co/docs/transformers/installation#installing-from-source). - -### Con conda - -Desde la versiรณn v4.0.0 de Transformers, ahora tenemos un canal conda: `huggingface`. - -๐Ÿค— Transformers se puede instalar usando conda de la siguiente manera: - -```shell script -conda install -c huggingface transformers -``` - -Sigue las pรกginas de instalaciรณn de Flax, PyTorch o TensorFlow para ver cรณmo instalarlos con conda. - -> **_NOTA:_** En Windows, es posible que se le pida que active el modo de desarrollador para beneficiarse del almacenamiento en cachรฉ. Si esta no es una opciรณn para usted, hรกganoslo saber en [esta issue](https://github.com/huggingface/huggingface_hub/issues/1062). - -## Arquitecturas modelo - -**[Todos los puntos de control del modelo](https://huggingface.co/models)** aportados por ๐Ÿค— Transformers estรกn perfectamente integrados desde huggingface.co [Centro de modelos](https://huggingface.co) donde son subidos directamente por los [usuarios](https://huggingface.co/users) y [organizaciones](https://huggingface.co/organizations). - -Nรบmero actual de puntos de control: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— Transformers actualmente proporciona las siguientes arquitecturas (ver [aquรญ](https://huggingface.co/docs/transformers/model_summary) para un resumen de alto nivel de cada uno de ellas.): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. Released with the paper [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (from Microsoft Research & University of Wisconsin-Madison) released with the paper [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Facebook) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the repository [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** (from IBM Research) released with the paper [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (from IBM) released with the paper [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released with the paper [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng) released with the paper [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (from University of Wisconsinโ€“Madison) released with the paper [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. -1. ยฟQuieres aportar un nuevo modelo? Hemos agregado una **guรญa detallada y plantillas** para guiarte en el proceso de agregar un nuevo modelo. Puedes encontrarlos en la carpeta de [`templates`](./templates) del repositorio. Asegรบrate de revisar las [pautas de contribuciรณn](./CONTRIBUTING.md) y comunรญcate con los mantenedores o abra un problema para recopilar comentarios antes de comenzar su PR. - -Para comprobar si cada modelo tiene una implementaciรณn en Flax, PyTorch o TensorFlow, o tiene un tokenizador asociado respaldado por la librerรญa ๐Ÿค— Tokenizers , ve a [esta tabla](https://huggingface.co/docs/transformers/index#supported-frameworks). - -Estas implementaciones se han probado en varios conjuntos de datos (consulte los scripts de ejemplo) y deberรญan coincidir con el rendimiento de las implementaciones originales. Puede encontrar mรกs detalles sobre el rendimiento en la secciรณn Examples de la [documentaciรณn](https://github.com/huggingface/transformers/tree/main/examples). - - -## Aprender mรกs - -| Secciรณn | Descripciรณn | -|-|-| -| [Documentaciรณn](https://huggingface.co/docs/transformers/) | Toda la documentaciรณn de la API y tutoriales | -| [Resumen de tareas](https://huggingface.co/docs/transformers/task_summary) | Tareas soportadas ๐Ÿค— Transformers | -| [Tutorial de preprocesAmiento](https://huggingface.co/docs/transformers/preprocessing) | Usando la clase `Tokenizer` para preparar datos para los modelos | -| [Entrenamiento y puesta a punto](https://huggingface.co/docs/transformers/training) | Usando los modelos aportados por ๐Ÿค— Transformers en un bucle de entreno de PyTorch/TensorFlow y la API de `Trainer` | -| [Recorrido rรกpido: secuencias de comandos de ajuste/uso](https://github.com/huggingface/transformers/tree/main/examples) | Scripts de ejemplo para ajustar modelos en una amplia gama de tareas | -| [Compartir y subir modelos](https://huggingface.co/docs/transformers/model_sharing) | Carga y comparte tus modelos perfeccionados con la comunidad | -| [Migraciรณn](https://huggingface.co/docs/transformers/migration) | Migra a ๐Ÿค— Transformers desde `pytorch-transformers` o `pytorch-pretrained-bert` | - -## Citaciรณn - -Ahora nosotros tenemos un [papel](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) que puedes citar para la librerรญa de ๐Ÿค— Transformers: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_hd.md b/README_hd.md deleted file mode 100644 index 35e21548e6063f..00000000000000 --- a/README_hd.md +++ /dev/null @@ -1,519 +0,0 @@ - - - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

Jax, PyTorch เค”เคฐ TensorFlow เค•เฅ‡ เคฒเคฟเค เค‰เคจเฅเคจเคค เคฎเคถเฅ€เคจ เคฒเคฐเฅเคจเคฟเค‚เค—

-

- -

- -

- -๐Ÿค— Transformers 100 เคธเฅ‡ เค…เคงเคฟเค• เคญเคพเคทเคพเค“เค‚ เคฎเฅ‡เค‚ เคชเคพเค  เคตเคฐเฅเค—เฅ€เค•เคฐเคฃ, เคธเฅ‚เคšเคจเคพ เคจเคฟเคทเฅเค•เคฐเฅเคทเคฃ, เคชเฅเคฐเคถเฅเคจ เค‰เคคเฅเคคเคฐ, เคธเคพเคฐเคพเค‚เคถเฅ€เค•เคฐเคฃ, เค…เคจเฅเคตเคพเคฆ, เคชเคพเค  เคจเคฟเคฐเฅเคฎเคพเคฃ เค•เคพ เคธเคฎเคฐเฅเคฅเคจ เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคนเคœเคพเคฐเฅ‹เค‚ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเคพ เคนเฅˆเฅค เค‡เคธเค•เคพ เค‰เคฆเฅเคฆเฅ‡เคถเฅเคฏ เคธเคฌเคธเฅ‡ เค‰เคจเฅเคจเคค เคเคจเคเคฒเคชเฅ€ เคคเค•เคจเฅ€เค• เค•เฅ‹ เคธเคญเฅ€ เค•เฅ‡ เคฒเคฟเค เคธเฅเคฒเคญ เคฌเคจเคพเคจเคพ เคนเฅˆเฅค - -๐Ÿค— Transformers เคคเฅเคตเคฐเคฟเคค เคกเคพเค‰เคจเคฒเฅ‹เคก เค”เคฐ เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฒเคฟเค เคเค• เคเคชเฅ€เค†เคˆ เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเคพ เคนเฅˆ, เคœเคฟเคธเคธเฅ‡ เค†เคช เค•เคฟเคธเฅ€ เคฆเคฟเค เค—เค เคชเคพเค  เคชเคฐ เคเค• เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เคฒเฅ‡ เคธเค•เคคเฅ‡ เคนเฅˆเค‚, เค‡เคธเฅ‡ เค…เคชเคจเฅ‡ เคกเฅ‡เคŸเคพเคธเฅ‡เคŸ เคชเคฐ เค เฅ€เค• เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ เค”เคฐ เค‡เคธเฅ‡ [เคฎเฅ‰เคกเคฒ เคนเคฌ](https://huggingface.co/models) เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ เคธเคฎเฅเคฆเคพเคฏ เค•เฅ‡ เคธเคพเคฅ เคธเคพเคเคพ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค เค‡เคธเฅ€ เคธเคฎเคฏ, เคชเฅเคฐเคคเฅเคฏเฅ‡เค• เคชเคฐเคฟเคญเคพเคทเคฟเคค เคชเคพเคฏเคฅเคจ เคฎเฅ‰เคกเฅเคฏเฅ‚เคฒ เคชเฅ‚เคฐเฅ€ เคคเคฐเคน เคธเฅ‡ เคธเฅเคตเคคเค‚เคคเฅเคฐ เคนเฅˆ, เคœเฅ‹ เคธเค‚เคถเฅ‹เคงเคจ เค”เคฐ เคคเฅ‡เคœเฅ€ เคธเฅ‡ เค…เคจเฅเคธเค‚เคงเคพเคจ เคชเฅเคฐเคฏเฅ‹เค—เฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เคธเฅเคตเคฟเคงเคพเคœเคจเค• เคนเฅˆเฅค - -๐Ÿค— Transformers เคคเฅ€เคจ เคธเคฌเคธเฅ‡ เคฒเฅ‹เค•เคชเฅเคฐเคฟเคฏ เค—เคนเคจ เคถเคฟเค•เฅเคทเคฃ เคชเฅเคธเฅเคคเค•เคพเคฒเคฏเฅ‹เค‚ เค•เคพ เคธเคฎเคฐเฅเคฅเคจ เค•เคฐเคคเคพ เคนเฅˆ๏ผš [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) โ€” เค”เคฐ เค‡เคธเค•เฅ‡ เคธเคพเคฅ เคจเคฟเคฐเฅเคฌเคพเคง เคฐเฅ‚เคช เคธเฅ‡ เคเค•เฅ€เค•เฅƒเคค เคนเฅ‹เคคเคพ เคนเฅˆเฅค เค†เคช เค…เคชเคจเฅ‡ เคฎเฅ‰เคกเคฒ เค•เฅ‹ เคธเฅ€เคงเฅ‡ เคเค• เคขเคพเค‚เคšเฅ‡ เค•เฅ‡ เคธเคพเคฅ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ เค”เคฐ เคฆเฅ‚เคธเคฐเฅ‡ เค•เฅ‡ เคธเคพเคฅ เคฒเฅ‹เคก เค”เคฐ เค…เคจเฅเคฎเคพเคจ เคฒเค—เคพ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค - -## เค‘เคจเคฒเคพเค‡เคจ เคกเฅ‡เคฎเฅ‹ - -เค†เคช เคธเคฌเคธเฅ‡ เคธเฅ€เคงเฅ‡ เคฎเฅ‰เคกเคฒ เคชเฅƒเคทเฅเค  เคชเคฐ เคชเคฐเฅ€เค•เฅเคทเคฃ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ [model hub](https://huggingface.co/models) เคฎเฅ‰เคกเคฒ เคชเคฐเฅค เคนเคฎ [เคจเคฟเคœเฅ€ เคฎเฅ‰เคกเคฒ เคนเฅ‹เคธเฅเคŸเคฟเค‚เค—, เคฎเฅ‰เคกเคฒ เคธเค‚เคธเฅเค•เคฐเคฃ, เค”เคฐ เค…เคจเฅเคฎเคพเคจ เคเคชเฅ€เค†เคˆ](https://huggingface.co/pricing) เคญเฅ€ เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚เฅคใ€‚ - -เคฏเคนเคพเค เค•เฅเค› เค‰เคฆเคพเคนเคฐเคฃ เคนเฅˆเค‚๏ผš -- [เคถเคฌเฅเคฆ เค•เฅ‹ เคญเคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคฎเคพเคธเฅเค• เค•เฅ‡ เคฐเฅ‚เคช เคฎเฅ‡เค‚ BERT เค•เคพ เคชเฅเคฐเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [เค‡เคฒเฅ‡เค•เฅเคŸเฅเคฐเคพ เค•เฅ‡ เคธเคพเคฅ เคจเคพเคฎเคฟเคค เค‡เค•เคพเคˆ เคชเคนเคšเคพเคจ](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [เคœเฅ€เคชเฅ€เคŸเฅ€-2 เค•เฅ‡ เคธเคพเคฅ เคŸเฅ‡เค•เฅเคธเฅเคŸ เคœเคจเคฐเฅ‡เคถเคจ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [เคฐเฅ‰เคฌเคฐเฅเคŸเคพ เค•เฅ‡ เคธเคพเคฅ เคชเฅเคฐเคพเค•เฅƒเคคเคฟเค• เคญเคพเคทเคพ เคจเคฟเคทเฅเค•เคฐเฅเคท](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [เคฌเคพเคฐเฅเคŸ เค•เฅ‡ เคธเคพเคฅ เคชเคพเค  เคธเคพเคฐเคพเค‚เคถ](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [เคกเคฟเคธเฅเคŸเคฟเคฒเคฌเคฐเฅเคŸ เค•เฅ‡ เคธเคพเคฅ เคชเฅเคฐเคถเฅเคจเฅ‹เคคเฅเคคเคฐ](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [เค…เคจเฅเคตเคพเคฆ เค•เฅ‡ เคฒเคฟเค T5 เค•เคพ เคชเฅเคฐเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -**[Write With Transformer](https://transformer.huggingface.co)**๏ผŒเคนเค—เคฟเค‚เค— เคซเฅ‡เคธ เคŸเฅ€เคฎ เคฆเฅเคตเคพเคฐเคพ เคฌเคจเคพเคฏเคพ เค—เคฏเคพ, เคฏเคน เคเค• เค†เคงเคฟเค•เคพเคฐเคฟเค• เคชเคพเค  เคชเฅ€เคขเคผเฅ€ เคนเฅˆ demoใ€‚ - -## เคฏเคฆเคฟ เค†เคช เคนเค—เคฟเค‚เค— เคซเฅ‡เคธ เคŸเฅ€เคฎ เคธเฅ‡ เคฌเฅ€เคธเฅเคชเฅ‹เค• เคธเคฎเคฐเฅเคฅเคจ เค•เฅ€ เคคเคฒเคพเคถ เค•เคฐ เคฐเคนเฅ‡ เคนเฅˆเค‚ - - - HuggingFace Expert Acceleration Program -
- -## เคœเคฒเฅเคฆเฅ€ เคถเฅเคฐเฅ‚ เค•เคฐเฅ‡เค‚ - -เคนเคฎ เคคเฅเคตเคฐเคฟเคค เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฒเคฟเค เคฎเฅ‰เคกเคฒ เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚ `pipeline` (เคชเคพเค‡เคชเคฒเคพเค‡เคจ) เคเคชเฅ€เค†เคˆเฅค เคชเคพเค‡เคชเคฒเคพเค‡เคจ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เค”เคฐ เคธเค‚เคฌเค‚เคงเคฟเคค เคชเคพเค  เคชเฅเคฐเฅ€เคชเฅเคฐเฅ‹เคธเฅ‡เคธเคฟเค‚เค— เค•เฅ‹ เคเค•เคคเฅเคฐเคฟเคค เค•เคฐเคคเฅ€ เคนเฅˆเฅค เคธเค•เคพเคฐเคพเคคเฅเคฎเค• เค”เคฐ เคจเค•เคพเคฐเคพเคคเฅเคฎเค• เคญเคพเคตเคจเคพ เค•เฅ‹ เคจเคฟเคฐเฅเคงเคพเคฐเคฟเคค เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคชเคพเค‡เคชเคฒเคพเค‡เคจเฅ‹เค‚ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเฅ‡ เค•เคพ เคเค• เคคเฅเคตเคฐเคฟเคค เค‰เคฆเคพเคนเคฐเคฃ เคฏเคนเคพเค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆ: - -```python ->>> from transformers import pipeline - -# เคญเคพเคตเคจเคพ เคตเคฟเคถเฅเคฒเฅ‡เคทเคฃ เคชเคพเค‡เคชเคฒเคพเค‡เคจ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -เค•เฅ‹เคก เค•เฅ€ เคฆเฅ‚เคธเคฐเฅ€ เคชเค‚เค•เฅเคคเคฟ เคชเคพเค‡เคชเคฒเคพเค‡เคจ เคฆเฅเคตเคพเคฐเคพ เค‰เคชเคฏเฅ‹เค— เค•เคฟเค เค—เค เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เค•เฅ‹ เคกเคพเค‰เคจเคฒเฅ‹เคก เค”เคฐ เค•เฅˆเคถ เค•เคฐเคคเฅ€ เคนเฅˆ, เคœเคฌเค•เคฟ เค•เฅ‹เคก เค•เฅ€ เคคเฅ€เคธเคฐเฅ€ เคชเค‚เค•เฅเคคเคฟ เคฆเคฟเค เค—เค เคชเคพเค  เคชเคฐ เคฎเฅ‚เคฒเฅเคฏเคพเค‚เค•เคจ เค•เคฐเคคเฅ€ เคนเฅˆเฅค เคฏเคนเคพเค‚ เค‰เคคเฅเคคเคฐ 99 เค†เคคเฅเคฎเคตเคฟเคถเฅเคตเคพเคธ เค•เฅ‡ เคธเฅเคคเคฐ เค•เฅ‡ เคธเคพเคฅ "เคธเค•เคพเคฐเคพเคคเฅเคฎเค•" เคนเฅˆเฅค - -เค•เคˆ เคเคจเคเคฒเคชเฅ€ เค•เคพเคฐเฅเคฏเฅ‹เค‚ เคฎเฅ‡เค‚ เค†เค‰เคŸ เค‘เฅž เคฆ เคฌเฅ‰เค•เฅเคธ เคชเคพเค‡เคชเคฒเคพเค‡เคจเฅ‹เค‚ เค•เคพ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เคนเฅ‹เคคเคพ เคนเฅˆเฅค เค‰เคฆเคพเคนเคฐเคฃ เค•เฅ‡ เคฒเคฟเค, เคนเคฎ เค•เคฟเคธเฅ€ เคฆเคฟเค เค—เค เคชเคพเค  เคธเฅ‡ เค•เคฟเคธเฅ€ เคชเฅเคฐเคถเฅเคจ เค•เคพ เค‰เคคเฅเคคเคฐ เค†เคธเคพเคจเฅ€ เคธเฅ‡ เคจเคฟเค•เคพเคฒ เคธเค•เคคเฅ‡ เคนเฅˆเค‚: - -``` python ->>> from transformers import pipeline - -# เคชเฅเคฐเคถเฅเคจเฅ‹เคคเฅเคคเคฐ เคชเคพเค‡เคชเคฒเคพเค‡เคจ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ ->>> question_answerer = pipeline('question-answering') ->>> question_answerer({ -... 'question': 'What is the name of the repository ?', -... 'context': 'Pipeline has been included in the huggingface/transformers repository' -... }) -{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'} - -``` - -เค‰เคคเฅเคคเคฐ เคฆเฅ‡เคจเฅ‡ เค•เฅ‡ เค…เคฒเคพเคตเคพ, เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เคธเค‚เค—เคค เค†เคคเฅเคฎเคตเคฟเคถเฅเคตเคพเคธ เคธเฅเค•เฅ‹เคฐ เคญเฅ€ เคฆเฅ‡เคคเคพ เคนเฅˆ, เคœเคนเคพเค‚ เค‰เคคเฅเคคเคฐ เคŸเฅ‹เค•เคจเคฏเฅเค•เฅเคค เคชเคพเค  เคฎเฅ‡เค‚ เคถเฅเคฐเฅ‚ เค”เคฐ เคธเคฎเคพเคชเฅเคค เคนเฅ‹เคคเคพ เคนเฅˆเฅค เค†เคช [เค‡เคธ เคŸเฅเคฏเฅ‚เคŸเฅ‹เคฐเคฟเคฏเคฒ](https://huggingface.co/docs/transformers/task_summary) เคธเฅ‡ เคชเคพเค‡เคชเคฒเคพเค‡เคจ เคเคชเฅ€เค†เคˆ เคฆเฅเคตเคพเคฐเคพ เคธเคฎเคฐเฅเคฅเคฟเคค เค•เคพเคฐเฅเคฏเฅ‹เค‚ เค•เฅ‡ เคฌเคพเคฐเฅ‡ เคฎเฅ‡เค‚ เค…เคงเคฟเค• เคœเคพเคจ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค - -เค…เคชเคจเฅ‡ เค•เคพเคฐเฅเคฏ เคชเคฐ เค•เคฟเคธเฅ€ เคญเฅ€ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เค•เฅ‹ เคกเคพเค‰เคจเคฒเฅ‹เคก เค•เคฐเคจเคพ เค”เคฐ เค‰เคธเค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ เคญเฅ€ เค•เฅ‹เคก เค•เฅ€ เคคเฅ€เคจ เคชเค‚เค•เฅเคคเคฟเคฏเฅ‹เค‚ เค•เฅ€ เคคเคฐเคน เคธเคฐเคฒ เคนเฅˆเฅค เคฏเคนเคพเค PyTorch เคธเค‚เคธเฅเค•เคฐเคฃ เค•เฅ‡ เคฒเคฟเค เคเค• เค‰เคฆเคพเคนเคฐเคฃ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆ: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` -เคฏเคนเคพเค เคธเคฎเค•เค•เฅเคท เคนเฅˆ TensorFlow เค•เฅ‹เคก: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -เคŸเฅ‹เค•เคจเคจเคพเค‡เคœเคผเคฐ เคธเคญเฅ€ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒเฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เคชเฅเคฐเฅ€เคชเฅเคฐเฅ‹เคธเฅ‡เคธเคฟเค‚เค— เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเคพ เคนเฅˆ เค”เคฐ เค‡เคธเฅ‡ เคธเฅ€เคงเฅ‡ เคเค• เคธเฅเคŸเฅเคฐเคฟเค‚เค— (เคœเฅˆเคธเฅ‡ เคŠเคชเคฐ เคฆเคฟเค เค—เค เค‰เคฆเคพเคนเคฐเคฃ) เคฏเคพ เค•เคฟเคธเฅ€ เคธเฅ‚เคšเฅ€ เคชเคฐ เคฌเฅเคฒเคพเคฏเคพ เคœเคพ เคธเค•เคคเคพ เคนเฅˆเฅค เคฏเคน เคเค• เคกเคฟเค•เฅเคถเคจเคฐเฅ€ (เคคเคพเคจเคพเคถเคพเคนเฅ€) เค•เฅ‹ เค†เค‰เคŸเคชเฅเคŸ เค•เคฐเคคเคพ เคนเฅˆ เคœเคฟเคธเฅ‡ เค†เคช เคกเคพเค‰เคจเคธเฅเคŸเฅเคฐเฅ€เคฎ เค•เฅ‹เคก เคฎเฅ‡เค‚ เค‰เคชเคฏเฅ‹เค— เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ เคฏเคพ `**` เค…เคจเคชเฅˆเค•เคฟเค‚เค— เคเค•เฅเคธเคชเฅเคฐเฅ‡เคถเคจ เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ เคธเฅ€เคงเฅ‡ เคฎเฅ‰เคกเคฒ เค•เฅ‹ เคชเคพเคธ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค - -เคฎเฅ‰เคกเคฒ เคธเฅเคตเคฏเค‚ เคเค• เคจเคฟเคฏเคฎเคฟเคค [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) เคฏเคพ [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (เค†เคชเค•เฅ‡ เคฌเฅˆเค•เคเค‚เคก เค•เฅ‡ เค†เคงเคพเคฐ เคชเคฐ), เคœเฅ‹ เคนเฅ‹ เคธเค•เคคเคพ เคนเฅˆ เคธเคพเคฎเคพเคจเฅเคฏ เคคเคฐเฅ€เค•เฅ‡ เคธเฅ‡ เค‰เคชเคฏเฅ‹เค— เค•เคฟเคฏเคพ เคœเคพเคคเคพ เคนเฅˆเฅค [เคฏเคน เคŸเฅเคฏเฅ‚เคŸเฅ‹เคฐเคฟเคฏเคฒ](https://huggingface.co/transformers/training.html) เคฌเคคเคพเคคเคพ เคนเฅˆ เค•เคฟ เค‡เคธ เคคเคฐเคน เค•เฅ‡ เคฎเฅ‰เคกเคฒ เค•เฅ‹ เค•เฅเคฒเคพเคธเคฟเค• PyTorch เคฏเคพ TensorFlow เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เคฒเฅ‚เคช เคฎเฅ‡เค‚ เค•เฅˆเคธเฅ‡ เคเค•เฅ€เค•เฅƒเคค เค•เคฟเคฏเคพ เคœเคพเค, เคฏเคพ เคนเคฎเคพเคฐเฅ‡ `เคŸเฅเคฐเฅ‡เคจเคฐ` เคเคชเฅ€เค†เคˆ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เฅˆเคธเฅ‡ เค•เคฐเฅ‡เค‚ เคคเคพเค•เคฟ เค‡เคธเฅ‡ เคœเคฒเฅเคฆเฅ€ เคธเฅ‡ เคซเคผเคพเค‡เคจ เคŸเฅเคฏเฅ‚เคจ เค•เคฟเคฏเคพ เคœเคพ เคธเค•เฅ‡เฅคเคเค• เคจเคฏเคพ เคกเฅ‡เคŸเคพเคธเฅ‡เคŸ เคชเฅ‡เฅค - -## เคŸเฅเคฐเคพเค‚เคธเคซเคพเคฐเฅเคฎเคฐ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เฅเคฏเฅ‹เค‚ เค•เคฐเฅ‡เค‚? - -1. เค‰เคชเคฏเฅ‹เค— เคฎเฅ‡เค‚ เค†เคธเคพเคจเฅ€ เค•เฅ‡ เคฒเคฟเค เค‰เคจเฅเคจเคค เคฎเฅ‰เคกเคฒ: - - เคเคจเคเคฒเคฏเฅ‚ เค”เคฐ เคเคจเคเคฒเคœเฅ€ เคชเคฐ เคฌเฅ‡เคนเคคเคฐ เคชเฅเคฐเคฆเคฐเฅเคถเคจ - - เคชเฅเคฐเคตเฅ‡เคถ เค•เฅ‡ เคฒเคฟเค เค•เคฎ เคฌเคพเคงเคพเค“เค‚ เค•เฅ‡ เคธเคพเคฅ เคถเคฟเค•เฅเคทเคฃ เค”เคฐ เค…เคญเฅเคฏเคพเคธ เค•เฅ‡ เค…เคจเฅเค•เฅ‚เคฒ - - เค‰เคชเคฏเฅ‹เค—เค•เคฐเฅเคคเคพ-เคธเคพเคฎเคจเคพ เค•เคฐเคจเฅ‡ เคตเคพเคฒเฅ‡ เคธเคพเคฐ เคคเคคเฅเคต, เค•เฅ‡เคตเคฒ เคคเฅ€เคจ เคตเคฐเฅเค—เฅ‹เค‚ เค•เฅ‹ เคœเคพเคจเคจเฅ‡ เค•เฅ€ เคœเคฐเฅ‚เคฐเคค เคนเฅˆ - - เคธเคญเฅ€ เคฎเฅ‰เคกเคฒเฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เคเค•เฅ€เค•เฅƒเคค เคเคชเฅ€เค†เคˆ - -1. เค•เคฎ เค•เคฎเฅเคชเฅเคฏเฅ‚เคŸเฅ‡เคถเคจเคฒ เค“เคตเคฐเคนเฅ‡เคก เค”เคฐ เค•เคฎ เค•เคพเคฐเฅเคฌเคจ เค‰เคคเฅเคธเคฐเฅเคœเคจ: - - เคถเฅ‹เคงเค•เคฐเฅเคคเคพ เคนเคฐ เคฌเคพเคฐ เคจเค เคธเคฟเคฐเฅ‡ เคธเฅ‡ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เคฆเฅ‡เคจเฅ‡ เค•เฅ‡ เคฌเคœเคพเคฏ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ เคธเคพเคเคพ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ - - เค‡เค‚เคœเฅ€เคจเคฟเคฏเคฐ เค—เคฃเคจเคพ เคธเคฎเคฏ เค”เคฐ เค‰เคคเฅเคชเคพเคฆเคจ เค“เคตเคฐเคนเฅ‡เคก เค•เฅ‹ เค•เคฎ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚ - - เคฆเคฐเฅเคœเคจเฅ‹เค‚ เคฎเฅ‰เคกเคฒ เค†เคฐเฅเค•เคฟเคŸเฅ‡เค•เฅเคšเคฐ, 2,000 เคธเฅ‡ เค…เคงเคฟเค• เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเฅ‰เคกเคฒ, 100 เคธเฅ‡ เค…เคงเคฟเค• เคญเคพเคทเคพเค“เค‚ เค•เคพ เคธเคฎเคฐเฅเคฅเคจ - -1.เคฎเฅ‰เคกเคฒ เคœเฅ€เคตเคจเคšเค•เฅเคฐ เค•เฅ‡ เคนเคฐ เคนเคฟเคธเฅเคธเฅ‡ เค•เฅ‹ เคถเคพเคฎเคฟเคฒ เค•เคฐเคคเคพ เคนเฅˆ: - - เค•เฅ‹เคก เค•เฅ€ เค•เฅ‡เคตเคฒ 3 เคชเค‚เค•เฅเคคเคฟเคฏเฅ‹เค‚ เคฎเฅ‡เค‚ เค‰เคจเฅเคจเคค เคฎเฅ‰เคกเคฒเฅ‹เค‚ เค•เฅ‹ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เค•เคฐเฅ‡เค‚ - - เคฎเฅ‰เคกเคฒ เค•เฅ‹ เคฎเคจเคฎเคพเคจเฅ‡ เคขเค‚เค— เคธเฅ‡ เคตเคฟเคญเคฟเคจเฅเคจ เคกเฅ€เคช เคฒเคฐเฅเคจเคฟเค‚เค— เคซเฅเคฐเฅ‡เคฎเคตเคฐเฅเค• เค•เฅ‡ เคฌเฅ€เคš เคธเฅเคฅเคพเคจเคพเค‚เคคเคฐเคฟเคค เค•เคฟเคฏเคพ เคœเคพ เคธเค•เคคเคพ เคนเฅˆ, เคœเฅˆเคธเคพ เค†เคช เคšเคพเคนเคคเฅ‡ เคนเฅˆเค‚ - - เคจเคฟเคฐเฅเคฌเคพเคง เคฐเฅ‚เคช เคธเฅ‡ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ, เคฎเฅ‚เคฒเฅเคฏเคพเค‚เค•เคจ เค”เคฐ เค‰เคคเฅเคชเคพเคฆเคจ เค•เฅ‡ เคฒเคฟเค เคธเคฌเคธเฅ‡ เค‰เคชเคฏเฅเค•เฅเคค เคขเคพเค‚เคšเคพ เคšเฅเคจเฅ‡เค‚ - -1. เค†เคธเคพเคจเฅ€ เคธเฅ‡ เค…เคจเคจเฅเคฏ เคฎเฅ‰เคกเคฒ เค•เฅ‹ เค…เคจเฅเค•เฅ‚เคฒเคฟเคค เค•เคฐเฅ‡เค‚ เค”เคฐ เค…เคชเคจเฅ€ เค†เคตเคถเฅเคฏเค•เคคเคพเค“เค‚ เค•เฅ‡ เคฒเคฟเค เคฎเคพเคฎเคฒเฅ‹เค‚ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚: - - เคนเคฎ เคฎเฅ‚เคฒ เคชเฅ‡เคชเคฐ เคชเคฐเคฟเคฃเคพเคฎเฅ‹เค‚ เค•เฅ‹ เคชเฅเคจ: เคชเฅ‡เคถ เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคชเฅเคฐเคคเฅเคฏเฅ‡เค• เคฎเฅ‰เคกเคฒ เค†เคฐเฅเค•เคฟเคŸเฅ‡เค•เฅเคšเคฐ เค•เฅ‡ เคฒเคฟเค เค•เคˆ เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฎเคพเคฎเคฒเฅ‡ เคชเฅเคฐเคฆเคพเคจ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚ - - เคฎเฅ‰เคกเคฒ เค•เฅ€ เค†เค‚เคคเคฐเคฟเค• เคธเค‚เคฐเคšเคจเคพ เคชเคพเคฐเคฆเคฐเฅเคถเฅ€ เค”เคฐ เคธเฅเคธเค‚เค—เคค เคฐเคนเคคเฅ€ เคนเฅˆ - - เคฎเฅ‰เคกเคฒ เคซเคผเคพเค‡เคฒ เค•เฅ‹ เค…เคฒเค— เคธเฅ‡ เค‡เคธเฅเคคเฅ‡เคฎเคพเคฒ เค•เคฟเคฏเคพ เคœเคพ เคธเค•เคคเคพ เคนเฅˆ, เคœเฅ‹ เคธเค‚เคถเฅ‹เคงเคจ เค”เคฐ เคคเฅเคตเคฐเคฟเคค เคชเฅเคฐเคฏเฅ‹เค— เค•เฅ‡ เคฒเคฟเค เคธเฅเคตเคฟเคงเคพเคœเคจเค• เคนเฅˆ - -## เคฎเฅเคเฅ‡ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฌ เคจเคนเฅ€เค‚ เค•เคฐเคจเคพ เคšเคพเคนเคฟเค? - -- เคฏเคน เคฒเคพเค‡เคฌเฅเคฐเฅ‡เคฐเฅ€ เคฎเฅ‰เคกเฅเคฏเฅ‚เคฒเคฐ เคจเฅเคฏเฅ‚เคฐเคฒ เคจเฅ‡เคŸเคตเคฐเฅเค• เคŸเฅ‚เคฒเคฌเฅ‰เค•เฅเคธ เคจเคนเฅ€เค‚ เคนเฅˆเฅค เคฎเฅ‰เคกเคฒ เคซเคผเคพเค‡เคฒ เคฎเฅ‡เค‚ เค•เฅ‹เคก เคœเคพเคจเคฌเฅ‚เคเค•เคฐ เค…เคฒเฅเคชเคตเคฟเค•เคธเคฟเคค เคนเฅˆ, เคฌเคฟเคจเคพ เค…เคคเคฟเคฐเคฟเค•เฅเคค เคธเคพเคฐ เค‡เคจเค•เฅˆเคชเฅเคธเฅเคฒเฅ‡เคถเคจ เค•เฅ‡, เคคเคพเค•เคฟ เคถเฅ‹เคงเค•เคฐเฅเคคเคพ เค…เคฎเฅ‚เคฐเฅเคคเคคเคพ เค”เคฐ เคซเคผเคพเค‡เคฒ เคœเค‚เคชเคฟเค‚เค— เคฎเฅ‡เค‚ เคถเคพเคฎเคฟเคฒ เคนเฅเค เคœเคฒเฅเคฆเฅ€ เคธเฅ‡ เคชเฅเคจเคฐเคพเคตเฅƒเคคเคฟ เค•เคฐ เคธเค•เฅ‡เค‚เฅค -- `เคŸเฅเคฐเฅ‡เคจเคฐ` เคเคชเฅ€เค†เคˆ เค•เคฟเคธเฅ€ เคญเฅ€ เคฎเฅ‰เคกเคฒ เค•เฅ‡ เคธเคพเคฅ เคธเค‚เค—เคค เคจเคนเฅ€เค‚ เคนเฅˆ, เคฏเคน เค•เฅ‡เคตเคฒ เค‡เคธ เคชเฅเคธเฅเคคเค•เคพเคฒเคฏ เค•เฅ‡ เคฎเฅ‰เคกเคฒ เค•เฅ‡ เคฒเคฟเค เค…เคจเฅเค•เฅ‚เคฒเคฟเคค เคนเฅˆเฅค เคฏเคฆเคฟ เค†เคช เคธเคพเคฎเคพเคจเฅเคฏ เคฎเคถเฅ€เคจ เคฒเคฐเฅเคจเคฟเค‚เค— เค•เฅ‡ เคฒเคฟเค เค‰เคชเคฏเฅเค•เฅเคค เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เคฒเฅ‚เคช เค•เคพเคฐเฅเคฏเคพเคจเฅเคตเคฏเคจ เค•เฅ€ เคคเคฒเคพเคถ เคฎเฅ‡เค‚ เคนเฅˆเค‚, เคคเฅ‹ เค•เคนเฅ€เค‚ เค”เคฐ เคฆเฅ‡เค–เฅ‡เค‚เฅค -- เคนเคฎเคพเคฐเฅ‡ เคธเคฐเฅเคตเฅ‹เคคเฅเคคเคฎ เคชเฅเคฐเคฏเคพเคธเฅ‹เค‚ เค•เฅ‡ เคฌเคพเคตเคœเฅ‚เคฆ, [เค‰เคฆเคพเคนเคฐเคฃ เคจเคฟเคฐเฅเคฆเฅ‡เคถเคฟเค•เคพ](https://github.com/huggingface/transformers/tree/main/examples) เคฎเฅ‡เค‚ เคธเฅเค•เฅเคฐเคฟเคชเฅเคŸ เค•เฅ‡เคตเคฒ เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฎเคพเคฎเคฒเฅ‡ เคนเฅˆเค‚เฅค เค†เคชเค•เฅ€ เคตเคฟเคถเคฟเคทเฅเคŸ เคธเคฎเคธเฅเคฏเคพ เค•เฅ‡ เคฒเคฟเค, เคตเฅ‡ เคœเคฐเฅ‚เคฐเฅ€ เคจเคนเฅ€เค‚ เค•เคฟ เคฌเฅ‰เค•เฅเคธ เคธเฅ‡ เคฌเคพเคนเคฐ เค•เคพเคฎ เค•เคฐเฅ‡เค‚, เค”เคฐ เค†เคชเค•เฅ‹ เค•เฅ‹เคก เค•เฅ€ เค•เฅเค› เคชเค‚เค•เฅเคคเคฟเคฏเฅ‹เค‚ เค•เฅ‹ เคธเฅ‚เคŸ เค•เคฐเคจเฅ‡ เค•เฅ€ เค†เคตเคถเฅเคฏเค•เคคเคพ เคนเฅ‹ เคธเค•เคคเฅ€ เคนเฅˆเฅค - -## เคธเฅเคฅเคพเคชเคฟเคค เค•เคฐเคจเคพ - -### เคชเคฟเคช เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ - -เค‡เคธ เคฐเคฟเคชเฅ‰เคœเคฟเคŸเคฐเฅ€ เค•เคพ เคชเคฐเฅ€เค•เฅเคทเคฃ Python 3.8+, Flax 0.4.1+, PyTorch 1.10+ เค”เคฐ TensorFlow 2.6+ เค•เฅ‡ เคคเคนเคค เค•เคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆเฅค - -เค†เคช [เคตเคฐเฅเคšเฅเค…เคฒ เคเคจเคตเคพเคฏเคฐเคจเคฎเฅ‡เค‚เคŸ](https://docs.python.org/3/library/venv.html) เคฎเฅ‡เค‚ ๐Ÿค— เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค‡เค‚เคธเฅเคŸเฅ‰เคฒ เค•เคฐ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค เคฏเคฆเคฟ เค†เคช เค…เคญเฅ€ เคคเค• เคชเคพเคฏเคฅเคจ เค•เฅ‡ เคตเคฐเฅเคšเฅเค…เคฒ เคเคจเคตเคพเคฏเคฐเคจเคฎเฅ‡เค‚เคŸ เคธเฅ‡ เคชเคฐเคฟเคšเคฟเคค เคจเคนเฅ€เค‚ เคนเฅˆเค‚, เคคเฅ‹ เค•เฅƒเคชเคฏเคพ เค‡เคธเฅ‡ [เค‰เคชเคฏเฅ‹เค—เค•เคฐเฅเคคเคพ เคจเคฟเคฐเฅเคฆเฅ‡เคถ](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) เคชเคขเคผเฅ‡เค‚เฅค - -เคธเคฌเคธเฅ‡ เคชเคนเคฒเฅ‡, เคชเคพเคฏเคฅเคจ เค•เฅ‡ เค‰เคธ เคธเค‚เคธเฅเค•เคฐเคฃ เค•เฅ‡ เคธเคพเคฅ เคเค• เค†เคญเคพเคธเฅ€ เคตเคพเคคเคพเคตเคฐเคฃ เคฌเคจเคพเคเค‚ เคœเคฟเคธเค•เคพ เค†เคช เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเฅ‡ เค”เคฐ เค‰เคธเฅ‡ เคธเค•เฅเคฐเคฟเคฏ เค•เคฐเคจเฅ‡ เค•เฅ€ เคฏเฅ‹เคœเคจเคพ เคฌเคจเคพ เคฐเคนเฅ‡ เคนเฅˆเค‚เฅค - -เคซเคฟเคฐ, เค†เคชเค•เฅ‹ Flax, PyTorch เคฏเคพ TensorFlow เคฎเฅ‡เค‚ เคธเฅ‡ เค•เคฟเคธเฅ€ เคเค• เค•เฅ‹ เคธเฅเคฅเคพเคชเคฟเคค เค•เคฐเคจเฅ‡ เค•เฅ€ เค†เคตเคถเฅเคฏเค•เคคเคพ เคนเฅˆเฅค เค…เคชเคจเฅ‡ เคชเฅเคฒเฅ‡เคŸเคซเคผเฅ‰เคฐเฅเคฎ เคชเคฐ เค‡เคจ เคซเคผเฅเคฐเฅ‡เคฎเคตเคฐเฅเค• เค•เฅ‹ เคธเฅเคฅเคพเคชเคฟเคค เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค, [TensorFlow เคธเฅเคฅเคพเคชเคจเคพ เคชเฅƒเคทเฅเค ](https://www.tensorflow.org/install/), [PyTorch เคธเฅเคฅเคพเคชเคจเคพ เคชเฅƒเคทเฅเค ](https://pytorch.org/get-started/locally) - -เคฆเฅ‡เค–เฅ‡เค‚ start-locally เคฏเคพ [Flax เคธเฅเคฅเคพเคชเคจเคพ เคชเฅƒเคทเฅเค ](https://github.com/google/flax#quick-install). - -เคœเคฌ เค‡เคจเคฎเฅ‡เค‚ เคธเฅ‡ เค•เฅ‹เคˆ เคเค• เคฌเฅˆเค•เคเค‚เคก เคธเคซเคฒเคคเคพเคชเฅ‚เคฐเฅเคตเค• เคธเฅเคฅเคพเคชเคฟเคค เคนเฅ‹ เคœเคพเคคเคพ เคนเฅˆ, เคคเฅ‹ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคจเคฟเคฎเฅเคจเคพเคจเฅเคธเคพเคฐ เคธเฅเคฅเคพเคชเคฟเคค เค•เคฟเค เคœเคพ เคธเค•เคคเฅ‡ เคนเฅˆเค‚: - -```bash -pip install transformers -``` - -เคฏเคฆเคฟ เค†เคช เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฎเคพเคฎเคฒเฅ‹เค‚ เค•เฅ‹ เค†เคœเคผเคฎเคพเคจเคพ เคšเคพเคนเคคเฅ‡ เคนเฅˆเค‚ เคฏเคพ เค†เคงเคฟเค•เคพเคฐเคฟเค• เคฐเคฟเคฒเฅ€เคœเคผ เคธเฅ‡ เคชเคนเคฒเฅ‡ เคจเคตเฅ€เคจเคคเคฎ เค‡เคจ-เคกเฅ‡เคตเคฒเคชเคฎเฅ‡เค‚เคŸ เค•เฅ‹เคก เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ เคšเคพเคนเคคเฅ‡ เคนเฅˆเค‚, เคคเฅ‹ เค†เคชเค•เฅ‹ [เคธเฅ‹เคฐเฅเคธ เคธเฅ‡ เค‡เค‚เคธเฅเคŸเฅ‰เคฒ เค•เคฐเคจเคพ เคนเฅ‹เค—เคพ](https://huggingface.co/docs/transformers/installation#installing-from-) เคธเฅเคฐเฅ‹เคคเฅค - -### เค•เฅ‹เค‚เคกเคพ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ - -เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคธเค‚เคธเฅเค•เคฐเคฃ 4.0.0 เค•เฅ‡ เคฌเคพเคฆ เคธเฅ‡, เคนเคฎเคพเคฐเฅ‡ เคชเคพเคธ เคเค• เค•เฅ‹เค‚เคกเคพ เคšเฅˆเคจเคฒ เคนเฅˆ: `เคนเค—เคฟเค‚เค—เคซเฅ‡เคธ`เฅค - -เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค•เฅ‹เค‚เคกเคพ เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ เคจเคฟเคฎเฅเคจเคพเคจเฅเคธเคพเคฐ เคธเฅเคฅเคพเคชเคฟเคค เค•เคฟเคฏเคพ เคœเคพ เคธเค•เคคเคพ เคนเฅˆ: - -```shell script -conda install -c huggingface transformers -``` - -เค•เฅ‹เค‚เคกเคพ เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ Flax, PyTorch, เคฏเคพ TensorFlow เคฎเฅ‡เค‚ เคธเฅ‡ เค•เคฟเคธเฅ€ เคเค• เค•เฅ‹ เคธเฅเคฅเคพเคชเคฟเคค เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค, เคจเคฟเคฐเฅเคฆเฅ‡เคถเฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เค‰เคจเค•เฅ‡ เคธเค‚เคฌเค‚เคงเคฟเคค เคธเฅเคฅเคพเคชเคจเคพ เคชเฅƒเคทเฅเค  เคฆเฅ‡เค–เฅ‡เค‚เฅค - -## เคฎเฅ‰เคกเคฒ เค†เคฐเฅเค•เคฟเคŸเฅ‡เค•เฅเคšเคฐ -[เค‰เคชเคฏเฅ‹เค—เค•เคฐเฅเคคเคพ](https://huggingface.co/users) เค”เคฐ [organization](https://huggingface.co) เคฆเฅเคตเคพเคฐเคพ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคธเคฎเคฐเฅเคฅเคฟเคค [**เคธเคญเฅ€ เคฎเฅ‰เคกเคฒ เคšเฅŒเค•เคฟเคฏเฅ‹เค‚**](https://huggingface.co/models/users) เคนเค—เคฟเค‚เค—เคซเฅ‡เคธ.เค•เฅ‹/เค‘เคฐเฅเค—เคจเคพเค‡เคœเฅ‡เคถเคจ), เคธเคญเฅ€ เค•เฅ‹ เคฌเคฟเคจเคพ เค•เคฟเคธเฅ€ เคฌเคพเคงเคพ เค•เฅ‡ เคนเค—เคฟเค‚เค—เคซเฅ‡เคธ.เค•เฅ‹ [เคฎเฅ‰เคกเคฒ เคนเคฌ](https://huggingface.co) เค•เฅ‡ เคธเคพเคฅ เคเค•เฅ€เค•เฅƒเคค เค•เคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆเฅค - -เคšเฅŒเค•เคฟเคฏเฅ‹เค‚ เค•เฅ€ เคตเคฐเฅเคคเคฎเคพเคจ เคธเค‚เค–เฅเคฏเคพ: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคตเคฐเฅเคคเคฎเคพเคจ เคฎเฅ‡เค‚ เคจเคฟเคฎเฅเคจเคฒเคฟเค–เคฟเคค เค†เคฐเฅเค•เคฟเคŸเฅ‡เค•เฅเคšเคฐ เค•เคพ เคธเคฎเคฐเฅเคฅเคจ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚ (เคฎเฅ‰เคกเคฒ เค•เฅ‡ เค…เคตเคฒเฅ‹เค•เคจ เค•เฅ‡ เคฒเคฟเค [เคฏเคนเคพเค‚] เคฆเฅ‡เค–เฅ‡เค‚ (https://huggingface.co/docs/transformers/model_summary))๏ผš - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (Google Research and the Toyota Technological Institute at Chicago) เคธเคพเคฅ เคฅเฅ€เคธเคฟเคธ [ALBERT: A Lite BERT for Self-supervised เคญเคพเคทเคพ เคชเฅเคฐเคคเคฟเคจเคฟเคงเคฟเคคเฅเคต เคธเฅ€เค–เคจเคพ](https://arxiv.org/abs/1909.11942), เคเฅ‡เค‚เคเฅ‹เค‚เค— เคฒเฅˆเคจ, เคฎเคฟเค‚เค—เคฆเคพ เคšเฅ‡เคจ, เคธเฅ‡เคฌเฅ‡เคธเฅเคŸเคฟเคฏเคจ เค—เฅเคกเคฎเฅˆเคจ, เค•เฅ‡เคตเคฟเคจ เค—เคฟเคฎเฅเคชเฅ‡เคฒ, เคชเฅ€เคฏเฅ‚เคท เคถเคฐเฅเคฎเคพ, เคฐเคพเคกเฅ‚ เคธเฅ‹เคฐเคฟเค•เคŸ -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research เคธเฅ‡) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (เคซเฅ‡เคธเคฌเฅเค•) เคธเคพเคฅ เคฅเฅ€เคธเคฟเคธ [เคฌเคพเคฐเฅเคŸ: เคชเฅเคฐเคพเค•เฅƒเคคเคฟเค• เคญเคพเคทเคพ เคจเคฟเคฐเฅเคฎเคพเคฃ, เค…เคจเฅเคตเคพเคฆ เค•เฅ‡ เคฒเคฟเค เค…เคจเฅเค•เฅเคฐเคฎ-เคธเฅ‡-เค…เคจเฅเค•เฅเคฐเคฎ เคชเฅ‚เคฐเฅเคต เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ , เค”เคฐ เคธเคฎเค](https://arxiv.org/pdf/1910.13461.pdf) เคชเคฐ เคจเคฟเคฐเฅเคญเคฐ เคฎเคพเค‡เค• เคฒเฅเคˆเคธ, เคฏเคฟเคจเคนเคพเคจ เคฒเคฟเคฏเฅ‚, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคฎเคพเคฐเฅเคœเคจ เค—เคผเคœเคผเคตเคฟเคจเคฟเคจเฅ‡เคœเคพเคฆ, เค…เคฌเฅเคฆเฅ‡เคฒเคฐเคนเคฎเคพเคจ เคฎเฅ‹เคนเคฎเฅเคฎเคฆ, เค“เคฎเคฐ เคฒเฅ‡เคตเฅ€, เคตเฅ‡เคธ เคธเฅเคŸเฅ‹เคฏเคพเคจเฅ‹เคต เค”เคฐ เคฒเฅเคฏเฅ‚เค• เคœเคผเฅ‡เคŸเคฒเคฎเฅ‰เคฏเคฐ -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (เคธเฅ‡ ร‰cole polytechnique) เคธเคพเคฅ เคฅเฅ€เคธเคฟเคธ [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) เคชเคฐ เคจเคฟเคฐเฅเคญเคฐ Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis เคฐเคฟเคนเคพเคˆเฅค -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)เค—เฅเคฏเฅ‡เคจ เคฒเฅเค“เค‚เค— เคŸเฅเคฐเคพเคจ, เคกเฅเค“เค‚เค— เคฎเคฟเคจเฅเคน เคฒเฅ‡ เค”เคฐ เคกเคพเคŸ เค•เฅเคตเฅ‹เค• เค—เฅเคฏเฅ‡เคจ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (Microsoft เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [BEiT: BERT เค‡เคฎเฅ‡เคœ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เค•เคพ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—](https://arxiv.org/abs/2106.08254) Hangbo Bao, Li Dong, Furu Wei เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (เค—เฅ‚เค—เคฒ เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคฌเฅ€เคˆเค†เคฐเคŸเฅ€: เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค‘เคซ เคกเฅ€เคช เคฌเคฟเคกเคพเคฏเคฐเฅ‡เค•เฅเคถเคจเคฒ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เค…เค‚เคกเคฐเคธเฅเคŸเฅˆเค‚เคกเคฟเค‚เค—](https://arxiv.org/abs/1810.04805) เคœเฅˆเค•เคฌ เคกเฅ‡เคตเคฒเคฟเคจ, เคฎเคฟเค‚เค—-เคตเฅ‡เคˆ เคšเคพเค‚เค—, โ€‹โ€‹เค•เฅ‡เค‚เคŸเคจ เคฒเฅ€ เค”เคฐ เค•เฅเคฐเคฟเคธเฅเคŸเฅ€เคจเคพ เคŸเฅŒเคŸเคพเคจเฅ‹เคตเคพ เคฆเฅเคตเคพเคฐเคพ เคชเฅเคฐเค•เคพเคถเคฟเคค เค•เคฟเคฏเคพ เค—เคฏเคพ เคฅเคพเฅค . -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (เค—เฅ‚เค—เคฒ เคธเฅ‡) เคธเคพเคฅ เคฆเฅ‡เคจเฅ‡ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ เคœเฅ‡เคจเคฐเฅ‡เคถเคจ เคŸเคพเคธเฅเค• เค•เฅ‡ เคฒเคฟเค เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เค‚เคก เคšเฅ‡เค•เคชเฅ‰เค‡เค‚เคŸ เค•เคพ เค‡เคธเฅเคคเฅ‡เคฎเคพเคฒ เค•เคฐเคจเคพ](https ://arxiv.org/abs/1907.12461) เคธเคพเคถเคพ เคฐเฅ‹เค เฅ‡, เคถเคถเคฟ เคจเคพเคฐเคพเคฏเคฃ, เค…เคฒเคฟเคฏเคพเค•เฅเคธเคฟ เคธเฅ‡เคตเฅ‡เคฐเคฟเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (VinAI Research เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [BERTweet: เค…เค‚เค—เฅเคฐเฅ‡เคœเฅ€ เคŸเฅเคตเฅ€เคŸเฅเคธ เค•เฅ‡ เคฒเคฟเค เคเค• เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ](https://aclanthology.org/2020.emnlp-demos.2/) เคกเคพเคŸ เค•เฅเคตเฅ‹เค• เค—เฅเคฏเฅ‡เคจ, เคฅเคพเคจ เคตเฅ เค”เคฐ เค…เคจเฅเคน เคคเฅเค†เคจ เค—เฅเคฏเฅ‡เคจ เคฆเฅเคตเคพเคฐเคพ เคชเฅเคฐเค•เคพเคถเคฟเคคเฅค -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (เค—เฅ‚เค—เคฒ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคฌเคฟเค— เคฌเคฐเฅเคก: เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เคฒเฅ‰เคจเฅเค—เคฐ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ](https://arxiv .org/abs/2007.14062) เคฎเค‚เคœเคผเคฟเคฒ เคœเคผเคนเฅ€เคฐ, เค—เฅเคฐเฅ เค—เฅเคฐเฅเค—เคฃเฅ‡เคถ, เค…เคตเคฟเคจเคพเคตเคพ เคฆเฅเคฌเฅ‡, เคœเฅ‹เคถเฅเค† เค†เค‡เค‚เคธเฅเคฒเฅ€, เค•เฅเคฐเคฟเคธ เค…เคฒเฅเคฌเคฐเฅเคŸเฅ€, เคธเฅˆเค‚เคŸเคฟเคฏเคพเค—เฅ‹ เค“เค‚เคŸเคพเคจเฅ‹เคจ, เคซเคฟเคฒเคฟเคช เคซเคพเคฎ, เค…เคจเคฟเคฐเฅเคฆเฅเคง เคฐเคพเคตเฅเคฒเคพ, เค•เคฟเคซเคผเคพเคจ เคตเคพเค‚เค—, เคฒเฅ€ เคฏเคพเค‚เค—, เค…เคฎเคฐ เค…เคนเคฎเคฆ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (เค—เฅ‚เค—เคฒ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฌเคฟเค— เคฌเคฐเฅเคก: เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เคฒเฅ‰เคจเฅเค—เคฐ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ](https://arxiv.org/abs/2007.14062) เคฎเค‚เคœเคผเคฟเคฒ เคœเคผเคนเฅ€เคฐ, เค—เฅเคฐเฅ เค—เฅเคฐเฅเค—เคฃเฅ‡เคถ, เค…เคตเคฟเคจเคพเคตเคพ เคฆเฅเคฌเฅ‡, เคœเฅ‹เคถเฅเค† เค†เค‡เค‚เคธเฅเคฒเฅ€, เค•เฅเคฐเคฟเคธ เค…เคฒเฅเคฌเคฐเฅเคŸเฅ€, เคธเฅˆเค‚เคŸเคฟเคฏเคพเค—เฅ‹ เค“เค‚เคŸเคพเคจเคจ, เคซเคฟเคฒเคฟเคช เคซเคพเคฎ เคฆเฅเคตเคพเคฐเคพ , เค…เคจเคฟเคฐเฅเคฆเฅเคง เคฐเคพเคตเฅเคฒเคพ, เค•เคฟเคซเคผเคพเคจ เคตเคพเค‚เค—, เคฒเฅ€ เคฏเคพเค‚เค—, เค…เคฎเคฐ เค…เคนเคฎเคฆ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคเค• เค“เคชเคจ-เคกเฅ‹เคฎเฅ‡เคจ เคšเฅˆเคŸเคฌเฅ‰เคŸ เคฌเคจเคพเคจเฅ‡ เค•เฅ€ เคตเคฟเคงเคฟ](https://arxiv.org /abs/2004.13637) เคธเฅเคŸเฅ€เคซเคจ เคฐเฅ‹เคฒเคฐ, เคเคฎเคฟเคฒเฅ€ เคฆเฅ€เคจเคจ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคฆเคพ เคœเฅ‚, เคฎเฅˆเคฐเฅ€ เคตเคฟเคฒเคฟเคฏเคฎเคธเคจ, เคฏเคฟเคจเคนเคพเคจ เคฒเคฟเคฏเฅ‚, เคœเคฟเค‚เค— เคœเฅ‚, เคฎเคพเคฏเคฒ เค“เคŸ, เค•เคฐเฅเคŸ เคถเคธเฅเคŸเคฐ, เคเคฐเคฟเค• เคเคฎเฅค เคธเฅเคฎเคฟเคฅ, เคตเคพเคˆ-เคฒเฅˆเคจ เคฌเฅ‰เคฐเฅ‹, เคœเฅ‡เคธเคจ เคตเฅ‡เคธเฅเคŸเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคเค• เค“เคชเคจ-เคกเฅ‹เคฎเฅ‡เคจ เคšเฅˆเคŸเคฌเฅ‰เคŸ เคฌเคจเคพเคจเฅ‡ เค•เฅ€ เคฐเฅ‡เคธเคฟเคชเฅ€](https://arxiv .org/abs/2004.13637) เคธเฅเคŸเฅ€เคซเคจ เคฐเฅ‹เคฒเคฐ, เคเคฎเคฟเคฒเฅ€ เคฆเฅ€เคจเคจ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคฆเคพ เคœเฅ‚, เคฎเฅˆเคฐเฅ€ เคตเคฟเคฒเคฟเคฏเคฎเคธเคจ, เคฏเคฟเคจเคนเคพเคจ เคฒเคฟเคฏเฅ‚, เคœเคฟเค‚เค— เคœเฅ‚, เคฎเคพเคฏเคฒ เค“เคŸ, เค•เคฐเฅเคŸ เคถเคธเฅเคŸเคฐ, เคเคฐเคฟเค• เคเคฎ เคธเฅเคฎเคฟเคฅ, เคตเคพเคˆ-เคฒเฅˆเคจ เคฌเฅ‰เคฐเฅ‹, เคœเฅ‡เคธเคจ เคตเฅ‡เคธเฅเคŸเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (Salesforce เคธเฅ‡) Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (เคเคฒเฅ‡เค•เฅเคธเคพ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคฌเฅ€เคˆเค†เคฐเคŸเฅ€ เค•เฅ‡ เคฒเคฟเค เค‘เคชเฅเคŸเคฟเคฎเคฒ เคธเคฌเค†เคฐเฅเค•เคฟเคŸเฅ‡เค•เฅเคšเคฐ เคเค•เฅเคธเคŸเฅเคฐเฅˆเค•เฅเคถเคจ](https://arxiv.org/abs/ 2010.10499) เคเคกเฅเคฐเคฟเคฏเคจ เคกเฅ€ เคตเคฟเค‚เคŸเคฐ เค”เคฐ เคกเฅˆเคจเคฟเคฏเคฒ เคœเฅ‡ เคชเฅ‡เคฐเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (เคนเคฐเคฌเคฟเคจ เค‡เค‚เคธเฅเคŸเคฟเคŸเฅเคฏเฅ‚เคŸ เค‘เฅž เคŸเฅ‡เค•เฅเคจเฅ‹เคฒเฅ‰เคœเฅ€/เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคเคถเคฟเคฏเคพ/เค‡เค‚เคŸเฅ‡เคฒ เคฒเฅˆเคฌเฅเคธ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคฌเฅเคฐเคฟเคœเคŸเฅ‰เคตเคฐ: เคตเคฟเคœเคจ-เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคฎเฅ‡เค‚ เคเคจเค•เฅ‹เคกเคฐเฅเคธ เค•เฅ‡ เคฌเฅ€เคš เคฌเฅเคฐเคฟเคœ เคฌเคจเคพเคจเคพ]() by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (NAVER CLOVA เคธเฅ‡) Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (Google เค…เคจเฅเคธเค‚เคงเคพเคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [ByT5: เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฌเคพเค‡เคŸ-เคŸเฅ‚-เคฌเคพเค‡เคŸ เคฎเฅ‰เคกเคฒ เค•เฅ‡ เคธเคพเคฅ เคเค• เคŸเฅ‹เค•เคจ-เคฎเฅเค•เฅเคค เคญเคตเคฟเคทเฅเคฏ เค•เฅ€ เค“เคฐ] (https://arxiv.org/abs/2105.13626) Linting Xue, Aditya Barua, Noah Constant, เคฐเคพเคฎเฅ€ เค…เคฒ-เคฐเคซเฅ‚, เคถเคฐเคฃ เคจเคพเคฐเค‚เค—, เคฎเคฟเคนเคฟเคฐ เค•เคพเคฒเฅ‡, เคเคกเคฎ เคฐเฅ‰เคฌเคฐเฅเคŸเฅเคธ, เค•เฅ‰เคฒเคฟเคจ เคฐเฅˆเคซเฅ‡เคฒ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (เค‡เคจเคฐเคฟเคฏเคพ/เคซเฅ‡เคธเคฌเฅเค•/เคธเฅ‹เคฐเคฌเฅ‹เคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [CamemBERT: เคเค• เคŸเฅ‡เคธเฅเคŸเฅ€ เคซเฅเคฐเฅ‡เค‚เคš เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ](https:// arxiv.org/abs/1911.03894) เคฒเฅเคˆ เคฎเคพเคฐเฅเคŸเคฟเคจ*, เคฌเฅ‡เค‚เคœเคพเคฎเคฟเคจ เคฎเฅเคฒเคฐ*, เคชเฅ‡เคกเฅเคฐเฅ‹ เคœเฅ‡เคตเคฟเคฏเคฐ เค‘เคฐเฅเคŸเคฟเคœเคผ เคธเฅเค†เคฐเฅ‡เคœเคผ*, เคฏเฅ‹เค†เคจ เคกเฅเคฏเฅ‚เคชเฅ‰เคจเฅเคŸ, เคฒเฅ‰เคฐเฅ‡เค‚เคŸ เคฐเฅ‹เคฎเคฐเฅ€, เคเคฐเคฟเค• เคตเคฟเคฒเฅ‡เคฎเฅ‹เคจเฅเคŸเฅ‡ เคกเฅ‡ เคฒเคพ เค•เฅเคฒเคฐเฅเคœเคฐเฅ€, เคœเฅˆเคฎเฅ‡ เคธเฅ‡เคกเคพเคน เค”เคฐ เคฌเฅ‡เคจเฅ‹เค‡เคŸ เคธเค—เฅ‹เคŸ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (Google เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [เค•เฅˆเคจเคพเค‡เคจ: เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค เคเคซเคฟเคถเคฟเคเค‚เคŸ เคŸเฅ‹เค•เคจเคพเค‡เคœเฅ‡เคถเคจ-เคซเฅเคฐเฅ€ เคเคจเค•เฅ‹เคกเคฐ เคซเฅ‰เคฐ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ]( https://arxiv.org/abs/2103.06874) เคœเฅ‹เคจเคพเคฅเคจ เคเคš เค•เฅเคฒเคพเคฐเฅเค•, เคกเฅˆเคจ เค—เฅˆเคฐเฅ‡เคŸ, เคฏเฅ‚เคฒเคฟเคฏเคพ เคŸเคฐเฅเค•, เคœเฅ‰เคจ เคตเคฟเคเคŸเคฟเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI เคธเฅ‡) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคฒเคฐเฅเคจเคฟเค‚เค— เคŸเฅเคฐเคพเค‚เคธเคซเคฐเฅ‡เคฌเคฒ เคตเคฟเคœเฅเค…เคฒ เคฎเฅ‰เคกเคฒ เคซเฅเคฐเฅ‰เคฎ เคจเฅ‡เคšเฅเคฐเคฒ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคธเฅเคชเคฐเคตเคฟเคœเคจ](https://arxiv.org /abs/2103.00020) เคเคฒเฅ‡เค• เคฐเฅˆเคกเคซเฅ‹เคฐเฅเคก, เคœเฅ‹เค‚เค— เคตเฅ‚เค• เค•เคฟเคฎ, เค•เฅเคฐเคฟเคธ เคนเฅˆเคฒเคพเคธเฅ€, เค†เคฆเคฟเคคเฅเคฏ เคฐเคฎเฅ‡เคถ, เค—เฅ‡เคฌเฅเคฐเคฟเคฏเคฒ เค—เฅ‹เคน, เคธเค‚เคงเฅเคฏเคพ เค…เค—เฅเคฐเคตเคพเคฒ, เค—เคฟเคฐเฅ€เคถ เคถเคพเคธเฅเคคเฅเคฐเฅ€, เค…เคฎเคพเค‚เคกเคพ เคเคธเฅเค•เฅ‡เคฒ, เคชเคพเคฎเฅ‡เคฒเคพ เคฎเคฟเคถเฅเค•เคฟเคจ, เคœเฅˆเค• เค•เฅเคฒเคพเคฐเฅเค•, เค—เฅเคฐเฅ‡เคšเฅ‡เคจ เค•เฅเคฐเฅเคเค—เคฐ, เค‡เคฒเฅเคฏเคพ เคธเฅเคคเฅเคธเฅเค•เฅ‡เคตเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (เคธเฅ‡เคฒเฅเคธเคซเฅ‹เคฐเฅเคธ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคชเฅเคฐเฅ‹เค—เฅเคฐเคพเคฎ เคธเคฟเค‚เคฅเฅ‡เคธเคฟเคธ เค•เฅ‡ เคฒเคฟเค เคเค• เคธเค‚เคตเคพเคฆเคพเคคเฅเคฎเค• เคชเฅเคฐเคคเคฟเคฎเคพเคจ](https://arxiv.org/abs/2203.13474) เคเคฐเคฟเค• เคจเคฟเคœเค•เฅˆเค‚เคช, เคฌเฅ‹ เคชเฅˆเค‚เค—, เคนเคฟเคฐเฅ‹เค†เค•เฅ€ เคนเคฏเคพเคถเฅ€, เคฒเคฟเคซเฅ‚ เคคเฅ‚, เคนเฅเค†เคจ เคตเคพเค‚เค—, เคฏเคฟเค‚เค—เคฌเฅ‹ เคเฅ‹เค‰, เคธเคฟเคฒเฅเคตเคฟเคฏเฅ‹ เคธเคพเคตเคฐเฅ‡เคธ, เค•เฅˆเคฎเคฟเค‚เค— เคœเคฟเค“เค‚เค— เคฐเคฟเคฒเฅ€เคœเฅค -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI เคธเฅ‡) Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคเคถเคฟเคฏเคพ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคซเคพเคธเฅเคŸ เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค•เคจเฅเคตเคฐเฅเคœเฅ‡เค‚เคธ เค•เฅ‡ เคฒเคฟเค เคธเคถเคฐเฅเคค เคกเฅ€เคˆเคŸเฅ€เค†เคฐ](https://arxiv. org/abs/2108.06152) เคกเฅ‡เคชเฅ‚ เคฎเฅ‡เค‚เค—, เคœเคผเคฟเคฏเคพเค“เค•เคพเค‚เค— เคšเฅ‡เคจ, เคœเคผเฅ‡เคœเคฟเคฏเคพ เคซเฅˆเคจ, เค—เฅˆเค‚เค— เคœเคผเฅ‡เค‚เค—, เคนเฅ‹เค‰เค•เคฟเคฏเคพเค‚เค— เคฒเฅ€, เคฏเฅเคนเฅเคˆ เคฏเฅเค†เคจ, เคฒเฅ‡เคˆ เคธเคจ, เคœเคฟเค‚เค—เคกเฅ‹เค‚เค— เคตเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (YituTech เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [ConvBERT: เคธเฅเคชเฅˆเคจ-เค†เคงเคพเคฐเคฟเคค เคกเคพเคฏเคจเฅ‡เคฎเคฟเค• เค•เคจเคตเคฒเฅเคถเคจ เค•เฅ‡ เคธเคพเคฅ BERT เคฎเฅ‡เค‚ เคธเฅเคงเคพเคฐ](https://arxiv .org/abs/2008.02496) เคœเคฟเคนเคพเค‚เค— เคœเคฟเคฏเคพเค‚เค—, เคตเฅ€เคนเคพเค“ เคฏเฅ‚, เคกเคพเค•เคพเคจ เคเฅ‹เค‰, เคฏเฅเคจเคชเฅ‡เค‚เค— เคšเฅ‡เคจ, เคœเคฟเคฏเคพเคถเฅ€ เคซเฅ‡เค‚เค—, เคถเฅเค‡เคšเฅ‡เค‚เค— เคฏเคพเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [A ConvNet for the 2020s](https://arxiv.org/abs /2201.03545) เคœเคผเฅเค†เค‚เค— เคฒเคฟเคฏเฅ‚, เคนเฅ‡เค‚เคœเคผเฅ€ เคฎเคพเค“, เคšเคพเค“-เคฏเฅเค†เคจ เคตเฅ‚, เค•เฅเคฐเคฟเคธเฅเคŸเฅ‹เคซเคผ เคซเฅ€เคšเคŸเฅ‡เคจเคนเฅ‹เคซเคผเคฐ, เคŸเฅเคฐเฅ‡เคตเคฐ เคกเฅ‡เคฐเฅ‡เคฒ, เคธเฅˆเคจเคฟเค‚เค— เคœเคผเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (เคธเคฟเค‚เค˜เฅเค† เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคธเฅ€เคชเฅ€เคเคฎ: เค เคฒเคพเคฐเฅเคœ-เคธเฅเค•เฅ‡เคฒ เคœเฅ‡เคจเฅ‡เคฐเฅ‡เคŸเคฟเคต เคšเคพเค‡เคจเฅ€เคœ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เค‚เคก เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ](https : //arxiv.org/abs/2012.00413) เคเฅ‡เค‚เค—เฅเคฏเคพเคจ เคเคพเค‚เค—, เคœเฅ‚ เคนเคพเคจ, เคนเคพเค“ เคเฅ‹เค‰, เคชเฅ‡เคˆ เค•เฅ‡, เคฏเฅเค•เฅเคธเคฟเคฏเคจ เค—เฅ, เคกเฅ‡เคฎเคฟเค‚เค— เคฏเฅ‡, เคฏเฅเคœเคฟเคฏเคพ เค•เคฟเคจ, เคฏเฅเคถเฅ‡เค‚เค— เคธเฅ, เคนเคพเค“เคเฅ‡ เคœเฅ€, เคœเคฟเคฏเคพเคจ เค—เฅเค†เคจ, เคซเฅˆเค‚เคšเคพเค“ เค•เฅเคฏเฅ‚เคˆ, เคœเคผเคฟเคฏเคพเค“เคเฅ€ เคตเคพเค‚เค—, เคฏเคพเคจเคพเคจ เคเฅ‡เค‚เค— เคฆเฅเคตเคพเคฐเคพ , เค—เฅเค“เคฏเคพเค‚เค— เคœเคผเฅ‡เค‚เค—, เคนเฅเค†เคจเค•เฅ€ เค•เคพเค“, เคถเฅ‡เค‚เค—เค•เฅ€ เคšเฅ‡เคจ, เคกเคพเค‡เค•เฅเคธเฅเค†เคจ เคฒเฅ€, เคœเคผเฅ‡เคจเคฌเฅ‹ เคธเคจ, เคœเคผเคฟเคฏเฅเค†เคจ เคฒเคฟเคฏเฅ‚, เคฎเคฟเคจเคฒเฅ€ เคนเฅเค†เค‚เค—, เคตเฅ‡เค‚เคŸเคพเค“ เคนเคพเคจ, เคœเฅ€ เคคเคพเค‚เค—, เคœเฅเค†เคจเคœเคผเฅ€ เคฒเฅ€, เคœเคผเคฟเคฏเคพเค“เคฏเคพเคจ เคเฅ‚, เคฎเคพเค“เคธเฅ‹เค‚เค— เคธเคจเฅค -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (เคธเฅ‡เคฒเฅเคธเคซเฅ‹เคฐเฅเคธ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [CTRL: เค เค•เค‚เคกเคฟเคถเคจเคฒ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ เคซเฅ‰เคฐ เค•เค‚เคŸเฅเคฐเฅ‹เคฒเฅ‡เคฌเคฒ เคœเฅ‡เคจเคฐเฅ‡เคถเคจ](https://arxiv.org/abs/1909.05858) เคจเฅ€เคคเฅ€เคถ เคถเคฟเคฐเฅ€เคท เค•เฅ‡เคธเค•เคฐ*, เคฌเฅเคฐเคพเคฏเคจ เคฎเฅˆเค•เค•เฅˆเคจ*, เคฒเคต เค†เคฐ. เคตเคพเคฐเฅเคทเฅเคฃเฅ‡เคฏ, เค•เฅˆเคฎเคฟเค‚เค— เคœเคฟเค“เค‚เค— เค”เคฐ เคฐเคฟเคšเคฐเฅเคก เคฆเฅเคตเคพเคฐเคพ เคธเฅ‹เคšเคฐ เคฆเฅเคตเคพเคฐเคพ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [CvT: เค‡เค‚เคŸเฅเคฐเฅ‹เคกเฅเคฏเฅ‚เคธเคฟเค‚เค— เค•เคจเคตเฅ‰เคฒเฅเคฏเฅ‚เคถเคจ เคŸเฅ‚ เคตเคฟเคœเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ](https://arxiv.org/ เคเคฌเฅเคธ/2103.15808) เคนเฅˆเคชเคฟเค‚เค— เคตเฅ‚, เคฌเคฟเคจ เคœเคฟเค“, เคจเฅ‹เคเคฒ เค•เฅ‹เคกเฅ‡เคฒเคพ, เคฎเฅ‡เค‚เค—เคšเฅ‡เคจ เคฒเคฟเคฏเฅ‚, เคœเคฟเคฏเคพเค‚เค— เคฆเคพเคˆ, เคฒเฅ‚ เคฏเฅเค†เคจ, เคฒเฅ‡เคˆ เคเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [Data2Vec: เคญเคพเคทเคฃ, เคฆเฅƒเคทเฅเคŸเคฟ เค”เคฐ เคญเคพเคทเคพ เคฎเฅ‡เค‚ เคธเฅเคต-เคชเคฐเฅเคฏเคตเฅ‡เค•เฅเคทเคฟเคค เคธเฅ€เค–เคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคเค• เคธเคพเคฎเคพเคจเฅเคฏ เคขเคพเค‚เคšเคพ] (https://arxiv.org/abs/2202.03555) เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเคพเคเคตเฅเคธเฅเค•เฅ€, เคตเฅ‡เคˆ-เคจเคฟเค‚เค— เคธเฅ‚, เค•เคฟเคฏเคพเคจเคŸเฅ‹เค‚เค— เคœเฅ‚, เค…เคฐเฅเคฃ เคฌเคพเคฌเฅ‚, เคœเคฟเคฏเคพเคคเคพเค“ เค—เฅ, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (Microsoft เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [DeBERta: เคกเคฟเค•เฅ‹เคกเคฟเค‚เค—-เคเคจเฅเคนเคพเค‚เคธเฅเคก BERT เคตเคฟเคฆ เคกเคฟเคธเฅ‡เค‚เคŸเฅˆเค‚เค—เคฒเฅเคก เค…เคŸเฅ‡เค‚เคถเคจ](https://arxiv. org/abs/2006.03654) เคชเฅ‡เค‚เค—เคšเฅ‡เค‚เค— เคนเฅ‡, เคœเคผเคฟเคฏเคพเค“เคกเฅ‹เค‚เค— เคฒเคฟเคฏเฅ‚, เคœเคฟเคฏเคพเคจเคซเฅ‡เค‚เค— เค—เคพเค“, เคตเฅ€เคœเคผเฅ‚ เคšเฅ‡เคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (Microsoft เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [DeBERTa: เคกเคฟเค•เฅ‹เคกเคฟเค‚เค—-เคเคจเฅเคนเคพเค‚เคธเฅเคก BERT เคตเคฟเคฅ เคกเคฟเคธเฅ‡เค‚เคจเฅเค—เคฒเฅเคก เค…เคŸเฅ‡เค‚เคถเคจ](https: //arxiv.org/abs/2006.03654) เคชเฅ‡เค‚เค—เคšเฅ‡เค‚เค— เคนเฅ‡, เคœเคผเคฟเคฏเคพเค“เคกเฅ‹เค‚เค— เคฒเคฟเคฏเฅ‚, เคœเคฟเคฏเคพเคจเคซเฅ‡เค‚เค— เค—เคพเค“, เคตเฅ€เคœเคผเฅ‚ เคšเฅ‡เคจ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (เคฌเคฐเฅเค•เคฒเฅ‡/เคซเฅ‡เคธเคฌเฅเค•/เค—เฅ‚เค—เคฒ เคธเฅ‡) เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ [เคกเคฟเคธเฅ€เคœเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ: เคฐเฅ€เคจเคซเฅ‹เคฐเฅเคธเคฎเฅ‡เค‚เคŸ เคฒเคฐเฅเคจเคฟเค‚เค— เคตเคพเคฏเคพ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ เคฎเฅ‰เคกเคฒเคฟเค‚เค—](https : //arxiv.org/abs/2106.01345) เคฒเคฟเคฒเฅ€ เคšเฅ‡เคจ, เค•เฅ‡เคตเคฟเคจ เคฒเฅ‚, เค…เคฐเคตเคฟเค‚เคฆ เคฐเคพเคœเฅ‡เคถเฅเคตเคฐเคจ, เค•เคฟเคฎเคฟเคจ เคฒเฅ€, เค†เคฆเคฟเคคเฅเคฏ เค—เฅเคฐเฅ‹เคตเคฐ, เคฎเคพเค‡เค•เคฒ เคฒเคพเคธเฅเค•เคฟเคจ, เคชเฅ€เคŸเคฐ เคเคฌเฅ€เคฒ, เค…เคฐเคตเคฟเค‚เคฆ เคถเฅเคฐเฅ€เคจเคฟเคตเคพเคธ, เค‡เค—เฅ‹เคฐ เคฎเฅ‹เคฐเฅเคกเคš เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (เคธเฅ‡เค‚เคธเคŸเคพเค‡เคฎ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคกเคฟเคซเฅ‰เคฐเฅเคฎเฅ‡เคฌเคฒ เคกเฅ€เคˆเคŸเฅ€เค†เคฐ: เคกเคฟเคซเฅ‰เคฐเฅเคฎเฅ‡เคฌเคฒ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เคเค‚เคก-เคŸเฅ‚-เคเค‚เคก เค‘เคฌเฅเคœเฅ‡เค•เฅเคŸ เคกเคฟเคŸเฅ‡เค•เฅเคถเคจ] (https://arxiv.org/abs/2010.04159) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, เคœเคฟเคซเฅ‡เค‚เค— เคฆเคพเคˆ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคกเฅ‡เคŸเคพ-เคเคซเคฟเคถเคฟเคเค‚เคŸ เค‡เคฎเฅ‡เคœ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค”เคฐ เคกเคฟเคธเฅเคŸเคฟเคฒเฅ‡เคถเคจ เคฅเฅเคฐเฅ‚ เค…เคŸเฅ‡เค‚เคถเคจ](https://arxiv .org/abs/2012.12877) เคนเฅเคฏเฅ‚เค—เฅ‹ เคŸเฅŒเคตเฅเคฐเฅ‹เคจ, เคฎเฅˆเคฅเฅเคฏเฅ‚ เค•เฅ‰เคฐเฅเคก, เคฎเฅˆเคฅเคฟเคœเฅเคธ เคกเฅ‚เคœเคผ, เคซเคผเฅเคฐเคพเค‚เคธเคฟเคธเฅเค•เฅ‹ เคฎเคธเฅเคธเคพ, เคเคฒเฅ‡เค•เฅเคœเคผเฅ‡เค‚เคกเคฐ เคธเคฌเคฒเฅ‡เคฐเฅ‹เคฒเฅเคธ, เคนเคฐเฅเคตเฅ‡ เคœเฅ‡เค—เฅŒ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (Google AI เคธเฅ‡) Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เค•เฅ‡ เคธเคพเคฅ เคเค‚เคก-เคŸเฅ‚-เคเค‚เคก เค‘เคฌเฅเคœเฅ‡เค•เฅเคŸ เคกเคฟเคŸเฅ‡เค•เฅเคถเคจ](https://arxiv. org/abs/2005.12872) เคจเคฟเค•เฅ‹เคฒเคธ เค•เฅˆเคฐเคฟเคฏเคจ, เคซเคผเฅเคฐเคพเค‚เคธเคฟเคธเฅเค•เฅ‹ เคฎเคธเฅเคธเคพ, เค—เฅ‡เคฌเฅเคฐเคฟเคฏเคฒ เคธเคฟเคจเฅ‡เคต, เคจเคฟเค•เฅ‹เคฒเคธ เค‰เคธเฅเคจเคฟเคฏเคฐ, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เค•เคฟเคฐเคฟเคฒเฅ‹เคต, เคธเคฐเฅเค—เฅ‡เคˆ เคœเคผเคพเค—เฅ‹เคฐเฅเคฏเค•เฅ‹ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [DialoGPT: เคฌเคกเคผเฅ‡ เคชเฅˆเคฎเคพเคจเฅ‡ เคชเคฐ เคœเคจเคฐเฅ‡เคŸเคฟเคต เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคซเฅ‰เคฐ เค•เคจเฅเคตเคฐเฅเคธเฅ‡เคถเคจเคฒ เคฐเคฟเคธเฅเคชเคพเค‚เคธ เคœเฅ‡เคจเคฐเฅ‡เคถเคจ](https ://arxiv.org/abs/1911.00536) เคฏเคฟเคœเคผเฅ‡ เคเคพเค‚เค—, เคธเคฟเค•เฅ€ เคธเคจ, เคฎเคฟเคถเฅ‡เคฒ เค—เฅˆเคฒเฅ€, เคฏเฅ‡เคจ-เคšเฅเคจ เคšเฅ‡เคจ, เค•เฅเคฐเคฟเคธ เคฌเฅเคฐเฅ‹เค•เฅ‡เคŸ, เคœเคฟเคฏเคพเค‚เค— เค—เคพเค“, เคœเคฟเคฏเคพเคจเคซเฅ‡เค‚เค— เค—เคพเค“, เคœเคฟเค‚เค—เคœเคฟเค‚เค— เคฒเคฟเคฏเฅ‚, เคฌเคฟเคฒ เคกเฅ‹เคฒเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (Meta AI เคธเฅ‡) Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (เคนเค—เคฟเค‚เค—เคซเฅ‡เคธ เคธเฅ‡), เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคกเคฟเคธเฅเคŸเคฟเคฒเคฌเคฐเฅเคŸ, เคฌเฅ€เคˆเค†เคฐเคŸเฅ€ เค•เคพ เคกเคฟเคธเฅเคŸเคฟเคฒเฅเคก เคตเคฐเฅเคœเคจ: เค›เฅ‹เคŸเคพ, เคคเฅ‡เคœ, เคธเคธเฅเคคเคพ เค”เคฐ เคนเคฒเฅเค•เคพ] (https://arxiv.org/abs/1910.01108) เคตเคฟเค•เฅเคŸเคฐ เคธเคจเคน, เคฒเคฟเคธเคพเค‚เคกเฅเคฐเฅ‡ เคกเฅ‡เคฌเฅเคฏเฅ‚ เค”เคฐ เคฅเฅ‰เคฎเคธ เคตเฅเคฒเฅเคซ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค เคฏเคนเฅ€ เคคเคฐเฅ€เค•เคพ GPT-2 เค•เฅ‹ [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERta เคธเฅ‡ [DistilRoBERta](https://github.com) เคชเคฐ เค•เค‚เคชเฅเคฐเฅ‡เคธ เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคญเฅ€ เคฒเคพเค—เฅ‚ เค•เคฟเคฏเคพ เคœเคพเคคเคพ เคนเฅˆเฅค / เคนเค—เคฟเค‚เค—เคซเฅ‡เคธ/เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ/เคŸเฅเคฐเฅ€/เคฎเฅ‡เคจ/เค‰เคฆเคพเคนเคฐเคฃ/เคกเคฟเคธเฅเคŸเคฟเคฒเฅ‡เคถเคจ), เคฌเคนเฅเคญเคพเคทเฅ€ BERT เคธเฅ‡ [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) เค”เคฐ เคกเคฟเคธเฅเคŸเคฟเคฒเคฌเคฐเฅเคŸ เค•เคพ เคœเคฐเฅเคฎเคจ เคธเค‚เคธเฅเค•เคฐเคฃเฅค -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [DiT: เคธเฅ‡เคฒเฅเคซ เคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคซเฅ‰เคฐ เคกเฅ‰เค•เฅเคฏเฅ‚เคฎเฅ‡เค‚เคŸ เค‡เคฎเฅ‡เคœ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ](https://arxiv.org/abs/2203.02378) เคœเฅเคจเคฒเฅ‰เคจเฅเค— เคฒเฅ€, เคฏเคฟเคนเฅ‡เค‚เค— เคœเฅ‚, เคŸเฅ‡เค‚เค—เคšเคพเค“ เคฒเคต, เคฒเฅ‡เคˆ เค•เฅเคˆ, เคšเคพ เคเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคซเฅเคฐเฅ เคตเฅ‡เคˆ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (NAVER เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [OCR-เคฎเฅเค•เฅเคค เคกเฅ‰เค•เฅเคฏเฅ‚เคฎเฅ‡เค‚เคŸ เค…เค‚เคกเคฐเคธเฅเคŸเฅˆเค‚เคกเคฟเค‚เค— เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ](https://arxiv.org/abs /2111.15664) เค—เฅ€เคตเฅ‚เค• เค•เคฟเคฎ, เคŸเฅ€เค•เค—เฅเคฏเฅ‚ เคนเฅ‹เค‚เค—, เคฎเฅ‚เคจเคฌเคฟเคจ เคฏเคฟเคฎ, เคœเคฟเคฏเฅ‹เค‚เค—เฅเคฏเฅ‹เคจ เคจเคพเคฎ, เคœเคฟเคจเคฏเฅ‰เคจเฅเค— เคชเคพเคฐเฅเค•, เคœเคฟเคจเคฏเฅ‰เคจเฅเค— เคฏเคฟเคฎ, เคตเฅ‹เคจเคธเฅ‡เค“เค• เคนเฅเคตเคพเค‚เค—, เคธเคพเค‚เค—เคกเฅ‚ เคฏเฅ‚เค‚, เคกเฅ‹เค‚เค—เคฏเฅ‚เคจ เคนเคพเคจ, เคธเฅ‡เค‰เค‚เค—เฅเคฏเฅเคจ เคชเคพเคฐเฅเค• เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เค“เคชเคจ-เคกเฅ‹เคฎเฅ‡เคจ เค•เฅเคตเฅ‡เคถเฅเคšเคจ เค†เค‚เคธเคฐเคฟเค‚เค— เค•เฅ‡ เคฒเคฟเค เคกเฅ‡เค‚เคธ เคชเฅˆเคธเฅ‡เคœ เคฐเคฟเคŸเฅเคฐเฅ€เคตเคฒ](https://arxiv. org/abs/2004.04906) เคตเฅเคฒเคพเคฆเคฟเคฎเฅ€เคฐ เค•เคฐเคชเฅเค–เคฟเคจ, เคฌเคฐเคฒเคพเคธ เค“เคœเคผเฅเคœเคผ, เคธเฅ‡เคตเคจ เคฎเคฟเคจ, เคชเฅˆเคŸเฅเคฐเคฟเค• เคฒเฅเคˆเคธ, เคฒเฅ‡เคกเฅ‡เคฒ เคตเฅ‚, เคธเคฐเฅเค—เฅ‡เคˆ เคเคกเฅเคจเฅ‹เคต, เคกเฅˆเคจเค•เฅ€ เคšเฅ‡เคจ, เค”เคฐ เคตเฅ‡เคจ-เคคเคพเคŠ เคฏเคฟเคน เคฆเฅเคตเคพเคฐเคพเฅค -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (เค‡เค‚เคŸเฅ‡เคฒ เคฒเฅˆเคฌเฅเคธ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคตเคฟเคœเคผเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เคกเฅ‡เค‚เคธ เคชเฅเคฐเฅ‡เคกเคฟเค•เฅเคถเคจ](https://arxiv.org /abs/2103.13413) เคฐเฅ‡เคจเฅ‡ เคฐเฅˆเคจเคซเฅเคŸเคฒ, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเฅ‹เคšเค•เฅ‹เคตเคธเฅเค•เฅ€, เคตเฅเคฒเคพเคฆเคฒเฅ‡เคจ เค•เฅ‹เคฒเฅเคŸเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google เคฐเคฟเคธเคฐเฅเคš/เคธเฅเคŸเฅˆเคจเคซเฅ‹เคฐเฅเคก เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [เค‡เคฒเฅ‡เค•เฅเคŸเฅเคฐเคพ: เคœเฅ‡เคจเคฐเฅ‡เคŸเคฐ เค•เฅ‡ เคฌเคœเคพเคฏ เคญเฅ‡เคฆเคญเคพเคต เค•เคฐเคจเฅ‡ เคตเคพเคฒเฅ‡ เค•เฅ‡ เคฐเฅ‚เคช เคฎเฅ‡เค‚ เคŸเฅ‡เค•เฅเคธเฅเคŸ เคเคจเฅเค•เฅ‹เคกเคฐเฅเคธ เค•เคพ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ] (https://arxiv.org/abs/2003.10555) เค•เฅ‡เคตเคฟเคจ เค•เฅเคฒเคพเคฐเฅเค•, เคฎเคฟเคจเฅเคน-เคฅเคพเค‚เค— เคฒเฅเค“เค‚เค—, เค•เฅเคตเฅ‹เค• เคตเฅ€. เคฒเฅ‡, เค•เฅเคฐเคฟเคธเฅเคŸเฅ‹เคซเคฐ เคกเฅ€. เคฎเฅˆเคจเคฟเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (Meta AI เคธเฅ‡) Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ เคœเฅ‡เคจเคฐเฅ‡เคถเคจ เคŸเคพเคธเฅเค• เค•เฅ‡ เคฒเคฟเค เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เค‚เคก เคšเฅ‡เค•เคชเฅ‰เค‡เค‚เคŸ เค•เคพ เค‡เคธเฅเคคเฅ‡เคฎเคพเคฒ เค•เคฐเคจเคพ](https:/ /arxiv.org/abs/1907.12461) เคธเคพเคถเคพ เคฐเฅ‹เค เฅ‡, เคถเคถเคฟ เคจเคพเคฐเคพเคฏเคฃ, เค…เคฒเคฟเคฏเคพเค•เฅเคธเคฟ เคธเฅ‡เคตเฅ‡เคฐเคฟเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)**(Baidu เคธเฅ‡) เคธเคพเคฅ เคฆเฅ‡เคจเฅ‡ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [ERNIE: เคเคจเฅเคนเคพเค‚เคธเฅเคก เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฅเฅเคฐเฅ‚ เคจเฅ‰เคฒเฅ‡เคœ เค‡เค‚เคŸเฅ€เค—เฅเคฐเฅ‡เคถเคจ](https://arxiv.org/abs/1904.09223) เคฏเฅ‚ เคธเคจ, เคถเฅเค“เคนเฅเค†เคจ เคตเคพเค‚เค—, เคฏเฅเค•เฅเคจ เคฒเฅ€, เคถเคฟเค•เฅเคจ เคซเฅ‡เค‚เค—, เคœเคผเฅเคˆ เคšเฅ‡เคจ, เคนเคพเคจ เคเคพเค‚เค—, เคถเคฟเคจ เคคเคฟเคฏเคพเคจ, เคกเฅˆเคจเค•เฅเคธเคฟเคฏเคพเค‚เค— เคเฅ‚, เคนเคพเค“ เคคเคฟเคฏเคพเคจ, เคนเฅเค† เคตเฅ‚ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (Baidu เคธเฅ‡) Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (เคฎเฅ‡เคŸเคพ AI เคธเฅ‡) เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคชเฅเคฐเฅ‹เคŸเฅ€เคจ เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ เคนเฅˆเค‚เฅค **ESM-1b** เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ เคฅเคพ [ เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เคฐเคพเค‡เคตเฅเคธ, เคœเฅ‹เคถเฅเค† เคฎเฅ‡เคฏเคฐ, เคŸเฅ‰เคฎ เคธเคฐเฅเค•เฅ, เคธเคฟเคฆเฅเคงเคพเคฐเฅเคฅ เค—เฅ‹เคฏเคฒ, เคœเคผเฅ‡เคฎเคฟเค‚เค— เคฒเคฟเคจ เคฆเฅเคตเคพเคฐเคพ เคœเฅˆเคตเคฟเค• เคธเค‚เคฐเคšเคจเคพ เค”เคฐ เค•เคพเคฐเฅเคฏ เค…เคธเฅเคฐเค•เฅเคทเคฟเคค เคธเฅ€เค–เคจเฅ‡ เค•เฅ‹ 250 เคฎเคฟเคฒเคฟเคฏเคจ เคชเฅเคฐเฅ‹เคŸเฅ€เคจ เค…เคจเฅเค•เฅเคฐเคฎเฅ‹เค‚ เคคเค• เคธเฅเค•เฅ‡เคฒ เค•เคฐเคจเฅ‡ เคธเฅ‡ เค‰เคญเคฐเคคเคพ เคนเฅˆ] (https://www.pnas.org/content/118/15/e2016239118) เคœเฅ‡เคธเคจ เคฒเคฟเคฏเฅ‚, เคกเฅ‡เคฎเฅ€ เค—เฅเค“, เคฎเคพเคฏเคฒ เค“เคŸ, เคธเฅ€. เคฒเฅ‰เคฐเฅ‡เค‚เคธ เคœเคผเคฟเคŸเคจเคฟเค•, เคœเฅ‡เคฐเฅ€ เคฎเคพ เค”เคฐ เคฐเฅ‰เคฌ เคซเคฐเฅเค—เคธเฅค **ESM-1v** เค•เฅ‹ เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ เคฅเคพ [เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ เคชเฅเคฐเฅ‹เคŸเฅ€เคจ เคซเคผเค‚เค•เฅเคถเคจ เคชเคฐ เค‰เคคเฅเคชเคฐเคฟเคตเคฐเฅเคคเคจ เค•เฅ‡ เคชเฅเคฐเคญเคพเคตเฅ‹เค‚ เค•เฅ€ เคถเฅ‚เคจเฅเคฏ-เคถเฅ‰เคŸ เคญเคตเคฟเคทเฅเคฏเคตเคพเคฃเฅ€ เค•เฅ‹ เคธเค•เฅเคทเคฎ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚] (https://doi.org/10.1101/2021.07.09.450648) เคœเฅ‹เคถเฅเค† เคฎเฅ‡เคฏเคฐ, เคฐเฅ‹เคถเคจ เคฐเคพเคต, เคฐเฅ‰เคฌเคฐเฅเคŸ เคตเฅ‡เคฐเค•เฅเค‡เคฒ, เคœเฅ‡เคธเคจ เคฒเคฟเคฏเฅ‚, เคŸเฅ‰เคฎ เคธเคฐเฅเค•เฅ เค”เคฐ เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เคฐเคพเค‡เคตเฅเคธ เคฆเฅเคตเคพเคฐเคพเฅค **ESM-2** เค•เฅ‹ เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ เคฅเคพ [เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ เคตเคฟเค•เคพเคธ เค•เฅ‡ เคชเฅˆเคฎเคพเคจเฅ‡ เคชเคฐ เคชเฅเคฐเฅ‹เคŸเฅ€เคจ เค…เคจเฅเค•เฅเคฐเคฎ เคธเคŸเฅ€เค• เคธเค‚เคฐเคšเคจเคพ เคญเคตเคฟเคทเฅเคฏเคตเคพเคฃเฅ€ เค•เฅ‹ เคธเค•เฅเคทเคฎ เค•เคฐเคคเฅ‡ เคนเฅˆเค‚](https://doi.org/10.1101/2022.07.20.500902) เคœเคผเฅ‡เคฎเคฟเค‚เค— เคฒเคฟเคจ, เคนเคฒเฅ€เคฒ เค…เค•เคฟเคจ, เคฐเฅ‹เคถเคจ เคฐเคพเคต, เคฌเฅเคฐเคพเคฏเคจ เคนเฅ€, เคเฅ‹เค‚เค—เค•เคพเคˆ เคเฅ‚, เคตเฅ‡เค‚เคŸเคฟเค‚เค— เคฒเฅ‚, เค เคฆเฅเคตเคพเคฐเคพ เคฒเคพเคจ เคกเฅ‰เคธ เคธเฅˆเค‚เคŸเฅ‹เคธ เค•เฅ‹เคธเฅเคŸเคพ, เคฎเคฐเคฟเคฏเคฎ เคซเคผเคœเคผเคฒ-เคœเคผเคฐเค‚เคกเฅ€, เคŸเฅ‰เคฎ เคธเคฐเฅเค•เฅ‚, เคธเคพเคฒ เค•เฅˆเค‚เคกเคฟเคกเฅ‹, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เคฐเคพเค‡เคตเฅเคธเฅค -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (CNRS เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [FlauBERT: Unsupervised Language Model Pre-training for เคซเคผเฅเคฐเฅ‡เค‚เคš](https://arxiv .org/abs/1912.05372) Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, เคฌเฅ‡เค‚เคœเคพเคฎเคฟเคจ เคฒเฅ‡เค•เฅ‹เค‰เคŸเฅ‡เค•เฅเคธ, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเฅเคฐเฅ‡ เค…เคฒเฅเคฒเคพเค‰เคœเคผเฅ‡เคจ, เคฌเฅ‡เคจเฅ‹เค‡เคŸ เค•เฅเคฐเฅˆเคฌเฅ‡, เคฒเฅ‰เคฐเฅ‡เค‚เคŸ เคฌเฅ‡เคธเฅ‡เคธเคฟเคฏเคฐ, เคกเคฟเคกเคฟเคเคฐ เคถเฅเคตเคพเคฌ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (FLAVA: A เคซเคพเค‰เค‚เคกเฅ‡เคถเคจเคฒ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคเค‚เคก เคตเคฟเคœเคจ เค…เคฒเคพเค‡เคจเคฎเฅ‡เค‚เคŸ เคฎเฅ‰เคกเคฒ) (https://arxiv) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ .org/abs/2112.04482) เค…เคฎเคจเคชเฅเคฐเฅ€เคค เคธเคฟเค‚เคน, เคฐเฅ‹เค‚เค—เคนเคพเค‚เค— เคนเฅ‚, เคตเฅ‡เคฆเคพเคจเฅเคœ เค—เฅ‹เคธเฅเคตเคพเคฎเฅ€, เค—เฅเค‡เคฒเฅเคฏเฅ‚เคฎ เค•เฅเคเคฐเฅ‰เคจ, เคตเฅ‹เคœเฅเคถเคฟเคเค• เค—เคพเคฒเฅเคฌเคพ, เคฎเคพเคฐเฅเค•เคธ เคฐเฅ‹เคนเคฐเคฌเฅˆเค•, เค”เคฐ เคกเฅŒเคตเฅ‡ เค•เฅ€เคฒเคพ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (เค—เฅ‚เค—เคฒ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [FNet: เคฎเคฟเค•เฅเคธเคฟเค‚เค— เคŸเฅ‹เค•เคจ เคตเคฟเคฆ เคซเฅ‚เคฐเคฟเคฏเคฐ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเฅเคธ](https://arxiv.org /abs/2105.03824) เคœเฅ‡เคฎเฅเคธ เคฒเฅ€-เคฅเฅ‰เคฐเฅเคช, เคœเฅ‹เคถเฅเค† เค†เค‡เค‚เคธเฅเคฒเฅ€, เค‡เคฒเฅเคฏเคพ เคเค•เคธเฅเคŸเฅ€เคจ, เคธเฅˆเค‚เคŸเคฟเคฏเคพเค—เฅ‹ เค“เค‚เคŸเคพเคจเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (Microsoft Research เคธเฅ‡) Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (เคธเฅ€เคเคฎเคฏเฅ‚/เค—เฅ‚เค—เคฒ เคฌเฅเคฐเฅ‡เคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคซเคผเคจเคฒ-เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ: เค•เฅเคถเคฒ เคญเคพเคทเคพ เคชเฅเคฐเคธเค‚เคธเฅเค•เคฐเคฃ เค•เฅ‡ เคฒเคฟเค เค…เคจเฅเค•เฅเคฐเคฎเคฟเค• เค…เคคเคฟเคฐเฅ‡เค• เค•เฅ‹ เค›เคพเคจเคจเคพ](https://arxiv.org/abs/2006.03236) เคœเคฟเคนเคพเค‚เค— เคฆเคพเคˆ, เค—เฅเค“เค•เฅเคจ เคฒเคพเคˆ, เคฏเคฟเคฎเคฟเค‚เค— เคฏเคพเค‚เค—, เค•เฅเคตเฅ‹เค• เคตเฅ€. เคฒเฅ‡ โ€‹โ€‹เคฆเฅเคตเคพเคฐเคพ เคฐเคฟเคนเคพเคˆเฅค -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (ADEPT เคธเฅ‡) เคฐเฅ‹เคนเคจ เคฌเคพเคตเคฟเคถเฅ€, เคเคฐเคฟเคš เคเคฒเคธเฅ‡เคจ, เค•เคฐเฅเคŸเคฟเคธ เคนเฅ‰เคฅเฅ‹เคฐเฅเคจ, เคฎเฅˆเค•เฅเคธเคตเฅ‡เคฒ เคจเฅ€, เค‘เค—เคธเฅเคŸเคธ เค“เคกเฅ‡เคจเคพ, เค…เคฐเฅเคถเฅ€ เคธเฅ‹เคฎเคพเคจเฅ€, เคธเคพเค—เคจเคพเค• เคคเคพเคธเคฟเคฐเคฒเคพเคฐ [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (KAIST เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคตเคฐเฅเคŸเคฟเค•เคฒ เค•เคŸเคกเฅ‡เคชเฅเคฅ เค•เฅ‡ เคธเคพเคฅ เคฎเฅ‹เคจเฅ‹เค•เฅเคฒเคฐ เคกเฅ‡เคชเฅเคฅ เคเคธเฅเคŸเฅ€เคฎเฅ‡เคถเคจ เค•เฅ‡ เคฒเคฟเค เค—เฅเคฒเฅ‹เคฌเคฒ-เคฒเฅ‹เค•เคฒ เคชเคพเคฅ เคจเฅ‡เคŸเคตเคฐเฅเค•เฅเคธ](https:/ /arxiv.org/abs/2201.07436) เคกเฅ‹เคฏเฅ‹เคจ เค•เคฟเคฎ, เคตเฅ‚เค‚เค—เคนเฅเคฏเฅเคจ เค—เคพ, เคชเฅเคฏเฅเค‚เค—เคตเคพเคจ เค†เคน, เคกเฅ‹เค‚เค—เค—เฅเคฏเฅ‚ เคœเฅ‚, เคธเฅ‡เคนเคตเคพเคจ เคšเฅเคจ, เคœเฅเคจเคฎเฅ‹ เค•เคฟเคฎ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (OpenAI เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [เคœเฅ‡เคจเคฐเฅ‡เคŸเคฟเคต เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคญเคพเคทเคพ เค•เฅ€ เคธเคฎเค เคฎเฅ‡เค‚ เคธเฅเคงเคพเคฐ](https://blog .openai.com/language-unsupervised/) เคเคฒเฅ‡เค• เคฐเฅˆเคกเคซเฅ‹เคฐเฅเคก, เค•เคพเคฐเฅเคคเคฟเค• เคจเคฐเคธเคฟเคฎเฅเคนเคจ, เคŸเคฟเคฎ เคธเคพเคฒเคฟเคฎเคจเฅเคธ เค”เคฐ เค‡เคฒเฅเคฏเคพ เคธเฅเคคเฅเคธเฅเค•เฅ‡เคตเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (EleutherAI เคธเฅ‡) เคฐเคฟเคชเฅ‰เคœเคฟเคŸเคฐเฅ€ เค•เฅ‡ เคธเคพเคฅ [EleutherAI/gpt-neo](https://github.com/ EleutherAI /gpt-neo) เคฐเคฟเคฒเฅ€เคœเฅค เคธเคฟเคก เคฌเฅเคฒเฅˆเค•, เคธเฅเคŸเฅ‡เคฒเคพ เคฌเคฟเคกเคฐเคฎเฅˆเคจ, เคฒเคฟเคฏเฅ‹ เค—เคพเค“, เคซเคฟเคฒ เคตเคพเค‚เค— เค”เคฐ เค•เฅ‰เคจเคฐ เคฒเฅ‡เคนเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (EleutherAI เคธเฅ‡) เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ [GPT-NeoX-20B: เคเค• เค“เคชเคจ-เคธเฅ‹เคฐเฅเคธ เค‘เคŸเฅ‹เคฐเฅ‡เค—เฅเคฐเฅ‡เคธเคฟเคต เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ] (https://arxiv.org/abs/2204.06745) เคธเคฟเคก เคฌเฅเคฒเฅˆเค•, เคธเฅเคŸเฅ‡เคฒเคพ เคฌเคฟเคกเคฐเคฎเฅˆเคจ, เคเคฐเคฟเค• เคนเฅˆเคฒเคพเคนเคจ, เค•เฅเคตเฅ‡เค‚เคŸเคฟเคจ เคเค‚เคฅเฅ‹เคจเฅ€, เคฒเคฟเคฏเฅ‹ เค—เคพเค“, เคฒเฅ‰เคฐเฅ‡เค‚เคธ เค—เฅ‹เคฒเฅเคกเคฟเค‚เค—, เคนเฅ‹เคฐเฅ‡เคธ เคนเฅ‡, เค•เฅ‰เคจเคฐ เคฒเฅ‡เคนเฅ€, เค•เคพเค‡เคฒ เคฎเฅˆเค•เคกเฅ‹เคจเฅ‡เคฒ, เคœเฅ‡เคธเคจ เคซเคพเค‚เค—, เคฎเคพเค‡เค•เคฒ เคชเคพเค‡เคฒเคฐ, เคฏเฅ‚เคเคธเคตเฅ€เคเคธเคเคจ เคธเคพเคˆ เคชเฅเคฐเคถเคพเค‚เคค เคฆเฅเคตเคพเคฐเคพ , เคถเคฟเคตเคพเค‚เคถเฅ เคชเฅเคฐเฅ‹เคนเคฟเคค, เคฒเคพเคฐเคฟเคฏเคพ เคฐเฅ‡เคจเฅ‰เคฒเฅเคกเฅเคธ, เคœเฅ‹เคจเคพเคฅเคจ เคŸเฅ‹, เคฌเฅ‡เคจ เคตเคพเค‚เค—, เคธเฅˆเคฎเฅเค…เคฒ เคตเฅ‡เคจเคฌเฅˆเค• -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (เค…เคฌเฅ‡เคœเคพ เค•เฅ‡ เคœเคฐเคฟเค) เคถเคฟเคจเฅเคฏเคพ เค“เคŸเคพเคจเฅ€, เคคเคพเค•เคพเคฏเฅ‹เคถเฅ€ เคฎเค•เคพเคฌเฅ‡, เค…เคจเฅเคœ เค…เคฐเฅ‹เคกเคผเคพ, เค•เฅเคฏเฅ‹ เคนเคŸเฅ‹เคฐเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (เค“เคชเคจเคเค†เคˆ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒเฅเคธ เค…เคจเคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคฎเคฒเฅเคŸเฅ€เคŸเคพเคธเฅเค• เคฒเคฐเฅเคจเคฐเฅเคธ เคนเฅˆเค‚](https://blog.openai.com/better-language-models/) เคเคฒเฅ‡เค• เคฐเฅˆเคกเคซเฅ‹เคฐเฅเคก*, เคœเฅ‡เคซเคฐเฅ€ เคตเฅ‚*, เคฐเฅ‡เคตเคจ เคšเคพเค‡เคฒเฅเคก, เคกเฅ‡เคตเคฟเคก เคฒเฅเค†เคจ, เคกเคพเคฐเคฟเคฏเฅ‹ เคเคฎเฅ‹เคกเฅ€* เคฆเฅเคตเคพเคฐเคพ * เค”เคฐ เค‡เคฒเฅเคฏเคพ เคธเฅเคคเฅเคธเค•เฅ‡เคตเคฐ** เคจเฅ‡ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพเฅค -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (EleutherAI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [kingoflolz/mesh-transformer-jax](https://github. com/kingoflolz/mesh-transformer-jax/) เคฌเฅ‡เคจ เคตเคพเค‚เค— เค”เคฐ เค…เคฐเคจ เค•เฅ‹เคฎเคพเคคเฅเคธเฅเคœเคพเค•เฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (BigCode เคธเฅ‡) Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (UCSD, NVIDIA เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [GroupViT: เคŸเฅ‡เค•เฅเคธเฅเคŸ เคธเฅเคชเคฐเคตเคฟเคœเคจ เคธเฅ‡ เคธเคฟเคฎเฅ‡เค‚เคŸเคฟเค• เคธเฅ‡เค—เคฎเฅ‡เค‚เคŸเฅ‡เคถเคจ เค‡เคฎเคฐเฅเคœเฅ‡เคธ](https://arxiv .org/abs/2202.11094) เคœเคฟเคฏเคพเคฐเฅเคˆ เคœเฅ‚, เคถเคพเคฒเคฟเคจเฅ€ เคกเฅ€ เคฎเฅ‡เคฒเฅ‹, เคธเคฟเคซเคผเฅ€ เคฒเคฟเคฏเฅ‚, เคตเฅ‹เคจเคฎเคฟเคจ เคฌเคพเคฏเคจ, เคฅเฅ‰เคฎเคธ เคฌเฅเคฐเฅ‡เค‰เคเคฒ, เคœเคพเคจ เค•เฅŒเคŸเฅเคœเคผ, เคœเคผเคฟเคฏเคพเค“เคฒเฅ‹เค‚เค— เคตเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (Allegro.pl, AGH University of Science and Technology เคธเฅ‡) Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคนเฅเคฏเฅ‚เคฌเคฐเฅเคŸ: เคธเฅ‡เคฒเฅเคซ เคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคธเฅเคชเฅ€เคš เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคฌเคพเคฏ เคฎเคพเคธเฅเค•เฅเคก เคชเฅเคฐเฅ‡เคกเคฟเค•เฅเคถเคจ เค‘เคซ เคนเคฟเคกเคจ เคฏเฅ‚เคจเคฟเคŸเฅเคธ](https ://arxiv.org/abs/2106.07447) เคตเฅ‡เคˆ-เคจเคฟเค‚เค— เคธเฅ‚, เคฌเฅ‡เค‚เคœเคพเคฎเคฟเคจ เคฌเฅ‹เคฒเฅเคŸเฅ‡, เคฏเคพเค“-เคนเค‚เค— เคนเฅเคฏเฅ‚เคฌเคฐเฅเคŸ เคคเฅเคธเคพเคˆ, เค•เฅเคถเคพเคฒ เคฒเค–เฅ‹เคŸเคฟเคฏเคพ, เคฐเฅเคธเฅเคฒเคพเคจ เคธเคพเคฒเคพเค–เฅเคคเคฆเฅ€เคจเฅ‹เคต, เค…เคฌเฅเคฆเฅ‡เคฒเคฐเคนเคฎเคพเคจ เคฎเฅ‹เคนเคฎเฅเคฎเคฆ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (เคฌเคฐเฅเค•เคฒเฅ‡ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [I-BERT: Integer-only BERT Quantization](https:// arxiv.org/abs/2101.01321) เคธเฅ‡เคนเฅ‚เคจ เค•เคฟเคฎ, เค…เคฎเฅ€เคฐ เค˜เฅ‹เคฒเคฎเฅ€, เคœเคผเฅ‡เคตเฅ‡เคˆ เคฏเคพเค“, เคฎเคพเค‡เค•เคฒ เคกเคฌเฅเคฒเฅเคฏเฅ‚ เคฎเคนเฅ‹เคจเฅ€, เค•เคฐเฅเคŸ เค•เฅ‡เคŸเคœเคผเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce เคธเฅ‡) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคเคถเคฟเคฏเคพ เคธเฅ‡) เคธเคพเคฅ เคฆเฅ‡เคจเฅ‡ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคฒเฅ‡เค†เค‰เคŸเคเคฒเคเคฎเคตเฅ€3: เคฏเฅ‚เคจเคฟเคซเคพเค‡เคก เคŸเฅ‡เค•เฅเคธเฅเคŸ เค”เคฐ เค‡เคฎเฅ‡เคœ เคฎเคพเคธเฅเค•เคฟเค‚เค— เค•เฅ‡ เคธเคพเคฅ เคฆเคธเฅเคคเคพเคตเฅ‡เคœเคผ เคเค†เคˆ เค•เฅ‡ เคฒเคฟเค เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ](https://arxiv.org/abs/2204.08387) เคฏเฅเคชเคจ เคนเฅเค†เค‚เค—, เคŸเฅ‡เค‚เค—เคšเคพเค“ เคฒเคต, เคฒเฅ‡เคˆ เค•เฅเคˆ, เคฏเฅเคŸเฅ‹เค‚เค— เคฒเฅ‚, เคซเฅเคฐเฅ เคตเฅ‡เคˆ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (เคฎเฅ‡เคŸเคพ AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https:/ /arxiv.org/abs/2104.01136) เคฌเฅ‡เคจ เค—เฅเคฐเคพเคนเคฎ, เค…เคฒเคพเคเคฒเฅเคกเคฟเคจ เคเคฒ-เคจเฅŒเคฌเฅ€, เคนเฅเคฏเฅ‚เค—เฅ‹ เคŸเฅŒเคตเคฐเคจ, เคชเคฟเคฏเคฐเฅ‡ เคธเฅเคŸเฅ‰เค•, เค†เคฐเฅเคฎเค‚เคก เคœเฅŒเคฒเคฟเคจ, เคนเคฐเฅเคตเฅ‡ เคœเฅ‡เค—เฅŒ, เคฎเฅˆเคฅเคฟเคœ เคกเฅ‚เคœเคผ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (เคฆเค•เฅเคทเคฟเคฃ เคšเฅ€เคจ เคชเฅเคฐเฅŒเคฆเฅเคฏเฅ‹เค—เคฟเค•เฅ€ เคตเคฟเคถเฅเคตเคตเคฟเคฆเฅเคฏเคพเคฒเคฏ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [LiLT: เคเค• เคธเคฐเคฒ เคฒเฅ‡เค•เคฟเคจ เคชเฅเคฐเคญเคพเคตเฅ€ เคญเคพเคทเคพ-เคธเฅเคตเคคเค‚เคคเฅเคฐ เคฒเฅ‡เค†เค‰เคŸ เคŸเฅเคฐเคพเค‚เคธเคซเคพเคฐเฅเคฎเคฐ เคธเค‚เคฐเคšเคฟเคค เคฆเคธเฅเคคเคพเคตเฅ‡เคœเคผ เคธเคฎเค เค•เฅ‡ เคฒเคฟเค](https://arxiv.org/abs/2202.13669) เคœเคฟเคฏเคพเคชเฅ‡เค‚เค— เคตเคพเค‚เค—, เคฒเคฟเคฏเคพเคจเคตเฅ‡เคจ เคœเคฟเคจ, เค•เคพเคˆ เคกเคฟเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (The FAIR team of Meta AI เคธเฅ‡) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (The FAIR team of Meta AI เคธเฅ‡) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (Microsoft Research & University of Wisconsin-Madison เคธเฅ‡) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (เคฎเฅˆเค‚เคกเฅ€ เค—เฅเค“, เคœเฅ‹เคถเฅเค† เค†เค‡เค‚เคธเฅเคฒเฅ€, เคกเฅ‡เคตเคฟเคก เคฏเฅ‚เคฅเคธ, เคธเฅˆเค‚เคŸเคฟเคฏเคพเค—เฅ‹ เค“เค‚เคŸเคพเคจเคจ, เคœเคฟเคฏเคพเคจเคฎเฅ‹ เคจเคฟ, เคฏเฅ‚เค‚-เคนเฅเค†เคจ เคธเฅเค‚เค—, เคฏเคฟเคจเคซเฅ‡เคˆ เคฏเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (เคธเฅเคŸเฅ‚เคกเคฟเคฏเฅ‹ เค”เคธเคฟเคฏเคพ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [LUKE: เคกเฅ€เคช เค•เฅ‰เคจเฅเคŸเฅ‡เค•เฅเคธเฅเคŸเฅเค…เคฒเคพเค‡เคœเฅเคก เคเค‚เคŸเคฟเคŸเฅ€ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคตเคฟเคฆ เคเค‚เคŸเคฟเคŸเฅ€-เค…เคตเฅ‡เคฏเคฐ เคธเฅ‡เคฒเฅเคซ-เค…เคŸเฅ‡เค‚เคถเคจ](https ://arxiv.org/abs/2010.01057) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto เคฆเฅเคตเคพเคฐเคพเฅค -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (UNC เคšเฅˆเคชเคฒ เคนเคฟเคฒ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [LXMERT: เค“เคชเคจ-เคกเฅ‹เคฎเฅ‡เคจ เค•เฅเคตเฅ‡เคถเฅเคšเคจ เค•เฅ‡ เคฒเคฟเค เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคธเฅ‡ เค•เฅเคฐเฅ‰เคธ-เคฎเฅ‹เคกเคฒเคฟเคŸเฅ€ เคเคจเค•เฅ‹เคกเคฐ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคธเฅ€เค–เคจเคพ Answering](https://arxiv.org/abs/1908.07490) เคนเคพเค“ เคŸเฅˆเคจ เค”เคฐ เคฎเฅ‹เคนเคฟเคค เคฌเค‚เคธเคฒ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฆเฅ‡เคจเฅ‡ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคฌเคฟเคฏเฅ‰เคจเฅเคก เค‡เค‚เค—เฅเคฒเคฟเคถ-เคธเฅ‡เค‚เคŸเฅเคฐเคฟเค• เคฎเคฒเฅเคŸเฅ€เคฒเคฟเค‚เค—เฅเค…เคฒ เคฎเคถเฅ€เคจ เคŸเฅเคฐเคพเค‚เคธเคฒเฅ‡เคถเคจ](https://arxiv.org/ เคเคฌเฅเคธ/2010.11125) เคเค‚เคœเฅ‡เคฒเคพ เคซเฅˆเคจ, เคถเฅเคฐเฅเคคเคฟ เคญเฅ‹เคธเคฒเฅ‡, เคนเฅ‹เคฒเฅเค—เคฐ เคถเฅเคตเฅ‡เคจเฅเค•, เคเฅ€ เคฎเคพ, เค…เคนเคฎเคฆ เค…เคฒ-เค•เคฟเคถเฅเค•เฅ€, เคธเคฟเคฆเฅเคงเคพเคฐเฅเคฅ เค—เฅ‹เคฏเคฒ, เคฎเคจเคฆเฅ€เคช เคฌเฅˆเคจเฅ‡เคธ, เค“เคจเฅ‚เคฐ เคธเฅ‡เคฒเฅ‡เคฌเฅ€, เค—เฅเค‡เคฒเฅเคฒเคพเคฎ เคตเฅ‡เคจเฅเคœเฅ‡เค•, เคตเคฟเคถเฅเคฐเคต เคšเฅŒเคงเคฐเฅ€, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคŸเฅ‰เคฎ เคฌเคฐเฅเคš, เคตเคฟเคŸเคพเคฒเฅ€ เคฒเคฟเคชเคšเคฟเค‚เคธเฅเค•เฅ€, เคธเคฐเฅเค—เฅ‡เคˆ เคเคกเฅเคจเฅ‹เคต, เคเคกเฅŒเคฐเฅเคก เคฆเฅเคตเคพเคฐเคพ เค—เฅเคฐเฅ‡เคต, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€, เค†เคฐเฅเคฎเค‚เคก เคœเฅŒเคฒเคฟเคจ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Jรถrg เคฆเฅเคตเคพเคฐเคพ [OPUS](http://opus.nlpl.eu/) เคกเฅ‡เคŸเคพ เคธเฅ‡ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคฎเคถเฅ€เคจเฅ€ เค…เคจเฅเคตเคพเคฆ เคฎเฅ‰เคกเคฒ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพ เคŸเคพเค‡เคกเฅ‡เคฎเฅˆเคจ เคฆเฅเคตเคพเคฐเคพเฅค [เคฎเฅˆเคฐเคฟเคฏเคจ เคซเฅเคฐเฅ‡เคฎเคตเคฐเฅเค•](https://marian-nmt.github.io/) เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคŸเฅเคฐเคพเค‚เคธเคฒเฅ‡เคŸเคฐ เคŸเฅ€เคฎ เคฆเฅเคตเคพเคฐเคพ เคตเคฟเค•เคธเคฟเคคเฅค -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคเคถเคฟเคฏเคพ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฎเคพเคฐเฅเค•เค…เคชเคเคฒเคเคฎ: เคตเคฟเคœเฅเค…เคฒเฅ€-เคฐเคฟเคš เคกเฅ‰เค•เฅเคฏเฅ‚เคฎเฅ‡เค‚เคŸ เค…เค‚เคกเคฐเคธเฅเคŸเฅˆเค‚เคกเคฟเค‚เค— เค•เฅ‡ เคฒเคฟเค เคŸเฅ‡เค•เฅเคธเฅเคŸ เค”เคฐ เคฎเคพเคฐเฅเค•เค…เคช เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เค•เคพ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—] (https://arxiv.org/abs/2110.08518) เคœเฅเคจเคฒเฅ‰เคจเฅเค— เคฒเฅ€, เคฏเคฟเคนเฅ‡เค‚เค— เคœเฅ‚, เคฒเฅ‡เคˆ เค•เฅเคˆ, เคซเฅเคฐเฅ เคฆเฅเคตเคพเคฐเคพ เคตเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (FAIR and UIUC เคธเฅ‡) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (เคฎเฅ‡เคŸเคพ เค”เคฐ UIUC เคธเฅ‡) เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ [เคชเฅเคฐเคคเคฟ-เคชเคฟเค•เฅเคธเฅ‡เคฒ เคตเคฐเฅเค—เฅ€เค•เคฐเคฃ เคตเคน เคธเคฌ เคจเคนเฅ€เค‚ เคนเฅˆ เคœเคฟเคธเค•เฅ€ เค†เคชเค•เฅ‹ เคธเคฟเคฎเฅ‡เค‚เคŸเคฟเค• เคธเฅ‡เค—เคฎเฅ‡เค‚เคŸเฅ‡เคถเคจ เค•เฅ€ เค†เคตเคถเฅเคฏเค•เคคเคพ เคนเฅˆ] (https://arxiv.org/abs/2107.06278) เคฌเฅ‹เคตเฅ‡เคจ เคšเฅ‡เค‚เค—, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เคœเฅ€. เคถเฅเคตเคฟเค‚เค—, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เค•เคฟเคฐเคฟเคฒเฅ‹เคต เคฆเฅเคตเคพเคฐเคพ >>>>>> เคฐเคฟเคฌเฅ‡เคธ เค เฅ€เค• เค•เคฐเฅ‡เค‚ -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (Google AI เคธเฅ‡) Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคจเฅเคฏเฅ‚เคฐเคฒ เคฎเคถเฅ€เคจ เคŸเฅเคฐเคพเค‚เคธเคฒเฅ‡เคถเคจ เค•เฅ‡ เคฒเคฟเค เคฎเคฒเฅเคŸเฅ€เคฒเคฟเค‚เค—เฅเค…เคฒ เคกเฅ€เคจเฅ‹เค‡เคœเคฟเค‚เค— เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—](https://arxiv. org/abs/2001.08210) เคฏเคฟเคจเคนเคพเคจ เคฒเคฟเคฏเฅ‚, เคœเคฟเคฏเคพเคคเคพเค“ เค—เฅ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคœเคฟเคฏเคพเคจ เคฒเฅ€, เคธเคฐเฅเค—เฅ‡เคˆ เคเคกเฅเคจเฅ‹เคต, เคฎเคพเคฐเฅเคœเคจ เค—เคผเคœเคผเคตเคฟเคจเคฟเคจเฅ‡เคœเคพเคฆ, เคฎเคพเค‡เค• เคฒเฅเคˆเคธ, เคฒเฅเคฏเฅ‚เค• เคœเคผเฅ‡เคŸเคฒเคฎเฅ‰เคฏเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคเค•เฅเคธเฅเคŸเฅ‡เค‚เคธเคฟเคฌเคฒ เคฌเคนเฅเคญเคพเคทเฅ€ เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค”เคฐ เคซเคพเค‡เคจเคŸเฅเคฏเฅ‚เคจเคฟเค‚เค— เค•เฅ‡ เคธเคพเคฅ เคฌเคนเฅเคญเคพเคทเฅ€ เค…เคจเฅเคตเคพเคฆ](https://arxiv เคฏเฅเค•เคฟเค‚เค— เคŸเฅˆเค‚เค—, เคšเคพเค‰ เคŸเฅเคฐเคพเคจ, เคœเคฟเคฏเคพเคจ เคฒเฅ€, เคชเฅ‡เค‚เค—-เคœเฅ‡เคจ เคšเฅ‡เคจ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคตเคฟเคถเฅเคฐเคต เคšเฅŒเคงเคฐเฅ€, เคœเคฟเคฏเคพเคคเคพเค“ เค—เฅ, เคเค‚เคœเฅ‡เคฒเคพ เคซเฅˆเคจ เคฆเฅเคตเคพเคฐเคพ .org/abs/2008.00401)เฅค -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (Facebook เคธเฅ‡) Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (NVIDIA เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [Megatron-LM: เคฎเฅ‰เคกเคฒ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเค•เฅ‡ เคฌเคนเฅ-เค…เคฐเคฌ เคชเฅˆเคฐเคพเคฎเฅ€เคŸเคฐ เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ เค•เคพ เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ Parallelism](https://arxiv.org/abs/1909.08053) เคฎเฅ‹เคนเคฎเฅเคฎเคฆ เคถเฅ‹เคเคฌเฅ€, เคฎเฅ‹เคธเฅเคŸเฅ‹เคซเคพ เคชเคŸเคตเคพเคฐเฅ€, เคฐเคพเค‰เคฒ เคชเฅเคฐเฅ€, เคชเฅˆเคŸเฅเคฐเคฟเค• เคฒเฅ‡เค—เฅเคฐเฅ‡เคธเฅเคฒเฅ‡, เคœเฅ‡เคฐเฅ‡เคก เค•เฅˆเคธเฅเคชเคฐ เค”เคฐ เคฌเฅเคฐเคพเคฏเคจ เค•เฅˆเคŸเคพเคจเคœเคผเคพเคฐเฅ‹ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [Megatron-LM: เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฎเคฒเฅเคŸเฅ€-เคฌเคฟเคฒเคฟเคฏเคจ เคชเฅˆเคฐเคพเคฎเฅ€เคŸเคฐ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒเฅเคธ เคฏเฅ‚เคœเคฟเค‚เค— เคฎเฅ‰เคกเคฒ เคชเฅˆเคฐเฅ‡เคฒเคฒเคฟเคœเคผเฅเคฎ] (https://arxiv.org/abs/1909.08053) เคฎเฅ‹เคนเคฎเฅเคฎเคฆ เคถเฅ‹เคเคฌเฅ€, เคฎเฅ‹เคธเฅเคŸเฅ‹เคซเคพ เคชเคŸเคตเคพเคฐเฅ€, เคฐเคพเค‰เคฒ เคชเฅเคฐเฅ€, เคชเฅˆเคŸเฅเคฐเคฟเค• เคฒเฅ‡เค—เฅเคฐเฅ‡เคธเฅเคฒเฅ‡, เคœเฅ‡เคฐเฅ‡เคก เค•เฅˆเคธเฅเคชเคฐ เค”เคฐ เคฌเฅเคฐเคพเคฏเคจ เค•เฅˆเคŸเคพเคจเคœเคผเคพเคฐเฅ‹ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research เคธเฅ‡) Peng Wang, Cheng Da, and Cong Yao. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (เคซเฅเคฐเฅ‰เคฎ Studio Ousia) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [mLUKE: เคฆ เคชเคพเคตเคฐ เค‘เคซ เคเค‚เคŸเคฟเคŸเฅ€ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เค‡เคจ เคฎเคฒเฅเคŸเฅ€เคฒเคฟเค‚เค—เฅเค…เคฒ เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเฅเคก เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒเฅเคธ](https://arxiv.org/abs/2110.08151) เคฐเคฏเฅ‹เค•เคจ เคฐเฅ€, เค‡เค•เฅเคฏเคพ เคฏเคพเคฎเคพเคกเคพ, เค”เคฐ เคฏเฅ‹เคถเคฟเคฎเคพเคธเคพ เคคเฅเคธเฅเคฐเฅ‹เค•เคพ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook เคธเฅ‡) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (เคธเฅ€เคเคฎเคฏเฅ‚/เค—เฅ‚เค—เคฒ เคฌเฅเคฐเฅ‡เคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฎเฅ‹เคฌเคพเค‡เคฒเคฌเคฐเฅเคŸ: เคธเค‚เคธเคพเคงเคจ-เคธเฅ€เคฎเคฟเคค เค‰เคชเค•เคฐเคฃเฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เคเค• เค•เฅ‰เคฎเฅเคชเฅˆเค•เฅเคŸ เคŸเคพเคธเฅเค•-เค…เคœเฅเคžเฅ‡เคฏ เคฌเฅ€เคˆเค†เคฐเคŸเฅ€] (https://arxiv.org/abs/2004.02984) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, เค”เคฐ Denny Zhou เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (Apple เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [MobileViT: เคฒเคพเค‡เคŸ-เคตเฅ‡เคŸ, เคœเคจเคฐเคฒ-เคชเคฐเฅเคชเคธ, เค”เคฐ เคฎเฅ‹เคฌเคพเค‡เคฒ-เคซเฅเคฐเฅ‡เค‚เคกเคฒเฅ€ เคตเคฟเคœเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ] (https://arxiv.org/abs/2110.02178) เคธเคšเคฟเคจ เคฎเฅ‡เคนเคคเคพ เค”เคฐ เคฎเฅ‹เคนเคฎเฅเคฎเคฆ เคฐเคธเฅเคคเค—เคฐเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (Apple เคธเฅ‡) Sachin Mehta and Mohammad Rastegari. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (MosaiML เคธเฅ‡) the MosaicML NLP Team. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [llm-foundry](https://github.com/mosaicml/llm-foundry/) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (the University of Wisconsin - Madison เคธเฅ‡) Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (Google AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [mT5: เคเค• เคตเฅเคฏเคพเคชเค• เคฌเคนเฅเคญเคพเคทเฅ€ เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคŸเฅ‡เค•เฅเคธเฅเคŸ-เคŸเฅ‚-เคŸเฅ‡เค•เฅเคธเฅเคŸ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ]( https://arxiv.org/abs/2010.11934) เคฒเคฟเค‚เคŸเคฟเค‚เค— เคœเคผเฅ‚, เคจเฅ‹เค† เค•เฅ‰เคจเฅเคธเคŸเฅ‡เค‚เคŸ, เคเคกเคฎ เคฐเฅ‰เคฌเคฐเฅเคŸเฅเคธ, เคฎเคฟเคนเคฟเคฐ เค•เคพเคฒเฅ‡, เคฐเคพเคฎเฅ€ เค…เคฒ-เคฐเคซเฅ‚, เค†เคฆเคฟเคคเฅเคฏ เคธเคฟเคฆเฅเคงเคพเค‚เคค, เค†เคฆเคฟเคคเฅเคฏ เคฌเคฐเฅเค†, เค•เฅ‰เคฒเคฟเคจ เคฐเฅˆเคซเฅ‡เคฒ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (เคนเฅเค†เคตเฅ‡เคˆ เคจเฅ‚เคน เค•เฅ‡ เค†เคฐเฅเค• เคฒเฅˆเคฌ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœเคผ [NEZHA: เคšเฅ€เคจเฅ€ เคญเคพเคทเคพ เคธเคฎเค เค•เฅ‡ เคฒเคฟเค เคคเค‚เคคเฅเคฐเคฟเค•เคพ เคชเฅเคฐเคพเคธเค‚เค—เคฟเค• เคชเฅเคฐเคคเคฟเคจเคฟเคงเคฟเคคเฅเคต](https :/ /arxiv.org/abs/1909.00204) เคœเฅเคจเฅเค•เคฟเค‰ เคตเฅ‡เคˆ, เคœเคผเคฟเคฏเคพเค“เคœเคผเฅ‡ เคฐเฅ‡เคจ, เคœเคผเคฟเค†เค“เค—เฅเค†เค‚เค— เคฒเฅ€, เคตเฅ‡เคจเคฏเฅ‹เค‚เค— เคนเฅเค†เค‚เค—, เคฏเฅ€ เคฒเคฟเคฏเคพเค“, เคฏเคพเคถเฅ‡เค‚เค— เคตเคพเค‚เค—, เคœเคฟเคฏเคพเคถเฅ‚ เคฒเคฟเคจ, เคถเคฟเคจ เคœเคฟเคฏเคพเค‚เค—, เคœเคฟเค“ เคšเฅ‡เคจ เค”เคฐ เค•เฅเคจ เคฒเคฟเคฏเฅ‚ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (เคซเฅเคฐเฅ‰เคฎ เคฎเฅ‡เคŸเคพ) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคจเฅ‹ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฒเฅ‡เคซเฅเคŸ เคฌเคฟเคนเคพเค‡เค‚เคก: เคธเฅเค•เฅ‡เคฒเคฟเค‚เค— เคนเฅเคฏเฅ‚เคฎเคจ-เคธเฅ‡เค‚เคŸเฅ‡เคก เคฎเคถเฅ€เคจ เคŸเฅเคฐเคพเค‚เคธเคฒเฅ‡เคถเคจ] (https://arxiv.org/abs/2207.04672) เคเคจเคเคฒเคเคฒเคฌเฅ€ เคŸเฅ€เคฎ เคฆเฅเคตเคพเคฐเคพ เคชเฅเคฐเค•เคพเคถเคฟเคคเฅค -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (Meta เคธเฅ‡) the NLLB team. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (Meta AI เคธเฅ‡) Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (เคตเคฟเคธเฅเค•เฅ‰เคจเฅเคธเคฟเคจ เคตเคฟเคถเฅเคตเคตเคฟเคฆเฅเคฏเคพเคฒเคฏ - เคฎเฅˆเคกเคฟเคธเคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [Nystrรถmformer: A Nystrรถm- เค†เคงเคพเคฐเคฟเคค เคเคฒเฅเค—เฅ‹เคฐเคฟเคฅเคฎ เค†เคคเฅเคฎ-เคงเฅเคฏเคพเคจ เค•เคพ เค…เคจเฅเคฎเคพเคจ เคฒเค—เคพเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค ](https://arxiv.org/abs/2102.03902) เคฏเฅเคจเคฏเคพเค‚เค— เคœเคผเคฟเค“เค‚เค—, เคเคพเคจเคชเฅ‡เค‚เค— เคœเคผเฅ‡เค‚เค—, เคฐเฅเคฆเฅเคฐเคธเคฟเคธ เคšเค•เฅเคฐเคตเคฐเฅเคคเฅ€, เคฎเคฟเค‚เค—เค•เฅเคธเคฟเค‚เค— เคŸเฅˆเคจ, เค—เฅเคฒเฅ‡เคจ เคซเค‚เค—, เคฏเคฟเคจ เคฒเฅ€, เคตเคฟเค•เคพเคธ เคธเคฟเค‚เคน เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (SHI Labs เคธเฅ‡) เคชเฅ‡เคชเคฐ [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) เคœเคฟเคคเฅ‡เคถ เคœเฅˆเคจ, เคœเคฟเค†เคšเฅ‡เคจ เคฒเฅ€, เคฎเคพเค‚เค—เคŸเคฟเค• เคšเคฟเค‰, เค…เคฒเฅ€ เคนเคธเคจเฅ€, เคจเคฟเค•เคฟเคคเคพ เค“เคฐเคฒเฅ‹เคต, เคนเคฎเฅเคซเฅเคฐเฅ€ เคถเคฟ เค•เฅ‡ เคฆเฅเคตเคพเคฐเคพ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆเฅค -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคตเคฟเคœเคผเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เค•เฅ‡ เคธเคพเคฅ เคธเคฟเค‚เคชเคฒ เค“เคชเคจ-เคตเฅ‹เค•เฅˆเคฌเฅเคฒเคฐเฅ€ เค‘เคฌเฅเคœเฅ‡เค•เฅเคŸ เคกเคฟเคŸเฅ‡เค•เฅเคถเคจ](https:/ /arxiv.org/abs/2205.06230) เคฎเฅˆเคฅเคฟเคฏเคพเคธ เคฎเคฟเค‚เคกเคฐเคฐ, เคเคฒเฅ‡เค•เฅเคธเฅ€ เค—เฅเคฐเคฟเคŸเฅเคธเฅ‡เค‚เค•เฅ‹, เค‘เคธเฅเคŸเคฟเคจ เคธเฅเคŸเฅ‹เคจ, เคฎเฅˆเค•เฅเคธเคฟเคฎ เคจเฅเคฏเฅ‚เคฎเฅˆเคจ, เคกเคฟเคฐเฅเค• เคตเฅ€เคธเฅ‡เคจเคฌเฅ‹เคฐเฅเคจ, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคกเฅ‹เคธเฅ‹เคตเคฟเคคเฅเคธเฅเค•เฅ€, เค…เคฐเคตเคฟเค‚เคฆ เคฎเคนเฅ‡เค‚เคฆเฅเคฐเคจ, เค…เคจเฅเคฐเคพเค— เค…เคฐเฅเคจเคฌ, เคฎเฅเคธเฅเคคเคซเคพ เคฆเฅ‡เคนเค˜เคพเคจเฅ€, เคœเคผเฅเค“เคฐเคจ เคถเฅ‡เคจ, เคœเคฟเค“ เคตเคพเค‚เค—, เคœเคผเคฟเคฏเคพเค“เคนเฅเค† เคเคพเคˆ, เคฅเฅ‰เคฎเคธ เค•เคฟเคซเคผ, เค”เคฐ เคจเฅ€เคฒ เคนเฅ‰เคฒเฅเคธเคฌเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI เคธเฅ‡) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** ( IBM Research เคธเฅ‡) Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (IBM เคธเฅ‡) Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google เค•เฅ€ เค“เคฐ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [เคฒเค‚เคฌเฅ‡ เค‡เคจเคชเฅเคŸ เคธเคพเคฐเคพเค‚เคถ เค•เฅ‡ เคฒเคฟเค เคŸเฅเคฐเคพเค‚เคธเคซเคผเฅ‰เคฐเฅเคฎเคฐเฅ‹เค‚ เค•เฅ‹ เคฌเฅ‡เคนเคคเคฐ เคคเคฐเฅ€เค•เฅ‡ เคธเฅ‡ เคเค•เฅเคธเคŸเฅ‡เค‚เคก เค•เคฐเคจเคพ](https://arxiv .org/abs/2208.04347) เคœเฅ‡เคธเคจ เคซเคพเค‚เค—, เคฏเคพเค“ เคเคพเค“, เคชเฅ€เคŸเคฐ เคœเฅ‡ เคฒเคฟเคฏเฅ‚ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (เคฆเฅ€เคชเคฎเคพเค‡เค‚เคก เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคชเคฐเฅเคธเฅ€เคตเคฐ เค†เคˆเค“: เคธเค‚เคฐเคšเคฟเคค เค‡เคจเคชเฅเคŸ เค”เคฐ เค†เค‰เคŸเคชเฅเคŸ เค•เฅ‡ เคฒเคฟเค เคเค• เคธเคพเคฎเคพเคจเฅเคฏ เคตเคพเคธเฅเคคเฅเค•เคฒเคพ] (https://arxiv.org/abs/2107.14795) เคเค‚เคกเฅเคฐเคฏเฅ‚ เคœเฅ‡เค—เคฒ, เคธเฅ‡เคฌเฅ‡เคธเฅเคŸเคฟเคฏเคจ เคฌเฅ‹เคฐเค—เฅเคฏเฅ‚เคก, เคœเฅ€เคจ-เคฌเฅˆเคชเฅเคŸเคฟเคธเฅเคŸ เค…เคฒเคพเคฏเคฐเคพเค•, เค•เคพเคฐเฅเคฒ เคกเฅ‹เคฐเฅเคถ, เค•เฅˆเคŸเคฒเคฟเคจ เค‡เค“เคจเฅ‡เคธเฅเค•เฅ, เคกเฅ‡เคตเคฟเคก เคฆเฅเคตเคพเคฐเคพ เคกเคฟเค‚เค—, เคธเฅเค•เค‚เคฆ เค•เฅ‹เคชเฅเคชเฅเคฒเคพ, เคกเฅˆเคจเคฟเคฏเคฒ เคœเคผเฅ‹เคฐเคพเคจ, เคเค‚เคกเฅเคฐเคฏเฅ‚ เคฌเฅเคฐเฅ‰เค•, เค‡เคตเคพเคจ เคถเฅ‡เคฒเคนเฅˆเคฎเคฐ, เค“เคฒเคฟเคตเคฟเคฏเคฐ เคนเฅ‡เคจเคพเคซ, เคฎเฅˆเคฅเฅเคฏเฅ‚ เคเคฎเฅค เคฌเฅ‹เคŸเฅเคตเคฟเคจเคฟเค•, เคเค‚เคกเฅเคฐเคฏเฅ‚ เคœเคผเคฟเคธเคฐเคฎเฅˆเคจ, เค“เคฐเคฟเค“เคฒ เคตเคฟเคจเคฟเคฏเคฒเฅเคธ, เคœเฅ‹เค†เค“ เค•เฅˆเคฐเฅ‡เคฐเคพ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT เคธเฅ‡) Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [blog post](https://www.adept.ai/blog/persimmon-8b) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [PhoBERT: เคตเคฟเคฏเคคเคจเคพเคฎเฅ€ เค•เฅ‡ เคฒเคฟเค เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ](https://www .aclweb.org/anthology/2020.findings-emnlp.92/) เคกเฅˆเคŸ เค•เฅเคตเฅ‹เค• เค—เฅเคฏเฅ‡เคจ เค”เคฐ เค…เคจเฅเคน เคคเฅเค†เคจ เค—เฅเคฏเฅ‡เคจ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google เคธเฅ‡) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคชเฅเคฐเฅ‹เค—เฅเคฐเคพเคฎ เค…เค‚เคกเคฐเคธเฅเคŸเฅˆเค‚เคกเคฟเค‚เค— เคเค‚เคก เคœเฅ‡เคจเคฐเฅ‡เคถเคจ เค•เฅ‡ เคฒเคฟเค เคฏเฅ‚เคจเคฟเคซเคพเค‡เคก เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—](https://arxiv .org/abs/2103.06333) เคตเคธเฅ€ เค‰เคฆเฅเคฆเฅ€เคจ เค…เคนเคฎเคฆ, เคธเฅˆเค•เคค เคšเค•เฅเคฐเคตเคฐเฅเคคเฅ€, เคฌเฅˆเคถเคพเค–เฅ€ เคฐเฅ‡, เค•เคพเคˆ-เคตเฅ‡เคˆ เคšเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [ProphetNet: เคชเฅเคฐเฅ‡เคกเคฟเค•เฅเคŸเคฟเค‚เค— เคซเฅเคฏเฅ‚เคšเคฐ เคเคจ-เค—เฅเคฐเคพเคฎ เคซเฅ‰เคฐ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ-เคŸเฅ‚-เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— ](https://arxiv.org/abs/2001.04063) เคฏเฅ‚ เคฏเคพเคจ, เคตเฅ€เคœเคผเฅ‡เคจ เค•เฅเคฏเฅ‚เคˆ, เคฏเฅ‡เคฏเฅเคจ เค—เฅ‹เค‚เค—, เคฆเคฏเคพเคนเฅ‡เค‚เค— เคฒเคฟเคฏเฅ‚, เคจเคพเคจ เคกเฅเค†เคจ, เคœเคฟเค‰เคถเฅ‡เค‚เค— เคšเฅ‡เคจ, เคฐเฅเค“เคซเคผเฅ‡เคˆ เคเคพเค‚เค— เค”เคฐ เคฎเคฟเค‚เค— เคเฅ‹เค‰ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (Nanjing University, The University of Hong Kong etc. เคธเฅ‡) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (NVIDIA เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคกเฅ€เคช เคฒเคฐเฅเคจเคฟเค‚เค— เค‡เค‚เคซเคผเฅ‡เค•เฅเคถเคจ เค•เฅ‡ เคฒเคฟเค เค‡เค‚เคŸเฅ€เคœเคฐ เค•เฅเคตเคพเค‚เคŸเคฟเคœเคผเฅ‡เคถเคจ: เคชเฅเคฐเคฟเค‚เคธเคฟเคชเคฒเฅเคธ เคเค‚เคก เคเคฎเฅเคชเคฟเคฐเคฟเค•เคฒ เค‡เคตเฅˆเคฒเฅเคฏเฅ‚เคเคถเคจ](https:// arxiv.org/abs/2004.09602) เคนเคพเค“ เคตเฅ‚, เคชเฅˆเคŸเฅเคฐเคฟเค• เคœเฅเคก, เคœเคฟเค†เค“เคœเฅ€ เคเคพเค‚เค—, เคฎเคฟเค–เคพเค‡เคฒ เค‡เคธเฅ‡เคต เค”เคฐ เคชเฅ‰เคฒเคฟเคฏเคธ เคฎเคพเค‡เค•เฅ‡เคตเคฟเคธเคฟเคฏเคธ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฐเคฟเคŸเฅเคฐเฅ€เคตเคฒ-เค‘เค—เคฎเฅ‡เค‚เคŸเฅ‡เคก เคœเฅ‡เคจเคฐเฅ‡เคถเคจ เคซเฅ‰เคฐ เคจเฅ‰เคฒเฅ‡เคœ-เค‡เค‚เคŸเฅ‡เค‚เคธเคฟเคต เคเคจเคเคฒเคชเฅ€ เคŸเคพเคธเฅเค•](https://arxiv .org/abs/2005.11401) เคชเฅˆเคŸเฅเคฐเคฟเค• เคฒเฅเคˆเคธ, เคเคฅเคจ เคชเฅ‡เคฐเฅ‡เคœเคผ, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเฅเคฐเคพ เคชเคฟเค•เฅเคŸเคธ, เคซเฅˆเคฌเคฟเคฏเฅ‹ เคชเฅ‡เคŸเฅเคฐเฅ‹เคจเฅ€, เคตเฅเคฒเคพเคฆเคฟเคฎเฅ€เคฐ เค•เคพเคฐเคชเฅเค–เคฟเคจ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคนเฅ‡เคจเคฐเคฟเค• เค•เฅเคŸเคฒเคฐ, เคฎเคพเค‡เค• เคฒเฅเคˆเคธ, เคตเฅ‡เคจ-เคคเคพเค‰ เคฏเคฟเคน, เคŸเคฟเคฎ เคฐเฅ‰เค•เคŸเคพเคถเฅ‡เคฒ, เคธเฅ‡เคฌเคธเฅเคŸเคฟเคฏเคจ เคฐเคฟเคกเฅ‡เคฒ, เคกเฅŒเคตเฅ‡ เค•เฅ€เคฒเคพ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (Google เค…เคจเฅเคธเค‚เคงเคพเคจ เคธเฅ‡) เค•เฅ‡เคฒเฅเคตเคฟเคจ เค—เฅ, เค•เฅ‡เค‚เคŸเคจ เคฒเฅ€, เคœเคผเฅ‹เคฐเคพ เคคเฅเค‚เค—, เคชเคพเคจเฅเคชเฅ‹เค‚เค— เคชเคธเฅเคชเคค เค”เคฐ เคฎเคฟเค‚เค—-เคตเฅ‡เคˆ เคšเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [REALM: เคฐเคฟเคŸเฅเคฐเฅ€เคตเคฒ-เค‘เค—เคฎเฅ‡เค‚เคŸเฅ‡เคก เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—](https://arxiv.org/abs/2002.08909)เฅค -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (META เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) [เคกเคฟเคœเคผเคพเค‡เคจเคฟเค‚เค— เคจเฅ‡เคŸเคตเคฐเฅเค• เคกเคฟเคœเคผเคพเค‡เคจ เคธเฅเคชเฅ‡เคธ] (https://arxiv.org/) เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ เคเคฌเฅเคธ/2003.13678) เค‡เคฒเคฟเคœเคพ เคฐเคพเคกเฅ‹เคธเคพเคตเฅ‹เคตเคฟเค•, เคฐเคพเคœ เคชเฅเคฐเคคเฅ€เค• เค•เฅ‹เคธเคพเคฐเคพเคœเฅ‚, เคฐเฅ‰เคธ เค—เคฟเคฐเฅเคถเคฟเค•, เค•เฅˆเคฎเคฟเค‚เค— เคนเฅ€, เคชเคฟเค“เคŸเคฐ เคกเฅ‰เคฒเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (เค—เฅ‚เค—เคฒ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฟเคค เคญเคพเคทเคพ เคฎเฅ‰เคกเคฒ เคฎเฅ‡เค‚ เคเคฎเฅเคฌเฅ‡เคกเคฟเค‚เค— เค•เคชเคฒเคฟเค‚เค— เคชเคฐ เคชเฅเคจเคฐเฅเคตเคฟเคšเคพเคฐ](https://arxiv .org/pdf/2010.12821.pdf) เคนเฅเคฏเฅเค‚เค— เคตเฅ‹เคจ เคšเฅเค‚เค—, เคฅเคฟเคฌเฅ‰เคฒเฅเคŸ เคซเคผเฅ‡เคตเคฐเฅ€, เคนเฅ‡เคจเคฐเฅ€ เคคเฅเคธเคพเคˆ, เคเคฎ. เคœเฅ‰เคจเคธเคจ, เคธเฅ‡เคฌเฅ‡เคธเฅเคŸเคฟเคฏเคจ เคฐเฅเคกเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) [เคกเฅ€เคช เคฐเฅ‡เคธเคฟเคกเฅเค…เคฒ เคฒเคฐเฅเคจเคฟเค‚เค— เคซเฅ‰เคฐ เค‡เคฎเฅ‡เคœ เคฐเคฟเค•เค—เฅเคจเคฟเคถเคจ] (https://arxiv. org/abs/1512.03385) เค•เฅˆเคฎเคฟเค‚เค— เคนเฅ‡, เคœเคฟเคฏเคพเค‚เค—เฅเคฏเฅ เคเคพเค‚เค—, เคถเคพเค“เค•เคฟเค‚เค— เคฐเฅ‡เคจ, เคœเคฟเคฏเคพเคจ เคธเคจ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡), เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฎเคœเคฌเฅ‚เคค เคฐเฅ‚เคช เคธเฅ‡ เค…เคจเฅเค•เฅ‚เคฒเคฟเคค BERT เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฆเฅƒเคทเฅเคŸเคฟเค•เฅ‹เคฃ](https://arxiv.org/abs /1907.11692) เคฏเคฟเคจเคนเคพเคจ เคฒเคฟเคฏเฅ‚, เคฎเคพเคฏเคฒ เค“เคŸ, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคœเคฟเค‚เค—เคซเฅ‡เคˆ เคกเฅ‚, เคฎเค‚เคฆเคพเคฐ เคœเฅ‹เคถเฅ€, เคกเฅˆเคจเค•เฅ€ เคšเฅ‡เคจ, เค“เคฎเคฐ เคฒเฅ‡เคตเฅ€, เคฎเคพเค‡เค• เคฒเฅเคˆเคธ, เคฒเฅเคฏเฅ‚เค• เคœเคผเฅ‡เคŸเคฒเคฎเฅ‰เคฏเคฐ, เคตเฅ‡เคธเฅ‡เคฒเคฟเคจ เคธเฅเคŸเฅ‹เคฏเคพเคจเฅ‹เคต เคฆเฅเคตเคพเคฐเคพเฅค -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (เคเฅเคˆเคˆ เคŸเฅ‡เค•เฅเคจเฅ‹เคฒเฅ‰เคœเฅ€ เคธเฅ‡), เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฐเฅ‹เคซเฅ‰เคฐเฅเคฎเคฐ: เคฐเฅ‹เคŸเคฐเฅ€ เคชเฅ‹เคœเคฟเคถเคจ เคเค‚เคฌเฅ‡เคกเคฟเค‚เค— เค•เฅ‡ เคธเคพเคฅ เคเคจเฅเคนเคพเค‚เคธเฅเคก เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ] (https://arxiv.org/pdf/2104.09864v1.pdf) เคœเคฟเคฏเคพเคจเคฒเคฟเคจ เคธเฅ เค”เคฐ เคฏเฅ‚ เคฒเฅ‚ เค”เคฐ เคถเฅ‡เค‚เค—เคซเฅ‡เค‚เค— เคชเฅˆเคจ เค”เคฐ เคฌเฅ‹ เคตเฅ‡เคจ เค”เคฐ เคฏเฅเคจเคซเฅ‡เค‚เค— เคฒเคฟเคฏเฅ‚ เคฆเฅเคตเคพเคฐเคพ เคชเฅเคฐเค•เคพเคถเคฟเคคเฅค -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng เคธเฅ‡) Bo Peng. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [this repo](https://github.com/BlinkDL/RWKV-LM) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI เคธเฅ‡) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP เคธเฅ‡) เคธเคพเคฅ เคฆเฅ‡เคจเฅ‡ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคญเคพเคทเคฃ เคชเคนเคšเคพเคจ เค•เฅ‡ เคฒเคฟเค เค…เคจเคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฎเฅ‡เค‚ เคชเคฐเคซเฅ‰เคฐเฅเคฎเฅ‡เค‚เคธ-เคเคซเคฟเคถเคฟเคเค‚เคธเฅ€ เคŸเฅเคฐเฅ‡เคก-เค‘เคซเฅเคธ](https ://arxiv.org/abs/2109.06870) เคซเฅ‡เคฒเคฟเค•เฅเคธ เคตเฅ‚, เค•เฅเคตเคพเค‚เค—เคฏเฅเคจ เค•เคฟเคฎ, เคœเคฟเค‚เค— เคชเฅˆเคจ, เค•เฅเคฏเฅ‚ เคนเคพเคจ, เค•เคฟเคฒเคฟเคฏเคจ เค•เฅเคฏเฅ‚. เคตเฅ‡เคจเคฌเคฐเฅเค—เคฐ, เคฏเฅ‹เคต เค†เคฐเฅเคŸเคœเคผเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (ASAPP เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคญเคพเคทเคฃ เคชเคนเคšเคพเคจ เค•เฅ‡ เคฒเคฟเค เค…เคจเคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฎเฅ‡เค‚ เคชเคฐเคซเฅ‰เคฐเฅเคฎเฅ‡เค‚เคธ-เคเคซเคฟเคถเคฟเคเค‚เคธเฅ€ เคŸเฅเคฐเฅ‡เคก-เค‘เคซเฅเคธ] (https://arxiv.org/abs/2109.06870) เคซเฅ‡เคฒเคฟเค•เฅเคธ เคตเฅ‚, เค•เฅเคตเคพเค‚เค—เคฏเฅเคจ เค•เคฟเคฎ, เคœเคฟเค‚เค— เคชเฅˆเคจ, เค•เฅเคฏเฅ‚ เคนเคพเคจ, เค•เคฟเคฒเคฟเคฏเคจ เค•เฅเคฏเฅ‚. เคตเฅ‡เคจเคฌเคฐเฅเค—เคฐ, เคฏเฅ‹เค†เคต เค†เคฐเฅเคŸเคœเคผเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡), เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคซเฅ‡เคฏเคฐเคธเฅ‡เค• S2T: เคซเคพเคธเฅเคŸ เคธเฅเคชเฅ€เคš-เคŸเฅ‚-เคŸเฅ‡เค•เฅเคธเฅเคŸ เคฎเฅ‰เคกเคฒเคฟเค‚เค— เคตเคฟเคฆ เคซเฅ‡เคฏเคฐเคธเฅ‡เค•](https: //arxiv.org/abs/2010.05171) เคšเคพเค‚เค—เคนเคพเคจ เคตเคพเค‚เค—, เคฏเฅ‚เค‚ เคคเคพเค‚เค—, เคœเฅเคคเคพเคˆ เคฎเคพ, เคเคจเฅ€ เคตเฅ‚, เคฆเคฟเคฎเคฟเคคเฅเคฐเฅ‹ เค“เค–เฅ‹เคจเค•เฅ‹, เคœเฅเค†เคจ เคชเคฟเคจเฅ‹ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพใ€‚ -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฒเคพเคฐเฅเคœ-เคธเฅเค•เฅ‡เคฒ เคธเฅ‡เคฒเฅเคซ- เคเค‚เคก เคธเฅ‡เคฎเฅ€-เคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคฒเคฐเฅเคจเคฟเค‚เค— เคซเฅ‰เคฐ เคธเฅเคชเฅ€เคš เคŸเฅเคฐเคพเค‚เคธเคฒเฅ‡เคถเคจ](https://arxiv.org/abs/2104.06678) เคšเคพเค‚เค—เคนเคพเคจ เคตเคพเค‚เค—, เคเคจเฅ€ เคตเฅ‚, เคœเฅเค†เคจ เคชเคฟเคจเฅ‹, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเฅ‡เคตเคธเฅเค•เฅ€, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€, เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เคฆเฅเคตเคพเคฐเคพ Conneau เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (เคคเฅ‡เคฒ เค…เคตเฅ€เคต เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคธเฅเคชเฅˆเคจ เคธเคฟเคฒเฅ‡เค•เฅเคถเคจ เค•เฅ‹ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค•เคฐเค•เฅ‡ เค•เฅเค›-เคถเฅ‰เคŸ เค•เฅเคตเฅ‡เคถเฅเคšเคจ เค†เค‚เคธเคฐเคฟเค‚เค—](https:// arxiv.org/abs/2101.00438) เค“เคฐเคฟ เคฐเคพเคฎ, เคฏเฅเคตเคฒ เค•เคฐเฅเคธเฅเคŸเคจ, เคœเฅ‹เคจเคพเคฅเคจ เคฌเฅ‡เคฐเฅ‡เค‚เคŸ, เค…เคฎเฅ€เคฐ เค—เฅเคฒเฅ‹เคฌเคฐเฅเคธเคจ, เค“เคฎเคฐ เคฒเฅ‡เคตเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (เคฌเคฐเฅเค•เคฒเฅ‡ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [SqueezeBERT: เค•เฅเคถเคฒ เคคเค‚เคคเฅเคฐเคฟเค•เคพ เคจเฅ‡เคŸเคตเคฐเฅเค• เค•เฅ‡ เคฌเคพเคฐเฅ‡ เคฎเฅ‡เค‚ NLP เค•เฅ‹ เค•เค‚เคชเฅเคฏเฅ‚เคŸเคฐ เคตเคฟเคœเคผเคจ เค•เฅเคฏเคพ เคธเคฟเค–เคพ เคธเค•เคคเคพ เคนเฅˆ?](https: //arxiv.org/abs/2006.11316) เคซเฅ‰เคฐเฅ‡เคธเฅเคŸ เคเคจ. เค‡เคจเคกเฅ‹เคฒเคพ, เค…เคฒเฅเคฌเคฐเฅเคŸ เคˆ. เคถเฅ‰, เคฐเคตเคฟ เค•เฅƒเคทเฅเคฃเคพ, เค”เคฐ เค•เคฐเฅเคŸ เคกเคฌเฅเคฒเฅเคฏเฅ‚. เค•เฅ‡เคŸเคœเคผเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (MBZUAI เคธเฅ‡) Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคธเฅเคตเคพเค‡เคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ: เคถเคฟเคซเฅเคŸเฅ‡เคก เคตเคฟเค‚เคกเฅ‹เคœ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐ เคชเคฆเคพเคจเฅเค•เฅเคฐเคฎเคฟเคค เคตเคฟเคœเคจ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ](https://arxiv .org/abs/2103.14030) เคœเคผเฅ€ เคฒเคฟเคฏเฅ‚, เคฏเฅเคŸเฅ‹เค‚เค— เคฒเคฟเคจ, เคฏเฅ‚ เค•เคพเค“, เคนเคพเคจ เคนเฅ‚, เคฏเคฟเค•เฅเคธเฅเค†เคจ เคตเฅ‡เคˆ, เคเฅ‡เค‚เค— เคเคพเค‚เค—, เคธเฅเคŸเฅ€เคซเคจ เคฒเคฟเคจ, เคฌเฅˆเคจเคฟเค‚เค— เค—เฅเค“ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (Microsoft เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [Swin Transformer V2: เคธเฅเค•เฅ‡เคฒเคฟเค‚เค— เค…เคช เค•เฅˆเคชเฅ‡เคธเคฟเคŸเฅ€ เคเค‚เคก เคฐเฅ‡เคœเฅ‹เคฒเฅเคฏเฅ‚เคถเคจ](https:// เคœเคผเฅ€ เคฒเคฟเคฏเฅ‚, เคนเคพเคจ เคนเฅ‚, เคฏเฅเคŸเฅ‹เค‚เค— เคฒเคฟเคจ, เคœเคผเฅเคฒเคฟเค†เค‚เค— เคฏเคพเค“, เคœเคผเฅ‡เค‚เคกเคพ เคœเคผเฅ€, เคฏเคฟเค•เฅเคธเฅเค†เคจ เคตเฅ‡เคˆ, เคœเคฟเคฏเคพ เคจเคฟเค‚เค—, เคฏเฅ‚ เค•เคพเค“, เคเฅ‡เค‚เค— เคเคพเค‚เค—, เคฒเฅ€ เคกเฅ‹เค‚เค—, เคซเฅเคฐเฅ เคตเฅ‡เคˆ, เคฌเฅˆเคจเคฟเค‚เค— เค—เฅเค“ เคฆเฅเคตเคพเคฐเคพ arxiv.org/abs/2111.09883เฅค -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (ๆฅ่‡ช Google AI)เค•เฅ‰เคฒเคฟเคจ เคฐเฅˆเคซเฅ‡เคฒ เค”เคฐ เคจเฅ‹เคฎ เคถเคœเคผเฅ€เคฐ เค”เคฐ เคเคกเคฎ เคฐเฅ‰เคฌเคฐเฅเคŸเฅเคธ เค”เคฐ เค•เฅˆเคฅเคฐเฅ€เคจ เคฒเฅ€ เค”เคฐ เคถเคฐเคฃ เคจเคพเคฐเค‚เค— เค”เคฐ เคฎเคพเค‡เค•เคฒ เคฎเคŸเฅ‡เคจเคพ เคฆเฅเคตเคพเคฐเคพ เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคเค• เคเค•เฅ€เค•เฅƒเคค เคŸเฅ‡เค•เฅเคธเฅเคŸ-เคŸเฅ‚-เคŸเฅ‡เค•เฅเคธเฅเคŸ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค•เฅ‡ เคธเคพเคฅ เคธเฅเคฅเคพเคจเคพเค‚เคคเคฐเคฃ เคธเฅ€เค–เคจเฅ‡ เค•เฅ€ เคธเฅ€เคฎเคพ เค•เฅ€ เค–เฅ‹เคœ] (https://arxiv.org/abs/1910.10683) เค”เคฐ เคฏเคพเค‚เค•เฅ€ เคเฅ‹เค‰ เค”เคฐ เคตเฅ‡เคˆ เคฒเฅ€ เค”เคฐ เคชเฅ€เคŸเคฐ เคœเฅ‡ เคฒเคฟเคฏเฅ‚เฅค -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (Google AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [google-research/text-to-text-transfer- เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) เค•เฅ‰เคฒเคฟเคจ เคฐเฅˆเคซเฅ‡เคฒ เค”เคฐ เคจเฅ‹เคฎ เคถเคœเคผเฅ€เคฐ เค”เคฐ เคเคกเคฎ เคฐเฅ‰เคฌเคฐเฅเคŸเฅเคธ เค”เคฐ เค•เฅˆเคฅเคฐเฅ€เคจ เคฒเฅ€ เค”เคฐ เคถเคฐเคฃ เคจเคพเคฐเค‚เค— เคฆเฅเคตเคพเคฐเคพ เค”เคฐ เคฎเคพเค‡เค•เคฒ เคฎเคŸเฅ‡เคจเคพ เค”เคฐ เคฏเคพเค‚เค•เฅ€ เคเฅ‹เค‰ เค”เคฐ เคตเฅ‡เคˆ เคฒเฅ€ เค”เคฐ เคชเฅ€เคŸเคฐ เคœเฅ‡ เคฒเคฟเคฏเฅ‚เฅค -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคชเคฌเคŸเฅ‡เคฌเคฒเฅเคธ-1เคเคฎ: เคŸเฅ‚เคตเคฐเฅเคกเฅเคธ เค•เฅ‰เคฎเฅเคชเฅเคฐเคฟเคนเฅ‡เค‚เคธเคฟเคต เคŸเฅ‡เคฌเคฒ เคเค•เฅเคธเคŸเฅเคฐเฅˆเค•เฅเคถเคจ เคซเฅเคฐเฅ‰เคฎ เค…เคจเคธเฅเคŸเฅเคฐเค•เฅเคšเคฐเฅเคก เคกเฅ‰เค•เฅเคฏเฅ‚เคฎเฅ‡เค‚เคŸเฅเคธ ](https://arxiv.org/abs/2110.00061) เคฌเฅเคฐเฅˆเค‚เคกเคจ เคธเฅเคฎเฅ‰เค•, เคฐเฅ‹เคนเคฟเคค เคชเฅ‡เคธเคพเคฒเคพ, เคฐเฅ‰เคฌเคฟเคจ เค…เคฌเฅเคฐเคพเคนเคฎ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (Google AI เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [TAPAS: เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ เค•เคฎเคœเฅ‹เคฐ เคชเคฐเฅเคฏเคตเฅ‡เค•เฅเคทเคฃ เคคเคพเคฒเคฟเค•เคพ เคชเคพเคฐเฅเคธเคฟเค‚เค—](https:// arxiv.org/abs/2004.02349) เคœเฅ‹เคจเคพเคฅเคจ เคนเคฐเฅเคœเคผเคฟเค—, เคชเคพเคตเฅ‡เคฒ เค•เฅเคฐเคฟเคœเคผเคฟเคธเฅเคคเฅ‹เคซเคผ เคจเฅ‹เคตเคพเค•, เคฅเฅ‰เคฎเคธ เคฎเฅเคฒเคฐ, เคซเฅเคฐเคพเค‚เคธเฅ‡เคธเฅเค•เฅ‹ เคชเคฟเค•เคฟเคจเฅเคจเฅ‹ เค”เคฐ เคœเฅ‚เคฒเคฟเคฏเคจ เคฎเคพเคฐเฅเคŸเคฟเคจ เคˆเคธเฅ‡เคจเฅเคšเฅเคฒเฅ‹เคธ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [TAPEX: เคŸเฅ‡เคฌเคฒ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฅเฅเคฐเฅ‚ เคฒเคฐเฅเคจเคฟเค‚เค— เค… เคจเฅเคฏเฅ‚เคฐเคฒ SQL เคเค•เฅเคœเคผเฅ€เค•เฅเคฏเฅ‚เคŸเคฐ](https: //arxiv.org/abs/2107.07653) เค•เคฟเคฏเคพเคจ เคฒเคฟเคฏเฅ‚, เคฌเฅ‡เคˆ เคšเฅ‡เคจ, เคœเคฟเคฏเคพเค•เฅ€ เค—เฅเค“, เคฎเฅ‹เคฐเฅเคŸเฅ‡เคœเคผเคพ เคœเคผเคฟเคฏเคพเคฆเฅ€, เคœเคผเฅ‡เค•เฅ€ เคฒเคฟเคจ, เคตเฅ€เคœเคผเฅ‚ เคšเฅ‡เคจ, เคœเคฟเคฏเคพเคจ-เค—เฅเค†เค‚เค— เคฒเฅ‚ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (Google/CMU เค•เฅ€ เค“เคฐ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคธเค‚เคธเฅเค•เคฐเคฃ-เคเค•เฅเคธ: เคเค• เคฌเฅเคฒเฅ‰เค— เคฎเฅ‰เคกเคฒ เคšเฅŒเค•เคธ เคšเฅŒเค• เคฎเฅ‰เคกเคฒ เคฎเฅ‰เคกเคฒ] (https://arxivorg/abs/1901.02860) เค•เฅเคตเฅ‹เค•เฅ‹เค• เคตเฅ€. เคฒเฅ‡, เคฐเฅเคธเฅเคฒเฅˆเคจ เคธเคฒเคพเค–เฅเคคเคฆเฅ€ -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research เคธเฅ‡) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคฆเคฟเคฏเคพ เค—เคฏเคพ เคชเฅ‡เคชเคฐ [UniSpeech: เคฏเฅ‚เคจเคฟเคซเคพเค‡เคก เคธเฅเคชเฅ€เคš เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคตเคฟเคฆ เคฒเฅ‡เคฌเคฒเฅ‡เคก เคเค‚เคก เค…เคจเคฒเฅ‡เคฌเคฒเฅเคก เคกเฅ‡เคŸเคพ](https:/ /arxiv.org/abs/2101.07597) เคšเฅ‡เค‚เค—เคˆ เคตเคพเค‚เค—, เคฏเฅ‚ เคตเฅ‚, เคฏเคพเค“ เค•เคฟเคฏเคพเคจ, เค•เฅ‡เคจเคฟเคšเฅ€ เค•เฅเคฎเคพเคคเคพเคจเฅ€, เคถเฅเคœเฅ€ เคฒเคฟเคฏเฅ‚, เคซเฅเคฐเฅ เคตเฅ‡เคˆ, เคฎเคพเค‡เค•เคฒ เคœเคผเฅ‡เค‚เค—, เคœเคผเฅเคเคฆเฅ‹เค‚เค— เคนเฅเค†เค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [UNISPEECH-SAT: เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฒ เคธเฅเคชเฅ€เคš เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคตเคฟเคฆ เคธเฅเคชเฅ€เค•เคฐ เค…เคตเฅ‡เคฏเคฐ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— ](https://arxiv.org/abs/2110.05752) เคธเคพเคจเคฏเฅเค†เคจ เคšเฅ‡เคจ, เคฏเฅ‚ เคตเฅ‚, เคšเฅ‡เค‚เค—เฅเคฏเฅ€ เคตเคพเค‚เค—, เคเฅ‡เค‚เค—เคฏเคพเค‚เค— เคšเฅ‡เคจ, เคเฅ‚เค“ เคšเฅ‡เคจ, เคถเฅเคœเฅ€ เคฒเคฟเคฏเฅ‚, เคœเคฟเคฏเคพเคจ เคตเฅ‚, เคฏเคพเค“ เค•เคฟเคฏเคพเคจ, เคซเฅเคฐเฅ เคตเฅ‡เคˆ, เคœเคฟเคจเฅเคฏเฅ เคฒเฅ€, เคœเคฟเคฏเคพเค‚เค—เคœเคผเคพเคจ เคฏเฅ‚ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (เคธเคฟเค‚เค˜เฅเค† เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เค”เคฐ เคจเคจเค•เคพเคˆ เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคตเคฟเคœเฅเค…เคฒ เค…เคŸเฅ‡เค‚เคถเคจ เคจเฅ‡เคŸเคตเคฐเฅเค•](https://arxiv.org/ pdf/2202.09741.pdf) เคฎเฅ‡เค‚เค—-เคนเคพเค“ เค—เฅเค“, เคšเฅ‡เค‚เค—-เคœเคผเฅ‡ เคฒเฅ‚, เคเฅ‡เค‚เค—-เคจเคฟเค‚เค— เคฒเคฟเคฏเฅ‚, เคฎเคฟเค‚เค—-เคฎเคฟเค‚เค— เคšเฅ‡เค‚เค—, เคถเคฟ-เคฎเคฟเคจ เคนเฅ‚ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (เคฎเคฒเฅเคŸเฅ€เคฎเฅ€เคกเคฟเคฏเคพ เค•เคฎเฅเคชเฅเคฏเฅ‚เคŸเคฟเค‚เค— เค—เฅเคฐเฅเคช, เคจเคพเคจเคœเคฟเค‚เค— เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคตเฅ€เคกเคฟเคฏเฅ‹เคเคฎเคเคˆ: เคฎเคพเคธเฅเค•เฅเคก เค‘เคŸเฅ‹เคเคจเฅเค•เฅ‹เคกเคฐ เคธเฅเคต-เคชเคฐเฅเคฏเคตเฅ‡เค•เฅเคทเคฟเคค เคตเฅ€เคกเคฟเคฏเฅ‹ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เค•เฅ‡ เคฒเคฟเค เคกเฅ‡เคŸเคพ-เค•เฅเคถเคฒ เคธเฅ€เค–เคจเฅ‡ เคตเคพเคฒเฅ‡ เคนเฅˆเค‚] (https://arxiv.org/abs/2203.12602) เคœเคผเคพเคจ เคŸเฅ‹เค‚เค—, เคฏเคฟเคฌเคฟเค‚เค— เคธเฅ‰เคจเฅเค—, เคœเฅเค เคฆเฅเคตเคพเคฐเคพ เคตเคพเค‚เค—, เคฒเคฟเคฎเคฟเคจ เคตเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (NAVER AI Lab/Kakao Enterprise/Kakao Brain เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [ViLT: Vision-and-Language Transformer เคฌเคฟเคจเคพ เค•เคจเคตเคฒเฅเคถเคจ เคฏเคพ เคฐเฅ€เคœเคจ เคธเฅเคชเคฐเคตเคฟเคœเคจ](https://arxiv.org/abs/2102.03334) เคตเฅ‹เคจเคœเฅ‡ เค•เคฟเคฎ, เคฌเฅ‹เค•เฅเคฏเฅ‚เค‚เค— เคธเฅ‹เคจ, เค‡เคฒเฅเคกเฅ‚ เค•เคฟเคฎ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (University of Wisconsinโ€“Madison เคธเฅ‡) Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (เค—เฅ‚เค—เคฒ เคเค†เคˆ เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคเค• เค‡เคฎเฅ‡เคœ เค‡เคœเคผ เคตเคฐเฅเคฅ 16x16 เคตเคฐเฅเคกเฅเคธ: เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคซเฅ‰เคฐ เค‡เคฎเฅ‡เคœ เคฐเคฟเค•เฅ‰เค—เฅเคจเคฟเคถเคจ เคเคŸ เคธเฅเค•เฅ‡เคฒ](https://arxiv.org/abs/2010.11929) เคเคฒเฅ‡เค•เฅเคธเฅ€ เคกเฅ‹เคธเฅ‹เคตเคฟเคคเฅเคธเฅเค•เฅ€, เคฒเฅเค•เคพเคธ เคฌเฅ‡เคฏเคฐ, เค…เคฒเฅ‡เค•เฅเคœเฅ‡เค‚เคกเคฐ เค•เฅ‹เคฒเฅ‡เคธเคจเคฟเค•เฅ‹เคต, เคกเคฟเคฐเฅเค• เคตเฅ€เคธเฅ‡เคจเคฌเฅ‹เคฐเฅเคจ, เคถเคฟเคฏเคพเค“เคนเฅเค† เคเคพเคˆ, เคฅเฅ‰เคฎเคธ เค…เคจเคŸเคฐเคฅเคฟเคจเคฐ, เคฎเฅเคธเฅเคคเคซเคพ เคฆเฅ‡เคนเค˜เคพเคจเฅ€, เคฎเฅˆเคฅเคฟเคฏเคพเคธ เคฎเคฟเค‚เคกเคฐเคฐ, เคœเฅ‰เคฐเฅเคœ เคนเฅ‡เค—เฅ‹เคฒเฅเคก, เคธเคฟเคฒเฅเคตเฅ‡เคจ เค—เฅ‡เคฒเฅ€, เคœเฅˆเค•เคฌ เค‰เคธเฅเคœเคผเค•เฅ‹เคฐเฅ‡เค‡เคŸ เคฆเฅเคตเคพเคฐเคพ เคนเฅ‰เคฒเฅเคธเคฌเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (UCLA NLP เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [VisualBERT: A Simple and Performant Baseline for Vision and Language](https:/ /arxiv.org/pdf/1908.03557) เคฒเคฟเคฏเฅเคจเคฟเคฏเคจ เคนเฅ‡เคฐเฅ‹เคฒเฅเคก เคฒเฅ€, เคฎเคพเคฐเฅเค• เคฏเคพเคคเฅเคธเฅเค•เคฐ, เคฆเคพ เคฏเคฟเคจ, เคšเฅ‹-เคœเฅเคˆ เคนเคธเฅ€เคน, เค•เคพเคˆ-เคตเฅ‡เคˆ เคšเคพเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (Meta AI เคธเฅ‡) Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (เคฎเฅ‡เคŸเคพ เคเค†เคˆ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฎเคพเคธเฅเค•เคก เค‘เคŸเฅ‹เคเคจเฅเค•เฅ‹เคกเคฐ เคธเฅเค•เฅ‡เคฒเฅ‡เคฌเคฒ เคตเคฟเคœเคจ เคฒเคฐเฅเคจเคฐเฅเคธ เคนเฅˆเค‚](https://arxiv.org/ เคเคฌเฅเคธ/2111.06377) เค•เฅˆเคฎเคฟเค‚เค— เคนเฅ‡, เคœเคผเคฟเคจเฅ‡เคฒเฅ€ เคšเฅ‡เคจ, เคธเฅ‡เคจเคฟเค‚เค— เคœเคผเฅ€, เคฏเคพเค‚เค—เคนเฅ‹ เคฒเฅ€, เคชเคฟเค“เคŸเฅเคฐ เคกเฅ‰เคฒเคฐ, เคฐเฅ‰เคธ เค—เคฟเคฐเฅเคถเคฟเค• เคฆเฅเคตเคพเคฐเคพเฅค -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (HUST-VL เคธเฅ‡) Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (เคฎเฅ‡เคŸเคพ เคเค†เคˆ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฒเฅ‡เคฌเคฒ-เค•เฅเคถเคฒ เคธเฅ€เค–เคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคฎเคพเคธเฅเค•เฅเคก เคธเฅเคฏเคพเคฎ เคฆเฅ‡เคถ เค•เฅ‡ เคจเฅ‡เคŸเคตเคฐเฅเค•](https://arxiv. org/abs/2204.07141) เคฎเคนเคฎเฅ‚เคฆ เค…เคธเคฐเคพเคจ, เคฎเคฅเคฟเคฒเฅเคกเฅ‡ เค•เฅˆเคฐเคจ, เคˆเคถเคพเคจ เคฎเคฟเคถเฅเคฐเคพ, เคชเคฟเคฏเฅ‹เคŸเฅเคฐ เคฌเฅ‹เคœเคพเคจเฅ‹เคตเคธเฅเค•เฅ€, เคซเฅเคฒเฅ‹เคฐเคฟเคฏเคจ เคฌเฅ‹เคฐเฅเคกเฅ‡เคธ, เคชเคพเคธเฅเค•เคฒ เคตเคฟเค‚เคธเฅ‡เค‚เคŸ, เค†เคฐเฅเคฎเค‚เคก เคœเฅŒเคฒเคฟเคจ, เคฎเคพเค‡เค•เคฒ เคฐเคฌเฅเคฌเคค, เคจเคฟเค•เฅ‹เคฒเคธ เคฌเคฒเฅเคฒเคพเคธ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (Kakao Enterprise เคธเฅ‡) Jaehyeon Kim, Jungil Kong, Juhee Son. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (เคซเฅ‡เคธเคฌเฅเค• เคเค†เคˆ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [wav2vec 2.0: เค เคซเฅเคฐเฅ‡เคฎเคตเคฐเฅเค• เคซเฅ‰เคฐ เคธเฅ‡เคฒเฅเคซ-เคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เคฒเคฐเฅเคจเคฟเค‚เค— เค‘เคซ เคธเฅเคชเฅ€เคš เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ](https://arxiv.org/abs/2006.11477) เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเฅ‡เคตเคธเฅเค•เฅ€, เคนเฅ‡เคจเคฐเฅ€ เคเฅ‹เค‰, เค…เคฌเฅเคฆเฅ‡เคฒเคฐเคนเคฎเคพเคจ เคฎเฅ‹เคนเคฎเฅเคฎเคฆ, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (Facebook AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [FAIRSEQ S2T: FAIRSEQ เค•เฅ‡ เคธเคพเคฅ เคซเคพเคธเฅเคŸ เคธเฅเคชเฅ€เคš-เคŸเฅ‚-เคŸเฅ‡เค•เฅเคธเฅเคŸ เคฎเฅ‰เคกเคฒเคฟเค‚เค— ](https://arxiv.org/abs/2010.05171) เคšเคพเค‚เค—เคนเคพเคจ เคตเคพเค‚เค—, เคฏเฅ‚เค‚ เคคเคพเค‚เค—, เคœเฅเคคเคพเคˆ เคฎเคพ, เคเคจเฅ€ เคตเฅ‚, เคธเคฐเคตเฅเคฏเคพ เคชเฅ‹เคชเฅเคฐเฅ€, เคฆเคฟเคฎเคฟเคคเฅเคฐเฅ‹ เค“เค–เฅ‹เคจเค•เฅ‹, เคœเฅเค†เคจ เคชเคฟเคจเฅ‹ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (Facebook AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [เคธเคฐเคฒ เค”เคฐ เคชเฅเคฐเคญเคพเคตเฅ€ เคœเฅ€เคฐเฅ‹-เคถเฅ‰เคŸ เค•เฅเคฐเฅ‰เคธ-เคฒเคฟเค‚เค—เฅเค…เคฒ เคซเฅ‹เคจเฅ‡เคฎ เคฐเคฟเค•เฅ‰เค—เฅเคจเคฟเคถเคจ](https://arxiv.org/abs/2109.11680) เค•เคฟเคฏเคพเคจเคŸเฅ‹เค‚เค— เคœเฅ‚, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเคพเคเคตเฅเคธเฅเค•เฅ€, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคชเฅ‡เคชเคฐ เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ [WavLM: เคซเฅเคฒ เคธเฅเคŸเฅˆเค• เค•เฅ‡ เคฒเคฟเค เคฌเคกเคผเฅ‡ เคชเฅˆเคฎเคพเคจเฅ‡ เคชเคฐ เคธเฅเคต-เคชเคฐเฅเคฏเคตเฅ‡เค•เฅเคทเคฟเคค เคชเฅ‚เคฐเฅเคต-เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เคธเฅเคชเฅ€เคš เคชเฅเคฐเฅ‹เคธเฅ‡เคธเคฟเค‚เค—](https://arxiv.org/abs/2110.13900) เคธเคพเคจเคฏเฅเค†เคจ เคšเฅ‡เคจ, เคšเฅ‡เค‚เค—เคฏเฅ€ เคตเคพเค‚เค—, เคเฅ‡เค‚เค—เคฏเคพเค‚เค— เคšเฅ‡เคจ, เคฏเฅ‚ เคตเฅ‚, เคถเฅเคœเฅ€ เคฒเคฟเคฏเฅ‚, เคœเคผเฅเค“ เคšเฅ‡เคจ, เคœเคฟเคจเฅเคฏเฅ เคฒเฅ€, เคจเคพเค“เคฏเฅเค•เฅ€ เค•เคพเค‚เคกเคพ, เคคเคพเค•เฅเคฏเคพ เคฏเฅ‹เคถเคฟเคฏเฅ‹เค•เคพ, เคœเคผเคฟเค“เค‚เค— เคœเคฟเค“, เคœเคฟเคฏเคพเคจ เคตเฅ‚, เคฒเฅ‰เคจเฅเค— เคเฅ‹เค‰, เคถเฅเค“ เคฐเฅ‡เคจ, เคฏเคพเคจเคฎเคฟเคจ เค•เคฟเคฏเคพเคจ, เคฏเคพเค“ เค•เคฟเคฏเคพเคจ, เคœเคฟเคฏเคพเคจ เคตเฅ‚, เคฎเคพเค‡เค•เคฒ เคœเคผเฅ‡เค‚เค—, เคซเฅเคฐเฅ เคตเฅ‡เคˆเฅค -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (OpenAI เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฌเคกเคผเฅ‡ เคชเฅˆเคฎเคพเคจเฅ‡ เคชเคฐ เค•เคฎเคœเฅ‹เคฐ เคชเคฐเฅเคฏเคตเฅ‡เค•เฅเคทเคฃ เค•เฅ‡ เคฎเคพเคงเฅเคฏเคฎ เคธเฅ‡ เคฎเคœเคฌเฅ‚เคค เคญเคพเคทเคฃ เคชเคนเคšเคพเคจ](https://cdn. openai.com/papers/whisper.pdf) เคเคฒเฅ‡เค• เคฐเฅˆเคกเคซเฅ‹เคฐเฅเคก, เคœเฅ‹เค‚เค— เคตเฅ‚เค• เค•เคฟเคฎ, เคคเคพเค“ เคœเฅ‚, เค—เฅเคฐเฅ‡เค— เคฌเฅเคฐเฅ‰เค•เคฎเฅˆเคจ, เค•เฅเคฐเคฟเคธเฅเคŸเฅ€เคจ เคฎเฅˆเค•เคฒเฅ€เคตเฅ‡, เค‡เคฒเฅเคฏเคพ เคธเฅเคคเฅเคธเฅเค•เฅ‡เคตเคฐ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เค•เคพเค—เคœ เค•เฅ‡ เคธเคพเคฅ [เคเค•เฅเคธเคชเฅˆเค‚เคกเคฟเค‚เค— เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ-เค‡เคฎเฅ‡เคœ เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเฅเคก เคฎเฅ‰เคกเคฒ เคซเฅ‰เคฐ เคœเคจเคฐเคฒ เคตเฅ€เคกเคฟเคฏเฅ‹ เคฐเคฟเค•เค—เฅเคจเคฟเคถเคจ](https://arxiv.org/abs/2208.02816) เคฌเฅ‹เคฒเคฟเคจ เคจเฅ€, เคนเฅ‹เค‰เคตเฅ‡เคจ เคชเฅ‡เค‚เค—, เคฎเคฟเค‚เค—เคพเค“ เคšเฅ‡เคจ, เคธเฅ‹เค‚เค—เคฏเคพเค‚เค— เคเคพเค‚เค—, เค—เคพเค“เคซเฅ‡เค‚เค— เคฎเฅ‡เค‚เค—, เคœเคฟเคฏเคพเคจเคฒเฅ‹เค‚เค— เคซเฅ‚, เคถเคฟเคฎเคฟเค‚เค— เคœเคฟเคฏเคพเค‚เค—, เคนเฅˆเคฌเคฟเคจ เคฒเคฟเค‚เค— เคฆเฅเคตเคพเคฐเคพเฅค -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (Meta AI เคธเฅ‡) Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. เคฆเฅเคตเคพเคฐเคพเค…เคจเฅเคธเค‚เคงเคพเคจ เคชเคคเฅเคฐ [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) เค•เฅ‡ เคธเคพเคฅ เคœเคพเคฐเฅ€ เค•เคฟเคฏเคพ เค—เคฏเคพ -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (เคซเฅ‡เคธเคฌเฅเค• เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เค•เฅเคฐเฅ‰เคธ-เคฒเคฟเค‚เค—เฅเค…เคฒ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เคฎเฅ‰เคกเคฒ เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—] (https://arxiv.org/abs/1901.07291) เค—เคฟเคฒเคพเค‰เคฎ เคฒเฅˆเคฎเฅเคชเคฒ เค”เคฐ เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เค•เฅ‹เคจเฅ‹ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (เคฎเคพเค‡เค•เฅเคฐเฅ‹เคธเฅ‰เคซเฅเคŸ เคฐเคฟเคธเคฐเฅเคš เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [ProphetNet: เคชเฅเคฐเฅ‡เคกเคฟเค•เฅเคŸเคฟเค‚เค— เคซเฅเคฏเฅ‚เคšเคฐ เคเคจ-เค—เฅเคฐเคพเคฎ เคซเฅ‰เคฐ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ-เคŸเฅ‚- เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ เคชเฅเคฐเฅ€-เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค—](https://arxiv.org/abs/2001.04063) เคฏเฅ‚ เคฏเคพเคจ, เคตเฅ€เคœเคผเฅ‡เคจ เค•เฅเคฏเฅ‚เคˆ, เคฏเฅ‡เคฏเฅเคจ เค—เฅ‹เค‚เค—, เคฆเคฏเคพเคนเฅ‡เค‚เค— เคฒเคฟเคฏเฅ‚, เคจเคพเคจ เคกเฅเค†เคจ, เคœเคฟเค‰เคถเฅ‡เค‚เค— เคšเฅ‡เคจ, เคฐเฅเค“เคซเคผเฅ‡เคˆ เคเคพเค‚เค— เค”เคฐ เคฎเคฟเค‚เค— เคเฅ‹เค‰ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (เคซเฅ‡เคธเคฌเฅเค• เคเค†เคˆ เคธเฅ‡), เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เค…เคจเคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เค•เฅเคฐเฅ‰เคธ-เคฒเคฟเค‚เค—เฅเค…เคฒ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคเคŸ เคธเฅเค•เฅ‡เคฒ] (https://arxiv.org/abs/1911.02116) เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เค•เฅ‹เคจเฅเคฏเฅ‚*, เค•เคพเคฐเฅเคคเคฟเค•เฅ‡เคฏ เค–เค‚เคกเฅ‡เคฒเคตเคพเคฒ*, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคตเคฟเคถเฅเคฐเคต เคšเฅŒเคงเคฐเฅ€, เค—เคฟเคฒเคพเค‰เคฎ เคตเฅ‡เคจเคœเคผเฅ‡เค•, เคซเฅเคฐเคพเค‚เคธเคฟเคธเฅเค•เฅ‹ เค—เฅเคœเคผเคฎเฅˆเคจ เคฆเฅเคตเคพเคฐเคพ , เคเคกเฅŒเคฐเฅเคก เค—เฅเคฐเฅ‡เคต, เคฎเคพเคฏเคฒ เค“เคŸ, เคฒเฅเคฏเฅ‚เค• เคœเคผเฅ‡เคŸเคฒเคฎเฅ‰เคฏเคฐ เค”เคฐ เคตเฅ‡เคธเฅ‡เคฒเคฟเคจ เคธเฅเคŸเฅ‹เคฏเคพเคจเฅ‹เคต เคฆเฅเคตเคพเคฐเคพเฅค -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (Facebook AI เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เค•เคพเค—เคœ [เคฌเคนเฅเคญเคพเคทเฅ€ เคจเค•เคพเคฌเคชเฅ‹เคถ เคญเคพเคทเคพ เค•เฅ‡ เคฒเคฟเค เคฌเคกเคผเฅ‡ เคชเฅˆเคฎเคพเคจเฅ‡ เคชเคฐ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ ] เคฎเฅ‰เคกเคฒเคฟเค‚เค—](https://arxiv.org/abs/2105.00572) เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เคœเคฟเค‚เค—เคซเฅ‡เคˆ เคกเฅ‚, เคฎเคพเคฏเคฒ เค“เคŸ, เค—เคฟเคฐเคฟ เค…เคจเค‚เคคเคฐเคพเคฎเคจ, เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เค•เฅ‹เคจเฅ‹ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (Google/CMU เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [XLNet: เคœเคจเคฐเคฒเคพเค‡เคœเฅเคก เค‘เคŸเฅ‹เคฐเฅ‡เค—เฅเคฐเฅ‡เคธเคฟเคต เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคซเฅ‰เคฐ เคฒเฅˆเค‚เค—เฅเคตเฅ‡เคœ เค…เค‚เคกเคฐเคธเฅเคŸเฅˆเค‚เคกเคฟเค‚เค—](https://arxiv เคœเคผเฅ€เคฒเคฟเคจ เคฏเคพเค‚เค—*, เคœเคผเคฟเคนเคพเค‚เค— เคฆเคพเคˆ*, เคฏเคฟเคฎเคฟเค‚เค— เคฏเคพเค‚เค—, เคœเฅˆเคฎ เค•เคพเคฐเฅเคฌเฅ‹เคจเฅ‡เคฒ, เคฐเฅเคธเฅเคฒเคพเคจ เคธเคฒเคพเค–เฅเคคเคฆเฅ€เคจเฅ‹เคต, เค•เฅเคตเฅ‹เค• เคตเฅ€. เคฒเฅ‡ โ€‹โ€‹เคฆเฅเคตเคพเคฐเคพ .org/abs/1906.08237)เฅค -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (Facebook AI เคธเฅ‡) เคธเคพเคฅ เคตเคพเคฒเคพ เคชเฅ‡เคชเคฐ [XLS-R: เคธเฅ‡เคฒเฅเคซ เคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เค•เฅเคฐเฅ‰เคธ-เคฒเคฟเค‚เค—เฅเค…เคฒ เคธเฅเคชเฅ€เคš เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคเคŸ เคธเฅเค•เฅ‡เคฒ](https://arxiv.org/abs/2111.09296) เค…เคฐเฅเคฃ เคฌเคพเคฌเฅ‚, เคšเคพเค‚เค—เคนเคพเคจ เคตเคพเค‚เค—, เคเค‚เคกเฅเคฐเฅ‹เคธ เคคเคœเค‚เคฆเฅเคฐเคพ, เค•เฅเคถเคพเคฒ เคฒเค–เฅ‹เคŸเคฟเคฏเคพ, เค•เคฟเคฏเคพเคจเคŸเฅ‹เค‚เค— เคœเฅ‚, เคจเคฎเคจ เค—เฅ‹เคฏเคฒ, เค•เฅƒเคคเคฟเค•เคพ เคธเคฟเค‚เคน, เคชเฅˆเคŸเฅเคฐเคฟเค• เคตเฅ‰เคจ เคชเฅเคฒเฅˆเคŸเคจ, เคฏเคพเคฅเคพเคฐเฅเคฅ เคธเคฐเคพเคซ, เคœเฅเค†เคจ เคชเคฟเคจเฅ‹, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเฅ‡เคตเคธเฅเค•เฅ€, เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เค•เฅ‹เคจเฅเคฏเฅ‚, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (เคซเฅ‡เคธเคฌเฅเค• เคเค†เคˆ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เค…เคจเคธเฅเคชเคฐเคตเคพเค‡เคœเฅเคก เค•เฅเคฐเฅ‰เคธ-เคฒเคฟเค‚เค—เฅเค…เคฒ เคฐเคฟเคชเฅเคฐเฅ‡เคœเฅ‡เค‚เคŸเฅ‡เคถเคจ เคฒเคฐเฅเคจเคฟเค‚เค— เคซเฅ‰เคฐ เคธเฅเคชเฅ€เคš เคฐเคฟเค•เค—เฅเคจเคฟเคถเคจ] (https://arxiv.org/abs/2006.13979) เคเคฒเฅ‡เค•เฅเคธเคฟเคธ เค•เฅ‹เคจเฅเคฏเฅ‚, เคเคฒเฅ‡เค•เฅเคธเฅ€ เคฌเฅ‡เคตเคธเฅเค•เฅ€, เคฐเฅ‹เคจเคจ เค•เฅ‹เคฒเฅ‹เคฌเคฐเฅเคŸ, เค…เคฌเฅเคฆเฅ‡เคฒเคฐเคนเคฎเคพเคจ เคฎเฅ‹เคนเคฎเฅเคฎเคฆ, เคฎเคพเค‡เค•เคฒ เค”เคฒเฅ€ เคฆเฅเคตเคพเคฐเคพเฅค -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (เคนเฅเค†เคเฅ‹เค‚เค— เคฏเฅ‚เคจเคฟเคตเคฐเฅเคธเคฟเคŸเฅ€ เค‘เคซ เคธเคพเค‡เค‚เคธ เคเค‚เคก เคŸเฅ‡เค•เฅเคจเฅ‹เคฒเฅ‰เคœเฅ€ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฏเฅ‚ เค“เคจเคฒเฅ€ เคฒเฅเค• เคเคŸ เคตเคจ เคธเฅ€เค•เฅเคตเฅ‡เค‚เคธ: เคฐเฅ€เคฅเคฟเค‚เค•เคฟเค‚เค— เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เค‡เคจ เคตเคฟเคœเคผเคจ เคฅเฅเคฐเฅ‚ เค‘เคฌเฅเคœเฅ‡เค•เฅเคŸ เคกเคฟเคŸเฅ‡เค•เฅเคถเคจ](https://arxiv.org/abs/2106.00666) เคฏเฅเค•เฅเคธเคฟเคจ เคซเฅ‡เค‚เค—, เคฌเฅ‡เคจเคšเฅ‡เค‚เค— เคฒเคฟเคฏเคพเค“, เคœเคฟเค‚เค—เค—เฅˆเค‚เค— เคตเคพเค‚เค—, เคœเฅ‡เคฎเคฟเคจ เคซเฅ‡เค‚เค—, เคœเคฟเคฏเคพเค‚เค— เค•เฅเคฏเฅ‚เคˆ, เคฐเฅเคˆ เคตเฅ‚, เคœเคฟเคฏเคพเคจเคตเฅ‡เคˆ เคจเฅ€เคฏเฅ‚, เคตเฅ‡เคจเฅเคฏเฅ‚ เคฒเคฟเคฏเฅ‚ เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (เคตเคฟเคธเฅเค•เฅ‰เคจเฅเคธเคฟเคจ เคตเคฟเคถเฅเคตเคตเคฟเคฆเฅเคฏเคพเคฒเคฏ - เคฎเฅˆเคกเคฟเคธเคจ เคธเฅ‡) เคธเคพเคฅ เคฎเฅ‡เค‚ เคชเฅ‡เคชเคฐ [เคฏเฅ‚ เค“เคจเคฒเฅ€ เคธเฅˆเค‚เคชเคฒ (เคฒเค—เคญเค—) เคœเคผเคพเคจเคชเฅ‡เค‚เค— เคœเคผเฅ‡เค‚เค—, เคฏเฅเคจเคฏเคพเค‚เค— เคœเคผเคฟเค“เค‚เค— เคฆเฅเคตเคพเคฐเคพ , เคธเคคเฅเคฏ เคเคจ. เคฐเคตเคฟ, เคถเฅˆเคฒเฅ‡เคถ เค†เคšเคพเคฐเฅเคฏ, เค—เฅเคฒเฅ‡เคจ เคซเค‚เค—, เคตเคฟเค•เคพเคธ เคธเคฟเค‚เคน เคฆเฅเคตเคพเคฐเคพ เคชเฅ‹เคธเฅเคŸ เค•เคฟเคฏเคพ เค—เคฏเคพเฅค -1. เคเค• เคจเค เคฎเฅ‰เคกเคฒ เคฎเฅ‡เค‚ เคฏเฅ‹เค—เคฆเคพเคจ เคฆเฅ‡เคจเคพ เคšเคพเคนเคคเฅ‡ เคนเฅˆเค‚? เคจเค เคฎเฅ‰เคกเคฒ เคœเฅ‹เคกเคผเคจเฅ‡ เคฎเฅ‡เค‚ เค†เคชเค•เคพ เคฎเคพเคฐเฅเค—เคฆเคฐเฅเคถเคจ เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคนเคฎเคพเคฐเฅ‡ เคชเคพเคธ เคเค• **เคตเคฟเคธเฅเคคเฅƒเคค เคฎเคพเคฐเฅเค—เคฆเคฐเฅเคถเคฟเค•เคพ เค”เคฐ เคŸเฅ‡เคฎเฅเคชเฅเคฒเฅ‡เคŸ** เคนเฅˆเฅค เค†เคช เค‰เคจเฅเคนเฅ‡เค‚ [`เคŸเฅ‡เคฎเฅเคชเคฒเฅ‡เคŸเฅเคธ`](./templates) เคจเคฟเคฐเฅเคฆเฅ‡เคถเคฟเค•เคพ เคฎเฅ‡เค‚ เคชเคพ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค เคชเฅ€เค†เคฐ เคถเฅเคฐเฅ‚ เค•เคฐเคจเฅ‡ เคธเฅ‡ เคชเคนเคฒเฅ‡ [เคฏเฅ‹เค—เคฆเคพเคจ เคฆเคฟเคถเคพเคจเคฟเคฐเฅเคฆเฅ‡เคถ](./CONTRIBUTING.md) เคฆเฅ‡เค–เคจเคพ เค”เคฐ เค…เคจเฅเคฐเค•เฅเคทเค•เฅ‹เค‚ เคธเฅ‡ เคธเค‚เคชเคฐเฅเค• เค•เคฐเคจเคพ เคฏเคพ เคชเฅเคฐเคคเคฟเค•เฅเคฐเคฟเคฏเคพ เคชเฅเคฐเคพเคชเฅเคค เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เคเค• เคจเคฏเคพ เคฎเฅเคฆเฅเคฆเคพ เค–เฅ‹เคฒเคจเคพ เคฏเคพเคฆ เคฐเค–เฅ‡เค‚เฅค - -เคฏเคน เคœเคพเค‚เคšเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค เค•เคฟ เค•เฅเคฏเคพ เค•เคฟเคธเฅ€ เคฎเฅ‰เคกเคฒ เคฎเฅ‡เค‚ เคชเคนเคฒเฅ‡ เคธเฅ‡ เคนเฅ€ Flax, PyTorch เคฏเคพ TensorFlow เค•เคพ เค•เคพเคฐเฅเคฏเคพเคจเฅเคตเคฏเคจ เคนเฅˆ, เคฏเคพ เคฏเคฆเคฟ เค‰เคธเค•เฅ‡ เคชเคพเคธ Tokenizers เคฒเคพเค‡เคฌเฅเคฐเฅ‡เคฐเฅ€ เคฎเฅ‡เค‚ เคธเค‚เคฌเค‚เคงเคฟเคค เคŸเฅ‹เค•เคจ เคนเฅˆ, เคคเฅ‹ [เคฏเคน เคคเคพเคฒเคฟเค•เคพ](https://huggingface.co/docs/transformers/index#supported) เคฆเฅ‡เค–เฅ‡เค‚เฅค -เคซเฅเคฐเฅ‡เคฎเคตเคฐเฅเค•)เฅค - -เค‡เคจ เค•เคพเคฐเฅเคฏเคพเคจเฅเคตเคฏเคจเฅ‹เค‚ เค•เคพ เคชเคฐเฅ€เค•เฅเคทเคฃ เค•เคˆ เคกเฅ‡เคŸเคพเคธเฅ‡เคŸ เคชเคฐ เค•เคฟเคฏเคพ เค—เคฏเคพ เคนเฅˆ (เคฆเฅ‡เค–เฅ‡เค‚ เค•เฅ‡เคธ เคธเฅเค•เฅเคฐเคฟเคชเฅเคŸ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚) เค”เคฐ เคตเฅˆเคจเคฟเคฒเคพ เค•เคพเคฐเฅเคฏเคพเคจเฅเคตเคฏเคจ เค•เฅ‡ เคฒเคฟเค เคคเฅเคฒเคจเคพเคคเฅเคฎเค• เคฐเฅ‚เคช เคธเฅ‡ เคชเฅเคฐเคฆเคฐเฅเคถเคจ เค•เคฐเคจเคพ เคšเคพเคนเคฟเคเฅค เค†เคช เค‰เคชเคฏเฅ‹เค— เค•เฅ‡ เคฎเคพเคฎเคฒเฅ‡ เค•เฅ‡ เคฆเคธเฅเคคเคพเคตเฅ‡เคœเคผ [เค‡เคธ เค…เคจเฅเคญเคพเค—](https://huggingface.co/docs/transformers/examples) เคฎเฅ‡เค‚ เคตเฅเคฏเคตเคนเคพเคฐ เค•เคพ เคตเคฟเคตเคฐเคฃ เคชเคขเคผ เคธเค•เคคเฅ‡ เคนเฅˆเค‚เฅค - - -## เค…เคงเคฟเค• เคธเคฎเคเฅ‡เค‚ - -|เค…เคงเฅเคฏเคพเคฏ | เคตเคฟเคตเคฐเคฃ | -|-|-| -| [เคฆเคธเฅเคคเคพเคตเฅ‡เคœเคผเฅ€เค•เคฐเคฃ](https://huggingface.co/transformers/) | เคชเฅ‚เคฐเคพ เคเคชเฅ€เค†เคˆ เคฆเคธเฅเคคเคพเคตเฅ‡เคœเคผเฅ€เค•เคฐเคฃ เค”เคฐ เคŸเฅเคฏเฅ‚เคŸเฅ‹เคฐเคฟเคฏเคฒ | -| [เค•เคพเคฐเฅเคฏ เคธเคพเคฐเคพเค‚เคถ](https://huggingface.co/docs/transformers/task_summary) | เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคธเคฎเคฐเฅเคฅเคฟเคค เค•เคพเคฐเฅเคฏ | -| [เคชเฅเคฐเฅ€เคชเฅเคฐเฅ‹เคธเฅ‡เคธเคฟเค‚เค— เคŸเฅเคฏเฅ‚เคŸเฅ‹เคฐเคฟเคฏเคฒ](https://huggingface.co/docs/transformers/preprocessing) | เคฎเฅ‰เคกเคฒ เค•เฅ‡ เคฒเคฟเค เคกเฅ‡เคŸเคพ เคคเฅˆเคฏเคพเคฐ เค•เคฐเคจเฅ‡ เค•เฅ‡ เคฒเคฟเค `เคŸเฅ‹เค•เคจเคพเค‡เคœเคผเคฐ` เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคจเคพ | -| [เคชเฅเคฐเคถเคฟเค•เฅเคทเคฃ เค”เคฐ เคซเคพเค‡เคจ-เคŸเฅเคฏเฅ‚เคจเคฟเค‚เค—](https://huggingface.co/docs/transformers/training) | PyTorch/TensorFlow เค•เฅ‡ เคŸเฅเคฐเฅ‡เคจเคฟเค‚เค— เคฒเฅ‚เคช เคฏเคพ `เคŸเฅเคฐเฅ‡เคจเคฐ` API เคฎเฅ‡เค‚ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคฆเฅเคตเคพเคฐเคพ เคฆเคฟเค เค—เค เคฎเฅ‰เคกเคฒ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚ | -| [เค•เฅเคตเคฟเค• เคธเฅเคŸเคพเคฐเฅเคŸ: เคŸเฅเคตเฅ€เค•เคฟเค‚เค— เคเค‚เคก เคฏเฅ‚เคœเคผ เค•เฅ‡เคธ เคธเฅเค•เฅเคฐเคฟเคชเฅเคŸเฅเคธ](https://github.com/huggingface/transformers/tree/main/examples) | เคตเคฟเคญเคฟเคจเฅเคจ เค•เคพเคฐเฅเคฏเฅ‹เค‚ เค•เฅ‡ เคฒเคฟเค เค•เฅ‡เคธ เคธเฅเค•เฅเคฐเคฟเคชเฅเคŸ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเฅ‡เค‚ | -| [เคฎเฅ‰เคกเคฒ เคธเคพเคเคพ เค•เคฐเคจเคพ เค”เคฐ เค…เคชเคฒเฅ‹เคก เค•เคฐเคจเคพ](https://huggingface.co/docs/transformers/model_sharing) | เคธเคฎเฅเคฆเคพเคฏ เค•เฅ‡ เคธเคพเคฅ เค…เคชเคจเฅ‡ เคซเคพเค‡เคจ เคŸเฅ‚เคจเคก เคฎเฅ‰เคกเคฒ เค…เคชเคฒเฅ‹เคก เค”เคฐ เคธเคพเคเคพ เค•เคฐเฅ‡เค‚ | -| [เคฎเคพเค‡เค—เฅเคฐเฅ‡เคถเคจ](https://huggingface.co/docs/transformers/migration) | `เคชเคพเค‡เคŸเฅ‹เคฐเคš-เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐเฅเคธ` เคฏเคพ `เคชเคพเค‡เคŸเฅ‹เคฐเคš-เคชเฅเคฐเฅ€เคŸเฅเคฐเฅ‡เคจเคก-เคฌเคฐเฅเคŸ` เคธเฅ‡ เคŸเฅเคฐเคพเค‚เคธเคซเฅ‰เคฐเฅเคฎเคฐ เคฎเฅ‡เค‚ เคฎเคพเค‡เค—เฅเคฐเฅ‡เคŸ เค•เคฐเคจเคพ | - -## เค‰เคฆเฅเคงเคฐเคฃ - -เคนเคฎเคจเฅ‡ เค†เคงเคฟเค•เคพเคฐเคฟเค• เคคเฅŒเคฐ เคชเคฐ เค‡เคธ เคฒเคพเค‡เคฌเฅเคฐเฅ‡เคฐเฅ€ เค•เคพ [เคชเฅ‡เคชเคฐ](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) เคชเฅเคฐเค•เคพเคถเคฟเคค เค•เคฟเคฏเคพ เคนเฅˆ, เค…เค—เคฐ เค†เคช เคŸเฅเคฐเคพเคจเฅเคธเคซเคผเฅ‰เคฐเฅเคฎเคฐเฅเคธ เคฒเคพเค‡เคฌเฅเคฐเฅ‡เคฐเฅ€ เค•เคพ เค‰เคชเคฏเฅ‹เค— เค•เคฐเคคเฅ‡ เคนเฅˆเค‚, เคคเฅ‹ เค•เฅƒเคชเคฏเคพ เค‰เคฆเฅเคงเฅƒเคค เค•เคฐเฅ‡เค‚: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_ja.md b/README_ja.md deleted file mode 100644 index b87767cf37156a..00000000000000 --- a/README_ja.md +++ /dev/null @@ -1,579 +0,0 @@ - - - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

JAXใ€PyTorchใ€TensorFlowใฎใŸใ‚ใฎๆœ€ๅ…ˆ็ซฏๆฉŸๆขฐๅญฆ็ฟ’

-

- -

- -

- -๐Ÿค—Transformersใฏใ€ใƒ†ใ‚ญใ‚นใƒˆใ€่ฆ–่ฆšใ€้Ÿณๅฃฐใชใฉใฎ็•ฐใชใ‚‹ใƒขใƒ€ใƒชใƒ†ใ‚ฃใซๅฏพใ—ใฆใ‚ฟใ‚นใ‚ฏใ‚’ๅฎŸ่กŒใ™ใ‚‹ใŸใ‚ใซใ€ไบ‹ๅ‰ใซๅญฆ็ฟ’ใ•ใ›ใŸๆ•ฐๅƒใฎใƒขใƒ‡ใƒซใ‚’ๆไพ›ใ—ใพใ™ใ€‚ - -ใ“ใ‚Œใ‚‰ใฎใƒขใƒ‡ใƒซใฏๆฌกใฎใ‚ˆใ†ใชๅ ดๅˆใซ้ฉ็”จใงใใพใ™: - -* ๐Ÿ“ ใƒ†ใ‚ญใ‚นใƒˆใฏใ€ใƒ†ใ‚ญใ‚นใƒˆใฎๅˆ†้กžใ€ๆƒ…ๅ ฑๆŠฝๅ‡บใ€่ณชๅ•ๅฟœ็ญ”ใ€่ฆ็ด„ใ€็ฟป่จณใ€ใƒ†ใ‚ญใ‚นใƒˆ็”Ÿๆˆใชใฉใฎใ‚ฟใ‚นใ‚ฏใฎใŸใ‚ใซใ€100ไปฅไธŠใฎ่จ€่ชžใซๅฏพๅฟœใ—ใฆใ„ใพใ™ใ€‚ -* ๐Ÿ–ผ๏ธ ็”ปๅƒๅˆ†้กžใ€็‰ฉไฝ“ๆคœๅ‡บใ€ใ‚ปใ‚ฐใƒกใƒณใƒ†ใƒผใ‚ทใƒงใƒณใชใฉใฎใ‚ฟใ‚นใ‚ฏใฎใŸใ‚ใฎ็”ปๅƒใ€‚ -* ๐Ÿ—ฃ๏ธ ้Ÿณๅฃฐใฏใ€้Ÿณๅฃฐ่ช่ญ˜ใ‚„้Ÿณๅฃฐๅˆ†้กžใชใฉใฎใ‚ฟใ‚นใ‚ฏใซไฝฟ็”จใ—ใพใ™ใ€‚ - -ใƒˆใƒฉใƒณใ‚นใƒ•ใ‚ฉใƒผใƒžใƒผใƒขใƒ‡ใƒซใฏใ€ใƒ†ใƒผใƒ–ใƒซ่ณชๅ•ๅฟœ็ญ”ใ€ๅ…‰ๅญฆๆ–‡ๅญ—่ช่ญ˜ใ€ใ‚นใ‚ญใƒฃใƒณๆ–‡ๆ›ธใ‹ใ‚‰ใฎๆƒ…ๅ ฑๆŠฝๅ‡บใ€ใƒ“ใƒ‡ใ‚ชๅˆ†้กžใ€่ฆ–่ฆš็š„่ณชๅ•ๅฟœ็ญ”ใชใฉใ€**่ค‡ๆ•ฐใฎใƒขใƒ€ใƒชใƒ†ใ‚ฃใ‚’็ต„ใฟๅˆใ‚ใ›ใŸ**ใ‚ฟใ‚นใ‚ฏใ‚‚ๅฎŸ่กŒๅฏ่ƒฝใงใ™ใ€‚ - -๐Ÿค—Transformersใฏใ€ไธŽใˆใ‚‰ใ‚ŒใŸใƒ†ใ‚ญใ‚นใƒˆใซๅฏพใ—ใฆใใ‚Œใ‚‰ใฎไบ‹ๅ‰ๅญฆ็ฟ’ใ•ใ‚ŒใŸใƒขใƒ‡ใƒซใ‚’็ด ๆ—ฉใใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ใ—ใฆไฝฟ็”จใ—ใ€ใ‚ใชใŸ่‡ช่บซใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใงใใ‚Œใ‚‰ใ‚’ๅพฎ่ชฟๆ•ดใ—ใ€็งใŸใกใฎ[model hub](https://huggingface.co/models)ใงใ‚ณใƒŸใƒฅใƒ‹ใƒ†ใ‚ฃใจๅ…ฑๆœ‰ใ™ใ‚‹ใŸใ‚ใฎAPIใ‚’ๆไพ›ใ—ใพใ™ใ€‚ๅŒๆ™‚ใซใ€ใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃใ‚’ๅฎš็พฉใ™ใ‚‹ๅ„Pythonใƒขใ‚ธใƒฅใƒผใƒซใฏๅฎŒๅ…จใซใ‚นใ‚ฟใƒณใƒ‰ใ‚ขใƒญใƒณใงใ‚ใ‚Šใ€่ฟ…้€Ÿใช็ ”็ฉถๅฎŸ้จ“ใ‚’ๅฏ่ƒฝใซใ™ใ‚‹ใŸใ‚ใซๅค‰ๆ›ดใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ - -๐Ÿค—Transformersใฏ[Jax](https://jax.readthedocs.io/en/latest/)ใ€[PyTorch](https://pytorch.org/)ใ€[TensorFlow](https://www.tensorflow.org/)ใจใ„ใ†3ๅคงใƒ‡ใ‚ฃใƒผใƒ—ใƒฉใƒผใƒ‹ใƒณใ‚ฐใƒฉใ‚คใƒ–ใƒฉใƒชใƒผใซๆ”ฏใˆใ‚‰ใ‚Œใ€ใใ‚Œใžใ‚Œใฎใƒฉใ‚คใƒ–ใƒฉใƒชใ‚’ใ‚ทใƒผใƒ ใƒฌใ‚นใซ็ตฑๅˆใ—ใฆใ„ใพใ™ใ€‚็‰‡ๆ–นใงใƒขใƒ‡ใƒซใ‚’ๅญฆ็ฟ’ใ—ใฆใ‹ใ‚‰ใ€ใ‚‚ใ†็‰‡ๆ–นใงๆŽจ่ซ–็”จใซใƒญใƒผใƒ‰ใ™ใ‚‹ใฎใฏ็ฐกๅ˜ใชใ“ใจใงใ™ใ€‚ - -## ใ‚ชใƒณใƒฉใ‚คใƒณใƒ‡ใƒข - -[model hub](https://huggingface.co/models)ใ‹ใ‚‰ใ€ใปใจใ‚“ใฉใฎใƒขใƒ‡ใƒซใฎใƒšใƒผใ‚ธใง็›ดๆŽฅใƒ†ใ‚นใƒˆใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ใพใŸใ€ใƒ‘ใƒ–ใƒชใƒƒใ‚ฏใƒขใƒ‡ใƒซใ€ใƒ—ใƒฉใ‚คใƒ™ใƒผใƒˆใƒขใƒ‡ใƒซใซๅฏพใ—ใฆใ€[ใƒ—ใƒฉใ‚คใƒ™ใƒผใƒˆใƒขใƒ‡ใƒซใฎใƒ›ใ‚นใƒ†ใ‚ฃใƒณใ‚ฐใ€ใƒใƒผใ‚ธใƒงใƒ‹ใƒณใ‚ฐใ€ๆŽจ่ซ–API](https://huggingface.co/pricing)ใ‚’ๆไพ›ใ—ใฆใ„ใพใ™ใ€‚ - -ไปฅไธ‹ใฏใใฎไธ€ไพ‹ใงใ™: - - ่‡ช็„ถ่จ€่ชžๅ‡ฆ็†ใซใฆ: -- [BERTใซใ‚ˆใ‚‹ใƒžใ‚นใ‚ฏใƒ‰ใƒฏใƒผใƒ‰่ฃœๅฎŒ](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Electraใซใ‚ˆใ‚‹ๅๅ‰ๅฎŸไฝ“่ช่ญ˜](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [GPT-2ใซใ‚ˆใ‚‹ใƒ†ใ‚ญใ‚นใƒˆ็”Ÿๆˆ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [RoBERTaใซใ‚ˆใ‚‹่‡ช็„ถ่จ€่ชžๆŽจ่ซ–](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [BARTใซใ‚ˆใ‚‹่ฆ็ด„](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [DistilBERTใซใ‚ˆใ‚‹่ณชๅ•ๅฟœ็ญ”](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [T5ใซใ‚ˆใ‚‹็ฟป่จณ](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟใƒ“ใ‚ธใƒงใƒณใซใฆ: -- [ViTใซใ‚ˆใ‚‹็”ปๅƒๅˆ†้กž](https://huggingface.co/google/vit-base-patch16-224) -- [DETRใซใ‚ˆใ‚‹็‰ฉไฝ“ๆคœๅ‡บ](https://huggingface.co/facebook/detr-resnet-50) -- [SegFormerใซใ‚ˆใ‚‹ใ‚ปใƒžใƒณใƒ†ใ‚ฃใƒƒใ‚ฏใ‚ปใ‚ฐใƒกใƒณใƒ†ใƒผใ‚ทใƒงใƒณ](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [DETRใซใ‚ˆใ‚‹ใƒ‘ใƒŽใƒ—ใƒ†ใ‚ฃใƒƒใ‚ฏใ‚ปใ‚ฐใƒกใƒณใƒ†ใƒผใ‚ทใƒงใƒณ](https://huggingface.co/facebook/detr-resnet-50-panoptic) - -ใ‚ชใƒผใƒ‡ใ‚ฃใ‚ชใซใฆ: -- [Wav2Vec2ใซใ‚ˆใ‚‹่‡ชๅ‹•้Ÿณๅฃฐ่ช่ญ˜](https://huggingface.co/facebook/wav2vec2-base-960h) -- [Wav2Vec2ใซใ‚ˆใ‚‹ใ‚ญใƒผใƒฏใƒผใƒ‰ๆคœ็ดข](https://huggingface.co/superb/wav2vec2-base-superb-ks) - -ใƒžใƒซใƒใƒขใƒผใƒ€ใƒซใชใ‚ฟใ‚นใ‚ฏใซใฆ: -- [ViLTใซใ‚ˆใ‚‹่ฆ–่ฆš็š„่ณชๅ•ๅฟœ็ญ”](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) - -Hugging Faceใƒใƒผใƒ ใซใ‚ˆใฃใฆไฝœใ‚‰ใ‚ŒใŸ **[ใƒˆใƒฉใƒณใ‚นใƒ•ใ‚ฉใƒผใƒžใƒผใ‚’ไฝฟใฃใŸๆ›ธใ่พผใฟ](https://transformer.huggingface.co)** ใฏใ€ใ“ใฎใƒชใƒใ‚ธใƒˆใƒชใฎใƒ†ใ‚ญใ‚นใƒˆ็”ŸๆˆๆฉŸ่ƒฝใฎๅ…ฌๅผใƒ‡ใƒขใงใ‚ใ‚‹ใ€‚ - -## Hugging Faceใƒใƒผใƒ ใซใ‚ˆใ‚‹ใ‚ซใ‚นใ‚ฟใƒ ใƒปใ‚ตใƒใƒผใƒˆใ‚’ใ”ๅธŒๆœ›ใฎๅ ดๅˆ - - - HuggingFace Expert Acceleration Program -
- -## ใ‚ฏใ‚คใƒƒใ‚ฏใƒ„ใ‚ขใƒผ - -ไธŽใˆใ‚‰ใ‚ŒใŸๅ…ฅๅŠ›๏ผˆใƒ†ใ‚ญใ‚นใƒˆใ€็”ปๅƒใ€้Ÿณๅฃฐใ€...๏ผ‰ใซๅฏพใ—ใฆใ™ใใซใƒขใƒ‡ใƒซใ‚’ไฝฟใ†ใŸใ‚ใซใ€ๆˆ‘ใ€…ใฏ`pipeline`ใจใ„ใ†APIใ‚’ๆไพ›ใ—ใฆใŠใ‚Šใพใ™ใ€‚pipelineใฏใ€ๅญฆ็ฟ’ๆธˆใฟใฎใƒขใƒ‡ใƒซใจใ€ใใฎใƒขใƒ‡ใƒซใฎๅญฆ็ฟ’ๆ™‚ใซไฝฟ็”จใ•ใ‚ŒใŸๅ‰ๅ‡ฆ็†ใ‚’ใ‚ฐใƒซใƒผใƒ—ๅŒ–ใ—ใŸใ‚‚ใฎใงใ™ใ€‚ไปฅไธ‹ใฏใ€่‚ฏๅฎš็š„ใชใƒ†ใ‚ญใ‚นใƒˆใจๅฆๅฎš็š„ใชใƒ†ใ‚ญใ‚นใƒˆใ‚’ๅˆ†้กžใ™ใ‚‹ใŸใ‚ใซpipelineใ‚’ไฝฟ็”จใ™ใ‚‹ๆ–นๆณ•ใงใ™: - -```python ->>> from transformers import pipeline - -# Allocate a pipeline for sentiment-analysis ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -2่กŒ็›ฎใฎใ‚ณใƒผใƒ‰ใงใฏใ€pipelineใงไฝฟ็”จใ•ใ‚Œใ‚‹ไบ‹ๅ‰ๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’ใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ใ—ใฆใ‚ญใƒฃใƒƒใ‚ทใƒฅใ—ใ€3่กŒ็›ฎใงใฏไธŽใˆใ‚‰ใ‚ŒใŸใƒ†ใ‚ญใ‚นใƒˆใซๅฏพใ—ใฆใใฎใƒขใƒ‡ใƒซใ‚’่ฉ•ไพกใ—ใพใ™ใ€‚ใ“ใ“ใงใฏใ€็ญ”ใˆใฏ99.97%ใฎไฟก้ ผๅบฆใงใ€Œใƒใ‚ธใƒ†ใ‚ฃใƒ–ใ€ใงใ™ใ€‚ - -่‡ช็„ถ่จ€่ชžๅ‡ฆ็†ใ ใ‘ใงใชใใ€ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟใƒ“ใ‚ธใƒงใƒณใ‚„้Ÿณๅฃฐๅ‡ฆ็†ใซใŠใ„ใฆใ‚‚ใ€ๅคšใใฎใ‚ฟใ‚นใ‚ฏใซใฏใ‚ใ‚‰ใ‹ใ˜ใ‚่จ“็ทดใ•ใ‚ŒใŸ`pipeline`ใŒ็”จๆ„ใ•ใ‚Œใฆใ„ใ‚‹ใ€‚ไพ‹ใˆใฐใ€็”ปๅƒใ‹ใ‚‰ๆคœๅ‡บใ•ใ‚ŒใŸ็‰ฉไฝ“ใ‚’็ฐกๅ˜ใซๆŠฝๅ‡บใ™ใ‚‹ใ“ใจใŒใงใใ‚‹: - -``` python ->>> import requests ->>> from PIL import Image ->>> from transformers import pipeline - -# Download an image with cute cats ->>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" ->>> image_data = requests.get(url, stream=True).raw ->>> image = Image.open(image_data) - -# Allocate a pipeline for object detection ->>> object_detector = pipeline('object-detection') ->>> object_detector(image) -[{'score': 0.9982201457023621, - 'label': 'remote', - 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, - {'score': 0.9960021376609802, - 'label': 'remote', - 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, - {'score': 0.9954745173454285, - 'label': 'couch', - 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, - {'score': 0.9988006353378296, - 'label': 'cat', - 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, - {'score': 0.9986783862113953, - 'label': 'cat', - 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] -``` - -ใ“ใ“ใงใฏใ€็”ปๅƒใ‹ใ‚‰ๆคœๅ‡บใ•ใ‚ŒใŸใ‚ชใƒ–ใ‚ธใ‚งใ‚ฏใƒˆใฎใƒชใ‚นใƒˆใŒๅพ—ใ‚‰ใ‚Œใ€ใ‚ชใƒ–ใ‚ธใ‚งใ‚ฏใƒˆใ‚’ๅ›ฒใ‚€ใƒœใƒƒใ‚ฏใ‚นใจไฟก้ ผๅบฆใ‚นใ‚ณใ‚ขใŒ่กจ็คบใ•ใ‚Œใพใ™ใ€‚ๅทฆๅดใŒๅ…ƒ็”ปๅƒใ€ๅณๅดใŒไบˆๆธฌ็ตๆžœใ‚’่กจ็คบใ—ใŸใ‚‚ใฎใงใ™: - -

- - -

- -[ใ“ใฎใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ](https://huggingface.co/docs/transformers/task_summary)ใงใฏใ€`pipeline`APIใงใ‚ตใƒใƒผใƒˆใ•ใ‚Œใฆใ„ใ‚‹ใ‚ฟใ‚นใ‚ฏใซใคใ„ใฆ่ฉณใ—ใ่ชฌๆ˜Žใ—ใฆใ„ใพใ™ใ€‚ - -`pipeline`ใซๅŠ ใˆใฆใ€ไธŽใˆใ‚‰ใ‚ŒใŸใ‚ฟใ‚นใ‚ฏใซๅญฆ็ฟ’ๆธˆใฟใฎใƒขใƒ‡ใƒซใ‚’ใƒ€ใ‚ฆใƒณใƒญใƒผใƒ‰ใ—ใฆไฝฟ็”จใ™ใ‚‹ใŸใ‚ใซๅฟ…่ฆใชใฎใฏใ€3่กŒใฎใ‚ณใƒผใƒ‰ใ ใ‘ใงใ™ใ€‚ไปฅไธ‹ใฏPyTorchใฎใƒใƒผใ‚ธใƒงใƒณใงใ™: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` - -ใใ—ใฆใ“ใกใ‚‰ใฏTensorFlowใจๅŒ็ญ‰ใฎใ‚ณใƒผใƒ‰ใจใชใ‚Šใพใ™: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -ใƒˆใƒผใ‚ฏใƒŠใ‚คใ‚ถใฏๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใŒๆœŸๅพ…ใ™ใ‚‹ใ™ในใฆใฎๅ‰ๅ‡ฆ็†ใ‚’ๆ‹…ๅฝ“ใ—ใ€ๅ˜ไธ€ใฎๆ–‡ๅญ—ๅˆ— (ไธŠ่จ˜ใฎไพ‹ใฎใ‚ˆใ†ใซ) ใพใŸใฏใƒชใ‚นใƒˆใซๅฏพใ—ใฆ็›ดๆŽฅๅ‘ผใณๅ‡บใ™ใ“ใจใŒใงใใพใ™ใ€‚ใ“ใ‚Œใฏไธ‹ๆตใฎใ‚ณใƒผใƒ‰ใงไฝฟ็”จใงใใ‚‹่พžๆ›ธใ‚’ๅ‡บๅŠ›ใ—ใพใ™ใ€‚ใพใŸใ€ๅ˜็ด”ใซ ** ๅผ•ๆ•ฐๅฑ•้–‹ๆผ”็ฎ—ๅญใ‚’ไฝฟ็”จใ—ใฆใƒขใƒ‡ใƒซใซ็›ดๆŽฅๆธกใ™ใ“ใจใ‚‚ใงใใพใ™ใ€‚ - -ใƒขใƒ‡ใƒซ่‡ชไฝ“ใฏ้€šๅธธใฎ[Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) ใพใŸใฏ [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ใซใ‚ˆใฃใฆ็•ฐใชใ‚‹)ใงใ€้€šๅธธ้€šใ‚Šไฝฟ็”จใ™ใ‚‹ใ“ใจใŒๅฏ่ƒฝใงใ™ใ€‚[ใ“ใฎใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ](https://huggingface.co/docs/transformers/training)ใงใฏใ€ใ“ใฎใ‚ˆใ†ใชใƒขใƒ‡ใƒซใ‚’ๅพ“ๆฅใฎPyTorchใ‚„TensorFlowใฎๅญฆ็ฟ’ใƒซใƒผใƒ—ใซ็ตฑๅˆใ™ใ‚‹ๆ–นๆณ•ใ‚„ใ€็งใŸใกใฎ`Trainer`APIใ‚’ไฝฟใฃใฆๆ–ฐใ—ใ„ใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใง็ด ๆ—ฉใๅพฎ่ชฟๆ•ดใ‚’่กŒใ†ๆ–นๆณ•ใซใคใ„ใฆ่ชฌๆ˜Žใ—ใพใ™ใ€‚ - -## ใชใœtransformersใ‚’ไฝฟใ†ๅฟ…่ฆใŒใ‚ใ‚‹ใฎใงใ—ใ‚‡ใ†ใ‹๏ผŸ - -1. ไฝฟใ„ใ‚„ใ™ใ„ๆœ€ๆ–ฐใƒขใƒ‡ใƒซ: - - ่‡ช็„ถ่จ€่ชž็†่งฃใƒป็”Ÿๆˆใ€ใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟใƒ“ใ‚ธใƒงใƒณใ€ใ‚ชใƒผใƒ‡ใ‚ฃใ‚ชใฎๅ„ใ‚ฟใ‚นใ‚ฏใง้ซ˜ใ„ใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใ‚’็™บๆฎใ—ใพใ™ใ€‚ - - ๆ•™่‚ฒ่€…ใ€ๅฎŸๅ‹™่€…ใซใจใฃใฆใฎไฝŽใ„ๅ‚ๅ…ฅ้šœๅฃใ€‚ - - ๅญฆ็ฟ’ใ™ใ‚‹ใ‚ฏใƒฉใ‚นใฏ3ใคใ ใ‘ใงใ€ใƒฆใƒผใ‚ถใŒ็›ด้ขใ™ใ‚‹ๆŠฝ่ฑกๅŒ–ใฏใปใจใ‚“ใฉใ‚ใ‚Šใพใ›ใ‚“ใ€‚ - - ๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’ๅˆฉ็”จใ™ใ‚‹ใŸใ‚ใฎ็ตฑไธ€ใ•ใ‚ŒใŸAPIใ€‚ - -1. ไฝŽใ„่จˆ็ฎ—ใ‚ณใ‚นใƒˆใ€ๅฐ‘ใชใ„ใ‚ซใƒผใƒœใƒณใƒ•ใƒƒใƒˆใƒ—ใƒชใƒณใƒˆ: - - ็ ”็ฉถ่€…ใฏใ€ๅธธใซๅ†ใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ‚’่กŒใ†ใฎใงใฏใชใใ€ใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ•ใ‚ŒใŸใƒขใƒ‡ใƒซใ‚’ๅ…ฑๆœ‰ใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ - - ๅฎŸๅ‹™ๅฎถใฏใ€่จˆ็ฎ—ๆ™‚้–“ใ‚„็”Ÿ็”ฃใ‚ณใ‚นใƒˆใ‚’ๅ‰Šๆธ›ใ™ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ - - ใ™ในใฆใฎใƒขใƒ€ใƒชใƒ†ใ‚ฃใซใŠใ„ใฆใ€60,000ไปฅไธŠใฎไบ‹ๅ‰ๅญฆ็ฟ’ๆธˆใฟใƒขใƒ‡ใƒซใ‚’ๆŒใคๆ•ฐๅคšใใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃใ‚’ๆไพ›ใ—ใพใ™ใ€‚ - -1. ใƒขใƒ‡ใƒซใฎใƒฉใ‚คใƒ•ใ‚ฟใ‚คใƒ ใฎใ‚ใ‚‰ใ‚†ใ‚‹้ƒจๅˆ†ใง้ฉๅˆ‡ใชใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใ‚’้ธๆŠžๅฏ่ƒฝ: - - 3่กŒใฎใ‚ณใƒผใƒ‰ใงๆœ€ๅ…ˆ็ซฏใฎใƒขใƒ‡ใƒซใ‚’ใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใ€‚ - - TF2.0/PyTorch/JAXใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏ้–“ใง1ใคใฎใƒขใƒ‡ใƒซใ‚’่‡ชๅœจใซ็งปๅ‹•ใ•ใ›ใ‚‹ใ€‚ - - ๅญฆ็ฟ’ใ€่ฉ•ไพกใ€็”Ÿ็”ฃใซ้ฉใ—ใŸใƒ•ใƒฌใƒผใƒ ใƒฏใƒผใ‚ฏใ‚’ใ‚ทใƒผใƒ ใƒฌใ‚นใซ้ธๆŠžใงใใพใ™ใ€‚ - -1. ใƒขใƒ‡ใƒซใ‚„ใ‚ตใƒณใƒ—ใƒซใ‚’ใƒ‹ใƒผใ‚บใซๅˆใ‚ใ›ใฆ็ฐกๅ˜ใซใ‚ซใ‚นใ‚ฟใƒžใ‚คใ‚บๅฏ่ƒฝ: - - ๅŽŸ่‘—่€…ใŒ็™บ่กจใ—ใŸ็ตๆžœใ‚’ๅ†็พใ™ใ‚‹ใŸใ‚ใซใ€ๅ„ใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃใฎไพ‹ใ‚’ๆไพ›ใ—ใฆใ„ใพใ™ใ€‚ - - ใƒขใƒ‡ใƒซๅ†…้ƒจใฏๅฏ่ƒฝใช้™ใ‚Šไธ€่ฒซใ—ใฆๅ…ฌ้–‹ใ•ใ‚Œใฆใ„ใพใ™ใ€‚ - - ใƒขใƒ‡ใƒซใƒ•ใ‚กใ‚คใƒซใฏใƒฉใ‚คใƒ–ใƒฉใƒชใจใฏ็‹ฌ็ซ‹ใ—ใฆๅˆฉ็”จใ™ใ‚‹ใ“ใจใŒใงใใ€่ฟ…้€ŸใชๅฎŸ้จ“ใŒๅฏ่ƒฝใงใ™ใ€‚ - -## ใชใœtransformersใ‚’ไฝฟใฃใฆใฏใ„ใ‘ใชใ„ใฎใงใ—ใ‚‡ใ†ใ‹๏ผŸ - -- ใ“ใฎใƒฉใ‚คใƒ–ใƒฉใƒชใฏใ€ใƒ‹ใƒฅใƒผใƒฉใƒซใƒใƒƒใƒˆใฎใŸใ‚ใฎใƒ“ใƒซใƒ‡ใ‚ฃใƒณใ‚ฐใƒ–ใƒญใƒƒใ‚ฏใฎใƒขใ‚ธใƒฅใƒผใƒซๅผใƒ„ใƒผใƒซใƒœใƒƒใ‚ฏใ‚นใงใฏใ‚ใ‚Šใพใ›ใ‚“ใ€‚ใƒขใƒ‡ใƒซใƒ•ใ‚กใ‚คใƒซใฎใ‚ณใƒผใƒ‰ใฏใ€็ ”็ฉถ่€…ใŒ่ฟฝๅŠ ใฎๆŠฝ่ฑกๅŒ–/ใƒ•ใ‚กใ‚คใƒซใซ้ฃ›ใณ่พผใ‚€ใ“ใจใชใใ€ๅ„ใƒขใƒ‡ใƒซใ‚’็ด ๆ—ฉใๅๅพฉใงใใ‚‹ใ‚ˆใ†ใซใ€ๆ„ๅ›ณ็š„ใซ่ฟฝๅŠ ใฎๆŠฝ่ฑกๅŒ–ใงใƒชใƒ•ใ‚กใ‚ฏใ‚ฟใƒชใƒณใ‚ฐใ•ใ‚Œใฆใ„ใพใ›ใ‚“ใ€‚ -- ๅญฆ็ฟ’APIใฏใฉใฎใ‚ˆใ†ใชใƒขใƒ‡ใƒซใงใ‚‚ๅ‹•ไฝœใ™ใ‚‹ใ‚ใ‘ใงใฏใชใใ€ใƒฉใ‚คใƒ–ใƒฉใƒชใŒๆไพ›ใ™ใ‚‹ใƒขใƒ‡ใƒซใงๅ‹•ไฝœใ™ใ‚‹ใ‚ˆใ†ใซๆœ€้ฉๅŒ–ใ•ใ‚Œใฆใ„ใพใ™ใ€‚ไธ€่ˆฌ็š„ใชๆฉŸๆขฐๅญฆ็ฟ’ใฎใƒซใƒผใƒ—ใซใฏใ€ๅˆฅใฎใƒฉใ‚คใƒ–ใƒฉใƒช(ใŠใใ‚‰ใ[Accelerate](https://huggingface.co/docs/accelerate))ใ‚’ไฝฟ็”จใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚ -- ็งใŸใกใฏใงใใ‚‹ใ ใ‘ๅคšใใฎไฝฟ็”จไพ‹ใ‚’็ดนไป‹ใ™ใ‚‹ใ‚ˆใ†ๅŠชๅŠ›ใ—ใฆใ„ใพใ™ใŒใ€[examples ใƒ•ใ‚ฉใƒซใƒ€](https://github.com/huggingface/transformers/tree/main/examples) ใซใ‚ใ‚‹ใ‚นใ‚ฏใƒชใƒ—ใƒˆใฏใ‚ใใพใงไพ‹ใงใ™ใ€‚ใ‚ใชใŸใฎ็‰นๅฎšใฎๅ•้กŒใซๅฏพใ—ใฆใ™ใใซๅ‹•ไฝœใ™ใ‚‹ใ‚ใ‘ใงใฏใชใใ€ใ‚ใชใŸใฎใƒ‹ใƒผใ‚บใซๅˆใ‚ใ›ใ‚‹ใŸใ‚ใซๆ•ฐ่กŒใฎใ‚ณใƒผใƒ‰ใ‚’ๅค‰ๆ›ดใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚‹ใ“ใจใŒไบˆๆƒณใ•ใ‚Œใพใ™ใ€‚ - -## ใ‚คใƒณใ‚นใƒˆใƒผใƒซ - -### pipใซใฆ - -ใ“ใฎใƒชใƒใ‚ธใƒˆใƒชใฏใ€Python 3.8+, Flax 0.4.1+, PyTorch 1.10+, TensorFlow 2.6+ ใงใƒ†ใ‚นใƒˆใ•ใ‚Œใฆใ„ใพใ™ใ€‚ - -๐Ÿค—Transformersใฏ[ไปฎๆƒณ็’ฐๅขƒ](https://docs.python.org/3/library/venv.html)ใซใ‚คใƒณใ‚นใƒˆใƒผใƒซใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚Pythonใฎไปฎๆƒณ็’ฐๅขƒใซๆ…ฃใ‚Œใฆใ„ใชใ„ๅ ดๅˆใฏใ€[ใƒฆใƒผใ‚ถใƒผใ‚ฌใ‚คใƒ‰](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)ใ‚’็ขบ่ชใ—ใฆใใ ใ•ใ„ใ€‚ - -ใพใšใ€ไฝฟ็”จใ™ใ‚‹ใƒใƒผใ‚ธใƒงใƒณใฎPythonใงไปฎๆƒณ็’ฐๅขƒใ‚’ไฝœๆˆใ—ใ€ใ‚ขใ‚ฏใƒ†ใ‚ฃใƒ™ใƒผใƒˆใ—ใพใ™ใ€‚ - -ใใฎๅพŒใ€Flax, PyTorch, TensorFlowใฎใ†ใกๅฐ‘ใชใใจใ‚‚1ใคใ‚’ใ‚คใƒณใ‚นใƒˆใƒผใƒซใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚ -[TensorFlowใ‚คใƒณใ‚นใƒˆใƒผใƒซใƒšใƒผใ‚ธ](https://www.tensorflow.org/install/)ใ€[PyTorchใ‚คใƒณใ‚นใƒˆใƒผใƒซใƒšใƒผใ‚ธ](https://pytorch.org/get-started/locally/#start-locally)ใ€[Flax](https://github.com/google/flax#quick-install)ใ€[Jax](https://github.com/google/jax#installation)ใ‚คใƒณใ‚นใƒˆใƒผใƒซใƒšใƒผใ‚ธใงใ€ใŠไฝฟใ„ใฎใƒ—ใƒฉใƒƒใƒˆใƒ•ใ‚ฉใƒผใƒ ๅˆฅใฎใ‚คใƒณใ‚นใƒˆใƒผใƒซใ‚ณใƒžใƒณใƒ‰ใ‚’ๅ‚็…งใ—ใฆใใ ใ•ใ„ใ€‚ - -ใ“ใ‚Œใ‚‰ใฎใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ใฎใ„ใšใ‚Œใ‹ใŒใ‚คใƒณใ‚นใƒˆใƒผใƒซใ•ใ‚Œใฆใ„ใ‚‹ๅ ดๅˆใ€๐Ÿค—Transformersใฏไปฅไธ‹ใฎใ‚ˆใ†ใซpipใ‚’ไฝฟ็”จใ—ใฆใ‚คใƒณใ‚นใƒˆใƒผใƒซใ™ใ‚‹ใ“ใจใŒใงใใพใ™: - -```bash -pip install transformers -``` - -ใ‚‚ใ—ใ‚ตใƒณใƒ—ใƒซใ‚’่ฉฆใ—ใŸใ„ใ€ใพใŸใฏใ‚ณใƒผใƒ‰ใฎๆœ€ๅ…ˆ็ซฏใŒๅฟ…่ฆใงใ€ๆ–ฐใ—ใ„ใƒชใƒชใƒผใ‚นใ‚’ๅพ…ใฆใชใ„ๅ ดๅˆใฏใ€[ใƒฉใ‚คใƒ–ใƒฉใƒชใ‚’ใ‚ฝใƒผใ‚นใ‹ใ‚‰ใ‚คใƒณใ‚นใƒˆใƒผใƒซ](https://huggingface.co/docs/transformers/installation#installing-from-source)ใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚ - -### condaใซใฆ - -Transformersใƒใƒผใ‚ธใƒงใƒณ4.0.0ใ‹ใ‚‰ใ€condaใƒใƒฃใƒณใƒใƒซใ‚’ๆญ่ผ‰ใ—ใพใ—ใŸ: `huggingface`ใ€‚ - -๐Ÿค—Transformersใฏไปฅไธ‹ใฎใ‚ˆใ†ใซcondaใ‚’ไฝฟใฃใฆ่จญ็ฝฎใ™ใ‚‹ใ“ใจใŒใงใใพใ™: - -```shell script -conda install -c huggingface transformers -``` - -Flaxใ€PyTorchใ€TensorFlowใ‚’condaใงใ‚คใƒณใ‚นใƒˆใƒผใƒซใ™ใ‚‹ๆ–นๆณ•ใฏใ€ใใ‚Œใžใ‚Œใฎใ‚คใƒณใ‚นใƒˆใƒผใƒซใƒšใƒผใ‚ธใซๅพ“ใฃใฆใใ ใ•ใ„ใ€‚ - -> **_ๆณจๆ„:_** Windowsใงใฏใ€ใ‚ญใƒฃใƒƒใ‚ทใƒฅใฎๆฉๆตใ‚’ๅ—ใ‘ใ‚‹ใŸใ‚ใซใ€ใƒ‡ใƒ™ใƒญใƒƒใƒ‘ใƒผใƒขใƒผใƒ‰ใ‚’ๆœ‰ๅŠนใซใ™ใ‚‹ใ‚ˆใ†ไฟƒใ•ใ‚Œใ‚‹ใ“ใจใŒใ‚ใ‚Šใพใ™ใ€‚ใ“ใฎใ‚ˆใ†ใชๅ ดๅˆใฏใ€[ใ“ใฎissue](https://github.com/huggingface/huggingface_hub/issues/1062)ใงใŠ็Ÿฅใ‚‰ใ›ใใ ใ•ใ„ใ€‚ - -## ใƒขใƒ‡ใƒซใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃ - -๐Ÿค—TransformersใŒๆไพ›ใ™ใ‚‹ **[ๅ…จใƒขใƒ‡ใƒซใƒใ‚งใƒƒใ‚ฏใƒใ‚คใƒณใƒˆ](https://huggingface.co/models)** ใฏใ€[ใƒฆใƒผใ‚ถใƒผ](https://huggingface.co/users)ใ‚„[็ต„็น”](https://huggingface.co/organizations)ใซใ‚ˆใฃใฆ็›ดๆŽฅใ‚ขใƒƒใƒ—ใƒญใƒผใƒ‰ใ•ใ‚Œใ‚‹huggingface.co [model hub](https://huggingface.co)ใ‹ใ‚‰ใ‚ทใƒผใƒ ใƒฌใ‚นใซ็ตฑๅˆใ•ใ‚Œใฆใ„ใพใ™ใ€‚ - -็พๅœจใฎใƒใ‚งใƒƒใ‚ฏใƒใ‚คใƒณใƒˆๆ•ฐ: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค—Transformersใฏ็พๅœจใ€ไปฅไธ‹ใฎใ‚ขใƒผใ‚ญใƒ†ใ‚ฏใƒใƒฃใ‚’ๆไพ›ใ—ใฆใ„ใพใ™๏ผˆใใ‚Œใžใ‚Œใฎใƒใ‚คใƒฌใƒ™ใƒซใช่ฆ็ด„ใฏ[ใ“ใกใ‚‰](https://huggingface.co/docs/transformers/model_summary)ใ‚’ๅ‚็…งใ—ใฆใใ ใ•ใ„๏ผ‰: - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (Google Research and the Toyota Technological Institute at Chicago ใ‹ใ‚‰) Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research ใ‹ใ‚‰) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (BAAI ใ‹ใ‚‰) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (MIT ใ‹ใ‚‰) Yuan Gong, Yu-An Chung, James Glass ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (Facebook ใ‹ใ‚‰) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (ร‰cole polytechnique ใ‹ใ‚‰) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research ใ‹ใ‚‰) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (Microsoft ใ‹ใ‚‰) Hangbo Bao, Li Dong, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (Google ใ‹ใ‚‰) Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (Google ใ‹ใ‚‰) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (VinAI Research ใ‹ใ‚‰) Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (Google Research ใ‹ใ‚‰) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (Google Research ใ‹ใ‚‰) Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (Microsoft Research AI4Science ใ‹ใ‚‰) Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (Google AI ใ‹ใ‚‰) Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Big Transfer (BiT)](https://arxiv.org/abs/1912.11370)Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (Facebook ใ‹ใ‚‰) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (Facebook ใ‹ใ‚‰) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (Salesforce ใ‹ใ‚‰) Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (Salesforce ใ‹ใ‚‰) Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (BigScience workshop ใ‹ใ‚‰) [BigScience Workshop](https://bigscience.huggingface.co/) ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚Œใพใ—ใŸ. -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (Alexa ใ‹ใ‚‰) Adrian de Wynter and Daniel J. Perry ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (Harbin Institute of Technology/Microsoft Research Asia/Intel Labs ใ‹ใ‚‰) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (NAVER CLOVA ใ‹ใ‚‰) Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (Google Research ใ‹ใ‚‰) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (Inria/Facebook/Sorbonne ใ‹ใ‚‰) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (Google Research ใ‹ใ‚‰) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (OFA-Sys ใ‹ใ‚‰) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI ใ‹ใ‚‰) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI ใ‹ใ‚‰) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (University of Gรถttingen ใ‹ใ‚‰) Timo Lรผddecke and Alexander Ecker ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (Salesforce ใ‹ใ‚‰) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI ใ‹ใ‚‰) Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (Microsoft Research Asia ใ‹ใ‚‰) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (YituTech ใ‹ใ‚‰) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI ใ‹ใ‚‰) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (Tsinghua University ใ‹ใ‚‰) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (OpenBMB ใ‹ใ‚‰) [OpenBMB](https://www.openbmb.org/) ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚Œใพใ—ใŸ. -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (Salesforce ใ‹ใ‚‰) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft ใ‹ใ‚‰) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (Facebook ใ‹ใ‚‰) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (Microsoft ใ‹ใ‚‰) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (Microsoft ใ‹ใ‚‰) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (Berkeley/Facebook/Google ใ‹ใ‚‰) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (SenseTime Research ใ‹ใ‚‰) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (Facebook ใ‹ใ‚‰) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (Google AI ใ‹ใ‚‰) Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (The University of Texas at Austin ใ‹ใ‚‰) Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [NMS Strikes Back](https://arxiv.org/abs/2212.06137) -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (Facebook ใ‹ใ‚‰) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (Microsoft Research ใ‹ใ‚‰) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (SHI Labs ใ‹ใ‚‰) Ali Hassani and Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (Meta AI ใ‹ใ‚‰) Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (HuggingFace ใ‹ใ‚‰), Victor Sanh, Lysandre Debut and Thomas Wolf. ๅŒใ˜ๆ‰‹ๆณ•ใง GPT2, RoBERTa ใจ Multilingual BERT ใฎๅœง็ธฎใ‚’่กŒใ„ใพใ—ใŸ.ๅœง็ธฎใ•ใ‚ŒใŸใƒขใƒ‡ใƒซใฏใใ‚Œใžใ‚Œ [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation)ใ€[DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation)ใ€[DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) ใจๅไป˜ใ‘ใ‚‰ใ‚Œใพใ—ใŸ. ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (Microsoft Research ใ‹ใ‚‰) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (NAVER ใ‹ใ‚‰), Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (Facebook ใ‹ใ‚‰) Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (Intel Labs ใ‹ใ‚‰) Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (Snap Research ใ‹ใ‚‰) Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University ใ‹ใ‚‰) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (Meta AI ใ‹ใ‚‰) Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research ใ‹ใ‚‰) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu ใ‹ใ‚‰) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (Baidu ใ‹ใ‚‰) Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (Meta AI ใ‹ใ‚‰) ใฏใƒˆใƒฉใƒณใ‚นใƒ•ใ‚ฉใƒผใƒžใƒผใƒ—ใƒญใƒ†ใ‚คใƒณ่จ€่ชžใƒขใƒ‡ใƒซใงใ™. **ESM-1b** ใฏ Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118). **ESM-1v** ใฏ Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rivesใ€€ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648). **ESM-2** ใจใ€€**ESMFold** ใฏ Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (Google AI ใ‹ใ‚‰) Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (CNRS ใ‹ใ‚‰) Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (Facebook AI ใ‹ใ‚‰) Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (Google Research ใ‹ใ‚‰) James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (Microsoft Research ใ‹ใ‚‰) Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (CMU/Google Brain ใ‹ใ‚‰) Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (ADEPT ใ‹ใ‚‰) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (Microsoft Research ใ‹ใ‚‰) Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (KAIST ใ‹ใ‚‰) Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (OpenAI ใ‹ใ‚‰) Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (EleutherAI ใ‹ใ‚‰) Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ : [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (EleutherAI ใ‹ใ‚‰) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (ABEJA ใ‹ใ‚‰) Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori ใ‹ใ‚‰ใƒชใƒชใƒผใ‚น. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (OpenAI ใ‹ใ‚‰) Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (EleutherAI ใ‹ใ‚‰) Ben Wang and Aran Komatsuzaki ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (AI-Sweden ใ‹ใ‚‰) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (BigCode ใ‹ใ‚‰) Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) ๅ‚ๆœฌไฟŠไน‹(tanreinama)ใ‹ใ‚‰ใƒชใƒชใƒผใ‚นใ•ใ‚Œใพใ—ใŸ. -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (Microsoft ใ‹ใ‚‰) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234). -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (UCSD, NVIDIA ใ‹ใ‚‰) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (Allegro.pl, AGH University of Science and Technology ใ‹ใ‚‰) Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (Facebook ใ‹ใ‚‰) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (Berkeley ใ‹ใ‚‰) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (OpenAI ใ‹ใ‚‰) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce ใ‹ใ‚‰) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (OpenAI ใ‹ใ‚‰) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (Microsoft Research Asia ใ‹ใ‚‰) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (Microsoft Research Asia ใ‹ใ‚‰) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (Microsoft Research Asia ใ‹ใ‚‰) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (Microsoft Research Asia ใ‹ใ‚‰) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (AllenAI ใ‹ใ‚‰) Iz Beltagy, Matthew E. Peters, Arman Cohan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (Meta AI ใ‹ใ‚‰) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (South China University of Technology ใ‹ใ‚‰) Jiapeng Wang, Lianwen Jin, Kai Ding ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (The FAIR team of Meta AI ใ‹ใ‚‰) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (The FAIR team of Meta AI ใ‹ใ‚‰) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (Microsoft Research & University of Wisconsin-Madison ใ‹ใ‚‰) Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (AllenAI ใ‹ใ‚‰) Iz Beltagy, Matthew E. Peters, Arman Cohan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (Google AI ใ‹ใ‚‰) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (Studio Ousia ใ‹ใ‚‰) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (UNC Chapel Hill ใ‹ใ‚‰) Hao Tan and Mohit Bansal ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (Facebook ใ‹ใ‚‰) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (Facebook ใ‹ใ‚‰) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Jรถrg Tiedemann ใ‹ใ‚‰. [OPUS](http://opus.nlpl.eu/) ใ‚’ไฝฟใ„ใชใŒใ‚‰ๅญฆ็ฟ’ใ•ใ‚ŒใŸ "Machine translation" (ใƒžใ‚ทใƒณใƒˆใƒฉใƒณใ‚นใƒฌใƒผใ‚ทใƒงใƒณ) ใƒขใƒ‡ใƒซ. [Marian Framework](https://marian-nmt.github.io/) ใฏMicrosoft Translator Teamใ€€ใŒ็พๅœจ้–‹็™บไธญใงใ™. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (Microsoft Research Asia ใ‹ใ‚‰) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (FAIR and UIUC ใ‹ใ‚‰) Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (Meta and UIUC ใ‹ใ‚‰) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (Google AI ใ‹ใ‚‰) Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ใ‹ใ‚‰) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ใ‹ใ‚‰) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (Facebook ใ‹ใ‚‰) Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (NVIDIA ใ‹ใ‚‰) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA ใ‹ใ‚‰) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research ใ‹ใ‚‰) Peng Wang, Cheng Da, and Cong Yao. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (Studio Ousia ใ‹ใ‚‰) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook ใ‹ใ‚‰) Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (CMU/Google Brain ใ‹ใ‚‰) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (Google Inc. ใ‹ใ‚‰) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (Google Inc. ใ‹ใ‚‰) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (Apple ใ‹ใ‚‰) Sachin Mehta and Mohammad Rastegari ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (Apple ใ‹ใ‚‰) Sachin Mehta and Mohammad Rastegari. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (Microsoft Research ใ‹ใ‚‰) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (MosaiML ใ‹ใ‚‰) the MosaicML NLP Team. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [llm-foundry](https://github.com/mosaicml/llm-foundry/) -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (the University of Wisconsin - Madison ใ‹ใ‚‰) Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (Google AI ใ‹ใ‚‰) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (RUC AI Box ใ‹ใ‚‰) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (SHI Labs ใ‹ใ‚‰) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (Huawei Noahโ€™s Ark Lab ใ‹ใ‚‰) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (Meta ใ‹ใ‚‰) the NLLB team ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (Meta ใ‹ใ‚‰) the NLLB team. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (Meta AI ใ‹ใ‚‰) Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (the University of Wisconsin - Madison ใ‹ใ‚‰) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (SHI Labs ใ‹ใ‚‰) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (Meta AI ใ‹ใ‚‰) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI ใ‹ใ‚‰) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI ใ‹ใ‚‰) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** ( IBM Research ใ‹ใ‚‰) Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (IBM ใ‹ใ‚‰) Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf) -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (Google ใ‹ใ‚‰) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google ใ‹ใ‚‰) Jason Phang, Yao Zhao, and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (Deepmind ใ‹ใ‚‰) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT ใ‹ใ‚‰) Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [blog post](https://www.adept.ai/blog/persimmon-8b) -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research ใ‹ใ‚‰) Dat Quoc Nguyen and Anh Tuan Nguyen ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google ใ‹ใ‚‰) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP ใ‹ใ‚‰) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (Sea AI Labs ใ‹ใ‚‰) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (Microsoft Research ใ‹ใ‚‰) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (Nanjing University, The University of Hong Kong etc. ใ‹ใ‚‰) Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (NVIDIA ใ‹ใ‚‰) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (Facebook ใ‹ใ‚‰) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (Google Research ใ‹ใ‚‰) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (Google Research ใ‹ใ‚‰) Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (META Platforms ใ‹ใ‚‰) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Designing Network Design Space](https://arxiv.org/abs/2003.13678) -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (Google Research ใ‹ใ‚‰) Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (Microsoft Research ใ‹ใ‚‰) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (Facebook ใ‹ใ‚‰), Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (Facebook ใ‹ใ‚‰) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (WeChatAI ใ‹ใ‚‰) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ZhuiyiTechnology ใ‹ใ‚‰), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng ใ‹ใ‚‰) Bo Peng. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [this repo](https://github.com/BlinkDL/RWKV-LM) -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (NVIDIA ใ‹ใ‚‰) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI ใ‹ใ‚‰) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP ใ‹ใ‚‰) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (ASAPP ใ‹ใ‚‰) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (Microsoft Research ใ‹ใ‚‰) Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (Facebook ใ‹ใ‚‰), Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (Facebook ใ‹ใ‚‰), Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (Tel Aviv University ใ‹ใ‚‰), Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (Berkeley ใ‹ใ‚‰) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (MBZUAI ใ‹ใ‚‰) Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (Microsoft ใ‹ใ‚‰) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (Microsoft ใ‹ใ‚‰) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (University of Wรผrzburg ใ‹ใ‚‰) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (Google ใ‹ใ‚‰) William Fedus, Barret Zoph, Noam Shazeer ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (Google AI ใ‹ใ‚‰) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (Google AI ใ‹ใ‚‰) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸใƒฌใƒใ‚ธใƒˆใƒชใƒผ [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (Microsoft Research ใ‹ใ‚‰) Brandon Smock, Rohith Pesala, Robin Abraham ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (Google AI ใ‹ใ‚‰) Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (Microsoft Research ใ‹ใ‚‰) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (HuggingFace ใ‹ใ‚‰). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (Facebook ใ‹ใ‚‰) Gedas Bertasius, Heng Wang, Lorenzo Torresani ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (the University of California at Berkeley ใ‹ใ‚‰) Michael Janner, Qiyang Li, Sergey Levine ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (Google/CMU ใ‹ใ‚‰) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (Microsoft ใ‹ใ‚‰), Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill ใ‹ใ‚‰), Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (Intel ใ‹ใ‚‰), Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (Google Research ใ‹ใ‚‰) Yi Tay, Mostafa Dehghani, Vinh Q ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research ใ‹ใ‚‰) Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (Microsoft Research ใ‹ใ‚‰) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (Microsoft Research ใ‹ใ‚‰) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (Peking University ใ‹ใ‚‰) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (Tsinghua University and Nankai University ใ‹ใ‚‰) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Visual Attention Network](https://arxiv.org/abs/2202.09741) -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (Multimedia Computing Group, Nanjing University ใ‹ใ‚‰) Zhan Tong, Yibing Song, Jue Wang, Limin Wang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (NAVER AI Lab/Kakao Enterprise/Kakao Brain ใ‹ใ‚‰) Wonjae Kim, Bokyung Son, Ildoo Kim ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (University of Wisconsinโ€“Madison ใ‹ใ‚‰) Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (Google AI ใ‹ใ‚‰) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (UCLA NLP ใ‹ใ‚‰) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (Google AI ใ‹ใ‚‰) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (Meta AI ใ‹ใ‚‰) Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (Meta AI ใ‹ใ‚‰) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (HUST-VL ใ‹ใ‚‰) Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (Meta AI ใ‹ใ‚‰) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (Kakao Enterprise ใ‹ใ‚‰) Jaehyeon Kim, Jungil Kong, Juhee Son. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (Facebook AI ใ‹ใ‚‰) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (Facebook AI ใ‹ใ‚‰) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (Facebook AI ใ‹ใ‚‰) Qiantong Xu, Alexei Baevski, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (Microsoft Research ใ‹ใ‚‰) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (OpenAI ใ‹ใ‚‰) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (Microsoft Research ใ‹ใ‚‰) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (Meta AI ใ‹ใ‚‰) Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡ [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (Facebook ใ‹ใ‚‰) Guillaume Lample and Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (Microsoft Research ใ‹ใ‚‰) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (Facebook AI ใ‹ใ‚‰), Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (Facebook AI ใ‹ใ‚‰), Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (Meta AI ใ‹ใ‚‰) Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (Google/CMU ใ‹ใ‚‰) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (Facebook AI ใ‹ใ‚‰) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (Facebook AI ใ‹ใ‚‰) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (Huazhong University of Science & Technology ใ‹ใ‚‰) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (the University of Wisconsin - Madison ใ‹ใ‚‰) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ใ‹ใ‚‰ๅ…ฌ้–‹ใ•ใ‚ŒใŸ็ ”็ฉถ่ซ–ๆ–‡: [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) -1. ๆ–ฐใ—ใ„ใƒขใƒ‡ใƒซใ‚’ๆŠ•็จฟใ—ใŸใ„ใงใ™ใ‹๏ผŸๆ–ฐใ—ใ„ใƒขใƒ‡ใƒซใ‚’่ฟฝๅŠ ใ™ใ‚‹ใŸใ‚ใฎใ‚ฌใ‚คใƒ‰ใจใ—ใฆใ€**่ฉณ็ดฐใชใ‚ฌใ‚คใƒ‰ใจใƒ†ใƒณใƒ—ใƒฌใƒผใƒˆ**ใŒ่ฟฝๅŠ ใ•ใ‚Œใพใ—ใŸใ€‚ใ“ใ‚Œใ‚‰ใฏใƒชใƒใ‚ธใƒˆใƒชใฎ[`templates`](./templates)ใƒ•ใ‚ฉใƒซใƒ€ใซใ‚ใ‚Šใพใ™ใ€‚PRใ‚’ๅง‹ใ‚ใ‚‹ๅ‰ใซใ€ๅฟ…ใš[ใ‚ณใƒณใƒˆใƒชใƒ“ใƒฅใƒผใ‚ทใƒงใƒณใ‚ฌใ‚คใƒ‰](./CONTRIBUTING.md)ใ‚’็ขบ่ชใ—ใ€ใƒกใƒณใƒ†ใƒŠใซ้€ฃ็ตกใ™ใ‚‹ใ‹ใ€ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏใ‚’ๅŽ้›†ใ™ใ‚‹ใŸใ‚ใซissueใ‚’้–‹ใ„ใฆใใ ใ•ใ„ใ€‚ - -ๅ„ใƒขใƒ‡ใƒซใŒFlaxใ€PyTorchใ€TensorFlowใงๅฎŸ่ฃ…ใ•ใ‚Œใฆใ„ใ‚‹ใ‹ใ€๐Ÿค—Tokenizersใƒฉใ‚คใƒ–ใƒฉใƒชใซๆ”ฏใˆใ‚‰ใ‚ŒใŸ้–ข้€ฃใƒˆใƒผใ‚ฏใƒŠใ‚คใ‚ถใ‚’ๆŒใฃใฆใ„ใ‚‹ใ‹ใฏใ€[ใ“ใฎ่กจ](https://huggingface.co/docs/transformers/index#supported-frameworks)ใ‚’ๅ‚็…งใ—ใฆใใ ใ•ใ„ใ€‚ - -ใ“ใ‚Œใ‚‰ใฎๅฎŸ่ฃ…ใฏใ„ใใคใ‹ใฎใƒ‡ใƒผใ‚ฟใ‚ปใƒƒใƒˆใงใƒ†ใ‚นใƒˆใ•ใ‚ŒใฆใŠใ‚Š(ใ‚ตใƒณใƒ—ใƒซใ‚นใ‚ฏใƒชใƒ—ใƒˆใ‚’ๅ‚็…ง)ใ€ใ‚ชใƒชใ‚ธใƒŠใƒซใฎๅฎŸ่ฃ…ใฎๆ€ง่ƒฝใจไธ€่‡ดใ™ใ‚‹ใฏใšใงใ‚ใ‚‹ใ€‚ๆ€ง่ƒฝใฎ่ฉณ็ดฐใฏ[documentation](https://github.com/huggingface/transformers/tree/main/examples)ใฎExamplesใ‚ปใ‚ฏใ‚ทใƒงใƒณใง่ฆ‹ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ - - -## ใ•ใ‚‰ใซ่ฉณใ—ใ - -| ใ‚ปใ‚ฏใ‚ทใƒงใƒณ | ๆฆ‚่ฆ | -|-|-| -| [ใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆ](https://huggingface.co/docs/transformers/) | ๅฎŒๅ…จใชAPIใƒ‰ใ‚ญใƒฅใƒกใƒณใƒˆใจใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ | -| [ใ‚ฟใ‚นใ‚ฏๆฆ‚่ฆ](https://huggingface.co/docs/transformers/task_summary) | ๐Ÿค—TransformersใŒใ‚ตใƒใƒผใƒˆใ™ใ‚‹ใ‚ฟใ‚นใ‚ฏ | -| [ๅ‰ๅ‡ฆ็†ใƒใƒฅใƒผใƒˆใƒชใ‚ขใƒซ](https://huggingface.co/docs/transformers/preprocessing) | ใƒขใƒ‡ใƒซ็”จใฎใƒ‡ใƒผใ‚ฟใ‚’ๆบ–ๅ‚™ใ™ใ‚‹ใŸใ‚ใซ`Tokenizer`ใ‚ฏใƒฉใ‚นใ‚’ไฝฟ็”จ | -| [ใƒˆใƒฌใƒผใƒ‹ใƒณใ‚ฐใจๅพฎ่ชฟๆ•ด](https://huggingface.co/docs/transformers/training) | PyTorch/TensorFlowใฎๅญฆ็ฟ’ใƒซใƒผใƒ—ใจ`Trainer`APIใง๐Ÿค—TransformersใŒๆไพ›ใ™ใ‚‹ใƒขใƒ‡ใƒซใ‚’ไฝฟ็”จ | -| [ใ‚ฏใ‚คใƒƒใ‚ฏใƒ„ใ‚ขใƒผ: ๅพฎ่ชฟๆ•ด/ไฝฟ็”จๆ–นๆณ•ใ‚นใ‚ฏใƒชใƒ—ใƒˆ](https://github.com/huggingface/transformers/tree/main/examples) | ๆง˜ใ€…ใชใ‚ฟใ‚นใ‚ฏใงใƒขใƒ‡ใƒซใฎๅพฎ่ชฟๆ•ดใ‚’่กŒใ†ใŸใ‚ใฎใ‚นใ‚ฏใƒชใƒ—ใƒˆไพ‹ | -| [ใƒขใƒ‡ใƒซใฎๅ…ฑๆœ‰ใจใ‚ขใƒƒใƒ—ใƒญใƒผใƒ‰](https://huggingface.co/docs/transformers/model_sharing) | ๅพฎ่ชฟๆ•ดใ—ใŸใƒขใƒ‡ใƒซใ‚’ใ‚ขใƒƒใƒ—ใƒญใƒผใƒ‰ใ—ใฆใ‚ณใƒŸใƒฅใƒ‹ใƒ†ใ‚ฃใงๅ…ฑๆœ‰ใ™ใ‚‹ | -| [ใƒžใ‚คใ‚ฐใƒฌใƒผใ‚ทใƒงใƒณ](https://huggingface.co/docs/transformers/migration) | `pytorch-transformers`ใพใŸใฏ`pytorch-pretrained-bert`ใ‹ใ‚‰๐Ÿค—Transformers ใซ็งป่กŒใ™ใ‚‹ | - -## ๅผ•็”จ - -๐Ÿค— ใƒˆใƒฉใƒณใ‚นใƒ•ใ‚ฉใƒผใƒžใƒผใƒฉใ‚คใƒ–ใƒฉใƒชใซๅผ•็”จใงใใ‚‹[่ซ–ๆ–‡](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)ใŒๅ‡บๆฅใพใ—ใŸ: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_ko.md b/README_ko.md deleted file mode 100644 index cd71488d1f455b..00000000000000 --- a/README_ko.md +++ /dev/null @@ -1,493 +0,0 @@ - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

Jax, Pytorch, TensorFlow๋ฅผ ์œ„ํ•œ ์ตœ์ฒจ๋‹จ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ

-

- -

- -

- -๐Ÿค— Transformers๋Š” ๋ถ„๋ฅ˜, ์ •๋ณด ์ถ”์ถœ, ์งˆ๋ฌธ ๋‹ต๋ณ€, ์š”์•ฝ, ๋ฒˆ์—ญ, ๋ฌธ์žฅ ์ƒ์„ฑ ๋“ฑ์„ 100๊ฐœ ์ด์ƒ์˜ ์–ธ์–ด๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ฒœ๊ฐœ์˜ ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ๋ชจ๋‘๊ฐ€ ์ตœ์ฒจ๋‹จ์˜ NLP ๊ธฐ์ˆ ์„ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. - -๐Ÿค— Transformers๋Š” ์ด๋Ÿฌํ•œ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ๋น ๋ฅด๊ฒŒ ๋‹ค์šด๋กœ๋“œํ•ด ํŠน์ • ํ…์ŠคํŠธ์— ์‚ฌ์šฉํ•˜๊ณ , ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋กœ fine-tuningํ•ด ์ปค๋ฎค๋‹ˆํ‹ฐ๋‚˜ ์šฐ๋ฆฌ์˜ [๋ชจ๋ธ ํ—ˆ๋ธŒ](https://huggingface.co/models)์— ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ •์˜ํ•˜๋Š” ๊ฐ ํŒŒ์ด์ฌ ๋ชจ๋“ˆ์€ ์™„์ „ํžˆ ๋…๋ฆฝ์ ์ด์—ฌ์„œ ์—ฐ๊ตฌ ์‹คํ—˜์„ ์œ„ํ•ด ์†์‰ฝ๊ฒŒ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -๐Ÿค— Transformers๋Š” ๊ฐ€์žฅ ์œ ๋ช…ํ•œ 3๊ฐœ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋“ค์€ ์„œ๋กœ ์™„๋ฒฝํžˆ ์—ฐ๋™๋ฉ๋‹ˆ๋‹ค โ€” [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/). ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘ ํ•˜๋‚˜๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ , ๋˜ ๋‹ค๋ฅธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์ถ”๋ก ์„ ์œ„ํ•ด ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์˜จ๋ผ์ธ ๋ฐ๋ชจ - -๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์„ [๋ชจ๋ธ ํ—ˆ๋ธŒ](https://huggingface.co/models) ํŽ˜์ด์ง€์—์„œ ๋ฐ”๋กœ ํ…Œ์ŠคํŠธํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณต๊ฐœ ๋ฐ ๋น„๊ณต๊ฐœ ๋ชจ๋ธ์„ ์œ„ํ•œ [๋น„๊ณต๊ฐœ ๋ชจ๋ธ ํ˜ธ์ŠคํŒ…, ๋ฒ„์ „ ๊ด€๋ฆฌ, ์ถ”๋ก  API](https://huggingface.co/pricing)๋„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. - -์˜ˆ์‹œ: -- [BERT๋กœ ๋งˆ์Šคํ‚น๋œ ๋‹จ์–ด ์™„์„ฑํ•˜๊ธฐ](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Electra๋ฅผ ์ด์šฉํ•œ ๊ฐœ์ฒด๋ช… ์ธ์‹](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [GPT-2๋กœ ํ…์ŠคํŠธ ์ƒ์„ฑํ•˜๊ธฐ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [RoBERTa๋กœ ์ž์—ฐ์–ด ์ถ”๋ก ํ•˜๊ธฐ](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [BART๋ฅผ ์ด์šฉํ•œ ์š”์•ฝ](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [DistilBERT๋ฅผ ์ด์šฉํ•œ ์งˆ๋ฌธ ๋‹ต๋ณ€](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [T5๋กœ ๋ฒˆ์—ญํ•˜๊ธฐ](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -**[Transformer์™€ ๊ธ€์“ฐ๊ธฐ](https://transformer.huggingface.co)** ๋Š” ์ด ์ €์žฅ์†Œ์˜ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋Šฅ๋ ฅ์— ๊ด€ํ•œ Hugging Face ํŒ€์˜ ๊ณต์‹ ๋ฐ๋ชจ์ž…๋‹ˆ๋‹ค. - -## Hugging Face ํŒ€์˜ ์ปค์Šคํ…€ ์ง€์›์„ ์›ํ•œ๋‹ค๋ฉด - - - HuggingFace Expert Acceleration Program -
- -## ํ€ต ํˆฌ์–ด - -์›ํ•˜๋Š” ํ…์ŠคํŠธ์— ๋ฐ”๋กœ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก, ์šฐ๋ฆฌ๋Š” `pipeline` API๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Pipeline์€ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ๊ณผ ๊ทธ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ๋•Œ ์ ์šฉํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐฉ์‹์„ ํ•˜๋‚˜๋กœ ํ•ฉ์นฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๊ธ์ •์ ์ธ ํ…์ŠคํŠธ์™€ ๋ถ€์ •์ ์ธ ํ…์ŠคํŠธ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด pipeline์„ ์‚ฌ์šฉํ•œ ๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค: - -```python ->>> from transformers import pipeline - -# Allocate a pipeline for sentiment-analysis ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -์ฝ”๋“œ์˜ ๋‘๋ฒˆ์งธ ์ค„์€ pipeline์ด ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹œ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์„ธ๋ฒˆ์งธ ์ค„์—์„  ๊ทธ ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ชจ๋ธ์€ 99.97%์˜ ํ™•๋ฅ ๋กœ ํ…์ŠคํŠธ๊ฐ€ ๊ธ์ •์ ์ด๋ผ๊ณ  ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. - -๋งŽ์€ NLP ๊ณผ์ œ๋“ค์„ `pipeline`์œผ๋กœ ๋ฐ”๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์งˆ๋ฌธ๊ณผ ๋ฌธ๋งฅ์ด ์ฃผ์–ด์ง€๋ฉด ์†์‰ฝ๊ฒŒ ๋‹ต๋ณ€์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - -``` python ->>> from transformers import pipeline - -# Allocate a pipeline for question-answering ->>> question_answerer = pipeline('question-answering') ->>> question_answerer({ -... 'question': 'What is the name of the repository ?', -... 'context': 'Pipeline has been included in the huggingface/transformers repository' -... }) -{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'} - -``` - -๋‹ต๋ณ€๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์—ฌ๊ธฐ์— ์‚ฌ์šฉ๋œ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์€ ํ™•์‹ ๋„์™€ ํ† ํฌ๋‚˜์ด์ฆˆ๋œ ๋ฌธ์žฅ ์† ๋‹ต๋ณ€์˜ ์‹œ์ž‘์ , ๋์ ๊นŒ์ง€ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. [์ด ํŠœํ† ๋ฆฌ์–ผ](https://huggingface.co/docs/transformers/task_summary)์—์„œ `pipeline` API๊ฐ€ ์ง€์›ํ•˜๋Š” ๋‹ค์–‘ํ•œ ๊ณผ์ œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -์ฝ”๋“œ 3์ค„๋กœ ์›ํ•˜๋Š” ๊ณผ์ œ์— ๋งž๊ฒŒ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ PyTorch ๋ฒ„์ „์ž…๋‹ˆ๋‹ค: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` -๋‹ค์Œ์€ TensorFlow ๋ฒ„์ „์ž…๋‹ˆ๋‹ค: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -ํ† ํฌ๋‚˜์ด์ €๋Š” ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ ๋ชจ๋“  ์ „์ฒ˜๋ฆฌ๋ฅผ ์ฑ…์ž„์ง‘๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  (์œ„์˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ) 1๊ฐœ์˜ ์ŠคํŠธ๋ง์ด๋‚˜ ๋ฆฌ์ŠคํŠธ๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ €๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š”๋ฐ, ์ด๋Š” ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ฝ”๋“œ์— ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์–ธํŒจํ‚น ์—ฐ์‚ฐ์ž ** ๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์— ๋ฐ”๋กœ ์ „๋‹ฌํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. - -๋ชจ๋ธ ์ž์ฒด๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)๋‚˜ [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)์ž…๋‹ˆ๋‹ค. [์ด ํŠœํ† ๋ฆฌ์–ผ](https://huggingface.co/transformers/training.html)์€ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ํ‘œ์ค€์ ์ธ PyTorch๋‚˜ TensorFlow ํ•™์Šต ๊ณผ์ •์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•, ๋˜๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ fine-tuneํ•˜๊ธฐ ์œ„ํ•ด `Trainer` API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ด์ค๋‹ˆ๋‹ค. - -## ์™œ transformers๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ• ๊นŒ์š”? - -1. ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ: - - NLU์™€ NLG ๊ณผ์ œ์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. - - ๊ต์œก์ž ์‹ค๋ฌด์ž์—๊ฒŒ ์ง„์ž… ์žฅ๋ฒฝ์ด ๋‚ฎ์Šต๋‹ˆ๋‹ค. - - 3๊ฐœ์˜ ํด๋ž˜์Šค๋งŒ ๋ฐฐ์šฐ๋ฉด ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - - ํ•˜๋‚˜์˜ API๋กœ ๋ชจ๋“  ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -1. ๋” ์ ์€ ๊ณ„์‚ฐ ๋น„์šฉ, ๋” ์ ์€ ํƒ„์†Œ ๋ฐœ์ž๊ตญ: - - ์—ฐ๊ตฌ์ž๋“ค์€ ๋ชจ๋ธ์„ ๊ณ„์† ๋‹ค์‹œ ํ•™์Šต์‹œํ‚ค๋Š” ๋Œ€์‹  ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - - ์‹ค๋ฌด์ž๋“ค์€ ํ•™์Šต์— ํ•„์š”ํ•œ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - - ์ˆ˜์‹ญ๊ฐœ์˜ ๋ชจ๋ธ ๊ตฌ์กฐ, 2,000๊ฐœ ์ด์ƒ์˜ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ, 100๊ฐœ ์ด์ƒ์˜ ์–ธ์–ด๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ ๋“ฑ. - -1. ๋ชจ๋ธ์˜ ๊ฐ ์ƒ์• ์ฃผ๊ธฐ์— ์ ํ•ฉํ•œ ํ”„๋ ˆ์ž„์›Œํฌ: - - ์ฝ”๋“œ 3์ค„๋กœ ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์„ธ์š”. - - ์ž์œ ๋กญ๊ฒŒ ๋ชจ๋ธ์„ TF2.0๋‚˜ PyTorch ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ๋ณ€ํ™˜ํ•˜์„ธ์š”. - - ํ•™์Šต, ํ‰๊ฐ€, ๊ณต๊ฐœ ๋“ฑ ๊ฐ ๋‹จ๊ณ„์— ๋งž๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์›ํ•˜๋Š”๋Œ€๋กœ ์„ ํƒํ•˜์„ธ์š”. - -1. ํ•„์š”ํ•œ ๋Œ€๋กœ ๋ชจ๋ธ์ด๋‚˜ ์˜ˆ์‹œ๋ฅผ ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆํ•˜์„ธ์š”: - - ์šฐ๋ฆฌ๋Š” ์ €์ž๊ฐ€ ๊ณต๊ฐœํ•œ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ๋ชจ๋ธ ๊ตฌ์กฐ์˜ ์˜ˆ์‹œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. - - ๋ชจ๋ธ ๋‚ด๋ถ€ ๊ตฌ์กฐ๋Š” ๊ฐ€๋Šฅํ•œ ์ผ๊ด€์ ์œผ๋กœ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. - - ๋น ๋ฅธ ์‹คํ—˜์„ ์œ„ํ•ด ๋ชจ๋ธ ํŒŒ์ผ์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋…๋ฆฝ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์™œ transformers๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋ง์•„์•ผ ํ• ๊นŒ์š”? - -- ์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์‹ ๊ฒฝ๋ง ๋ธ”๋ก์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ชจ๋“ˆ์ด ์•„๋‹™๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ž๋“ค์ด ์—ฌ๋Ÿฌ ํŒŒ์ผ์„ ์‚ดํŽด๋ณด์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ๊ฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก, ๋ชจ๋ธ ํŒŒ์ผ ์ฝ”๋“œ์˜ ์ถ”์ƒํ™” ์ˆ˜์ค€์„ ์ ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค. -- ํ•™์Šต API๋Š” ๋ชจ๋“  ๋ชจ๋ธ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์–ด์ง€์ง„ ์•Š์•˜์ง€๋งŒ, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋ธ๋“ค์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ๋จธ์‹  ๋Ÿฌ๋‹์„ ์œ„ํ•ด์„ , ๋‹ค๋ฅธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. -- ๊ฐ€๋Šฅํ•œ ๋งŽ์€ ์‚ฌ์šฉ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ๋“œ๋ฆฌ๊ณ  ์‹ถ์–ด์„œ, [์˜ˆ์‹œ ํด๋”](https://github.com/huggingface/transformers/tree/main/examples)์˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์Šคํฌ๋ฆฝํŠธ๋“ค์„ ์ˆ˜์ • ์—†์ด ํŠน์ •ํ•œ ๋ฌธ์ œ์— ๋ฐ”๋กœ ์ ์šฉํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•„์š”์— ๋งž๊ฒŒ ์ผ๋ถ€ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์„ค์น˜ - -### pip๋กœ ์„ค์น˜ํ•˜๊ธฐ - -์ด ์ €์žฅ์†Œ๋Š” Python 3.8+, Flax 0.4.1+, PyTorch 1.10+, TensorFlow 2.6+์—์„œ ํ…Œ์ŠคํŠธ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. - -[๊ฐ€์ƒ ํ™˜๊ฒฝ](https://docs.python.org/3/library/venv.html)์— ๐Ÿค— Transformers๋ฅผ ์„ค์น˜ํ•˜์„ธ์š”. Python ๊ฐ€์ƒ ํ™˜๊ฒฝ์— ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด, [์‚ฌ์šฉ์ž ๊ฐ€์ด๋“œ](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)๋ฅผ ํ™•์ธํ•˜์„ธ์š”. - -์šฐ์„ , ์‚ฌ์šฉํ•  Python ๋ฒ„์ „์œผ๋กœ ๊ฐ€์ƒ ํ™˜๊ฒฝ์„ ๋งŒ๋“ค๊ณ  ์‹คํ–‰ํ•˜์„ธ์š”. - -๊ทธ ๋‹ค์Œ, Flax, PyTorch, TensorFlow ์ค‘ ์ ์–ด๋„ ํ•˜๋‚˜๋Š” ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. -ํ”Œ๋žซํผ์— ๋งž๋Š” ์„ค์น˜ ๋ช…๋ น์–ด๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด [TensorFlow ์„ค์น˜ ํŽ˜์ด์ง€](https://www.tensorflow.org/install/), [PyTorch ์„ค์น˜ ํŽ˜์ด์ง€](https://pytorch.org/get-started/locally/#start-locally), [Flax ์„ค์น˜ ํŽ˜์ด์ง€](https://github.com/google/flax#quick-install)๋ฅผ ํ™•์ธํ•˜์„ธ์š”. - -์ด๋“ค ์ค‘ ์ ์–ด๋„ ํ•˜๋‚˜๊ฐ€ ์„ค์น˜๋˜์—ˆ๋‹ค๋ฉด, ๐Ÿค— Transformers๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด pip์„ ์ด์šฉํ•ด ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - -```bash -pip install transformers -``` - -์˜ˆ์‹œ๋“ค์„ ์ฒดํ—˜ํ•ด๋ณด๊ณ  ์‹ถ๊ฑฐ๋‚˜, ์ตœ์ตœ์ตœ์ฒจ๋‹จ ์ฝ”๋“œ๋ฅผ ์›ํ•˜๊ฑฐ๋‚˜, ์ƒˆ๋กœ์šด ๋ฒ„์ „์ด ๋‚˜์˜ฌ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆด ์ˆ˜ ์—†๋‹ค๋ฉด [๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์†Œ์Šค์—์„œ ๋ฐ”๋กœ ์„ค์น˜](https://huggingface.co/docs/transformers/installation#installing-from-source)ํ•˜์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค. - -### conda๋กœ ์„ค์น˜ํ•˜๊ธฐ - -Transformers ๋ฒ„์ „ v4.0.0๋ถ€ํ„ฐ, conda ์ฑ„๋„์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค: `huggingface`. - -๐Ÿค— Transformers๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด conda๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - -```shell script -conda install -c huggingface transformers -``` - -Flax, PyTorch, TensorFlow ์„ค์น˜ ํŽ˜์ด์ง€์—์„œ ์ด๋“ค์„ conda๋กœ ์„ค์น˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ™•์ธํ•˜์„ธ์š”. - -## ๋ชจ๋ธ ๊ตฌ์กฐ - -**๐Ÿค— Transformers๊ฐ€ ์ œ๊ณตํ•˜๋Š” [๋ชจ๋“  ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ](https://huggingface.co/models)** ๋Š” huggingface.co [๋ชจ๋ธ ํ—ˆ๋ธŒ](https://huggingface.co)์— ์™„๋ฒฝํžˆ ์—ฐ๋™๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. [๊ฐœ์ธ](https://huggingface.co/users)๊ณผ [๊ธฐ๊ด€](https://huggingface.co/organizations)์ด ๋ชจ๋ธ ํ—ˆ๋ธŒ์— ์ง์ ‘ ์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -ํ˜„์žฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ์˜ ๊ฐœ์ˆ˜: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— Transformers๋Š” ๋‹ค์Œ ๋ชจ๋ธ๋“ค์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค (๊ฐ ๋ชจ๋ธ์˜ ์š”์•ฝ์€ [์—ฌ๊ธฐ](https://huggingface.co/docs/transformers/model_summary)์„œ ํ™•์ธํ•˜์„ธ์š”): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research ์—์„œ ์ œ๊ณต)์€ Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.์˜ [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (Salesforce ์—์„œ ์ œ๊ณต)์€ Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi.์˜ [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (Alexa ์—์„œ) Adrian de Wynter and Daniel J. Perry ์˜ [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (NAVER CLOVA ์—์„œ ์ œ๊ณต)์€ Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park.์˜ [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (Google Research ์—์„œ) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ์˜ [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (Inria/Facebook/Sorbonne ์—์„œ) Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot ์˜ [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (Google Research ์—์„œ) Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ์˜ [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (OFA-Sys ์—์„œ) An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ์˜ [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI ์—์„œ ์ œ๊ณต)์€ Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.์˜ [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI ์—์„œ) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ์˜ [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (University of Gรถttingen ์—์„œ) Timo Lรผddecke and Alexander Ecker ์˜ [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (Salesforce ์—์„œ) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ์˜ [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI ์—์„œ ์ œ๊ณต)์€ Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.์˜ [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (Microsoft Research Asia ์—์„œ) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ์˜ [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (YituTech ์—์„œ) Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ์˜ [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (Facebook AI ์—์„œ) Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ์˜ [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (Tsinghua University ์—์„œ) Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ์˜ [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (Salesforce ์—์„œ) Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ์˜ [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (Microsoft ์—์„œ) Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ์˜ [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (Facebook ์—์„œ) Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ์˜ [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (Microsoft ์—์„œ) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ์˜ [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (Microsoft ์—์„œ) Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ์˜ [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (Berkeley/Facebook/Google ์—์„œ) Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ์˜ [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (SenseTime Research ์—์„œ) Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ์˜ [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (Facebook ์—์„œ) Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou ์˜ [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (Google AI ์—์„œ ์ œ๊ณต)์€ Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun.์˜ [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (The University of Texas at Austin ์—์„œ ์ œ๊ณต)์€ Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl.์˜ [NMS Strikes Back](https://arxiv.org/abs/2212.06137)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (Facebook ์—์„œ) Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ์˜ [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (Microsoft Research ์—์„œ) Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ์˜ [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (SHI Labs ์—์„œ) Ali Hassani and Humphrey Shi ์˜ [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski.์˜ [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (HuggingFace ์—์„œ) Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT ์˜ [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (Microsoft Research ์—์„œ) Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ์˜ [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (NAVER ์—์„œ) Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ์˜ [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (Facebook ์—์„œ) Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ์˜ [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (Intel Labs ์—์„œ) Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun ์˜ [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (Google Research/Stanford University ์—์„œ) Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ์˜ [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.์˜ [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (Google Research ์—์„œ) Sascha Rothe, Shashi Narayan, Aliaksei Severyn ์˜ [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (Baidu ์—์„œ) Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ์˜ [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (Baidu ์—์„œ ์ œ๊ณต)์€ Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.์˜ [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๊ณต๊ฐœ [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (EleutherAI ์—์„œ) Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbac ์˜ [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (OpenAI ์—์„œ) Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** ์˜ [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (AI-Sweden ์—์„œ) Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. ์˜ [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (BigCode ์—์„œ ์ œ๊ณต)์€ Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.์˜ [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu ์˜ [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (UCSD, NVIDIA ์—์„œ) Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ์˜ [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (Allegro.pl, AGH University of Science and Technology ์—์„œ ์ œ๊ณต)์€ Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.์˜ [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (Facebook ์—์„œ) Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ์˜ [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (Berkeley ์—์„œ) Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ์˜ [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (OpenAI ์—์„œ) Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ์˜ [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce ์—์„œ ์ œ๊ณต)์€ Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.์˜ [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (OpenAI ์—์„œ) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever ์˜ [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (Microsoft Research Asia ์—์„œ) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ์˜ [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (Microsoft Research Asia ์—์„œ) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ์˜ [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (Microsoft Research Asia ์—์„œ) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ์˜ [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (Microsoft Research Asia ์—์„œ) Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ์˜ [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (AllenAI ์—์„œ) Iz Beltagy, Matthew E. Peters, Arman Cohan ์˜ [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (Meta AI ์—์„œ) Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze ์˜ [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (South China University of Technology ์—์„œ) Jiapeng Wang, Lianwen Jin, Kai Ding ์˜ [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (The FAIR team of Meta AI ์—์„œ ์ œ๊ณต)์€ Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.์˜ [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (The FAIR team of Meta AI ์—์„œ ์ œ๊ณต)์€ Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom..์˜ [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (Microsoft Research & University of Wisconsin-Madison ์—์„œ ์ œ๊ณต)์€ Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.์˜ [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (AllenAI ์—์„œ) Iz Beltagy, Matthew E. Peters, Arman Cohan ์˜ [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (Google AI ์—์„œ) Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ์˜ [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (Studio Ousia ์—์„œ) Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ์˜ [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (UNC Chapel Hill ์—์„œ) Hao Tan and Mohit Bansal ์˜ [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (Facebook ์—์„œ) Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ์˜ [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (Facebook ์—์„œ) Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ์˜ [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (Microsoft Research Asia ์—์„œ) Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ์˜ [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (FAIR and UIUC ์—์„œ ์ œ๊ณต)์€ Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.์˜ [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (Meta and UIUC ์—์„œ) Bowen Cheng, Alexander G. Schwing, Alexander Kirillov ์˜ [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (Google AI ์—์„œ ์ œ๊ณต)์€ Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos.์˜ [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ์—์„œ) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ์˜ [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (Facebook ์—์„œ) Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ์˜ [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (Facebook ์—์„œ ์ œ๊ณต)์€ Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.์˜ [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (NVIDIA ์—์„œ) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ์˜ [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (NVIDIA ์—์„œ) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ์˜ [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (Alibaba Research ์—์„œ ์ œ๊ณต)์€ Peng Wang, Cheng Da, and Cong Yao.์˜ [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (Studio Ousia ์—์„œ) Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ์˜ [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (Facebook ์—์„œ ์ œ๊ณต)์€ Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.์˜ [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (CMU/Google Brain ์—์„œ) Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ์˜ [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (Google Inc. ์—์„œ) Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ์˜ [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (Google Inc. ์—์„œ) Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ์˜ [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (Apple ์—์„œ) Sachin Mehta and Mohammad Rastegari ์˜ [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (Apple ์—์„œ ์ œ๊ณต)์€ Sachin Mehta and Mohammad Rastegari.์˜ [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (Microsoft Research ์—์„œ) Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ์˜ [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (MosaiML ์—์„œ ์ œ๊ณต)์€ the MosaicML NLP Team.์˜ [llm-foundry](https://github.com/mosaicml/llm-foundry/)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (the University of Wisconsin - Madison ์—์„œ ์ œ๊ณต)์€ Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh.์˜ [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (Google AI ์—์„œ) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ์˜ [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (RUC AI Box ์—์„œ) Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ์˜ [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (SHI Labs ์—์„œ) Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ์˜ [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (Huawei Noahโ€™s Ark Lab ์—์„œ) Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ์˜ [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (Meta ์—์„œ) the NLLB team ์˜ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (Meta ์—์„œ ์ œ๊ณต)์€ the NLLB team.์˜ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.์˜ [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (the University of Wisconsin - Madison ์—์„œ) Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ์˜ [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (SHI Labs ์—์„œ) Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ์˜ [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (Meta AI ์—์„œ) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ์˜ [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI ์—์„œ) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ์˜ [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI ์—์„œ ์ œ๊ณต)์€ Matthias Minderer, Alexey Gritsenko, Neil Houlsby.์˜ [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** ( IBM Research ์—์„œ ์ œ๊ณต)์€ Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.์˜ [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (IBM ์—์„œ ์ œ๊ณต)์€ Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.์˜ [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (Google ์—์„œ) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ์˜ [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google ์—์„œ) Jason Phang, Yao Zhao, Peter J. Liu ์˜ [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (Deepmind ์—์„œ) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira ์˜ [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT ์—์„œ ์ œ๊ณต)์€ Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.์˜ [blog post](https://www.adept.ai/blog/persimmon-8b)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research ์—์„œ) Dat Quoc Nguyen and Anh Tuan Nguyen ์˜ [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google ์—์„œ ์ œ๊ณต)์€ Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.์˜ [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP ์—์„œ) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ์˜ [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (Sea AI Labs ์—์„œ) Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ์˜ [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (Microsoft Research ์—์„œ) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ์˜ [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (Nanjing University, The University of Hong Kong etc. ์—์„œ ์ œ๊ณต)์€ Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.์˜ [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (NVIDIA ์—์„œ) Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ์˜ [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (Facebook ์—์„œ) Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela ์˜ [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (Google Research ์—์„œ) Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ์˜ [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (Google Research ์—์„œ) Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya ์˜ [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (META Research ์—์„œ) Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr ์˜ [Designing Network Design Space](https://arxiv.org/abs/2003.13678) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (Google Research ์—์„œ) Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder ์˜ [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (Microsoft Research ์—์„œ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun ์˜ [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (Facebook ์—์„œ) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ์˜ a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (Facebook ์—์„œ) Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ์˜ [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (WeChatAI ์—์„œ) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ์˜ [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ZhuiyiTechnology ์—์„œ) Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ์˜ a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng ์—์„œ ์ œ๊ณต)์€ Bo Peng.์˜ [this repo](https://github.com/BlinkDL/RWKV-LM)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (NVIDIA ์—์„œ) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ์˜ [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.์˜ [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP ์—์„œ) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ์˜ [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (ASAPP ์—์„œ) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ์˜ [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (Microsoft Research ์—์„œ ์ œ๊ณต)์€ Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.์˜ [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (Facebook ์—์„œ) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ์˜ [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (Facebook ์—์„œ) Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ์˜ [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (Tel Aviv University ์—์„œ) Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ์˜ [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (Berkeley ์—์„œ) Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ์˜ [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (MBZUAI ์—์„œ ์ œ๊ณต)์€ Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.์˜ [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (Microsoft ์—์„œ) Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ์˜ [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (Microsoft ์—์„œ) Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ์˜ [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (University of Wรผrzburg ์—์„œ) Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ์˜ [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (Google ์—์„œ) William Fedus, Barret Zoph, Noam Shazeer. ์˜ [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (Google AI ์—์„œ) Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ์˜ [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (Microsoft Research ์—์„œ) Brandon Smock, Rohith Pesala, Robin Abraham ์˜ [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (Google AI ์—์„œ) Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos ์˜ [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (Microsoft Research ์—์„œ) Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ์˜ [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (Facebook ์—์„œ) Gedas Bertasius, Heng Wang, Lorenzo Torresani ์˜ [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (the University of California at Berkeley ์—์„œ) Michael Janner, Qiyang Li, Sergey Levin ์˜ [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (Google/CMU ์—์„œ) Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ์˜ [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (Microsoft ์—์„œ) Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ์˜ [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill ์—์„œ) Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal ์˜ [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (Intel ์—์„œ) Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding ์˜ [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (Google Research ์—์„œ) Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzle ์˜ [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (Google Research ์—์„œ ์ œ๊ณต)์€ Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.์˜ [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (Microsoft Research ์—์„œ) Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ์˜ [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (Microsoft Research ์—์„œ) Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ์˜ [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (Peking University ์—์„œ ์ œ๊ณต)์€ Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun.์˜ [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (Tsinghua University and Nankai University ์—์„œ) Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ์˜ [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (Multimedia Computing Group, Nanjing University ์—์„œ) Zhan Tong, Yibing Song, Jue Wang, Limin Wang ์˜ [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (NAVER AI Lab/Kakao Enterprise/Kakao Brain ์—์„œ) Wonjae Kim, Bokyung Son, Ildoo Kim ์˜ [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (University of Wisconsinโ€“Madison ์—์„œ ์ œ๊ณต)์€ Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.์˜ [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (Google AI ์—์„œ) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ์˜ [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (UCLA NLP ์—์„œ) Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ์˜ [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (Google AI ์—์„œ) Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ์˜ [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He.์˜ [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (Meta AI ์—์„œ) Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick ์˜ [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (HUST-VL ์—์„œ ์ œ๊ณต)์€ Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.์˜ [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (Meta AI ์—์„œ) Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ์˜ [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (Kakao Enterprise ์—์„œ ์ œ๊ณต)์€ Jaehyeon Kim, Jungil Kong, Juhee Son.์˜ [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (Facebook AI ์—์„œ) Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ์˜ [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (Facebook AI ์—์„œ) Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ์˜ [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (Facebook AI ์—์„œ) Qiantong Xu, Alexei Baevski, Michael Auli ์˜ [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (Microsoft Research ์—์„œ) Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei ์˜ [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (OpenAI ์—์„œ) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ์˜ [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (Microsoft Research ์—์„œ) Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ์˜ [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (Meta AI ์—์„œ ์ œ๊ณต)์€ Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe.์˜ [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255)๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (Facebook AI ์—์„œ ์ œ๊ณต) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li ์˜ [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (Facebook ์—์„œ) Guillaume Lample and Alexis Conneau ์˜ [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (Microsoft Research ์—์„œ) Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ์˜ [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (Facebook AI ์—์„œ) Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ์˜ [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (Facebook AI ์—์„œ) Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ์˜ [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (Meta AI ์—์„œ) Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa ์˜ [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (Google/CMU ์—์„œ) Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ์˜ [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (Facebook AI ์—์„œ) Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ์˜ [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (Facebook AI ์—์„œ) Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ์˜ [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (Huazhong University of Science & Technology ์—์„œ) Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ์˜ [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (the University of Wisconsin - Madison ์—์„œ) Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ์˜ [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) ๋…ผ๋ฌธ๊ณผ ํ•จ๊ป˜ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค. -1. ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ๊ณ  ์‹ถ๋‚˜์š”? ์šฐ๋ฆฌ๊ฐ€ **์ƒ์„ธํ•œ ๊ฐ€์ด๋“œ์™€ ํ…œํ”Œ๋ฆฟ** ์œผ๋กœ ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ๋„๋ก ๋„์™€๋“œ๋ฆด๊ฒŒ์š”. ๊ฐ€์ด๋“œ์™€ ํ…œํ”Œ๋ฆฟ์€ ์ด ์ €์žฅ์†Œ์˜ [`templates`](./templates) ํด๋”์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [์ปจํŠธ๋ฆฌ๋ทฐ์…˜ ๊ฐ€์ด๋“œ๋ผ์ธ](./CONTRIBUTING.md)์„ ๊ผญ ํ™•์ธํ•ด์ฃผ์‹œ๊ณ , PR์„ ์˜ฌ๋ฆฌ๊ธฐ ์ „์— ๋ฉ”์ธํ…Œ์ด๋„ˆ์—๊ฒŒ ์—ฐ๋ฝํ•˜๊ฑฐ๋‚˜ ์ด์Šˆ๋ฅผ ์˜คํ”ˆํ•ด ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์œผ์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค. - -๊ฐ ๋ชจ๋ธ์ด Flax, PyTorch, TensorFlow์œผ๋กœ ๊ตฌํ˜„๋˜์—ˆ๋Š”์ง€ ๋˜๋Š” ๐Ÿค— Tokenizers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์ง€์›ํ•˜๋Š” ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•˜๋Š”์ง€ ํ™•์ธํ•˜๋ ค๋ฉด, [์ด ํ‘œ](https://huggingface.co/docs/transformers/index#supported-frameworks)๋ฅผ ํ™•์ธํ•˜์„ธ์š”. - -์ด ๊ตฌํ˜„์€ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋กœ ๊ฒ€์ฆ๋˜์—ˆ๊ณ  (์˜ˆ์‹œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”) ์˜ค๋ฆฌ์ง€๋„ ๊ตฌํ˜„์˜ ์„ฑ๋Šฅ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. [๋„ํ๋จผํŠธ](https://huggingface.co/docs/transformers/examples)์˜ Examples ์„น์…˜์—์„œ ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ๋” ์•Œ์•„๋ณด๊ธฐ - -| ์„น์…˜ | ์„ค๋ช… | -|-|-| -| [๋„ํ๋จผํŠธ](https://huggingface.co/transformers/) | ์ „์ฒด API ๋„ํ๋จผํŠธ์™€ ํŠœํ† ๋ฆฌ์–ผ | -| [๊ณผ์ œ ์š”์•ฝ](https://huggingface.co/docs/transformers/task_summary) | ๐Ÿค— Transformers๊ฐ€ ์ง€์›ํ•˜๋Š” ๊ณผ์ œ๋“ค | -| [์ „์ฒ˜๋ฆฌ ํŠœํ† ๋ฆฌ์–ผ](https://huggingface.co/docs/transformers/preprocessing) | `Tokenizer` ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ | -| [ํ•™์Šต๊ณผ fine-tuning](https://huggingface.co/docs/transformers/training) | ๐Ÿค— Transformers๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๋ชจ๋ธ PyTorch/TensorFlow ํ•™์Šต ๊ณผ์ •๊ณผ `Trainer` API์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ | -| [ํ€ต ํˆฌ์–ด: Fine-tuning/์‚ฌ์šฉ ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/transformers/tree/main/examples) | ๋‹ค์–‘ํ•œ ๊ณผ์ œ์—์„œ ๋ชจ๋ธ fine-tuningํ•˜๋Š” ์˜ˆ์‹œ ์Šคํฌ๋ฆฝํŠธ | -| [๋ชจ๋ธ ๊ณต์œ  ๋ฐ ์—…๋กœ๋“œ](https://huggingface.co/docs/transformers/model_sharing) | ์ปค๋ฎค๋‹ˆํ‹ฐ์— fine-tune๋œ ๋ชจ๋ธ์„ ์—…๋กœ๋“œ ๋ฐ ๊ณต์œ ํ•˜๊ธฐ | -| [๋งˆ์ด๊ทธ๋ ˆ์ด์…˜](https://huggingface.co/docs/transformers/migration) | `pytorch-transformers`๋‚˜ `pytorch-pretrained-bert`์—์„œ ๐Ÿค— Transformers๋กœ ์ด๋™ํ•˜๊ธฐ| - -## ์ธ์šฉ - -๐Ÿค— Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ธ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, ์ด [๋…ผ๋ฌธ](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)์„ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_pt-br.md b/README_pt-br.md deleted file mode 100644 index 0c8e8d09e3591a..00000000000000 --- a/README_pt-br.md +++ /dev/null @@ -1,566 +0,0 @@ - - -

- - - - Hugging Face Transformers Library - -
-
-

- -

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - ะ ัƒััะบะธะน | - ะ ortuguรชs | - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

Aprendizado de mรกquina de รบltima geraรงรฃo para JAX, PyTorch e TensorFlow

-

- -

- -

- - -A biblioteca ๐Ÿค— Transformers oferece milhares de modelos prรฉ-treinados para executar tarefas em diferentes modalidades, como texto, visรฃo e รกudio. - -Esses modelos podem ser aplicados a: - -* ๐Ÿ“ Texto, para tarefas como classificaรงรฃo de texto, extraรงรฃo de informaรงรตes, resposta a perguntas, sumarizaรงรฃo, traduรงรฃo, geraรงรฃo de texto, em mais de 100 idiomas. -* ๐Ÿ–ผ๏ธ Imagens, para tarefas como classificaรงรฃo de imagens, detecรงรฃo de objetos e segmentaรงรฃo. -* ๐Ÿ—ฃ๏ธ รudio, para tarefas como reconhecimento de fala e classificaรงรฃo de รกudio. - -Os modelos Transformer tambรฉm podem executar tarefas em diversas modalidades combinadas, como responder a perguntas em tabelas, reconhecimento รณptico de caracteres, extraรงรฃo de informaรงรตes de documentos digitalizados, classificaรงรฃo de vรญdeo e resposta a perguntas visuais. - - -A biblioteca ๐Ÿค— Transformers oferece APIs para baixar e usar rapidamente esses modelos prรฉ-treinados em um texto especรญfico, ajustรก-los em seus prรณprios conjuntos de dados e, em seguida, compartilhรก-los com a comunidade em nosso [model hub](https://huggingface.co/models). Ao mesmo tempo, cada mรณdulo Python que define uma arquitetura รฉ totalmente independente e pode ser modificado para permitir experimentos de pesquisa rรกpidos. - -A biblioteca ๐Ÿค— Transformers รฉ respaldada pelas trรชs bibliotecas de aprendizado profundo mais populares โ€” [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) e [TensorFlow](https://www.tensorflow.org/) โ€” com uma integraรงรฃo perfeita entre elas. ร‰ simples treinar seus modelos com uma delas antes de carregรก-los para inferรชncia com a outra - -## Demonstraรงรฃo Online - -Vocรช pode testar a maioria de nossos modelos diretamente em suas pรกginas a partir do [model hub](https://huggingface.co/models). Tambรฉm oferecemos [hospedagem de modelos privados, versionamento e uma API de inferรชncia](https://huggingface.co/pricing) -para modelos pรบblicos e privados. - -Aqui estรฃo alguns exemplos: - -Em Processamento de Linguagem Natural: - -- [Completar palavra mascarada com BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Reconhecimento de Entidades Nomeadas com Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [Geraรงรฃo de texto com GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C) -- [Inferรชncia de Linguagem Natural com RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [Sumarizaรงรฃo com BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [Resposta a perguntas com DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [Traduรงรฃo com T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - - -Em Visรฃo Computacional: -- [Classificaรงรฃo de Imagens com ViT](https://huggingface.co/google/vit-base-patch16-224) -- [Detecรงรฃo de Objetos com DETR](https://huggingface.co/facebook/detr-resnet-50) -- [Segmentaรงรฃo Semรขntica com SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [Segmentaรงรฃo Panรณptica com MaskFormer](https://huggingface.co/facebook/maskformer-swin-small-coco) -- [Estimativa de Profundidade com DPT](https://huggingface.co/docs/transformers/model_doc/dpt) -- [Classificaรงรฃo de Vรญdeo com VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae) -- [Segmentaรงรฃo Universal com OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large) - - -Em รudio: -- [Reconhecimento Automรกtico de Fala com Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) -- [Detecรงรฃo de Palavras-Chave com Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) -- [Classificaรงรฃo de รudio com Transformer de Espectrograma de รudio](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) - -Em Tarefas Multimodais: -- [Respostas de Perguntas em Tabelas com TAPAS](https://huggingface.co/google/tapas-base-finetuned-wtq) -- [Respostas de Perguntas Visuais com ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) -- [Classificaรงรฃo de Imagens sem Anotaรงรฃo com CLIP](https://huggingface.co/openai/clip-vit-large-patch14) -- [Respostas de Perguntas em Documentos com LayoutLM](https://huggingface.co/impira/layoutlm-document-qa) -- [Classificaรงรฃo de Vรญdeo sem Anotaรงรฃo com X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip) - -## 100 Projetos Usando Transformers - -Transformers รฉ mais do que um conjunto de ferramentas para usar modelos prรฉ-treinados: รฉ uma comunidade de projetos construรญdos ao seu redor e o Hugging Face Hub. Queremos que o Transformers permita que desenvolvedores, pesquisadores, estudantes, professores, engenheiros e qualquer outra pessoa construa seus projetos dos sonhos. - -Para celebrar as 100.000 estrelas do Transformers, decidimos destacar a comunidade e criamos a pรกgina [awesome-transformers](./awesome-transformers.md), que lista 100 projetos incrรญveis construรญdos nas proximidades dos Transformers. - -Se vocรช possui ou utiliza um projeto que acredita que deveria fazer parte da lista, abra um PR para adicionรก-lo! - -## Se vocรช estรก procurando suporte personalizado da equipe Hugging Face - - - HuggingFace Expert Acceleration Program -
- - -## Tour Rรกpido - -Para usar imediatamente um modelo em uma entrada especรญfica (texto, imagem, รกudio, ...), oferecemos a API `pipeline`. Os pipelines agrupam um modelo prรฉ-treinado com o prรฉ-processamento que foi usado durante o treinamento desse modelo. Aqui estรก como usar rapidamente um pipeline para classificar textos como positivos ou negativos: - -```python -from transformers import pipeline - -# Carregue o pipeline de classificaรงรฃo de texto ->>> classifier = pipeline("sentiment-analysis") - -# Classifique o texto como positivo ou negativo ->>> classifier("Estamos muito felizes em apresentar o pipeline no repositรณrio dos transformers.") -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -A segunda linha de cรณdigo baixa e armazena em cache o modelo prรฉ-treinado usado pelo pipeline, enquanto a terceira linha o avalia no texto fornecido. Neste exemplo, a resposta รฉ "positiva" com uma confianรงa de 99,97%. - -Muitas tarefas tรชm um `pipeline` prรฉ-treinado pronto para uso, nรฃo apenas em PNL, mas tambรฉm em visรฃo computacional e processamento de รกudio. Por exemplo, podemos facilmente extrair objetos detectados em uma imagem: - -``` python ->>> import requests ->>> from PIL import Image ->>> from transformers import pipeline - -# Download an image with cute cats ->>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" ->>> image_data = requests.get(url, stream=True).raw ->>> image = Image.open(image_data) - -# Allocate a pipeline for object detection ->>> object_detector = pipeline('object-detection') ->>> object_detector(image) -[{'score': 0.9982201457023621, - 'label': 'remote', - 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, - {'score': 0.9960021376609802, - 'label': 'remote', - 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, - {'score': 0.9954745173454285, - 'label': 'couch', - 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, - {'score': 0.9988006353378296, - 'label': 'cat', - 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, - {'score': 0.9986783862113953, - 'label': 'cat', - 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] -``` - - -Aqui obtemos uma lista de objetos detectados na imagem, com uma caixa envolvendo o objeto e uma pontuaรงรฃo de confianรงa. Aqui estรก a imagem original ร  esquerda, com as previsรตes exibidas ร  direita: - -

- - -

- -Vocรช pode aprender mais sobre as tarefas suportadas pela API `pipeline` em [este tutorial](https://huggingface.co/docs/transformers/task_summary). - - -Alรฉm do `pipeline`, para baixar e usar qualquer um dos modelos prรฉ-treinados em sua tarefa especรญfica, tudo o que รฉ necessรกrio sรฃo trรชs linhas de cรณdigo. Aqui estรก a versรฃo em PyTorch: - -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` - -E aqui estรก o cรณdigo equivalente para TensorFlow: - -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -O tokenizador รฉ responsรกvel por todo o prรฉ-processamento que o modelo prรฉ-treinado espera, e pode ser chamado diretamente em uma รบnica string (como nos exemplos acima) ou em uma lista. Ele produzirรก um dicionรกrio que vocรช pode usar no cรณdigo subsequente ou simplesmente passar diretamente para o seu modelo usando o operador de descompactaรงรฃo de argumentos **. - -O modelo em si รฉ um [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) ou um [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)(dependendo do seu back-end) que vocรช pode usar como de costume. [Este tutorial](https://huggingface.co/docs/transformers/training) explica como integrar esse modelo em um ciclo de treinamento clรกssico do PyTorch ou TensorFlow, ou como usar nossa API `Trainer` para ajuste fino rรกpido em um novo conjunto de dados. - -## Por que devo usar transformers? - -1. Modelos state-of-the-art fรกceis de usar: - - Alto desempenho em compreensรฃo e geraรงรฃo de linguagem natural, visรฃo computacional e tarefas de รกudio. - - Barreira de entrada baixa para educadores e profissionais. - - Poucas abstraรงรตes visรญveis para o usuรกrio, com apenas trรชs classes para aprender. - - Uma API unificada para usar todos os nossos modelos prรฉ-treinados. - -1. Menores custos de computaรงรฃo, menor pegada de carbono: - - Pesquisadores podem compartilhar modelos treinados em vez de treinar sempre do zero. - - Profissionais podem reduzir o tempo de computaรงรฃo e os custos de produรงรฃo. - - Dezenas de arquiteturas com mais de 60.000 modelos prรฉ-treinados em todas as modalidades. - -1. Escolha o framework certo para cada parte da vida de um modelo: - - Treine modelos state-of-the-art em 3 linhas de cรณdigo. - - Mova um รบnico modelo entre frameworks TF2.0/PyTorch/JAX ร  vontade. - - Escolha o framework certo de forma contรญnua para treinamento, avaliaรงรฃo e produรงรฃo. - -1. Personalize facilmente um modelo ou um exemplo para atender ร s suas necessidades: - - Fornecemos exemplos para cada arquitetura para reproduzir os resultados publicados pelos autores originais. - - Os detalhes internos do modelo sรฃo expostos de maneira consistente. - - Os arquivos do modelo podem ser usados de forma independente da biblioteca para experimentos rรกpidos. - -## Por que nรฃo devo usar transformers? - -- Esta biblioteca nรฃo รฉ uma caixa de ferramentas modular para construir redes neurais. O cรณdigo nos arquivos do modelo nรฃo รฉ refatorado com abstraรงรตes adicionais de propรณsito, para que os pesquisadores possam iterar rapidamente em cada um dos modelos sem se aprofundar em abstraรงรตes/arquivos adicionais. -- A API de treinamento nรฃo รฉ projetada para funcionar com qualquer modelo, mas รฉ otimizada para funcionar com os modelos fornecidos pela biblioteca. Para loops de aprendizado de mรกquina genรฉricos, vocรช deve usar outra biblioteca (possivelmente, [Accelerate](https://huggingface.co/docs/accelerate)). -- Embora nos esforcemos para apresentar o maior nรบmero possรญvel de casos de uso, os scripts em nossa [pasta de exemplos](https://github.com/huggingface/transformers/tree/main/examples) sรฃo apenas isso: exemplos. ร‰ esperado que eles nรฃo funcionem prontos para uso em seu problema especรญfico e que seja necessรกrio modificar algumas linhas de cรณdigo para adaptรก-los ร s suas necessidades. - - - -### Com pip - -Este repositรณrio รฉ testado no Python 3.8+, Flax 0.4.1+, PyTorch 1.10+ e TensorFlow 2.6+. - -Vocรช deve instalar o ๐Ÿค— Transformers em um [ambiente virtual](https://docs.python.org/3/library/venv.html). Se vocรช nรฃo estรก familiarizado com ambientes virtuais em Python, confira o [guia do usuรกrio](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). - -Primeiro, crie um ambiente virtual com a versรฃo do Python que vocรช vai usar e ative-o. - -Em seguida, vocรช precisarรก instalar pelo menos um dos back-ends Flax, PyTorch ou TensorFlow. -Consulte a [pรกgina de instalaรงรฃo do TensorFlow](https://www.tensorflow.org/install/), a [pรกgina de instalaรงรฃo do PyTorch](https://pytorch.org/get-started/locally/#start-locally) e/ou [Flax](https://github.com/google/flax#quick-install) e [Jax](https://github.com/google/jax#installation) pรกginas de instalaรงรฃo para obter o comando de instalaรงรฃo especรญfico para a sua plataforma. - -Quando um desses back-ends estiver instalado, o ๐Ÿค— Transformers pode ser instalado usando pip da seguinte forma: - -```bash -pip install transformers -``` -Se vocรช deseja experimentar com os exemplos ou precisa da versรฃo mais recente do cรณdigo e nรฃo pode esperar por um novo lanรงamento, vocรช deve instalar a [biblioteca a partir do cรณdigo-fonte](https://huggingface.co/docs/transformers/installation#installing-from-source). - -### Com conda - -Desde a versรฃo v4.0.0 do Transformers, agora temos um canal conda: `huggingface`. - -O ๐Ÿค— Transformers pode ser instalado com conda da seguinte forma: - -```bash -conda install -c huggingface transformers -``` - -Siga as pรกginas de instalaรงรฃo do Flax, PyTorch ou TensorFlow para ver como instalรก-los com conda. - -Siga as pรกginas de instalaรงรฃo do Flax, PyTorch ou TensorFlow para ver como instalรก-los com o conda. - -> **_NOTA:_** No Windows, vocรช pode ser solicitado a ativar o Modo de Desenvolvedor para aproveitar o cache. Se isso nรฃo for uma opรงรฃo para vocรช, por favor nos avise [neste problema](https://github.com/huggingface/huggingface_hub/issues/1062). - -## Arquiteturas de Modelos - -**[Todos os pontos de verificaรงรฃo de modelo](https://huggingface.co/models)** fornecidos pelo ๐Ÿค— Transformers sรฃo integrados de forma transparente do [model hub](https://huggingface.co/models) do huggingface.co, onde sรฃo carregados diretamente por [usuรกrios](https://huggingface.co/users) e [organizaรงรตes](https://huggingface.co/organizations). - -Nรบmero atual de pontos de verificaรงรฃo: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— Transformers atualmente fornece as seguintes arquiteturas (veja [aqui](https://huggingface.co/docs/transformers/model_summary) para um resumo de alto nรญvel de cada uma delas): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Meta/USC/CMU/SJTU) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the repository [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA) for Approximate Self-Attention](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng), released on [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) rreleased with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, -Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. - -1. Quer contribuir com um novo modelo? Adicionamos um **guia detalhado e modelos de exemplo** para orientar vocรช no processo de adiรงรฃo de um novo modelo. Vocรช pode encontrรก-los na pasta [`templates`](./templates) do repositรณrio. Certifique-se de verificar as [diretrizes de contribuiรงรฃo](./CONTRIBUTING.md) e entrar em contato com os mantenedores ou abrir uma issue para coletar feedback antes de iniciar sua PR. - -Para verificar se cada modelo tem uma implementaรงรฃo em Flax, PyTorch ou TensorFlow, ou possui um tokenizador associado com a biblioteca ๐Ÿค— Tokenizers, consulte [esta tabela](https://huggingface.co/docs/transformers/index#supported-frameworks). - -Essas implementaรงรตes foram testadas em vรกrios conjuntos de dados (veja os scripts de exemplo) e devem corresponder ao desempenho das implementaรงรตes originais. Vocรช pode encontrar mais detalhes sobre o desempenho na seรงรฃo de Exemplos da [documentaรงรฃo](https://github.com/huggingface/transformers/tree/main/examples). - - -## Saiba mais - -| Seรงรฃo | Descriรงรฃo | -|-|-| -| [Documentaรงรฃo](https://huggingface.co/docs/transformers/) | Documentaรงรฃo completa da API e tutoriais | -| [Resumo de Tarefas](https://huggingface.co/docs/transformers/task_summary) | Tarefas suportadas pelo ๐Ÿค— Transformers | -| [Tutorial de Prรฉ-processamento](https://huggingface.co/docs/transformers/preprocessing) | Usando a classe `Tokenizer` para preparar dados para os modelos | -| [Treinamento e Ajuste Fino](https://huggingface.co/docs/transformers/training) | Usando os modelos fornecidos pelo ๐Ÿค— Transformers em um loop de treinamento PyTorch/TensorFlow e a API `Trainer` | -| [Tour Rรกpido: Scripts de Ajuste Fino/Utilizaรงรฃo](https://github.com/huggingface/transformers/tree/main/examples) | Scripts de exemplo para ajuste fino de modelos em uma ampla gama de tarefas | -| [Compartilhamento e Envio de Modelos](https://huggingface.co/docs/transformers/model_sharing) | Envie e compartilhe seus modelos ajustados com a comunidade | - -## Citaรงรฃo - -Agora temos um [artigo](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) que vocรช pode citar para a biblioteca ๐Ÿค— Transformers: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = out, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_ru.md b/README_ru.md deleted file mode 100644 index bc32f2f0b44cf1..00000000000000 --- a/README_ru.md +++ /dev/null @@ -1,553 +0,0 @@ - - -

- - - - Hugging Face Transformers Library - -
-
-

- -

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - ะ ัƒััะบะธะน - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

ะกะพะฒั€ะตะผะตะฝะฝะพะต ะผะฐัˆะธะฝะฝะพะต ะพะฑัƒั‡ะตะฝะธะต ะดะปั JAX, PyTorch ะธ TensorFlow

-

- -

- -

- -๐Ÿค— Transformers ะฟั€ะตะดะพัั‚ะฐะฒะปัะตั‚ ั‚ั‹ััั‡ะธ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน ะดะปั ะฒั‹ะฟะพะปะฝะตะฝะธั ั€ะฐะทะปะธั‡ะฝั‹ั… ะทะฐะดะฐั‡, ั‚ะฐะบะธั… ะบะฐะบ ั‚ะตะบัั‚, ะทั€ะตะฝะธะต ะธ ะฐัƒะดะธะพ. - -ะญั‚ะธ ะผะพะดะตะปะธ ะผะพะณัƒั‚ ะฑั‹ั‚ัŒ ะฟั€ะธะผะตะฝะตะฝั‹ ะบ: - -* ๐Ÿ“ ะขะตะบัั‚ัƒ ะดะปั ั‚ะฐะบะธั… ะทะฐะดะฐั‡, ะบะฐะบ ะบะปะฐััะธั„ะธะบะฐั†ะธั ั‚ะตะบัั‚ะพะฒ, ะธะทะฒะปะตั‡ะตะฝะธะต ะธะฝั„ะพั€ะผะฐั†ะธะธ, ะพั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะพะฟั€ะพัั‹, ะพะฑะพะฑั‰ะตะฝะธะต, ะฟะตั€ะตะฒะพะด, ะณะตะฝะตั€ะฐั†ะธั ั‚ะตะบัั‚ะพะฒ ะฝะฐ ะฑะพะปะตะต ั‡ะตะผ 100 ัะทั‹ะบะฐั…. -* ๐Ÿ–ผ๏ธ ะ˜ะทะพะฑั€ะฐะถะตะฝะธัะผ ะดะปั ะทะฐะดะฐั‡ ะบะปะฐััะธั„ะธะบะฐั†ะธะธ ะธะทะพะฑั€ะฐะถะตะฝะธะน, ะพะฑะฝะฐั€ัƒะถะตะฝะธั ะพะฑัŠะตะบั‚ะพะฒ ะธ ัะตะณะผะตะฝั‚ะฐั†ะธะธ. -* ๐Ÿ—ฃ๏ธ ะัƒะดะธะพ ะดะปั ะทะฐะดะฐั‡ ั€ะฐัะฟะพะทะฝะฐะฒะฐะฝะธั ั€ะตั‡ะธ ะธ ะบะปะฐััะธั„ะธะบะฐั†ะธะธ ะฐัƒะดะธะพ. - -ะœะพะดะตะปะธ transformers ั‚ะฐะบะถะต ะผะพะณัƒั‚ ะฒั‹ะฟะพะปะฝัั‚ัŒ ะฝะตัะบะพะปัŒะบะพ ะทะฐะดะฐั‡, ั‚ะฐะบะธะต ะบะฐะบ ะพั‚ะฒะตั‚ั‹ ะฝะฐ ั‚ะฐะฑะปะธั‡ะฝั‹ะต ะฒะพะฟั€ะพัั‹, ั€ะฐัะฟะพะทะฝะฐะฒะฐะฝะธะต ะพะฟั‚ะธั‡ะตัะบะธั… ัะธะผะฒะพะปะพะฒ, ะธะทะฒะปะตั‡ะตะฝะธะต ะธะฝั„ะพั€ะผะฐั†ะธะธ ะธะท ะพั‚ัะบะฐะฝะธั€ะพะฒะฐะฝะฝั‹ั… ะดะพะบัƒะผะตะฝั‚ะพะฒ, ะบะปะฐััะธั„ะธะบะฐั†ะธั ะฒะธะดะตะพ ะธ ะพั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะธะทัƒะฐะปัŒะฝั‹ะต ะฒะพะฟั€ะพัั‹. - -๐Ÿค— Transformers ะฟั€ะตะดะพัั‚ะฐะฒะปัะตั‚ API ะดะปั ะฑั‹ัั‚ั€ะพะน ะทะฐะณั€ัƒะทะบะธ ะธ ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน, ะธั… ั‚ะพะฝะบะพะน ะฝะฐัั‚ั€ะพะนะบะธ ะฝะฐ ัะพะฑัั‚ะฒะตะฝะฝั‹ั… ะดะฐั‚ะฐัะตั‚ะฐั… ะธ ะฟะพัะปะตะดัƒัŽั‰ะตะณะพ ะฒะทะฐะธะผะพะดะตะนัั‚ะฒะธั ะธะผะธ ั ัะพะพะฑั‰ะตัั‚ะฒะพะผ ะฝะฐ ะฝะฐัˆะตะผ [ัะฐะนั‚ะต](https://huggingface.co/models). ะ’ ั‚ะพ ะถะต ะฒั€ะตะผั ะบะฐะถะดั‹ะน python ะผะพะดัƒะปัŒ, ะพะฟั€ะตะดะตะปััŽั‰ะธะน ะฐั€ั…ะธั‚ะตะบั‚ัƒั€ัƒ, ะฟะพะปะฝะพัั‚ัŒัŽ ะฐะฒั‚ะพะฝะพะผะตะฝ ะธ ะผะพะถะตั‚ ะฑั‹ั‚ัŒ ะผะพะดะธั„ะธั†ะธั€ะพะฒะฐะฝ ะดะปั ะฟั€ะพะฒะตะดะตะฝะธั ะฑั‹ัั‚ั€ั‹ั… ะธััะปะตะดะพะฒะฐั‚ะตะปัŒัะบะธั… ัะบัะฟะตั€ะธะผะตะฝั‚ะพะฒ. - -๐Ÿค— Transformers ะพะฟะธั€ะฐะตั‚ัั ะฝะฐ ั‚ั€ะธ ัะฐะผั‹ะต ะฟะพะฟัƒะปัั€ะฝั‹ะต ะฑะธะฑะปะธะพั‚ะตะบะธ ะณะปัƒะฑะพะบะพะณะพ ะพะฑัƒั‡ะตะฝะธั - [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) ะธ [TensorFlow](https://www.tensorflow.org/) - ะธ ะปะตะณะบะพ ะธะฝั‚ะตะณั€ะธั€ัƒะตั‚ัั ะผะตะถะดัƒ ะฝะธะผะธ. ะญั‚ะพ ะฟะพะทะฒะพะปัะตั‚ ะปะตะณะบะพ ะพะฑัƒั‡ะฐั‚ัŒ ะผะพะดะตะปะธ ั ะฟะพะผะพั‰ัŒัŽ ะพะดะฝะพะน ะธะท ะฝะธั…, ะฐ ะทะฐั‚ะตะผ ะทะฐะณั€ัƒะถะฐั‚ัŒ ะธั… ะดะปั ะฒั‹ะฒะพะดะพะฒ ั ะฟะพะผะพั‰ัŒัŽ ะดั€ัƒะณะพะน. - -## ะžะฝะปะฐะนะฝ ะดะตะผะพะฝัั‚ั€ะฐั†ะธั - -ะ‘ะพะปัŒัˆะธะฝัั‚ะฒะพ ะฝะฐัˆะธั… ะผะพะดะตะปะตะน ะผะพะถะฝะพ ะฟั€ะพั‚ะตัั‚ะธั€ะพะฒะฐั‚ัŒ ะฝะตะฟะพัั€ะตะดัั‚ะฒะตะฝะฝะพ ะฝะฐ ะธั… ัั‚ั€ะฐะฝะธั†ะฐั… ั [ัะฐะนั‚ะฐ](https://huggingface.co/models). ะœั‹ ั‚ะฐะบะถะต ะฟั€ะตะดะปะฐะณะฐะตะผ [ะฟั€ะธะฒั‚ะฐะฝั‹ะน ั…ะพัั‚ะธะฝะณ ะผะพะดะตะปะตะน, ะบะพะฝั‚ั€ะพะปัŒ ะฒะตั€ัะธะน ะธ API ะดะปั ะฒั‹ะฒะพะดะพะฒ](https://huggingface.co/pricing) ะดะปั ะฟัƒะฑะปะธั‡ะฝั‹ั… ะธ ั‡ะฐัั‚ะฝั‹ั… ะผะพะดะตะปะตะน. - -ะ’ะพั‚ ะฝะตัะบะพะปัŒะบะพ ะฟั€ะธะผะตั€ะพะฒ: - -ะ’ ะพะฑะปะฐัั‚ะธ NLP ( ะžะฑั€ะฐะฑะพั‚ะบะฐ ั‚ะตะบัั‚ะพะฒ ะฝะฐ ะตัั‚ะตัั‚ะฒะตะฝะฝะพะผ ัะทั‹ะบะต ): -- [ะœะฐัะบะธั€ะพะฒะฐะฝะฝะพะต ะทะฐะฟะพะปะฝะตะฝะธะต ัะปะพะฒ ั ะฟะพะผะพั‰ัŒัŽ BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [ะ ะฐัะฟะพะทะฝะฐะฒะฐะฝะธะต ััƒั‰ะฝะพัั‚ะตะน ั ะฟะพะผะพั‰ัŒัŽ Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [ะ“ะตะฝะตั€ะฐั†ะธั ั‚ะตะบัั‚ะฐ ั ะฟะพะผะพั‰ัŒัŽ GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [ะ’ั‹ะฒะพะดั‹ ะฝะฐ ะตัั‚ะตัั‚ะฒะตะฝะฝะพะผ ัะทั‹ะบะต ั ะฟะพะผะพั‰ัŒัŽ RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [ะžะฑะพะฑั‰ะตะฝะธะต ั ะฟะพะผะพั‰ัŒัŽ BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [ะžั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะพะฟั€ะพัั‹ ั ะฟะพะผะพั‰ัŒัŽ DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [ะŸะตั€ะตะฒะพะด ั ะฟะพะผะพั‰ัŒัŽ T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -ะ’ ะพะฑะปะฐัั‚ะธ ะบะพะผะฟัŒัŽั‚ะตั€ะฝะพะณะพ ะทั€ะตะฝะธั: -- [ะšะปะฐััะธั„ะธะบะฐั†ะธั ะธะทะพะฑั€ะฐะถะตะฝะธะน ั ะฟะพะผะพั‰ัŒัŽ ViT](https://huggingface.co/google/vit-base-patch16-224) -- [ะžะฑะฝะฐั€ัƒะถะตะฝะธะต ะพะฑัŠะตะบั‚ะพะฒ ั ะฟะพะผะพั‰ัŒัŽ DETR](https://huggingface.co/facebook/detr-resnet-50) -- [ะกะตะผะฐะฝั‚ะธั‡ะตัะบะฐั ัะตะณะผะตะฝั‚ะฐั†ะธั ั ะฟะพะผะพั‰ัŒัŽ SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [ะกะตะณะผะตะฝั‚ะฐั†ะธั ะฟะฐะฝะพะฟั‚ะธะบัƒะผะฐ ั ะฟะพะผะพั‰ัŒัŽ MaskFormer](https://huggingface.co/facebook/maskformer-swin-small-coco) -- [ะžั†ะตะฝะบะฐ ะณะปัƒะฑะธะฝั‹ ั ะฟะพะผะพั‰ัŒัŽ DPT](https://huggingface.co/docs/transformers/model_doc/dpt) -- [ะšะปะฐััะธั„ะธะบะฐั†ะธั ะฒะธะดะตะพ ั ะฟะพะผะพั‰ัŒัŽ VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae) -- [ะฃะฝะธะฒะตั€ัะฐะปัŒะฝะฐั ัะตะณะผะตะฝั‚ะฐั†ะธั ั ะฟะพะผะพั‰ัŒัŽ OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large) - -ะ’ ะพะฑะปะฐัั‚ะธ ะทะฒัƒะบะฐ: -- [ะะฒั‚ะพะผะฐั‚ะธั‡ะตัะบะพะต ั€ะฐัะฟะพะทะฝะฐะฒะฐะฝะธะต ั€ะตั‡ะธ ั ะฟะพะผะพั‰ัŒัŽ Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) -- [ะŸะพะธัะบ ะบะปัŽั‡ะตะฒั‹ั… ัะปะพะฒ ั ะฟะพะผะพั‰ัŒัŽ Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) -- [ะšะปะฐััะธั„ะธะบะฐั†ะธั ะฐัƒะดะธะพะดะฐะฝะฝั‹ั… ั ะฟะพะผะพั‰ัŒัŽ ั‚ั€ะฐัะฝั„ะพั€ะผะตั€ะฐ ะฐัƒะดะธะพัะฟะตะบั‚ั€ะพะณั€ะฐะผะผ](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) - -ะ’ ะผัƒะปัŒั‚ะธะผะพะดะฐะปัŒะฝั‹ั… ะทะฐะดะฐั‡ะฐั…: -- [ะžั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะพะฟั€ะพัั‹ ะฟะพ ั‚ะฐะฑะปะธั†ะต ั ะฟะพะผะพั‰ัŒัŽ TAPAS](https://huggingface.co/google/tapas-base-finetuned-wtq) -- [ะ’ะธะทัƒะฐะปัŒะฝั‹ะต ะพั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะพะฟั€ะพัั‹ ั ะฟะพะผะพั‰ัŒัŽ ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) -- [Zero-shot ะบะปะฐััะธั„ะธะบะฐั†ะธั ะธะทะพะฑั€ะฐะถะตะฝะธะน ั ะฟะพะผะพั‰ัŒัŽ CLIP](https://huggingface.co/openai/clip-vit-large-patch14) -- [ะžั‚ะฒะตั‚ั‹ ะฝะฐ ะฒะพะฟั€ะพัั‹ ะฟะพ ะดะพะบัƒะผะตะฝั‚ะฐะผ ั ะฟะพะผะพั‰ัŒัŽ LayoutLM](https://huggingface.co/impira/layoutlm-document-qa) -- [Zero-shot ะบะปะฐััะธั„ะธะบะฐั†ะธั ะฒะธะดะตะพ ั ะฟะพะผะพั‰ัŒัŽ X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip) - - -## 100 ะฟั€ะพะตะบั‚ะพะฒ, ะธัะฟะพะปัŒะทัƒัŽั‰ะธั… Transformers - -Transformers - ัั‚ะพ ะฝะต ะฟั€ะพัั‚ะพ ะฝะฐะฑะพั€ ะธะฝัั‚ั€ัƒะผะตะฝั‚ะพะฒ ะดะปั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน: ัั‚ะพ ัะพะพะฑั‰ะตัั‚ะฒะพ ะฟั€ะพะตะบั‚ะพะฒ, ัะพะทะดะฐะฝะฝะพะต ะฝะฐ ะตะณะพ ะพัะฝะพะฒะต, ะธ -Hugging Face Hub. ะœั‹ ั…ะพั‚ะธะผ, ั‡ั‚ะพะฑั‹ Transformers ะฟะพะทะฒะพะปะธะป ั€ะฐะทั€ะฐะฑะพั‚ั‡ะธะบะฐะผ, ะธััะปะตะดะพะฒะฐั‚ะตะปัะผ, ัั‚ัƒะดะตะฝั‚ะฐะผ, ะฟั€ะพั„ะตััะพั€ะฐะผ, ะธะฝะถะตะฝะตั€ะฐะผ ะธ ะฒัะตะผ ะถะตะปะฐัŽั‰ะธะผ -ัะพะทะดะฐะฒะฐั‚ัŒ ะฟั€ะพะตะบั‚ั‹ ัะฒะพะตะน ะผะตั‡ั‚ั‹. - -ะงั‚ะพะฑั‹ ะพั‚ะฟั€ะฐะทะดะฝะพะฒะฐั‚ัŒ 100 ั‚ั‹ััั‡ ะทะฒะตะทะด Transformers, ะผั‹ ั€ะตัˆะธะปะธ ัะดะตะปะฐั‚ัŒ ะฐะบั†ะตะฝั‚ ะฝะฐ ัะพะพะฑั‰ะตัั‚ะฒะต, ะธ ัะพะทะดะฐะปะธ ัั‚ั€ะฐะฝะธั†ัƒ [awesome-transformers](./awesome-transformers.md), ะฝะฐ ะบะพั‚ะพั€ะพะน ะฟะตั€ะตั‡ะธัะปะตะฝั‹ 100 -ะฝะตะฒะตั€ะพัั‚ะฝั‹ั… ะฟั€ะพะตะบั‚ะพะฒ, ัะพะทะดะฐะฝะฝั‹ั… ั ะฟะพะผะพั‰ัŒัŽ transformers. - -ะ•ัะปะธ ะฒั‹ ัะฒะปัะตั‚ะตััŒ ะฒะปะฐะดะตะปัŒั†ะตะผ ะธะปะธ ะฟะพะปัŒะทะพะฒะฐั‚ะตะปะตะผ ะฟั€ะพะตะบั‚ะฐ, ะบะพั‚ะพั€ั‹ะน, ะฟะพ ะฒะฐัˆะตะผัƒ ะผะฝะตะฝะธัŽ, ะดะพะปะถะตะฝ ะฑั‹ั‚ัŒ ะฒะบะปัŽั‡ะตะฝ ะฒ ัั‚ะพั‚ ัะฟะธัะพะบ, ะฟะพะถะฐะปัƒะนัั‚ะฐ, ะพั‚ะบั€ะพะนั‚ะต PR ะดะปั ะตะณะพ ะดะพะฑะฐะฒะปะตะฝะธั! - -## ะ•ัะปะธ ะฒั‹ ั…ะพั‚ะธั‚ะต ะฟะพะปัƒั‡ะธั‚ัŒ ะธะฝะดะธะฒะธะดัƒะฐะปัŒะฝัƒัŽ ะฟะพะดะดะตั€ะถะบัƒ ะพั‚ ะบะพะผะฐะฝะดั‹ Hugging Face - - - HuggingFace Expert Acceleration Program -
- -## ะ‘ั‹ัั‚ั€ั‹ะน ะณะฐะนะด - -ะ”ะปั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะพะดะตะปะธ ะฝะฐ ะทะฐะดะฐะฝะฝะพะผ ะฒั…ะพะดะต (ั‚ะตะบัั‚, ะธะทะพะฑั€ะฐะถะตะฝะธะต, ะทะฒัƒะบ, ...) ะผั‹ ะฟั€ะตะดะพัั‚ะฐะฒะปัะตะผ API `pipeline`. ะšะพะฝะฒะตะนะตั€ั‹ ะพะฑัŠะตะดะธะฝััŽั‚ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝัƒัŽ ะผะพะดะตะปัŒ ั ะฟั€ะตะฟั€ะพั†ะตััะธะฝะณะพะผ, ะบะพั‚ะพั€ั‹ะน ะธัะฟะพะปัŒะทะพะฒะฐะปัั ะฟั€ะธ ะตะต ะพะฑัƒั‡ะตะฝะธะธ. ะ’ะพั‚ ะบะฐะบ ะผะพะถะฝะพ ะฑั‹ัั‚ั€ะพ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะบะพะฝะฒะตะนะตั€ ะดะปั ะบะปะฐััะธั„ะธะบะฐั†ะธะธ ะฟะพะปะพะถะธั‚ะตะปัŒะฝั‹ั… ะธ ะพั‚ั€ะธั†ะฐั‚ะตะปัŒะฝั‹ั… ั‚ะตะบัั‚ะพะฒ: - -```python ->>> from transformers import pipeline - -# ะ’ั‹ะดะตะปะตะฝะธะต ะบะพะฝะฒะตะนะตั€ะฐ ะดะปั ะฐะฝะฐะปะธะทะฐ ะฝะฐัั‚ั€ะพะตะฝะธะน ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('ะœั‹ ะพั‡ะตะฝัŒ ั€ะฐะดั‹ ะฟั€ะตะดัั‚ะฐะฒะธั‚ัŒ ะบะพะฝะฒะตะนะตั€ ะฒ transformers.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -ะ’ั‚ะพั€ะฐั ัั‚ั€ะพะบะฐ ะบะพะดะฐ ะทะฐะณั€ัƒะถะฐะตั‚ ะธ ะบััˆะธั€ัƒะตั‚ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝัƒัŽ ะผะพะดะตะปัŒ, ะธัะฟะพะปัŒะทัƒะตะผัƒัŽ ะบะพะฝะฒะตะนะตั€ะพะผ, ะฐ ั‚ั€ะตั‚ัŒั ะพั†ะตะฝะธะฒะฐะตั‚ ะตะต ะฝะฐ ะทะฐะดะฐะฝะฝะพะผ ั‚ะตะบัั‚ะต. ะ—ะดะตััŒ ะพั‚ะฒะตั‚ "POSITIVE" ั ัƒะฒะตั€ะตะฝะฝะพัั‚ัŒัŽ 99,97%. - -ะ’ะพ ะผะฝะพะณะธั… ะทะฐะดะฐั‡ะฐั…, ะบะฐะบ ะฒ ะะ›ะŸ, ั‚ะฐะบ ะธ ะฒ ะบะพะผะฟัŒัŽั‚ะตั€ะฝะพะผ ะทั€ะตะฝะธะธ ะธ ั€ะตั‡ะธ, ัƒะถะต ะตัั‚ัŒ ะณะพั‚ะพะฒั‹ะน `pipeline`. ะะฐะฟั€ะธะผะตั€, ะผั‹ ะผะพะถะตะผ ะปะตะณะบะพ ะธะทะฒะปะตั‡ัŒ ะพะฑะฝะฐั€ัƒะถะตะฝะฝั‹ะต ะพะฑัŠะตะบั‚ั‹ ะฝะฐ ะธะทะพะฑั€ะฐะถะตะฝะธะธ: - -``` python ->>> import requests ->>> from PIL import Image ->>> from transformers import pipeline - -# ะกะบะฐั‡ะธะฒะฐะตะผ ะธะทะพะฑั€ะฐะถะตะฝะธะต ั ะผะธะปั‹ะผะธ ะบะพั‚ะธะบะฐะผะธ ->>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" ->>> image_data = requests.get(url, stream=True).raw ->>> image = Image.open(image_data) - -# ะ’ั‹ะดะตะปะตะฝะธะต ะบะพะฝะฒะตะนะตั€ะฐ ะดะปั ะพะฑะฝะฐั€ัƒะถะตะฝะธั ะพะฑัŠะตะบั‚ะพะฒ ->>> object_detector = pipeline('object-detection') ->>> object_detector(image) -[{'score': 0.9982201457023621, - 'label': 'remote', - 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, - {'score': 0.9960021376609802, - 'label': 'remote', - 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, - {'score': 0.9954745173454285, - 'label': 'couch', - 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, - {'score': 0.9988006353378296, - 'label': 'cat', - 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, - {'score': 0.9986783862113953, - 'label': 'cat', - 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] -``` - -ะ—ะดะตััŒ ะผั‹ ะฟะพะปัƒั‡ะฐะตะผ ัะฟะธัะพะบ ะพะฑัŠะตะบั‚ะพะฒ, ะพะฑะฝะฐั€ัƒะถะตะฝะฝั‹ั… ะฝะฐ ะธะทะพะฑั€ะฐะถะตะฝะธะธ, ั ั€ะฐะผะบะพะน ะฒะพะบั€ัƒะณ ะพะฑัŠะตะบั‚ะฐ ะธ ะพั†ะตะฝะบะพะน ะดะพัั‚ะพะฒะตั€ะฝะพัั‚ะธ. ะกะปะตะฒะฐ - ะธัั…ะพะดะฝะพะต ะธะทะพะฑั€ะฐะถะตะฝะธะต, ัะฟั€ะฐะฒะฐ ะฟั€ะพะณะฝะพะทั‹: - -

- - -

- -ะŸะพะดั€ะพะฑะฝะตะต ะพ ะทะฐะดะฐั‡ะฐั…, ะฟะพะดะดะตั€ะถะธะฒะฐะตะผั‹ั… API `pipeline`, ะผะพะถะฝะพ ัƒะทะฝะฐั‚ัŒ ะฒ [ัั‚ะพะผ ัƒั‡ะตะฑะฝะพะผ ะฟะพัะพะฑะธะธ](https://huggingface.co/docs/transformers/task_sum) - -ะ’ ะดะพะฟะพะปะฝะตะฝะธะต ะบ `pipeline`, ะดะปั ะทะฐะณั€ัƒะทะบะธ ะธ ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะปัŽะฑะพะน ะธะท ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน ะฒ ะทะฐะดะฐะฝะฝะพะน ะทะฐะดะฐั‡ะต ะดะพัั‚ะฐั‚ะพั‡ะฝะพ ั‚ั€ะตั… ัั‚ั€ะพะบ ะบะพะดะฐ. ะ’ะพั‚ ะฒะตั€ัะธั ะดะปั PyTorch: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("ะŸั€ะธะฒะตั‚ ะผะธั€!", return_tensors="pt") ->>> outputs = model(**inputs) -``` - -ะ ะฒะพั‚ ัะบะฒะธะฒะฐะปะตะฝั‚ะฝั‹ะน ะบะพะด ะดะปั TensorFlow: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("ะŸั€ะธะฒะตั‚ ะผะธั€!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -ะขะพะบะตะฝะธะทะฐั‚ะพั€ ะพั‚ะฒะตั‡ะฐะตั‚ ะทะฐ ะฒััŽ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝัƒัŽ ะพะฑั€ะฐะฑะพั‚ะบัƒ, ะบะพั‚ะพั€ัƒัŽ ะพะถะธะดะฐะตั‚ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝะฐั ะผะพะดะตะปัŒ, ะธ ะผะพะถะตั‚ ะฑั‹ั‚ัŒ ะฒั‹ะทะฒะฐะฝ ะฝะตะฟะพัั€ะตะดัั‚ะฒะตะฝะฝะพ ั ะฟะพะผะพั‰ัŒัŽ ะพะดะฝะพะน ัั‚ั€ะพะบะธ (ะบะฐะบ ะฒ ะฟั€ะธะฒะตะดะตะฝะฝั‹ั… ะฒั‹ัˆะต ะฟั€ะธะผะตั€ะฐั…) ะธะปะธ ะฝะฐ ัะฟะธัะบะต. ะ’ ั€ะตะทัƒะปัŒั‚ะฐั‚ะต ะฑัƒะดะตั‚ ะฟะพะปัƒั‡ะตะฝ ัะปะพะฒะฐั€ัŒ, ะบะพั‚ะพั€ั‹ะน ะผะพะถะฝะพ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะฒ ะฟะพัะปะตะดัƒัŽั‰ะตะผ ะบะพะดะต ะธะปะธ ะฟั€ะพัั‚ะพ ะฝะฐะฟั€ัะผัƒัŽ ะฟะตั€ะตะดะฐั‚ัŒ ะฒ ะผะพะดะตะปัŒ ั ะฟะพะผะพั‰ัŒัŽ ะพะฟะตั€ะฐั‚ะพั€ะฐ ั€ะฐัะฟะฐะบะพะฒะบะธ ะฐั€ะณัƒะผะตะฝั‚ะพะฒ **. - -ะกะฐะผะฐ ะผะพะดะตะปัŒ ะฟั€ะตะดัั‚ะฐะฒะปัะตั‚ ัะพะฑะพะน ะพะฑั‹ั‡ะฝั‹ะน [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) ะธะปะธ [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (ะฒ ะทะฐะฒะธัะธะผะพัั‚ะธ ะพั‚ ะธัะฟะพะปัŒะทัƒะตะผะพะณะพ ะฑัะบะตะฝะดะฐ), ะบะพั‚ะพั€ั‹ะน ะผะพะถะฝะพ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะบะฐะบ ะพะฑั‹ั‡ะฝะพ. [ะ’ ัั‚ะพะผ ั€ัƒะบะพะฒะพะดัั‚ะฒะต](https://huggingface.co/docs/transformers/training) ั€ะฐััะบะฐะทั‹ะฒะฐะตั‚ัั, ะบะฐะบ ะธะฝั‚ะตะณั€ะธั€ะพะฒะฐั‚ัŒ ั‚ะฐะบัƒัŽ ะผะพะดะตะปัŒ ะฒ ะบะปะฐััะธั‡ะตัะบะธะน ั†ะธะบะป ะพะฑัƒั‡ะตะฝะธั PyTorch ะธะปะธ TensorFlow, ะธะปะธ ะบะฐะบ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะฝะฐัˆ API `Trainer` ะดะปั ะฑั‹ัั‚ั€ะพะน ั‚ะพะฝะบะพะน ะฝะฐัั‚ั€ะพะนะบะธ ะฝะฐ ะฝะพะฒะพะผ ะดะฐั‚ะฐัะตั‚ะต. - -## ะŸะพั‡ะตะผัƒ ะฝะตะพะฑั…ะพะดะธะผะพ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ transformers? - -1. ะŸั€ะพัั‚ั‹ะต ะฒ ะธัะฟะพะปัŒะทะพะฒะฐะฝะธะธ ัะพะฒั€ะตะผะตะฝะฝั‹ะต ะผะพะดะตะปะธ: - - ะ’ั‹ัะพะบะฐั ะฟั€ะพะธะทะฒะพะดะธั‚ะตะปัŒะฝะพัั‚ัŒ ะฒ ะทะฐะดะฐั‡ะฐั… ะฟะพะฝะธะผะฐะฝะธั ะธ ะณะตะฝะตั€ะฐั†ะธะธ ะตัั‚ะตัั‚ะฒะตะฝะฝะพะณะพ ัะทั‹ะบะฐ, ะบะพะผะฟัŒัŽั‚ะตั€ะฝะพะณะพ ะทั€ะตะฝะธั ะธ ะฐัƒะดะธะพ. - - ะะธะทะบะธะน ะฒั…ะพะดะฝะพะน ะฑะฐั€ัŒะตั€ ะดะปั ะฟั€ะตะฟะพะดะฐะฒะฐั‚ะตะปะตะน ะธ ะฟั€ะฐะบั‚ะธะบะพะฒ. - - ะะตะฑะพะปัŒัˆะพะต ะบะพะปะธั‡ะตัั‚ะฒะพ ะฐะฑัั‚ั€ะฐะบั†ะธะน ะดะปั ะฟะพะปัŒะทะพะฒะฐั‚ะตะปั ะธ ะฒัะตะณะพ ั‚ั€ะธ ะบะปะฐััะฐ ะดะปั ะธะทัƒั‡ะตะฝะธั. - - ะ•ะดะธะฝั‹ะน API ะดะปั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะฒัะตั… ะฝะฐัˆะธั… ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน. - -1. ะ‘ะพะปะตะต ะฝะธะทะบะธะต ะฒั‹ั‡ะธัะปะธั‚ะตะปัŒะฝั‹ะต ะทะฐั‚ั€ะฐั‚ั‹, ะผะตะฝัŒัˆะธะน "ัƒะณะปะตั€ะพะดะฝั‹ะน ัะปะตะด": - - ะ˜ััะปะตะดะพะฒะฐั‚ะตะปะธ ะผะพะณัƒั‚ ะพะฑะผะตะฝะธะฒะฐั‚ัŒัั ะพะฑัƒั‡ะตะฝะฝั‹ะผะธ ะผะพะดะตะปัะผะธ ะฒะผะตัั‚ะพ ั‚ะพะณะพ, ั‡ั‚ะพะฑั‹ ะฟะพัั‚ะพัะฝะฝะพ ะธั… ะฟะตั€ะตะพะฑัƒั‡ะฐั‚ัŒ. - - ะŸั€ะฐะบั‚ะธะบะธ ะผะพะณัƒั‚ ัะพะบั€ะฐั‚ะธั‚ัŒ ะฒั€ะตะผั ะฒั‹ั‡ะธัะปะตะฝะธะน ะธ ะฟั€ะพะธะทะฒะพะดัั‚ะฒะตะฝะฝั‹ะต ะทะฐั‚ั€ะฐั‚ั‹. - - ะ”ะตััั‚ะบะธ ะฐั€ั…ะธั‚ะตะบั‚ัƒั€ ั ะฑะพะปะตะต ั‡ะตะผ 60 000 ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพ ะพะฑัƒั‡ะตะฝะฝั‹ั… ะผะพะดะตะปะตะน ะดะปั ะฒัะตั… ะผะพะดะฐะปัŒะฝะพัั‚ะตะน. - -1. ะ’ั‹ะฑะพั€ ะฟะพะดั…ะพะดัั‰ะตะณะพ ั„ั€ะตะนะผะฒะพั€ะบะฐ ะดะปั ะบะฐะถะดะพะณะพ ัั‚ะฐะฟะฐ ะถะธะทะฝะธ ะผะพะดะตะปะธ: - - ะžะฑัƒั‡ะตะฝะธะต ัะฐะผั‹ั… ัะพะฒั€ะตะผะตะฝะฝั‹ั… ะผะพะดะตะปะตะน ะทะฐ 3 ัั‚ั€ะพะบะธ ะบะพะดะฐ. - - ะŸะตั€ะตะผะตั‰ะฐะนั‚ะต ะพะดะฝัƒ ะผะพะดะตะปัŒ ะผะตะถะดัƒ ั„ั€ะตะนะผะฒะพั€ะบะฐะผะธ TF2.0/PyTorch/JAX ะฟะพ ัะฒะพะตะผัƒ ัƒัะผะพั‚ั€ะตะฝะธัŽ. - - ะ‘ะตัะฟั€ะตะฟัั‚ัั‚ะฒะตะฝะฝั‹ะน ะฒั‹ะฑะพั€ ะฟะพะดั…ะพะดัั‰ะตะณะพ ั„ั€ะตะนะผะฒะพั€ะบะฐ ะดะปั ะพะฑัƒั‡ะตะฝะธั, ะพั†ะตะฝะบะธ ะธ ะฟั€ะพะธะทะฒะพะดัั‚ะฒะฐ. - -1. ะ›ะตะณะบะพ ะฝะฐัั‚ั€ะพะธั‚ัŒ ะผะพะดะตะปัŒ ะธะปะธ ะฟั€ะธะผะตั€ ะฟะพะด ัะฒะพะธ ะฝัƒะถะดั‹: - - ะœั‹ ะฟั€ะตะดะพัั‚ะฐะฒะปัะตะผ ะฟั€ะธะผะตั€ั‹ ะดะปั ะบะฐะถะดะพะน ะฐั€ั…ะธั‚ะตะบั‚ัƒั€ั‹, ั‡ั‚ะพะฑั‹ ะฒะพัะฟั€ะพะธะทะฒะตัั‚ะธ ั€ะตะทัƒะปัŒั‚ะฐั‚ั‹, ะพะฟัƒะฑะปะธะบะพะฒะฐะฝะฝั‹ะต ะธั… ะฐะฒั‚ะพั€ะฐะผะธ. - - ะ’ะฝัƒั‚ั€ะตะฝะฝะธะต ะบะพะผะฟะพะฝะตะฝั‚ั‹ ะผะพะดะตะปะธ ั€ะฐัะบั€ั‹ะฒะฐัŽั‚ัั ะผะฐะบัะธะผะฐะปัŒะฝะพ ะฟะพัะปะตะดะพะฒะฐั‚ะตะปัŒะฝะพ. - - ะคะฐะนะปั‹ ะผะพะดะตะปะตะน ะผะพะถะฝะพ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะฝะตะทะฐะฒะธัะธะผะพ ะพั‚ ะฑะธะฑะปะธะพั‚ะตะบะธ ะดะปั ะฟั€ะพะฒะตะดะตะฝะธั ะฑั‹ัั‚ั€ั‹ั… ัะบัะฟะตั€ะธะผะตะฝั‚ะพะฒ. - -## ะŸะพั‡ะตะผัƒ ั ะฝะต ะดะพะปะถะตะฝ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ transformers? - -- ะ”ะฐะฝะฝะฐั ะฑะธะฑะปะธะพั‚ะตะบะฐ ะฝะต ัะฒะปัะตั‚ัั ะผะพะดัƒะปัŒะฝั‹ะผ ะฝะฐะฑะพั€ะพะผ ัั‚ั€ะพะธั‚ะตะปัŒะฝั‹ั… ะฑะปะพะบะพะฒ ะดะปั ะฝะตะนั€ะพะฝะฝั‹ั… ัะตั‚ะตะน. ะšะพะด ะฒ ั„ะฐะนะปะฐั… ะผะพะดะตะปะตะน ัะฟะตั†ะธะฐะปัŒะฝะพ ะฝะต ั€ะตั„ะฐะบั‚ะพั€ะธั‚ัั ะดะพะฟะพะปะฝะธั‚ะตะปัŒะฝั‹ะผะธ ะฐะฑัั‚ั€ะฐะบั†ะธัะผะธ, ั‡ั‚ะพะฑั‹ ะธััะปะตะดะพะฒะฐั‚ะตะปะธ ะผะพะณะปะธ ะฑั‹ัั‚ั€ะพ ะธั‚ะตั€ะฐั‚ะธะฒะฝะพ ั€ะฐะฑะพั‚ะฐั‚ัŒ ั ะบะฐะถะดะพะน ะธะท ะผะพะดะตะปะตะน, ะฝะต ะฟะพะณั€ัƒะถะฐัััŒ ะฒ ะดะพะฟะพะปะฝะธั‚ะตะปัŒะฝั‹ะต ะฐะฑัั‚ั€ะฐะบั†ะธะธ/ั„ะฐะนะปั‹. -- API ะพะฑัƒั‡ะตะฝะธั ะฝะต ะฟั€ะตะดะฝะฐะทะฝะฐั‡ะตะฝ ะดะปั ั€ะฐะฑะพั‚ั‹ ั ะปัŽะฑะพะน ะผะพะดะตะปัŒัŽ, ะฐ ะพะฟั‚ะธะผะธะทะธั€ะพะฒะฐะฝ ะดะปั ั€ะฐะฑะพั‚ั‹ ั ะผะพะดะตะปัะผะธ, ะฟั€ะตะดะพัั‚ะฐะฒะปัะตะผั‹ะผะธ ะฑะธะฑะปะธะพั‚ะตะบะพะน. ะ”ะปั ั€ะฐะฑะพั‚ั‹ ั ะพะฑั‰ะธะผะธ ั†ะธะบะปะฐะผะธ ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ัะปะตะดัƒะตั‚ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ ะดั€ัƒะณัƒัŽ ะฑะธะฑะปะธะพั‚ะตะบัƒ (ะฒะพะทะผะพะถะฝะพ, [Accelerate](https://huggingface.co/docs/accelerate)). -- ะะตัะผะพั‚ั€ั ะฝะฐ ั‚ะพ, ั‡ั‚ะพ ะผั‹ ัั‚ั€ะตะผะธะผัั ะฟั€ะตะดัั‚ะฐะฒะธั‚ัŒ ะบะฐะบ ะผะพะถะฝะพ ะฑะพะปัŒัˆะต ะฟั€ะธะผะตั€ะพะฒ ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั, ัะบั€ะธะฟั‚ั‹ ะฒ ะฝะฐัˆะตะน ะฟะฐะฟะบะต [ะฟั€ะธะผะตั€ะพะฒ](https://github.com/huggingface/transformers/tree/main/examples) ัะฒะปััŽั‚ัั ะธะผะตะฝะฝะพ ะฟั€ะธะผะตั€ะฐะผะธ. ะŸั€ะตะดะฟะพะปะฐะณะฐะตั‚ัั, ั‡ั‚ะพ ะพะฝะธ ะฝะต ะฑัƒะดัƒั‚ ั€ะฐะฑะพั‚ะฐั‚ัŒ "ะธะท ะบะพั€ะพะฑะบะธ" ะดะปั ั€ะตัˆะตะฝะธั ะฒะฐัˆะตะน ะบะพะฝะบั€ะตั‚ะฝะพะน ะทะฐะดะฐั‡ะธ, ะธ ะฒะฐะผ ะฟั€ะธะดะตั‚ัั ะธะทะผะตะฝะธั‚ัŒ ะฝะตัะบะพะปัŒะบะพ ัั‚ั€ะพะบ ะบะพะดะฐ, ั‡ั‚ะพะฑั‹ ะฐะดะฐะฟั‚ะธั€ะพะฒะฐั‚ัŒ ะธั… ะฟะพะด ัะฒะพะธ ะฝัƒะถะดั‹. - -## ะฃัั‚ะฐะฝะพะฒะบะฐ - -### ะก ะฟะพะผะพั‰ัŒัŽ pip - -ะ”ะฐะฝะฝั‹ะน ั€ะตะฟะพะทะธั‚ะพั€ะธะน ะฟั€ะพั‚ะตัั‚ะธั€ะพะฒะฐะฝ ะฝะฐ Python 3.8+, Flax 0.4.1+, PyTorch 1.10+ ะธ TensorFlow 2.6+. - -ะฃัั‚ะฐะฝะฐะฒะปะธะฒะฐั‚ัŒ ๐Ÿค— Transformers ัะปะตะดัƒะตั‚ ะฒ [ะฒะธั€ั‚ัƒะฐะปัŒะฝะพะน ัั€ะตะดะต](https://docs.python.org/3/library/venv.html). ะ•ัะปะธ ะฒั‹ ะฝะต ะทะฝะฐะบะพะผั‹ ั ะฒะธั€ั‚ัƒะฐะปัŒะฝั‹ะผะธ ัั€ะตะดะฐะผะธ Python, ะพะทะฝะฐะบะพะผัŒั‚ะตััŒ ั [ั€ัƒะบะพะฒะพะดัั‚ะฒะพะผ ะฟะพะปัŒะทะพะฒะฐั‚ะตะปั](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). - -ะกะฝะฐั‡ะฐะปะฐ ัะพะทะดะฐะนั‚ะต ะฒะธั€ั‚ัƒะฐะปัŒะฝัƒัŽ ัั€ะตะดัƒ ั ั‚ะพะน ะฒะตั€ัะธะตะน Python, ะบะพั‚ะพั€ัƒัŽ ะฒั‹ ัะพะฑะธั€ะฐะตั‚ะตััŒ ะธัะฟะพะปัŒะทะพะฒะฐั‚ัŒ, ะธ ะฐะบั‚ะธะฒะธั€ัƒะนั‚ะต ะตะต. - -ะ—ะฐั‚ะตะผ ะฝะตะพะฑั…ะพะดะธะผะพ ัƒัั‚ะฐะฝะพะฒะธั‚ัŒ ั…ะพั‚ั ะฑั‹ ะพะดะธะฝ ะฑะตะบะตะฝะด ะธะท Flax, PyTorch ะธะปะธ TensorFlow. -ะŸะพะถะฐะปัƒะนัั‚ะฐ, ะพะฑั€ะฐั‚ะธั‚ะตััŒ ะบ ัั‚ั€ะฐะฝะธั†ะฐะผ [TensorFlow ัƒัั‚ะฐะฝะพะฒะพั‡ะฝะฐั ัั‚ั€ะฐะฝะธั†ะฐ](https://www.tensorflow.org/install/), [PyTorch ัƒัั‚ะฐะฝะพะฒะพั‡ะฝะฐั ัั‚ั€ะฐะฝะธั†ะฐ](https://pytorch.org/get-started/locally/#start-locally) ะธ/ะธะปะธ [Flax](https://github.com/google/flax#quick-install) ะธ [Jax](https://github.com/google/jax#installation), ะณะดะต ะพะฟะธัะฐะฝั‹ ะบะพะผะฐะฝะดั‹ ัƒัั‚ะฐะฝะพะฒะบะธ ะดะปั ะฒะฐัˆะตะน ะฟะปะฐั‚ั„ะพั€ะผั‹. - -ะŸะพัะปะต ัƒัั‚ะฐะฝะพะฒะบะธ ะพะดะฝะพะณะพ ะธะท ัั‚ะธั… ะฑัะบะตะฝะดะพะฒ ๐Ÿค— Transformers ะผะพะถะตั‚ ะฑั‹ั‚ัŒ ัƒัั‚ะฐะฝะพะฒะปะตะฝ ั ะฟะพะผะพั‰ัŒัŽ pip ัะปะตะดัƒัŽั‰ะธะผ ะพะฑั€ะฐะทะพะผ: - -```bash -pip install transformers -``` - -ะ•ัะปะธ ะฒั‹ ั…ะพั‚ะธั‚ะต ะฟะพะธะณั€ะฐั‚ัŒ ั ะฟั€ะธะผะตั€ะฐะผะธ ะธะปะธ ะฒะฐะผ ะฝัƒะถะตะฝ ัะฐะผั‹ะน ัะพะฒั€ะตะผะตะฝะฝั‹ะน ะบะพะด ะธ ะฒั‹ ะฝะต ะผะพะถะตั‚ะต ะถะดะฐั‚ัŒ ะฝะพะฒะพะณะพ ั€ะตะปะธะทะฐ, ะฒั‹ ะดะพะปะถะฝั‹ [ัƒัั‚ะฐะฝะพะฒะธั‚ัŒ ะฑะธะฑะปะธะพั‚ะตะบัƒ ะธะท ะธัั…ะพะดะฝะพะณะพ ะบะพะดะฐ](https://huggingface.co/docs/transformers/installation#installing-from-source). - -### ะก ะฟะพะผะพั‰ัŒัŽ conda - -ะะฐั‡ะธะฝะฐั ั ะฒะตั€ัะธะธ Transformers v4.0.0, ัƒ ะฝะฐั ะฟะพัะฒะธะปัะฐััŒ ะฟะพะดะดะตั€ะถะบะฐ conda: `huggingface`. - -ะฃัั‚ะฐะฝะพะฒะธั‚ัŒ Transformers ั ะฟะพะผะพั‰ัŒัŽ conda ะผะพะถะฝะพ ัะปะตะดัƒัŽั‰ะธะผ ะพะฑั€ะฐะทะพะผ: - -```bash -conda install -c huggingface transformers -``` - -ะž ั‚ะพะผ, ะบะฐะบ ัƒัั‚ะฐะฝะพะฒะธั‚ัŒ Flax, PyTorch ะธะปะธ TensorFlow ั ะฟะพะผะพั‰ัŒัŽ conda, ั‡ะธั‚ะฐะนั‚ะต ะฝะฐ ัั‚ั€ะฐะฝะธั†ะฐั…, ะฟะพัะฒัั‰ะตะฝะฝั‹ั… ะธั… ัƒัั‚ะฐะฝะพะฒะบะต. - -> **_ะ—ะะœะ•ะขะšะ:_** ะ’ ะพะฟะตั€ะฐั†ะธะพะฝะฝะพะน ัะธัั‚ะตะผะต Windows ะฒะฐะผ ะผะพะถะตั‚ ะฑั‹ั‚ัŒ ะฟั€ะตะดะปะพะถะตะฝะพ ะฐะบั‚ะธะฒะธั€ะพะฒะฐั‚ัŒ ั€ะตะถะธะผ ั€ะฐะทั€ะฐะฑะพั‚ั‡ะธะบะฐ, ั‡ั‚ะพะฑั‹ ะฒะพัะฟะพะปัŒะทะพะฒะฐั‚ัŒัั ะฟั€ะตะธะผัƒั‰ะตัั‚ะฒะฐะผะธ ะบััˆะธั€ะพะฒะฐะฝะธั. ะ•ัะปะธ ะดะปั ะฒะฐั ัั‚ะพ ะฝะตะฒะพะทะผะพะถะฝะพ, ัะพะพะฑั‰ะธั‚ะต ะฝะฐะผ ะพะฑ ัั‚ะพะผ [ะทะดะตััŒ](https://github.com/huggingface/huggingface_hub/issues/1062). - -## ะœะพะดะตะปัŒะฝั‹ะต ะฐั€ั…ะธั‚ะตะบั‚ัƒั€ั‹ - -**[ะ’ัะต ะบะพะฝั‚ั€ะพะปัŒะฝั‹ะต ั‚ะพั‡ะบะธ ะผะพะดะตะปะตะน](https://huggingface.co/models)**, ะฟั€ะตะดะพัั‚ะฐะฒะปัะตะผั‹ะต ๐Ÿค— Transformers, ะฑะตัะฟั€ะตะฟัั‚ัั‚ะฒะตะฝะฝะพ ะธะฝั‚ะตะณั€ะธั€ัƒัŽั‚ัั ั huggingface.co [model hub](https://huggingface.co/models), ะบัƒะดะฐ ะพะฝะธ ะทะฐะณั€ัƒะถะฐัŽั‚ัั ะฝะตะฟะพัั€ะตะดัั‚ะฒะตะฝะฝะพ [ะฟะพะปัŒะทะพะฒะฐั‚ะตะปัะผะธ](https://huggingface.co/users) ะธ [ะพั€ะณะฐะฝะธะทะฐั†ะธัะผะธ](https://huggingface.co/organizations). - -ะขะตะบัƒั‰ะตะต ะบะพะปะธั‡ะตัั‚ะฒะพ ะบะพะฝั‚ั€ะพะปัŒะฝั‹ั… ั‚ะพั‡ะตะบ: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— ะ’ ะฝะฐัั‚ะพัั‰ะตะต ะฒั€ะตะผั Transformers ะฟั€ะตะดะพัั‚ะฐะฒะปัะตั‚ ัะปะตะดัƒัŽั‰ะธะต ะฐั€ั…ะธั‚ะตะบั‚ัƒั€ั‹ (ะฟะพะดั€ะพะฑะฝะพะต ะพะฟะธัะฐะฝะธะต ะบะฐะถะดะพะน ะธะท ะฝะธั… ัะผ. [ะทะดะตััŒ](https://huggingface.co/docs/transformers/model_summary)): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. Released with the paper [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Meta/USC/CMU/SJTU) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the repository [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA) for Approximate Self-Attention](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/main/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[Phi](https://huggingface.co/docs/main/transformers/model_doc/phi)** (from Microsoft Research) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng), released on [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/main/model_doc/vitmatte)** (from HUST-VL) rreleased with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. -1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR. - -ะงั‚ะพะฑั‹ ะฟั€ะพะฒะตั€ะธั‚ัŒ, ะตัั‚ัŒ ะปะธ ัƒ ะบะฐะถะดะพะน ะผะพะดะตะปะธ ั€ะตะฐะปะธะทะฐั†ะธั ะฝะฐ Flax, PyTorch ะธะปะธ TensorFlow, ะธะปะธ ัะฒัะทะฐะฝะฝั‹ะน ั ะฝะตะน ั‚ะพะบะตะฝะธะทะฐั‚ะพั€, ะฟะพะดะดะตั€ะถะธะฒะฐะตะผั‹ะน ะฑะธะฑะปะธะพั‚ะตะบะพะน ๐Ÿค— Tokenizers, ะพะฑั€ะฐั‚ะธั‚ะตััŒ ะบ [ัั‚ะพะน ั‚ะฐะฑะปะธั†ะต](https://huggingface.co/docs/transformers/index#supported-frameworks). - -ะญั‚ะธ ั€ะตะฐะปะธะทะฐั†ะธะธ ะฑั‹ะปะธ ะฟั€ะพั‚ะตัั‚ะธั€ะพะฒะฐะฝั‹ ะฝะฐ ะฝะตัะบะพะปัŒะบะธั… ะฝะฐะฑะพั€ะฐั… ะดะฐะฝะฝั‹ั… (ัะผ. ะฟั€ะธะผะตั€ั‹ ัะบั€ะธะฟั‚ะพะฒ) ะธ ะดะพะปะถะฝั‹ ัะพะพั‚ะฒะตั‚ัั‚ะฒะพะฒะฐั‚ัŒ ะฟั€ะพะธะทะฒะพะดะธั‚ะตะปัŒะฝะพัั‚ะธ ะพั€ะธะณะธะฝะฐะปัŒะฝั‹ั… ั€ะตะฐะปะธะทะฐั†ะธะน. ะ‘ะพะปะตะต ะฟะพะดั€ะพะฑะฝัƒัŽ ะธะฝั„ะพั€ะผะฐั†ะธัŽ ะพ ะฟั€ะพะธะทะฒะพะดะธั‚ะตะปัŒะฝะพัั‚ะธ ะผะพะถะฝะพ ะฝะฐะนั‚ะธ ะฒ ั€ะฐะทะดะตะปะต "ะŸั€ะธะผะตั€ั‹" [ะดะพะบัƒะผะตะฝั‚ะฐั†ะธะธ](https://github.com/huggingface/transformers/tree/main/examples). - - -## ะ˜ะทัƒั‡ะธ ะฑะพะปัŒัˆะต - -| ะกะตะบั†ะธั | ะžะฟะธัะฐะฝะธะต | -|-|-| -| [ะ”ะพะบัƒะผะตะฝั‚ะฐั†ะธั](https://huggingface.co/docs/transformers/) | ะŸะพะปะฝะฐั ะดะพะบัƒะผะตะฝั‚ะฐั†ะธั ะฟะพ API ะธ ะณะฐะนะดั‹ | -| [ะšั€ะฐั‚ะบะธะต ะพะฟะธัะฐะฝะธั ะทะฐะดะฐั‡](https://huggingface.co/docs/transformers/task_summary) | ะ—ะฐะดะฐั‡ะธ ะฟะพะดะดะตั€ะถะธะฒะฐัŽั‚ัั ๐Ÿค— Transformers | -| [ะŸะพัะพะฑะธะต ะฟะพ ะฟั€ะตะดะฒะฐั€ะธั‚ะตะปัŒะฝะพะน ะพะฑั€ะฐะฑะพั‚ะบะต](https://huggingface.co/docs/transformers/preprocessing) | ะ˜ัะฟะพะปัŒะทะพะฒะฐะฝะธะต ะบะปะฐััะฐ `Tokenizer` ะดะปั ะฟะพะดะณะพั‚ะพะฒะบะธ ะดะฐะฝะฝั‹ั… ะดะปั ะผะพะดะตะปะตะน | -| [ะžะฑัƒั‡ะตะฝะธะต ะธ ะดะพั€ะฐะฑะพั‚ะบะฐ](https://huggingface.co/docs/transformers/training) | ะ˜ัะฟะพะปัŒะทะพะฒะฐะฝะธะต ะผะพะดะตะปะตะน, ะฟั€ะตะดะพัั‚ะฐะฒะปัะตะผั‹ั… ๐Ÿค— Transformers, ะฒ ั†ะธะบะปะต ะพะฑัƒั‡ะตะฝะธั PyTorch/TensorFlow ะธ API `Trainer`. | -| [ะ‘ั‹ัั‚ั€ั‹ะน ั‚ัƒั€: ะขะพะฝะบะฐั ะฝะฐัั‚ั€ะพะนะบะฐ/ัะบั€ะธะฟั‚ั‹ ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั](https://github.com/huggingface/transformers/tree/main/examples) | ะŸั€ะธะผะตั€ั‹ ัะบั€ะธะฟั‚ะพะฒ ะดะปั ั‚ะพะฝะบะพะน ะฝะฐัั‚ั€ะพะนะบะธ ะผะพะดะตะปะตะน ะฝะฐ ัˆะธั€ะพะบะพะผ ัะฟะตะบั‚ั€ะต ะทะฐะดะฐั‡ | -| [ะกะพะฒะผะตัั‚ะฝะพะต ะธัะฟะพะปัŒะทะพะฒะฐะฝะธะต ะธ ะทะฐะณั€ัƒะทะบะฐ ะผะพะดะตะปะตะน](https://huggingface.co/docs/transformers/model_sharing) | ะ—ะฐะณั€ัƒะถะฐะนั‚ะต ะธ ะดะตะปะธั‚ะตััŒ ั ัะพะพะฑั‰ะตัั‚ะฒะพะผ ัะฒะพะธะผะธ ะดะพั€ะฐะฑะพั‚ะฐะฝะฝั‹ะผะธ ะผะพะดะตะปัะผะธ | - -## ะฆะธั‚ะธั€ะพะฒะฐะฝะธะต - -ะขะตะฟะตั€ัŒ ัƒ ะฝะฐั ะตัั‚ัŒ [ัั‚ะฐั‚ัŒั](https://www.aclweb.org/anthology/2020.emnlp-demos.6/), ะบะพั‚ะพั€ัƒัŽ ะผะพะถะฝะพ ั†ะธั‚ะธั€ะพะฒะฐั‚ัŒ ะดะปั ะฑะธะฑะปะธะพั‚ะตะบะธ ๐Ÿค— Transformers: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_te.md b/README_te.md deleted file mode 100644 index 829e27690f9683..00000000000000 --- a/README_te.md +++ /dev/null @@ -1,558 +0,0 @@ - - -

- - - - Hugging Face Transformers Library - -
-
-

- - -

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- - -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ | - ะ ัƒััะบะธะน | - ะ ortuguรชs | - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

JAX, PyTorch เฐฎเฐฐเฐฟเฐฏเฑ TensorFlow เฐ•เฑ‹เฐธเฐ‚ เฐ…เฐคเฑเฐฏเฐพเฐงเฑเฐจเฐฟเฐ• เฐฏเฐ‚เฐคเฑเฐฐ เฐ…เฐญเฑเฐฏเฐพเฐธเฐ‚

-

- -

- -

- -๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ, เฐตเฐฟเฐœเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐ†เฐกเฐฟเฐฏเฑ‹ เฐตเฐ‚เฐŸเฐฟ เฐตเฐฟเฐญเฐฟเฐจเฑเฐจ เฐชเฐฆเฑเฐงเฐคเฑเฐฒเฐชเฑˆ เฐŸเฐพเฐธเฑเฐ•เฑโ€Œเฐฒเฐจเฑ เฐจเฐฟเฐฐเฑเฐตเฐนเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐตเฑ‡เฐฒเฐพเฐฆเฐฟ เฐฎเฑเฐ‚เฐฆเฑเฐ—เฐพ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐชเฑŠเฐ‚เฐฆเฐฟเฐจ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ…เฐ‚เฐฆเฐฟเฐธเฑเฐคเฐพเฐฏเฐฟ. - -เฐˆ เฐจเฐฎเฑ‚เฐจเฐพเฐฒเฑ เฐตเฐฐเฑเฐคเฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ: - -* ๐Ÿ“ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ, 100เฐ•เฐฟ เฐชเฑˆเฐ—เฐพ เฐญเฐพเฐทเฐฒเฑเฐฒเฑ‹ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ เฐ•เฑเฐฒเฐพเฐธเฐฟเฐซเฐฟเฐ•เฑ‡เฐทเฐจเฑ, เฐ‡เฐจเฑเฐซเฐฐเฑเฐฎเฑ‡เฐทเฐจเฑ เฐŽเฐ•เฑเฐธเฑโ€ŒเฐŸเฑเฐฐเฐพเฐ•เฑเฐทเฐจเฑ, เฐชเฑเฐฐเฐถเฑเฐจเฐฒเฐ•เฑ เฐธเฐฎเฐพเฐงเฐพเฐจเฐพเฐฒเฑ, เฐธเฐพเฐฐเฐพเฐ‚เฐถเฐ‚, เฐ…เฐจเฑเฐตเฐพเฐฆเฐ‚, เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ เฐœเฐจเฐฐเฑ‡เฐทเฐจเฑ เฐตเฐ‚เฐŸเฐฟ เฐชเฐจเฑเฐฒ เฐ•เฑ‹เฐธเฐ‚. -* ๐Ÿ–ผ๏ธ เฐ‡เฐฎเฑ‡เฐœเฑโ€Œเฐฒเฑ, เฐ‡เฐฎเฑ‡เฐœเฑ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ, เฐ†เฐฌเฑเฐœเฑ†เฐ•เฑเฐŸเฑ เฐกเฐฟเฐŸเฑ†เฐ•เฑเฐทเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐธเฑ†เฐ—เฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ เฐตเฐ‚เฐŸเฐฟ เฐชเฐจเฑเฐฒ เฐ•เฑ‹เฐธเฐ‚. -* ๐Ÿ—ฃ๏ธ เฐ†เฐกเฐฟเฐฏเฑ‹, เฐธเฑเฐชเฑ€เฐšเฑ เฐฐเฐฟเฐ•เฐ—เฑเฐจเฐฟเฐทเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐ†เฐกเฐฟเฐฏเฑ‹ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ เฐตเฐ‚เฐŸเฐฟ เฐชเฐจเฑเฐฒ เฐ•เฑ‹เฐธเฐ‚. - -เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฑ เฐŸเฑ‡เฐฌเฑเฐฒเฑ เฐ•เฑเฐตเฐถเฑเฐšเฐจเฑ เฐ†เฐจเฑเฐธเฐฐเฑ เฐšเฑ‡เฐฏเฐกเฐ‚, เฐ†เฐชเฑเฐŸเฐฟเฐ•เฐฒเฑ เฐ•เฑเฐฏเฐพเฐฐเฑ†เฐ•เฑเฐŸเฐฐเฑ เฐฐเฐฟเฐ•เฐ—เฑเฐจเฐฟเฐทเฐจเฑ, เฐธเฑเฐ•เฐพเฐจเฑ เฐšเฑ‡เฐธเฐฟเฐจ เฐกเฐพเฐ•เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑโ€Œเฐฒ เฐจเฑเฐ‚เฐกเฐฟ เฐ‡เฐจเฑเฐซเฐฐเฑเฐฎเฑ‡เฐทเฐจเฑ เฐŽเฐ•เฑเฐธเฑโ€ŒเฐŸเฑเฐฐเฐพเฐ•เฑเฐทเฐจเฑ, เฐตเฑ€เฐกเฐฟเฐฏเฑ‹ เฐ•เฑเฐฒเฐพเฐธเฐฟเฐซเฐฟเฐ•เฑ‡เฐทเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐตเฐฟเฐœเฑเฐตเฐฒเฑ เฐ•เฑเฐตเฐถเฑเฐšเฐจเฑ เฐ†เฐจเฑเฐธเฐฐเฑ เฐšเฑ‡เฐฏเฐกเฐ‚ เฐตเฐ‚เฐŸเฐฟ **เฐ…เฐจเฑ‡เฐ• เฐชเฐฆเฑเฐงเฐคเฑเฐฒเฐคเฑ‹ เฐ•เฐฒเฐฟเฐชเฐฟ** เฐชเฐจเฑเฐฒเฐจเฑ เฐ•เฑ‚เฐกเฐพ เฐšเฑ‡เฐฏเฐ—เฐฒเฐตเฑ. - -๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ เฐ…เฐ‚เฐฆเฐฟเฐ‚เฐšเฐฟเฐจ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑโ€Œเฐฒเฑ‹ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐคเฑเฐตเฐฐเฐ—เฐพ เฐกเฑŒเฐจเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ, เฐตเฐพเฐŸเฐฟเฐจเฐฟ เฐฎเฑ€ เฐธเฑเฐตเฐ‚เฐค เฐกเฑ‡เฐŸเฐพเฐธเฑ†เฐŸเฑโ€Œเฐฒเฐฒเฑ‹ เฐซเฑˆเฐจเฑ-เฐŸเฑเฐฏเฑ‚เฐจเฑ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐตเฐพเฐŸเฐฟเฐจเฐฟ เฐฎเฐพ [เฐฎเฑ‹เฐกเฐฒเฑ เฐนเฐฌเฑ](https://huggingface.co/models)เฐฒเฑ‹ เฐธเฐ‚เฐ˜เฐ‚เฐคเฑ‹ เฐญเฐพเฐ—เฐธเฑเฐตเฐพเฐฎเฑเฐฏเฐ‚ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ API เฐฒเฐจเฑ เฐ…เฐ‚เฐฆเฐฟเฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ. เฐ…เฐฆเฑ‡ เฐธเฐฎเฐฏเฐ‚เฐฒเฑ‹, เฐ†เฐฐเฑเฐ•เฐฟเฐŸเฑ†เฐ•เฑเฐšเฐฐเฑโ€Œเฐจเฐฟ เฐจเฐฟเฐฐเฑเฐตเฐšเฐฟเฐ‚เฐšเฑ‡ เฐชเฑเฐฐเฐคเฐฟ เฐชเฑˆเฐฅเฐพเฐจเฑ เฐฎเฐพเฐกเฑเฐฏเฑ‚เฐฒเฑ เฐชเฑ‚เฐฐเฑเฐคเฐฟเฐ—เฐพ เฐธเฑเฐตเฐคเฐ‚เฐคเฑเฐฐเฐ‚เฐ—เฐพ เฐ‰เฐ‚เฐŸเฑเฐ‚เฐฆเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐคเฑเฐตเฐฐเฐฟเฐค เฐชเฐฐเฐฟเฐถเฑ‹เฐงเฐจ เฐชเฑเฐฐเฐฏเฑ‹เฐ—เฐพเฐฒเฐจเฑ เฐชเฑเฐฐเฐพเฐฐเฐ‚เฐญเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐธเฐตเฐฐเฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ. - -๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐ•เฑ เฐฎเฑ‚เฐกเฑ เฐ…เฐคเฑเฐฏเฐ‚เฐค เฐชเฑเฐฐเฐœเฐพเฐฆเฐฐเฐฃ เฐชเฑŠเฐ‚เฐฆเฐฟเฐจ เฐกเฑ€เฐชเฑ เฐฒเฑ†เฐฐเฑเฐจเฐฟเฐ‚เฐ—เฑ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€เฐฒเฑ เฐ‰เฐจเฑเฐจเฐพเฐฏเฐฟ โ€” [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) เฐฎเฐฐเฐฟเฐฏเฑ [TensorFlow](https://www.tensorflow.org/) โ€” เฐตเฐพเฐŸเฐฟ เฐฎเฐงเฑเฐฏ เฐ…เฐคเฑเฐ•เฑเฐฒเฑ เฐฒเฑ‡เฐจเฐฟ เฐเฐ•เฑ€เฐ•เฐฐเฐฃเฐคเฑ‹. เฐฎเฑ€ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ’เฐ•เฐฆเฐพเฐจเฐฟเฐคเฑ‹ เฐฎเฐฐเฑŠเฐ•เฐฆเฐพเฐจเฐฟเฐคเฑ‹ เฐ…เฐจเฑเฐฎเฐฟเฐคเฐฟ เฐ•เฑ‹เฐธเฐ‚ เฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐธเฑ‡ เฐฎเฑเฐ‚เฐฆเฑ เฐตเฐพเฐŸเฐฟเฐ•เฐฟ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐ‡เฐตเฑเฐตเฐกเฐ‚ เฐšเฐพเฐฒเฐพ เฐธเฑเฐฒเฐญเฐ‚. - -## เฐ†เฐจเฑโ€Œเฐฒเฑˆเฐจเฑ เฐกเฑ†เฐฎเฑ‹เฐฒเฑ - -เฐฎเฑ€เฐฐเฑ [เฐฎเฑ‹เฐกเฐฒเฑ เฐนเฐฌเฑ](https://huggingface.co/models) เฐจเฑเฐ‚เฐกเฐฟ เฐฎเฐพ เฐฎเฑ‹เฐกเฐณเฑเฐฒเฐฒเฑ‹ เฐšเฐพเฐฒเฐพ เฐตเฐฐเฐ•เฑ เฐตเฐพเฐŸเฐฟ เฐชเฑ‡เฐœเฑ€เฐฒเฐฒเฑ‹ เฐจเฑ‡เฐฐเฑเฐ—เฐพ เฐชเฐฐเฑ€เฐ•เฑเฐทเฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ. เฐฎเฑ‡เฐฎเฑ เฐชเฐฌเฑเฐฒเฐฟเฐ•เฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐชเฑเฐฐเฑˆเฐตเฑ‡เฐŸเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒ เฐ•เฑ‹เฐธเฐ‚ [เฐชเฑเฐฐเฑˆเฐตเฑ‡เฐŸเฑ เฐฎเฑ‹เฐกเฐฒเฑ เฐนเฑ‹เฐธเฑเฐŸเฐฟเฐ‚เฐ—เฑ, เฐธเฐ‚เฐธเฑเฐ•เฐฐเฐฃ & เฐ…เฐจเฑเฐฎเฐฟเฐคเฐฟ API](https://huggingface.co/pricing)เฐจเฐฟ เฐ•เฑ‚เฐกเฐพ เฐ…เฐ‚เฐฆเฐฟเฐธเฑเฐคเฐพเฐฎเฑ. - -เฐ‡เฐ•เฑเฐ•เฐก เฐ•เฑŠเฐจเฑเฐจเฐฟ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒเฑ เฐ‰เฐจเฑเฐจเฐพเฐฏเฐฟ: - -เฐธเฐนเฐœ เฐญเฐพเฐทเฐพ เฐชเฑเฐฐเฐพเฐธเฑ†เฐธเฐฟเฐ‚เฐ—เฑโ€Œเฐฒเฑ‹: -- [BERT เฐคเฑ‹ เฐฎเฐพเฐธเฑเฐ•เฑโ€Œเฐกเฑ เฐตเฐฐเฑเฐกเฑ เฐ•เฐ‚เฐชเฑเฐฒเฑ€เฐทเฐจเฑ](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [Electra เฐคเฑ‹ เฐชเฑ‡เฐฐเฑ เฐŽเฐ‚เฐŸเฐฟเฐŸเฑ€ เฐ—เฑเฐฐเฑเฐคเฐฟเฐ‚เฐชเฑ](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [GPT-2 เฐคเฑ‹ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ เฐœเฐจเฐฐเฑ‡เฐทเฐจเฑ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [RoBERTa เฐคเฑ‹ เฐธเฐนเฐœ เฐญเฐพเฐทเฐพ เฐ…เฐจเฑเฐฎเฐฟเฐคเฐฟ](https://huggingface.co/roberta-large-mnli?text=The+dog+was+Lost.+Nobody+lost+any+animal) -- [BART เฐคเฑ‹ เฐธเฐพเฐฐเฐพเฐ‚เฐถเฐ‚](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [DistilBERT เฐคเฑ‹ เฐชเฑเฐฐเฐถเฑเฐจ เฐธเฐฎเฐพเฐงเฐพเฐจเฐ‚](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [T5 เฐคเฑ‹ เฐ…เฐจเฑเฐตเฐพเฐฆเฐ‚](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -เฐ•เฐ‚เฐชเฑเฐฏเฑ‚เฐŸเฐฐเฑ เฐฆเฑƒเฐทเฑเฐŸเฐฟเฐฒเฑ‹: -- [VIT เฐคเฑ‹ เฐšเฐฟเฐคเฑเฐฐ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ](https://huggingface.co/google/vit-base-patch16-224) -- [DETR เฐคเฑ‹ เฐ†เฐฌเฑเฐœเฑ†เฐ•เฑเฐŸเฑ เฐกเฐฟเฐŸเฑ†เฐ•เฑเฐทเฐจเฑ](https://huggingface.co/facebook/detr-resnet-50) -- [SegFormer เฐคเฑ‹ เฐธเฑ†เฐฎเฐพเฐ‚เฐŸเฐฟเฐ•เฑ เฐธเฑ†เฐ—เฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) -- [MaskFormer เฐคเฑ‹ เฐชเฐพเฐจเฑ‹เฐชเฑเฐŸเฐฟเฐ•เฑ เฐธเฑ†เฐ—เฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ](https://huggingface.co/facebook/maskformer-swin-small-coco) -- [DPT เฐคเฑ‹ เฐฒเฑ‹เฐคเฑ เฐ…เฐ‚เฐšเฐจเฐพ](https://huggingface.co/docs/transformers/model_doc/dpt) -- [VideoMAE เฐคเฑ‹ เฐตเฑ€เฐกเฐฟเฐฏเฑ‹ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ](https://huggingface.co/docs/transformers/model_doc/videomae) -- [OneFormer เฐคเฑ‹ เฐฏเฑ‚เฐจเฐฟเฐตเฐฐเฑเฐธเฐฒเฑ เฐธเฑ†เฐ—เฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large) - -เฐ†เฐกเฐฟเฐฏเฑ‹เฐฒเฑ‹: -- [Wav2Vec2 เฐคเฑ‹ เฐ†เฐŸเฑ‹เฐฎเฑ‡เฐŸเฐฟเฐ•เฑ เฐธเฑเฐชเฑ€เฐšเฑ เฐฐเฐฟเฐ•เฐ—เฑเฐจเฐฟเฐทเฐจเฑ](https://huggingface.co/facebook/wav2vec2-base-960h) -- [Wav2Vec2 เฐคเฑ‹ เฐ•เฑ€เฐตเฐฐเฑเฐกเฑ เฐธเฑเฐชเฐพเฐŸเฐฟเฐ‚เฐ—เฑ](https://huggingface.co/superb/wav2vec2-base-superb-ks) -- [เฐ†เฐกเฐฟเฐฏเฑ‹ เฐธเฑเฐชเฑ†เฐ•เฑเฐŸเฑเฐฐเฑ‹เฐ—เฑเฐฐเฐพเฐฎเฑ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐคเฑ‹ เฐ†เฐกเฐฟเฐฏเฑ‹ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) - -เฐฎเฐฒเฑเฐŸเฑ€เฐฎเฑ‹เฐกเฐฒเฑ เฐŸเฐพเฐธเฑเฐ•เฑโ€Œเฐฒเฐฒเฑ‹: -- [TAPAS เฐคเฑ‹ เฐŸเฑ‡เฐฌเฑเฐฒเฑ เฐชเฑเฐฐเฐถเฑเฐจ เฐธเฐฎเฐพเฐงเฐพเฐจเฐพเฐฒเฑ](https://huggingface.co/google/tapas-base-finetuned-wtq) -- [ViLT เฐคเฑ‹ เฐฆเฑƒเฐถเฑเฐฏเฐฎเฐพเฐจ เฐชเฑเฐฐเฐถเฑเฐจเฐ•เฑ เฐธเฐฎเฐพเฐงเฐพเฐจเฐ‚](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) -- [CLIP เฐคเฑ‹ เฐœเฑ€เฐฐเฑ‹-เฐทเฐพเฐŸเฑ เฐ‡เฐฎเฑ‡เฐœเฑ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ](https://huggingface.co/openai/clip-vit-large-patch14) -- [LayoutLM เฐคเฑ‹ เฐกเฐพเฐ•เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑ เฐชเฑเฐฐเฐถเฑเฐจเฐ•เฑ เฐธเฐฎเฐพเฐงเฐพเฐจเฐ‚](https://huggingface.co/impira/layoutlm-document-qa) -- [X-CLIP เฐคเฑ‹ เฐœเฑ€เฐฐเฑ‹-เฐทเฐพเฐŸเฑ เฐตเฑ€เฐกเฐฟเฐฏเฑ‹ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฃ](https://huggingface.co/docs/transformers/model_doc/xclip) - -## เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฟ 100 เฐชเฑเฐฐเฐพเฐœเฑ†เฐ•เฑเฐŸเฑเฐฒเฑ - -เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐŸเฑ‚เฐฒเฑโ€Œเฐ•เฐฟเฐŸเฑ เฐ•เฐ‚เฐŸเฑ‡ เฐŽเฐ•เฑเฐ•เฑเฐต: เฐ‡เฐฆเฐฟ เฐฆเฐพเฐจเฐฟ เฐšเฑเฐŸเฑเฐŸเฑ‚ เฐจเฐฟเฐฐเฑเฐฎเฐฟเฐ‚เฐšเฐฟเฐจ เฐชเฑเฐฐเฐพเฐœเฑ†เฐ•เฑเฐŸเฑโ€Œเฐฒ เฐธเฐ‚เฐ˜เฐ‚ เฐฎเฐฐเฐฟเฐฏเฑ -เฐนเฐ—เฑเฐ—เฐฟเฐ‚เฐ—เฑ เฐซเฑ‡เฐธเฑ เฐนเฐฌเฑ. เฐกเฑ†เฐตเฐฒเฐชเฐฐเฑโ€Œเฐฒเฑ, เฐชเฐฐเฐฟเฐถเฑ‹เฐงเฐ•เฑเฐฒเฑ, เฐตเฐฟเฐฆเฑเฐฏเฐพเฐฐเฑเฐฅเฑเฐฒเฑ, เฐชเฑเฐฐเฑŠเฐซเฑ†เฐธเฐฐเฑโ€Œเฐฒเฑ, เฐ‡เฐ‚เฐœเฐจเฑ€เฐฐเฑเฐฒเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐŽเฐตเฐฐเฐฟเฐจเฑˆเฐจเฐพ เฐ…เฐจเฑเฐฎเฐคเฐฟเฐ‚เฐšเฑ‡เฐฒเฐพ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐฎเฑ‡เฐฎเฑ เฐ•เฑ‹เฐฐเฑเฐ•เฑเฐ‚เฐŸเฑเฐจเฑเฐจเฐพเฐฎเฑ -เฐตเฐพเฐฐเฐฟ เฐ•เฐฒเฐฒ เฐชเฑเฐฐเฐพเฐœเฑ†เฐ•เฑเฐŸเฑเฐฒเฐจเฑ เฐจเฐฟเฐฐเฑเฐฎเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ. - -เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒ 100,000 เฐจเฐ•เฑเฐทเฐคเฑเฐฐเฐพเฐฒเฐจเฑ เฐœเฐฐเฑเฐชเฑเฐ•เฑ‹เฐตเฐกเฐพเฐจเฐฟเฐ•เฐฟ, เฐฎเฑ‡เฐฎเฑ เฐธเฑเฐชเฐพเฐŸเฑโ€ŒเฐฒเฑˆเฐŸเฑโ€Œเฐจเฐฟ เฐ‰เฐ‚เฐšเฐพเฐฒเฐจเฐฟ เฐจเฐฟเฐฐเฑเฐฃเฐฏเฐฟเฐ‚เฐšเฑเฐ•เฑเฐจเฑเฐจเฐพเฐฎเฑ -เฐธเฐ‚เฐ˜เฐ‚, เฐฎเฐฐเฐฟเฐฏเฑ เฐฎเฑ‡เฐฎเฑ 100 เฐœเฐพเฐฌเฐฟเฐคเฐพเฐฒเฐจเฑ เฐ•เฐฒเฐฟเฐ—เฐฟ เฐ‰เฐจเฑเฐจ [awesome-transformers](./awesome-transformers.md) เฐชเฑ‡เฐœเฑ€เฐจเฐฟ เฐธเฑƒเฐทเฑเฐŸเฐฟเฐ‚เฐšเฐพเฐฎเฑ. -เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒ เฐชเฐฐเฐฟเฐธเฐฐเฐพเฐฒเฑเฐฒเฑ‹ เฐ…เฐฆเฑเฐญเฑเฐคเฐฎเฑˆเฐจ เฐชเฑเฐฐเฐพเฐœเฑ†เฐ•เฑเฐŸเฑเฐฒเฑ เฐจเฐฟเฐฐเฑเฐฎเฐฟเฐ‚เฐšเฐฌเฐกเฑเฐกเฐพเฐฏเฐฟ. - -เฐœเฐพเฐฌเฐฟเฐคเฐพเฐฒเฑ‹ เฐญเฐพเฐ—เฐฎเฐจเฐฟ เฐฎเฑ€เฐฐเฑ เฐตเฐฟเฐถเฑเฐตเฐธเฐฟเฐ‚เฐšเฑ‡ เฐชเฑเฐฐเฐพเฐœเฑ†เฐ•เฑเฐŸเฑโ€Œเฐจเฑ เฐฎเฑ€เฐฐเฑ เฐ•เฐฒเฐฟเฐ—เฐฟ เฐ‰เฐ‚เฐŸเฑ‡ เฐฒเฑ‡เฐฆเฐพ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐธเฑเฐคเฑเฐ‚เฐŸเฑ‡, เฐฆเฐฏเฐšเฑ‡เฐธเฐฟ เฐฆเฐพเฐจเฐฟเฐจเฐฟ เฐœเฑ‹เฐกเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ PRเฐจเฐฟ เฐคเฑ†เฐฐเฐตเฐ‚เฐกเฐฟ! - -## เฐฎเฑ€เฐฐเฑ เฐนเฐ—เฑเฐ—เฐฟเฐ‚เฐ—เฑ เฐซเฑ‡เฐธเฑ เฐŸเฑ€เฐฎเฑ เฐจเฑเฐ‚เฐกเฐฟ เฐ…เฐจเฑเฐ•เฑ‚เฐฒ เฐฎเฐฆเฑเฐฆเฐคเฑ เฐ•เฑ‹เฐธเฐ‚ เฐšเฑ‚เฐธเฑเฐคเฑเฐจเฑเฐจเฐŸเฑเฐฒเฐฏเฐฟเฐคเฑ‡ - - - HuggingFace Expert Acceleration Program -
- -## เฐคเฑเฐตเฐฐเฐฟเฐค เฐชเฐฐเฑเฐฏเฐŸเฐจ - -เฐ‡เฐšเฑเฐšเฐฟเฐจ เฐ‡เฐจเฑโ€ŒเฐชเฑเฐŸเฑ (เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑ, เฐ‡เฐฎเฑ‡เฐœเฑ, เฐ†เฐกเฐฟเฐฏเฑ‹, ...)เฐชเฑˆ เฐคเฐ•เฑเฐทเฐฃเฐฎเฑ‡ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ, เฐฎเฑ‡เฐฎเฑ `pipeline` API เฐจเฐฟ เฐ…เฐ‚เฐฆเฐฟเฐธเฑเฐคเฐพเฐฎเฑ. เฐชเฑˆเฐชเฑโ€Œเฐฒเฑˆเฐจเฑโ€Œเฐฒเฑ เฐ† เฐฎเฑ‹เฐกเฐฒเฑ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐธเฐฎเฐฏเฐ‚เฐฒเฑ‹ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฟเฐจ เฐชเฑเฐฐเฑ€เฐชเฑเฐฐเฐพเฐธเฑ†เฐธเฐฟเฐ‚เฐ—เฑโ€Œเฐคเฑ‹ เฐ•เฑ‚เฐกเฐฟเฐจ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐธเฐฎเฑ‚เฐนเฐชเฐฐเฑเฐธเฑเฐคเฐพเฐฏเฐฟ. เฐธเฐพเฐจเฑเฐ•เฑ‚เฐฒ เฐฎเฐฐเฐฟเฐฏเฑ เฐชเฑเฐฐเฐคเฐฟเฐ•เฑ‚เฐฒ เฐชเฐพเฐ เฐพเฐฒเฐจเฑ เฐตเฐฐเฑเฐ—เฑ€เฐ•เฐฐเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐชเฑˆเฐชเฑโ€Œเฐฒเฑˆเฐจเฑโ€Œเฐจเฑ เฐคเฑเฐตเฐฐเฐ—เฐพ เฐŽเฐฒเฐพ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐพเฐฒเฑ‹ เฐ‡เฐ•เฑเฐ•เฐก เฐ‰เฐ‚เฐฆเฐฟ: - -```python ->>> from transformers import pipeline - -# Allocate a pipeline for sentiment-analysis ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -เฐฐเฑ†เฐ‚เฐกเฐต เฐฒเฑˆเฐจเฑ เฐ•เฑ‹เฐกเฑ เฐกเฑŒเฐจเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐชเฑˆเฐชเฑโ€Œเฐฒเฑˆเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฑ‡ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐ•เฐพเฐทเฑ เฐšเฑ‡เฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ, เฐฎเฑ‚เฐกเฐตเฐฆเฐฟ เฐ‡เฐšเฑเฐšเฐฟเฐจ เฐŸเฑ†เฐ•เฑเฐธเฑเฐŸเฑโ€Œเฐชเฑˆ เฐฎเฑ‚เฐฒเฑเฐฏเฐพเฐ‚เฐ•เฐจเฐ‚ เฐšเฑ‡เฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ. เฐ‡เฐ•เฑเฐ•เฐก เฐธเฐฎเฐพเฐงเฐพเฐจเฐ‚ 99.97% เฐตเฐฟเฐถเฑเฐตเฐพเฐธเฐ‚เฐคเฑ‹ "เฐชเฐพเฐœเฐฟเฐŸเฐฟเฐตเฑ". - -เฐšเฐพเฐฒเฐพ เฐชเฐจเฑเฐฒเฑ NLPเฐฒเฑ‹ เฐ•เฐพเฐจเฑ€ เฐ•เฐ‚เฐชเฑเฐฏเฑ‚เฐŸเฐฐเฑ เฐตเฐฟเฐœเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐธเฑเฐชเฑ€เฐšเฑโ€Œเฐฒเฑ‹ เฐ•เฑ‚เฐกเฐพ เฐฎเฑเฐ‚เฐฆเฑเฐ—เฐพ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐชเฑŠเฐ‚เฐฆเฐฟเฐจ `pipeline` เฐธเฐฟเฐฆเฑเฐงเฐ‚เฐ—เฐพ เฐ‰เฐจเฑเฐจเฐพเฐฏเฐฟ. เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐ•เฑ, เฐฎเฐจเฐ‚ เฐšเฐฟเฐคเฑเฐฐเฐ‚เฐฒเฑ‹ เฐ—เฑเฐฐเฑเฐคเฐฟเฐ‚เฐšเฐฟเฐจ เฐตเฐธเฑเฐคเฑเฐตเฑเฐฒเฐจเฑ เฐธเฑเฐฒเฐญเฐ‚เฐ—เฐพ เฐธเฐ‚เฐ—เฑเฐฐเฐนเฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ: - -``` python ->>> import requests ->>> from PIL import Image ->>> from transformers import pipeline - -# Download an image with cute cats ->>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" ->>> image_data = requests.get(url, stream=True).raw ->>> image = Image.open(image_data) - -# Allocate a pipeline for object detection ->>> object_detector = pipeline('object-detection') ->>> object_detector(image) -[{'score': 0.9982201457023621, - 'label': 'remote', - 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, - {'score': 0.9960021376609802, - 'label': 'remote', - 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, - {'score': 0.9954745173454285, - 'label': 'couch', - 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, - {'score': 0.9988006353378296, - 'label': 'cat', - 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, - {'score': 0.9986783862113953, - 'label': 'cat', - 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] -``` - -เฐ‡เฐ•เฑเฐ•เฐก เฐฎเฐจเฐ‚ เฐ†เฐฌเฑเฐœเฑ†เฐ•เฑเฐŸเฑ เฐšเฑเฐŸเฑเฐŸเฑ‚ เฐ‰เฐจเฑเฐจ เฐฌเฐพเฐ•เฑเฐธเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐ•เฐพเฐจเฑเฐซเฐฟเฐกเฑ†เฐจเฑเฐธเฑ เฐธเฑเฐ•เฑ‹เฐฐเฑโ€Œเฐคเฑ‹ เฐšเฐฟเฐคเฑเฐฐเฐ‚เฐฒเฑ‹ เฐ—เฑเฐฐเฑเฐคเฐฟเฐ‚เฐšเฐฌเฐกเฐฟเฐจ เฐตเฐธเฑเฐคเฑเฐตเฑเฐฒ เฐœเฐพเฐฌเฐฟเฐคเฐพเฐจเฑ เฐชเฑŠเฐ‚เฐฆเฑเฐคเฐพเฐฎเฑ. เฐ‡เฐ•เฑเฐ•เฐก เฐŽเฐกเฐฎเฐตเฑˆเฐชเฑเฐจ เฐ‰เฐจเฑเฐจ เฐ…เฐธเฐฒเฑ เฐšเฐฟเฐคเฑเฐฐเฐ‚, เฐ•เฑเฐกเฐฟเฐตเฑˆเฐชเฑเฐจ เฐ…เฐ‚เฐšเฐจเฐพเฐฒเฑ เฐชเฑเฐฐเฐฆเฐฐเฑเฐถเฐฟเฐ‚เฐšเฐฌเฐกเฐคเฐพเฐฏเฐฟ: - -

- - -

- -เฐฎเฑ€เฐฐเฑ [เฐˆ เฐŸเฑเฐฏเฑเฐŸเฑ‹เฐฐเฐฟเฐฏเฐฒเฑ](https://huggingface.co/docs/transformers/task_summary)เฐฒเฑ‹ `pipeline` API เฐฆเฑเฐตเฐพเฐฐเฐพ เฐธเฐชเฑ‹เฐฐเฑเฐŸเฑ เฐšเฑ‡เฐธเฑ‡ เฐŸเฐพเฐธเฑเฐ•เฑโ€Œเฐฒ เฐ—เฑเฐฐเฐฟเฐ‚เฐšเฐฟ เฐฎเฐฐเฐฟเฐ‚เฐค เฐคเฑ†เฐฒเฑเฐธเฑเฐ•เฑ‹เฐตเฐšเฑเฐšเฑ. - -`pipeline`เฐคเฑ‹ เฐชเฐพเฐŸเฑ, เฐฎเฑ€เฐฐเฑ เฐ‡เฐšเฑเฐšเฐฟเฐจ เฐŸเฐพเฐธเฑเฐ•เฑโ€Œเฐฒเฑ‹ เฐเฐฆเฑˆเฐจเฐพ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐกเฑŒเฐจเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ, เฐฆเฑ€เฐจเฐฟเฐ•เฐฟ เฐฎเฑ‚เฐกเฑ เฐฒเฑˆเฐจเฑเฐฒ เฐ•เฑ‹เฐกเฑ เฐธเฐฐเฐฟเฐชเฑ‹เฐคเฑเฐ‚เฐฆเฐฟ. เฐ‡เฐ•เฑเฐ•เฐก PyTorch เฐตเฑ†เฐฐเฑเฐทเฐจเฑ เฐ‰เฐ‚เฐฆเฐฟ: -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` - -เฐฎเฐฐเฐฟเฐฏเฑ TensorFlow เฐ•เฐฟ เฐธเฐฎเฐพเฐจเฐฎเฑˆเฐจ เฐ•เฑ‹เฐกเฑ เฐ‡เฐ•เฑเฐ•เฐก เฐ‰เฐ‚เฐฆเฐฟ: -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -เฐชเฑเฐฐเฐฟเฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑ เฐ†เฐถเฐฟเฐ‚เฐšเฑ‡ เฐ…เฐจเฑเฐจเฐฟ เฐชเฑเฐฐเฑ€เฐชเฑเฐฐเฐพเฐธเฑ†เฐธเฐฟเฐ‚เฐ—เฑโ€Œเฐฒเฐ•เฑ เฐŸเฑ‹เฐ•เฑ†เฐจเฑˆเฐœเฐฐเฑ เฐฌเฐพเฐงเฑเฐฏเฐค เฐตเฐนเฐฟเฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐจเฑ‡เฐฐเฑเฐ—เฐพ เฐ’เฐ•เฑ‡ เฐธเฑเฐŸเฑเฐฐเฐฟเฐ‚เฐ—เฑ (เฐชเฑˆ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒเฐฒเฑ‹ เฐตเฐฒเฑ†) เฐฒเฑ‡เฐฆเฐพ เฐœเฐพเฐฌเฐฟเฐคเฐพเฐชเฑˆ เฐ•เฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐตเฐšเฑเฐšเฑ. เฐ‡เฐฆเฐฟ เฐฎเฑ€เฐฐเฑ เฐกเฑŒเฐจเฑโ€ŒเฐธเฑเฐŸเฑเฐฐเฑ€เฐฎเฑ เฐ•เฑ‹เฐกเฑโ€Œเฐฒเฑ‹ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐ—เฐฒ เฐจเฐฟเฐ˜เฐ‚เฐŸเฑเฐตเฑเฐจเฐฟ เฐ…เฐตเฑเฐŸเฑโ€ŒเฐชเฑเฐŸเฑ เฐšเฑ‡เฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ เฐฒเฑ‡เฐฆเฐพ ** เฐ†เฐฐเฑเฐ—เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑ เฐ…เฐจเฑโ€Œเฐชเฑเฐฏเฐพเฐ•เฐฟเฐ‚เฐ—เฑ เฐ†เฐชเฐฐเฑ‡เฐŸเฐฐเฑโ€Œเฐจเฐฟ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฟ เฐจเฑ‡เฐฐเฑเฐ—เฐพ เฐฎเฑ€ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐ•เฐฟ เฐชเฐ‚เฐชเฑเฐคเฑเฐ‚เฐฆเฐฟ. - -เฐฎเฑ‹เฐกเฐฒเฑ เฐ•เฑ‚เฐกเฐพ เฐธเฐพเฐงเฐพเฐฐเฐฃ [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) เฐฒเฑ‡เฐฆเฐพ [TensorFlow `tf.keras.Model`]( https://www.tensorflow.org/api_docs/python/tf/keras/Model) (เฐฎเฑ€ เฐฌเฑเฐฏเฐพเฐ•เฑ†เฐ‚เฐกเฑโ€Œเฐจเฐฟ เฐฌเฐŸเฑเฐŸเฐฟ) เฐฎเฑ€เฐฐเฑ เฐฎเฐพเฐฎเฑ‚เฐฒเฑเฐ—เฐพ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ. [เฐˆ เฐŸเฑเฐฏเฑเฐŸเฑ‹เฐฐเฐฟเฐฏเฐฒเฑ](https://huggingface.co/docs/transformers/training) เฐ…เฐŸเฑเฐตเฐ‚เฐŸเฐฟ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฐฟ เฐ•เฑเฐฒเฐพเฐธเฐฟเฐ•เฑ PyTorch เฐฒเฑ‡เฐฆเฐพ TensorFlow เฐŸเฑเฐฐเฑˆเฐจเฐฟเฐ‚เฐ—เฑ เฐฒเฑ‚เฐชเฑโ€Œเฐฒเฑ‹ เฐŽเฐฒเฐพ เฐ‡เฐ‚เฐŸเฐฟเฐ—เฑเฐฐเฑ‡เฐŸเฑ เฐšเฑ‡เฐฏเฐพเฐฒเฑ‹ เฐฒเฑ‡เฐฆเฐพ เฐฎเฐพ `Trainer` API เฐจเฐฟ เฐŽเฐฒเฐพ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐพเฐฒเฑ‹ เฐตเฐฟเฐตเฐฐเฐฟเฐธเฑเฐคเฑเฐ‚เฐฆเฐฟ เฐ•เฑŠเฐคเฑเฐค เฐกเฑ‡เฐŸเฐพเฐธเฑ†เฐŸเฑ. - -## เฐจเฑ‡เฐจเฑ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐŽเฐ‚เฐฆเฑเฐ•เฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐพเฐฒเฐฟ? - -1. เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐธเฑเฐฒเฐญเฐฎเฑˆเฐจ เฐธเฑเฐŸเฑ‡เฐŸเฑ เฐ†เฐซเฑ เฐฆเฐฟ เฐ†เฐฐเฑเฐŸเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฑ: - - เฐธเฐนเฐœ เฐญเฐพเฐทเฐพ เฐ…เฐตเฐ—เฐพเฐนเฐจ & เฐ‰เฐคเฑเฐชเฐคเฑเฐคเฐฟ, เฐ•เฐ‚เฐชเฑเฐฏเฑ‚เฐŸเฐฐเฑ เฐฆเฑƒเฐทเฑเฐŸเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐ†เฐกเฐฟเฐฏเฑ‹ เฐชเฐจเฑเฐฒเฐชเฑˆ เฐ…เฐงเฐฟเฐ• เฐชเฐจเฐฟเฐคเฑ€เฐฐเฑ. - - เฐตเฐฟเฐฆเฑเฐฏเฐพเฐตเฑ‡เฐคเฑเฐคเฐฒเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐ…เฐญเฑเฐฏเฐพเฐธเฐ•เฑเฐฒ เฐชเฑเฐฐเฐตเฑ‡เฐถเฐพเฐจเฐฟเฐ•เฐฟ เฐคเฐ•เฑเฐ•เฑเฐต เฐ…เฐตเฐฐเฑ‹เฐงเฐ‚. - - เฐคเฑ†เฐฒเฑเฐธเฑเฐ•เฑ‹เฐตเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐ•เฑ‡เฐตเฐฒเฐ‚ เฐฎเฑ‚เฐกเฑ เฐคเฐฐเฐ—เฐคเฑเฐฒเฐคเฑ‹ เฐ•เฑŠเฐจเฑเฐจเฐฟ เฐตเฐฟเฐจเฐฟเฐฏเฑ‹เฐ—เฐฆเฐพเฐฐเฑ-เฐฎเฑเฐ– เฐธเฐ‚เฐ—เฑเฐฐเฐนเฐฃเฐฒเฑ. - - เฐฎเฐพ เฐ…เฐจเฑเฐจเฐฟ เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐ‚ เฐ•เฑ‹เฐธเฐ‚ เฐเฐ•เฑ€เฐ•เฑƒเฐค API. - -2. เฐคเฐ•เฑเฐ•เฑเฐต เฐ—เฐฃเฐจ เฐ–เฐฐเฑเฐšเฑเฐฒเฑ, เฐšเฐฟเฐจเฑเฐจ เฐ•เฐพเฐฐเฑเฐฌเฐจเฑ เฐชเฐพเฐฆเฐฎเฑเฐฆเฑเฐฐ: - - เฐชเฐฐเฐฟเฐถเฑ‹เฐงเฐ•เฑเฐฒเฑ เฐŽเฐฒเฑเฐฒเฐชเฑเฐชเฑเฐกเฑ‚ เฐฎเฐณเฑเฐฒเฑ€ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐชเฑŠเฐ‚เฐฆเฑ‡ เฐฌเฐฆเฑเฐฒเฑ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐชเฑŠเฐ‚เฐฆเฐฟเฐจ เฐจเฐฎเฑ‚เฐจเฐพเฐฒเฐจเฑ เฐชเฐ‚เฐšเฑเฐ•เฑ‹เฐตเฐšเฑเฐšเฑ. - - เฐ…เฐญเฑเฐฏเฐพเฐธเฐ•เฑเฐฒเฑ เฐ—เฐฃเฐจ เฐธเฐฎเฐฏเฐพเฐจเฑเฐจเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐ‰เฐคเฑเฐชเฐคเฑเฐคเฐฟ เฐ–เฐฐเฑเฐšเฑเฐฒเฐจเฑ เฐคเฐ—เฑเฐ—เฐฟเฐ‚เฐšเฐ—เฐฒเฐฐเฑ. - - เฐ…เฐจเฑเฐจเฐฟ เฐชเฐฆเฑเฐงเฐคเฑเฐฒเฑเฐฒเฑ‹ 60,000 เฐ•เฐ‚เฐŸเฑ‡ เฐŽเฐ•เฑเฐ•เฑเฐต เฐชเฑเฐฐเฑ€เฐŸเฑเฐฐเฑˆเฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐคเฑ‹ เฐกเฐœเฐจเฑเฐฒ เฐ•เฑŠเฐฆเฑเฐฆเฑ€ เฐ†เฐฐเฑเฐ•เฐฟเฐŸเฑ†เฐ•เฑเฐšเฐฐเฑโ€Œเฐฒเฑ. - -3. เฐฎเฑ‹เฐกเฐฒเฑ เฐœเฑ€เฐตเฐฟเฐคเฐ•เฐพเฐฒเฐ‚เฐฒเฑ‹ เฐชเฑเฐฐเฐคเฐฟ เฐญเฐพเฐ—เฐพเฐจเฐฟเฐ•เฐฟ เฐธเฐฐเฑˆเฐจ เฐซเฑเฐฐเฑ‡เฐฎเฑโ€Œเฐตเฐฐเฑเฐ•เฑโ€Œเฐจเฑ เฐŽเฐ‚เฐšเฑเฐ•เฑ‹เฐ‚เฐกเฐฟ: - - 3 เฐฒเฑˆเฐจเฑเฐฒ เฐ•เฑ‹เฐกเฑโ€Œเฐฒเฑ‹ เฐธเฑเฐŸเฑ‡เฐŸเฑ เฐ†เฐซเฑ เฐฆเฐฟ เฐ†เฐฐเฑเฐŸเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐ•เฑ เฐถเฐฟเฐ•เฑเฐทเฐฃ เฐ‡เฐตเฑเฐตเฐ‚เฐกเฐฟ. - - TF2.0/PyTorch/JAX เฐซเฑเฐฐเฑ‡เฐฎเฑโ€Œเฐตเฐฐเฑเฐ•เฑโ€Œเฐฒ เฐฎเฐงเฑเฐฏ เฐ’เฐ•เฑ‡ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐ‡เฐทเฑเฐŸเฐพเฐจเฑเฐธเฐพเฐฐเฐ‚เฐ—เฐพ เฐคเฐฐเฐฒเฐฟเฐ‚เฐšเฐ‚เฐกเฐฟ. - - เฐถเฐฟเฐ•เฑเฐทเฐฃ, เฐฎเฑ‚เฐฒเฑเฐฏเฐพเฐ‚เฐ•เฐจเฐ‚ เฐฎเฐฐเฐฟเฐฏเฑ เฐ‰เฐคเฑเฐชเฐคเฑเฐคเฐฟ เฐ•เฑ‹เฐธเฐ‚ เฐธเฐฐเฑˆเฐจ เฐซเฑเฐฐเฑ‡เฐฎเฑโ€Œเฐตเฐฐเฑเฐ•เฑโ€Œเฐจเฑ เฐธเฐœเฐพเฐตเฑเฐ—เฐพ เฐŽเฐ‚เฐšเฑเฐ•เฑ‹เฐ‚เฐกเฐฟ. - -4. เฐฎเฑ€ เฐ…เฐตเฐธเฐฐเฐพเฐฒเฐ•เฑ เฐ…เฐจเฑเฐ—เฑเฐฃเฐ‚เฐ—เฐพ เฐฎเฑ‹เฐกเฐฒเฑ เฐฒเฑ‡เฐฆเฐพ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐจเฑ เฐธเฑเฐฒเฐญเฐ‚เฐ—เฐพ เฐ…เฐจเฑเฐ•เฑ‚เฐฒเฑ€เฐ•เฐฐเฐฟเฐ‚เฐšเฐ‚เฐกเฐฟ: - - เฐชเฑเฐฐเฐคเฐฟ เฐ†เฐฐเฑเฐ•เฐฟเฐŸเฑ†เฐ•เฑเฐšเฐฐเฑ เฐฆเฐพเฐจเฐฟ เฐ…เฐธเฐฒเฑ เฐฐเฐšเฐฏเฐฟเฐคเฐฒเฑ เฐชเฑเฐฐเฐšเฑเฐฐเฐฟเฐ‚เฐšเฐฟเฐจ เฐซเฐฒเฐฟเฐคเฐพเฐฒเฐจเฑ เฐชเฑเฐจเฐฐเฑเฐคเฑเฐชเฐคเฑเฐคเฐฟ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฑ‡เฐฎเฑ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒเฐจเฑ เฐ…เฐ‚เฐฆเฐฟเฐธเฑเฐคเฐพเฐฎเฑ. - - เฐฎเฑ‹เฐกเฐฒเฑ เฐ‡เฐ‚เฐŸเฐฐเฑเฐจเฐฒเฑโ€Œเฐฒเฑ เฐตเฑ€เฐฒเฑˆเฐจเฐ‚เฐค เฐธเฑเฐฅเฐฟเฐฐเฐ‚เฐ—เฐพ เฐฌเฐนเฐฟเฐฐเฑเฐ—เฐคเฐฎเฐตเฑเฐคเฐพเฐฏเฐฟ. - - เฐถเฑ€เฐ˜เฑเฐฐ เฐชเฑเฐฐเฐฏเฑ‹เฐ—เฐพเฐฒ เฐ•เฑ‹เฐธเฐ‚ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€ เฐจเฑเฐ‚เฐกเฐฟ เฐธเฑเฐตเฐคเฐ‚เฐคเฑเฐฐเฐ‚เฐ—เฐพ เฐฎเฑ‹เฐกเฐฒเฑ เฐซเฑˆเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐตเฐšเฑเฐšเฑ. - -## เฐจเฑ‡เฐจเฑ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐŽเฐ‚เฐฆเฑเฐ•เฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐ•เฑ‚เฐกเฐฆเฑ? - -- เฐˆ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€ เฐจเฑเฐฏเฑ‚เฐฐเฐฒเฑ เฐจเฑ†เฐŸเฑโ€Œเฐฒ เฐ•เฑ‹เฐธเฐ‚ เฐฌเฐฟเฐฒเฑเฐกเฐฟเฐ‚เฐ—เฑ เฐฌเฑเฐฒเฐพเฐ•เฑโ€Œเฐฒ เฐฎเฐพเฐกเฑเฐฏเฑเฐฒเฐฐเฑ เฐŸเฑ‚เฐฒเฑโ€Œเฐฌเฐพเฐ•เฑเฐธเฑ เฐ•เฐพเฐฆเฑ. เฐฎเฑ‹เฐกเฐฒเฑ เฐซเฑˆเฐฒเฑโ€Œเฐฒเฐฒเฑ‹เฐจเฐฟ เฐ•เฑ‹เฐกเฑ เฐ‰เฐฆเฑเฐฆเฑ‡เฐถเฐชเฑ‚เฐฐเฑเฐตเฐ•เฐ‚เฐ—เฐพ เฐ…เฐฆเฐจเฐชเฑ เฐธเฐ‚เฐ—เฑเฐฐเฐนเฐฃเฐฒเฐคเฑ‹ เฐฐเฑ€เฐซเฑเฐฏเฐพเฐ•เฑเฐŸเฐฐเฐฟเฐ‚เฐ—เฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐฆเฑ, เฐคเฐฆเฑเฐตเฐพเฐฐเฐพ เฐชเฐฐเฐฟเฐถเฑ‹เฐงเฐ•เฑเฐฒเฑ เฐ…เฐฆเฐจเฐชเฑ เฐธเฐ‚เฐ—เฑเฐฐเฐนเฐฃเฐฒเฑ/เฐซเฑˆเฐณเฑเฐฒเฐฒเฑ‹เฐ•เฐฟ เฐชเฑเฐฐเฐตเฑ‡เฐถเฐฟเฐ‚เฐšเฐ•เฑเฐ‚เฐกเฐพ เฐชเฑเฐฐเฐคเฐฟ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐชเฑˆ เฐคเฑเฐตเฐฐเฐ—เฐพ เฐฎเฐณเฑเฐฒเฐฟเฐ‚เฐšเฐ—เฐฒเฐฐเฑ. -- เฐถเฐฟเฐ•เฑเฐทเฐฃ API เฐ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฑ‹ เฐชเฐจเฐฟ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐ‰เฐฆเฑเฐฆเฑ‡เฐถเฐฟเฐ‚เฐšเฐฌเฐกเฐฒเฑ‡เฐฆเฑ เฐ•เฐพเฐจเฑ€ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€ เฐ…เฐ‚เฐฆเฐฟเฐ‚เฐšเฐฟเฐจ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐคเฑ‹ เฐชเฐจเฐฟ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐ†เฐชเฑเฐŸเฐฟเฐฎเฑˆเฐœเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐฟเฐ‚เฐฆเฐฟ. เฐธเฐพเฐงเฐพเฐฐเฐฃ เฐฎเฑ†เฐทเฐฟเฐจเฑ เฐฒเฑ†เฐฐเฑเฐจเฐฟเฐ‚เฐ—เฑ เฐฒเฑ‚เฐชเฑโ€Œเฐฒ เฐ•เฑ‹เฐธเฐ‚, เฐฎเฑ€เฐฐเฑ เฐฎเฐฐเฑŠเฐ• เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€เฐจเฐฟ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐพเฐฒเฐฟ (เฐฌเฐนเฑเฐถเฐพ, [Accelerate](https://huggingface.co/docs/accelerate)). -- เฐฎเฑ‡เฐฎเฑ เฐตเฑ€เฐฒเฑˆเฐจเฐจเฑเฐจเฐฟ เฐŽเฐ•เฑเฐ•เฑเฐต เฐตเฐฟเฐจเฐฟเฐฏเฑ‹เฐ— เฐธเฐ‚เฐฆเฐฐเฑเฐญเฐพเฐฒเฐจเฑ เฐชเฑเฐฐเฐฆเฐฐเฑเฐถเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐชเฑเฐฐเฐฏเฐคเฑเฐจเฐฟเฐธเฑเฐคเฑเฐจเฑเฐจเฐชเฑเฐชเฑเฐกเฑ, เฐฎเฐพ [เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒ เฐซเฑ‹เฐฒเฑเฐกเฐฐเฑ](https://github.com/huggingface/transformers/tree/main/examples)เฐฒเฑ‹เฐจเฐฟ เฐธเฑเฐ•เฑเฐฐเฐฟเฐชเฑเฐŸเฑโ€Œเฐฒเฑ เฐ•เฑ‡เฐตเฐฒเฐ‚: เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒเฑ. เฐฎเฑ€ เฐจเฐฟเฐฐเฑเฐฆเฐฟเฐทเฑเฐŸ เฐธเฐฎเฐธเฑเฐฏเฐชเฑˆ เฐ…เฐตเฐฟ เฐชเฐจเฐฟ เฐšเฑ‡เฐฏเฐตเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐตเฐพเฐŸเฐฟเฐจเฐฟ เฐฎเฑ€ เฐ…เฐตเฐธเฐฐเฐพเฐฒเฐ•เฑ เฐ…เฐจเฑเฐ—เฑเฐฃเฐ‚เฐ—เฐพ เฐฎเฐพเฐฐเฑเฐšเฑเฐ•เฑ‹เฐตเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฑ€เฐฐเฑ เฐ•เฑŠเฐจเฑเฐจเฐฟ เฐ•เฑ‹เฐกเฑ เฐฒเฑˆเฐจเฑโ€Œเฐฒเฐจเฑ เฐฎเฐพเฐฐเฑเฐšเฐตเฐฒเฐธเฐฟ เฐ‰เฐ‚เฐŸเฑเฐ‚เฐฆเฐฟ. - -## เฐธเฐ‚เฐธเฑเฐฅเฐพเฐชเฐจ - -### เฐชเฐฟเฐชเฑ เฐคเฑ‹ - -เฐˆ เฐฐเฐฟเฐชเฑ‹เฐœเฐฟเฐŸเฐฐเฑ€ เฐชเฑˆเฐฅเฐพเฐจเฑ 3.8+, เฐซเฑเฐฒเฐพเฐ•เฑเฐธเฑ 0.4.1+, PyTorch 1.10+ เฐฎเฐฐเฐฟเฐฏเฑ TensorFlow 2.6+เฐฒเฑ‹ เฐชเฐฐเฑ€เฐ•เฑเฐทเฐฟเฐ‚เฐšเฐฌเฐกเฐฟเฐ‚เฐฆเฐฟ. - -เฐฎเฑ€เฐฐเฑ [เฐตเฐฐเฑเฐšเฑเฐตเฐฒเฑ เฐตเฐพเฐคเฐพเฐตเฐฐเฐฃเฐ‚](https://docs.python.org/3/library/venv.html)เฐฒเฑ‹ ๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐพเฐฒเฐฟ. เฐฎเฑ€เฐ•เฑ เฐชเฑˆเฐฅเฐพเฐจเฑ เฐตเฐฐเฑเฐšเฑเฐตเฐฒเฑ เฐชเฐฐเฐฟเฐธเฐฐเฐพเฐฒ เฐ—เฑเฐฐเฐฟเฐ‚เฐšเฐฟ เฐคเฑ†เฐฒเฐฟเฐฏเฐ•เฑเฐ‚เฐŸเฑ‡, [เฐฏเฑ‚เฐœเฐฐเฑ เฐ—เฑˆเฐกเฑ](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) เฐšเฑ‚เฐกเฐ‚เฐกเฐฟ. - -เฐฎเฑเฐ‚เฐฆเฑเฐ—เฐพ, เฐฎเฑ€เฐฐเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฌเฑ‹เฐคเฑเฐจเฑเฐจ เฐชเฑˆเฐฅเฐพเฐจเฑ เฐตเฑ†เฐฐเฑเฐทเฐจเฑโ€Œเฐคเฑ‹ เฐตเฐฐเฑเฐšเฑเฐตเฐฒเฑ เฐตเฐพเฐคเฐพเฐตเฐฐเฐฃเฐพเฐจเฑเฐจเฐฟ เฐธเฑƒเฐทเฑเฐŸเฐฟเฐ‚เฐšเฐ‚เฐกเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐฆเฐพเฐจเฐฟเฐจเฐฟ เฐธเฐ•เฑเฐฐเฐฟเฐฏเฐ‚ เฐšเฑ‡เฐฏเฐ‚เฐกเฐฟ. - -เฐ…เฐชเฑเฐชเฑเฐกเฑ, เฐฎเฑ€เฐฐเฑ เฐซเฑเฐฒเฐพเฐ•เฑเฐธเฑ, เฐชเฑˆเฐŸเฐพเฐฐเฑเฐšเฑ เฐฒเฑ‡เฐฆเฐพ เฐŸเฑ†เฐจเฑเฐธเฐฐเฑโ€Œเฐซเฑเฐฒเฑ‹เฐฒเฑ‹ เฐ•เฐจเฑ€เฐธเฐ‚ เฐ’เฐ•เฐฆเฐพเฐจเฐฟเฐจเฐฟ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐพเฐฒเฐฟ. -เฐฆเฐฏเฐšเฑ‡เฐธเฐฟ [TensorFlow เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ‡เฐทเฐจเฑ เฐชเฑ‡เฐœเฑ€](https://www.tensorflow.org/install/), [PyTorch เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ‡เฐทเฐจเฑ เฐชเฑ‡เฐœเฑ€](https://pytorch.org/get-started/locally/#start-locally) เฐฎเฐฐเฐฟเฐฏเฑ/เฐจเฐฟ เฐšเฑ‚เฐกเฐ‚เฐกเฐฟ เฐฒเฑ‡เฐฆเฐพ เฐฎเฑ€ เฐชเฑเฐฒเฐพเฐŸเฑโ€Œเฐซเฐพเฐฐเฐฎเฑ เฐ•เฑ‹เฐธเฐ‚ เฐจเฐฟเฐฐเฑเฐฆเฐฟเฐทเฑเฐŸ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ‡เฐทเฐจเฑ เฐ•เฐฎเฐพเฐ‚เฐกเฑโ€Œเฐ•เฑ เฐธเฐ‚เฐฌเฐ‚เฐงเฐฟเฐ‚เฐšเฐฟ [Flax](https://github.com/google/flax#quick-install) เฐฎเฐฐเฐฟเฐฏเฑ [Jax](https://github.com/google/jax#installation) เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ‡เฐทเฐจเฑ เฐชเฑ‡เฐœเฑ€เฐฒเฑ . - -เฐ† เฐฌเฑเฐฏเฐพเฐ•เฑ†เฐ‚เฐกเฑโ€Œเฐฒเฐฒเฑ‹ เฐ’เฐ•เฐŸเฐฟ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐฟเฐจเฐชเฑเฐชเฑเฐกเฑ, ๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐˆ เฐ•เฑเฐฐเฐฟเฐ‚เฐฆเฐฟ เฐตเฐฟเฐงเฐ‚เฐ—เฐพ เฐชเฐฟเฐชเฑโ€Œเฐจเฐฟ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฟ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐตเฐšเฑเฐšเฑ: - -```bash -pip install transformers -``` - -เฐฎเฑ€เฐฐเฑ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒเฐคเฑ‹ เฐชเฑเฐฒเฑ‡ เฐšเฑ‡เฐฏเฐพเฐฒเฐจเฑเฐ•เฑเฐ‚เฐŸเฑ‡ เฐฒเฑ‡เฐฆเฐพ เฐ•เฑ‹เฐกเฑ เฐฏเฑŠเฐ•เฑเฐ• เฐฌเฑเฐฒเฑ€เฐกเฐฟเฐ‚เฐ—เฑ เฐŽเฐกเฑเฐœเฑ เฐ…เฐตเฐธเฐฐเฐ‚ เฐฎเฐฐเฐฟเฐฏเฑ เฐ•เฑŠเฐคเฑเฐค เฐตเฐฟเฐกเฑเฐฆเฐฒ เฐ•เฑ‹เฐธเฐ‚ เฐตเฑ‡เฐšเฐฟ เฐ‰เฐ‚เฐกเฐฒเฑ‡เฐ•เฐชเฑ‹เฐคเฑ‡, เฐฎเฑ€เฐฐเฑ เฐคเฐชเฑเฐชเฐจเฐฟเฐธเฐฐเฐฟเฐ—เฐพ [เฐฎเฑ‚เฐฒเฐ‚ เฐจเฑเฐ‚เฐกเฐฟ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€เฐจเฐฟ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐพเฐฒเฐฟ](https://huggingface.co/docs/transformers/installation#installing-from-source). - -### เฐ•เฑŠเฐ‚เฐกเฐพ เฐคเฑ‹ - -เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐธเฑ เฐตเฑ†เฐฐเฑเฐทเฐจเฑ v4.0.0 เฐจเฑเฐ‚เฐกเฐฟ, เฐฎเฑ‡เฐฎเฑ เฐ‡เฐชเฑเฐชเฑเฐกเฑ เฐ•เฑŠเฐ‚เฐกเฐพ เฐ›เฐพเฐจเฑ†เฐฒเฑโ€Œเฐจเฐฟ เฐ•เฐฒเฐฟเฐ—เฐฟ เฐ‰เฐจเฑเฐจเฐพเฐฎเฑ: `huggingface`. - -๐Ÿค— เฐ•เฐฟเฐ‚เฐฆเฐฟ เฐตเฐฟเฐงเฐ‚เฐ—เฐพ เฐ•เฑŠเฐ‚เฐกเฐพ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐฟ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒเฐจเฑ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐตเฐšเฑเฐšเฑ: - -```shell script -conda install -c huggingface transformers -``` - -Flax, PyTorch เฐฒเฑ‡เฐฆเฐพ TensorFlow เฐฏเฑŠเฐ•เฑเฐ• เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ‡เฐทเฐจเฑ เฐชเฑ‡เฐœเฑ€เฐฒเฐจเฑ เฐ•เฑŠเฐ‚เฐกเฐพเฐคเฑ‹ เฐŽเฐฒเฐพ เฐ‡เฐจเฑโ€ŒเฐธเฑเฐŸเฐพเฐฒเฑ เฐšเฑ‡เฐฏเฐพเฐฒเฑ‹ เฐšเฑ‚เฐกเฐŸเฐพเฐจเฐฟเฐ•เฐฟ เฐตเฐพเฐŸเฐฟเฐจเฐฟ เฐ…เฐจเฑเฐธเฐฐเฐฟเฐ‚เฐšเฐ‚เฐกเฐฟ. - -> **_เฐ—เฐฎเฐจเฐฟเฐ•:_** Windowsเฐฒเฑ‹, เฐ•เฐพเฐทเฐฟเฐ‚เฐ—เฑ เฐจเฑเฐ‚เฐกเฐฟ เฐชเฑเฐฐเฐฏเฑ‹เฐœเฐจเฐ‚ เฐชเฑŠเฐ‚เฐฆเฑ‡เฐ‚เฐฆเฑเฐ•เฑ เฐฎเฑ€เฐฐเฑ เฐกเฑ†เฐตเฐฒเฐชเฐฐเฑ เฐฎเฑ‹เฐกเฑโ€Œเฐจเฐฟ เฐธเฐ•เฑเฐฐเฐฟเฐฏเฐ‚ เฐšเฑ‡เฐฏเฐฎเฐจเฐฟ เฐชเฑเฐฐเฐพเฐ‚เฐชเฑเฐŸเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐตเฐšเฑเฐšเฑ. เฐ‡เฐฆเฐฟ เฐฎเฑ€เฐ•เฑ เฐŽเฐ‚เฐชเฐฟเฐ• เฐ•เฐพเฐ•เฐชเฑ‹เฐคเฑ‡, เฐฆเฐฏเฐšเฑ‡เฐธเฐฟ [เฐˆ เฐธเฐ‚เฐšเฐฟเฐ•](https://github.com/huggingface/huggingface_hub/issues/1062)เฐฒเฑ‹ เฐฎเฐพเฐ•เฑ เฐคเฑ†เฐฒเฐฟเฐฏเฐœเฑ‡เฐฏเฐ‚เฐกเฐฟ. - -## เฐฎเฑ‹เฐกเฐฒเฑ เฐ†เฐฐเฑเฐ•เฐฟเฐŸเฑ†เฐ•เฑเฐšเฐฐเฑเฐฒเฑ - -**[เฐ…เฐจเฑเฐจเฐฟ เฐฎเฑ‹เฐกเฐฒเฑ เฐšเฑ†เฐ•เฑโ€Œเฐชเฐพเฐฏเฐฟเฐ‚เฐŸเฑโ€Œเฐฒเฑ](https://huggingface.co/models)** ๐Ÿค— เฐ…เฐ‚เฐฆเฐฟเฐ‚เฐšเฐฟเฐจ เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ huggingface.co [model hub](https://huggingface.co/models) เฐจเฑเฐ‚เฐกเฐฟ เฐธเฐœเฐพเฐตเฑเฐ—เฐพ เฐเฐ•เฑ€เฐ•เฑƒเฐคเฐ‚ เฐšเฑ‡เฐฏเฐฌเฐกเฑเฐกเฐพเฐฏเฐฟ [users](https://huggingface.co/users) เฐฎเฐฐเฐฟเฐฏเฑ [organizations](https://huggingface.co/organizations) เฐฆเฑเฐตเฐพเฐฐเฐพ เฐจเฑ‡เฐฐเฑเฐ—เฐพ เฐ…เฐชเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐคเฐพเฐฏเฐฟ. - -เฐชเฑเฐฐเฐธเฑเฐคเฑเฐค เฐคเฐจเฐฟเฐ–เฑ€ เฐ•เฑ‡เฐ‚เฐฆเฑเฐฐเฐพเฐฒ เฐธเฐ‚เฐ–เฑเฐฏ: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ เฐชเฑเฐฐเฐธเฑเฐคเฑเฐคเฐ‚ เฐ•เฐฟเฐ‚เฐฆเฐฟ เฐ†เฐฐเฑเฐ•เฐฟเฐŸเฑ†เฐ•เฑเฐšเฐฐเฑโ€Œเฐฒเฐจเฑ เฐ…เฐ‚เฐฆเฐœเฑ‡เฐธเฑเฐคเฑเฐจเฑเฐจเฐพเฐฏเฐฟ (เฐตเฐพเฐŸเฐฟเฐฒเฑ‹ เฐชเฑเฐฐเฐคเฐฟ เฐ’เฐ•เฑเฐ•เฐŸเฐฟ เฐ‰เฐจเฑเฐจเฐค เฐธเฑเฐฅเฐพเฐฏเฐฟ เฐธเฐพเฐฐเฐพเฐ‚เฐถเฐ‚ เฐ•เฑ‹เฐธเฐ‚ [เฐ‡เฐ•เฑเฐ•เฐก](https://huggingface.co/docs/transformers/model_summary) เฐšเฑ‚เฐกเฐ‚เฐกเฐฟ): - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT): General Visual Representation Learning](https://arxiv.org/abs/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. Released with the paper [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Meta/USC/CMU/SJTU) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the repository [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA) for Approximate Self-Attention](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi and Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng), released on [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. -1. เฐ•เฑŠเฐคเฑเฐค เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐ…เฐ‚เฐฆเฐฟเฐ‚เฐšเฐพเฐฒเฐจเฑเฐ•เฑเฐ‚เฐŸเฑเฐจเฑเฐจเฐพเฐฐเฐพ? เฐ•เฑŠเฐคเฑเฐค เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐจเฑ เฐœเฑ‹เฐกเฐฟเฐ‚เฐšเฑ‡ เฐชเฑเฐฐเฐ•เฑเฐฐเฐฟเฐฏเฐฒเฑ‹ เฐฎเฑ€เฐ•เฑ เฐฎเฐพเฐฐเฑเฐ—เฐจเฐฟเฐฐเฑเฐฆเฑ‡เฐถเฐ‚ เฐšเฑ‡เฐธเฑ‡เฐ‚เฐฆเฑเฐ•เฑ เฐฎเฑ‡เฐฎเฑ **เฐตเฐฟเฐตเฐฐเฐฃเฐพเฐคเฑเฐฎเฐ• เฐ—เฑˆเฐกเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐŸเฑ†เฐ‚เฐชเฑเฐฒเฑ‡เฐŸเฑโ€Œเฐฒเฐจเฑ** เฐœเฑ‹เฐกเฐฟเฐ‚เฐšเฐพเฐฎเฑ. เฐฎเฑ€เฐฐเฑ เฐตเฐพเฐŸเฐฟเฐจเฐฟ เฐฐเฐฟเฐชเฑ‹เฐœเฐฟเฐŸเฐฐเฑ€ เฐฏเฑŠเฐ•เฑเฐ• [`เฐŸเฑ†เฐ‚เฐชเฑเฐฒเฑ‡เฐŸเฑเฐฒเฑ`](./เฐŸเฑ†เฐ‚เฐชเฑเฐฒเฑ‡เฐŸเฑเฐฒเฑ) เฐซเฑ‹เฐฒเฑเฐกเฐฐเฑโ€Œเฐฒเฑ‹ เฐ•เฐจเฑเฐ—เฑŠเฐจเฐตเฐšเฑเฐšเฑ. เฐฎเฑ€ PRเฐจเฐฟ เฐชเฑเฐฐเฐพเฐฐเฐ‚เฐญเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐฎเฑเฐ‚เฐฆเฑ [เฐธเฐนเฐ•เฐพเฐฐ เฐฎเฐพเฐฐเฑเฐ—เฐฆเฐฐเฑเฐถเฐ•เฐพเฐฒเฑ](./CONTRIBUTING.md)เฐจเฐฟ เฐคเฐจเฐฟเฐ–เฑ€ เฐšเฑ‡เฐธเฐฟ, เฐจเฐฟเฐฐเฑเฐตเฐนเฐฃเฐฆเฐพเฐฐเฑเฐฒเฐจเฑ เฐธเฐ‚เฐชเฑเฐฐเฐฆเฐฟเฐ‚เฐšเฐ‚เฐกเฐฟ เฐฒเฑ‡เฐฆเฐพ เฐ…เฐญเฐฟเฐชเฑเฐฐเฐพเฐฏเฐพเฐจเฑเฐจเฐฟ เฐธเฑ‡เฐ•เฐฐเฐฟเฐ‚เฐšเฐกเฐพเฐจเฐฟเฐ•เฐฟ เฐธเฐฎเฐธเฑเฐฏเฐจเฑ เฐคเฑ†เฐฐเฐตเฐ‚เฐกเฐฟ. - -เฐชเฑเฐฐเฐคเฐฟ เฐฎเฑ‹เฐกเฐฒเฑ เฐซเฑเฐฒเฐพเฐ•เฑเฐธเฑ, เฐชเฑˆเฐŸเฐพเฐฐเฑเฐšเฑ เฐฒเฑ‡เฐฆเฐพ เฐŸเฑ†เฐจเฑเฐธเฐฐเฑโ€Œเฐซเฑเฐฒเฑ‹เฐฒเฑ‹ เฐ…เฐฎเฐฒเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐฟเฐ‚เฐฆเฐพ เฐฒเฑ‡เฐฆเฐพ ๐Ÿค— Tokenizers เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€ เฐฆเฑเฐตเฐพเฐฐเฐพ เฐ…เฐจเฑเฐฌเฐ‚เฐงเฐฟเฐ‚เฐšเฐฌเฐกเฐฟเฐจ เฐŸเฑ‹เฐ•เฑ†เฐจเฑˆเฐœเฐฐเฑโ€Œเฐจเฐฟ เฐ•เฐฒเฐฟเฐ—เฐฟ เฐ‰เฐ‚เฐฆเฑ‹ เฐฒเฑ‡เฐฆเฑ‹ เฐคเฐจเฐฟเฐ–เฑ€ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ, [เฐˆ เฐชเฐŸเฑเฐŸเฐฟเฐ•](https://huggingface.co/docs/transformers/index#supported-frameworks). - -เฐˆ เฐ…เฐฎเฐฒเฑเฐฒเฑ เฐ…เฐจเฑ‡เฐ• เฐกเฑ‡เฐŸเฐพเฐธเฑ†เฐŸเฑโ€Œเฐฒเฐฒเฑ‹ เฐชเฐฐเฑ€เฐ•เฑเฐทเฐฟเฐ‚เฐšเฐฌเฐกเฑเฐกเฐพเฐฏเฐฟ (เฐ‰เฐฆเฐพเฐนเฐฐเฐฃ เฐธเฑเฐ•เฑเฐฐเฐฟเฐชเฑเฐŸเฑโ€Œเฐฒเฐจเฑ เฐšเฑ‚เฐกเฐ‚เฐกเฐฟ) เฐฎเฐฐเฐฟเฐฏเฑ เฐ…เฐธเฐฒเฑˆเฐจ เฐ…เฐฎเฐฒเฑเฐฒ เฐชเฐจเฐฟเฐคเฑ€เฐฐเฑเฐคเฑ‹ เฐธเฐฐเฐฟเฐชเฑ‹เฐฒเฐพเฐฒเฐฟ. เฐฎเฑ€เฐฐเฑ [เฐกเฐพเฐ•เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ](https://github.com/huggingface/transformers/tree/main/examples) เฐฏเฑŠเฐ•เฑเฐ• เฐ‰เฐฆเฐพเฐนเฐฐเฐฃเฐฒ เฐตเฐฟเฐญเฐพเฐ—เฐ‚เฐฒเฑ‹ เฐชเฐจเฐฟเฐคเฑ€เฐฐเฑเฐชเฑˆ เฐฎเฐฐเฐฟเฐจเฑเฐจเฐฟ เฐตเฐฟเฐตเฐฐเฐพเฐฒเฐจเฑ เฐ•เฐจเฑเฐ—เฑŠเฐจเฐตเฐšเฑเฐšเฑ. - -## เฐ‡เฐ‚เฐ•เฐพ เฐจเฑ‡เฐฐเฑเฐšเฑเฐ•เฑ‹ - -| เฐตเฐฟเฐญเฐพเฐ—เฐ‚ | เฐตเฐฟเฐตเฐฐเฐฃ | -|-|-| -| [เฐกเฐพเฐ•เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ](https://huggingface.co/docs/transformers/) | เฐชเฑ‚เฐฐเฑเฐคเฐฟ API เฐกเฐพเฐ•เฑเฐฏเฑเฐฎเฑ†เฐ‚เฐŸเฑ‡เฐทเฐจเฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐŸเฑเฐฏเฑเฐŸเฑ‹เฐฐเฐฟเฐฏเฐฒเฑเฐธเฑ | -| [เฐŸเฐพเฐธเฑเฐ•เฑ เฐธเฐพเฐฐเฐพเฐ‚เฐถเฐ‚](https://huggingface.co/docs/transformers/task_summary) | ๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑโ€Œเฐฒ เฐฆเฑเฐตเฐพเฐฐเฐพ เฐธเฐชเฑ‹เฐฐเฑเฐŸเฑ เฐšเฑ‡เฐฏเฐฌเฐกเฐฟเฐจ เฐตเฐฟเฐงเฑเฐฒเฑ | -| [เฐชเฑเฐฐเฑ€เฐชเฑเฐฐเฐพเฐธเฑ†เฐธเฐฟเฐ‚เฐ—เฑ เฐŸเฑเฐฏเฑเฐŸเฑ‹เฐฐเฐฟเฐฏเฐฒเฑ](https://huggingface.co/docs/transformers/preprocessing) | เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒ เฐ•เฑ‹เฐธเฐ‚ เฐกเฑ‡เฐŸเฐพเฐจเฑ เฐธเฐฟเฐฆเฑเฐงเฐ‚ เฐšเฑ‡เฐฏเฐกเฐพเฐจเฐฟเฐ•เฐฟ `Tokenizer` เฐ•เฑเฐฒเฐพเฐธเฑโ€Œเฐจเฐฟ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐ‚ | -| [เฐŸเฑเฐฐเฑˆเฐจเฐฟเฐ‚เฐ—เฑ เฐฎเฐฐเฐฟเฐฏเฑ เฐซเฑˆเฐจเฑ-เฐŸเฑเฐฏเฑ‚เฐจเฐฟเฐ‚เฐ—เฑ](https://huggingface.co/docs/transformers/training) | PyTorch/TensorFlow เฐŸเฑเฐฐเฑˆเฐจเฐฟเฐ‚เฐ—เฑ เฐฒเฑ‚เฐชเฑ เฐฎเฐฐเฐฟเฐฏเฑ `Trainer` APIเฐฒเฑ‹ ๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐฒเฑ เฐ…เฐ‚เฐฆเฐฟเฐ‚เฐšเฐฟเฐจ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ‰เฐชเฐฏเฑ‹เฐ—เฐฟเฐ‚เฐšเฐกเฐ‚ | -| [เฐคเฑเฐตเฐฐเฐฟเฐค เฐชเฐฐเฑเฐฏเฐŸเฐจ: เฐซเฑˆเฐจเฑ-เฐŸเฑเฐฏเฑ‚เฐจเฐฟเฐ‚เฐ—เฑ/เฐฏเฑ‚เฐธเฑ‡เฐœเฑ เฐธเฑเฐ•เฑเฐฐเฐฟเฐชเฑเฐŸเฑโ€Œเฐฒเฑ](https://github.com/huggingface/transformers/tree/main/examples) | เฐตเฐฟเฐธเฑเฐคเฑƒเฐค เฐถเฑเฐฐเฑ‡เฐฃเฐฟ เฐŸเฐพเฐธเฑเฐ•เฑโ€Œเฐฒเฐชเฑˆ เฐซเฑˆเฐจเฑ-เฐŸเฑเฐฏเฑ‚เฐจเฐฟเฐ‚เฐ—เฑ เฐฎเฑ‹เฐกเฐฒเฑเฐธเฑ เฐ•เฑ‹เฐธเฐ‚ เฐ‰เฐฆเฐพเฐนเฐฐเฐฃ เฐธเฑเฐ•เฑเฐฐเฐฟเฐชเฑเฐŸเฑโ€Œเฐฒเฑ | -| [เฐฎเฑ‹เฐกเฐฒเฑ เฐญเฐพเฐ—เฐธเฑเฐตเฐพเฐฎเฑเฐฏเฐ‚ เฐฎเฐฐเฐฟเฐฏเฑ เฐ…เฐชเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐฏเฐกเฐ‚](https://huggingface.co/docs/transformers/model_sharing) | เฐ•เฐฎเฑเฐฏเฑ‚เฐจเฐฟเฐŸเฑ€เฐคเฑ‹ เฐฎเฑ€ เฐซเฑˆเฐจเฑ-เฐŸเฑเฐฏเฑ‚เฐจเฑเฐกเฑ เฐฎเฑ‹เฐกเฐฒเฑโ€Œเฐฒเฐจเฑ เฐ…เฐชเฑโ€Œเฐฒเฑ‹เฐกเฑ เฐšเฑ‡เฐฏเฐ‚เฐกเฐฟ เฐฎเฐฐเฐฟเฐฏเฑ เฐญเฐพเฐ—เฐธเฑเฐตเฐพเฐฎเฑเฐฏเฐ‚ เฐšเฑ‡เฐฏเฐ‚เฐกเฐฟ | - -## เฐ…เฐจเฑเฐฒเฑ‡เฐ–เฐจเฐ‚ - -๐Ÿค— เฐŸเฑเฐฐเฐพเฐจเฑเฐธเฑโ€Œเฐซเฐพเฐฐเฑเฐฎเฐฐเฑเฐธเฑ เฐฒเฑˆเฐฌเฑเฐฐเฐฐเฑ€ เฐ•เฑ‹เฐธเฐ‚ เฐฎเฑ€เฐฐเฑ เฐ‰เฐฆเฐนเฐฐเฐฟเฐ‚เฐšเฐ—เฐฒ [เฐชเฑ‡เฐชเฐฐเฑ](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) เฐ‡เฐชเฑเฐชเฑเฐกเฑ เฐฎเฐพ เฐตเฐฆเฑเฐฆ เฐ‰เฐ‚เฐฆเฐฟ: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_zh-hans.md b/README_zh-hans.md deleted file mode 100644 index 4f3258ecde1860..00000000000000 --- a/README_zh-hans.md +++ /dev/null @@ -1,518 +0,0 @@ - - - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

ไธบ Jaxใ€PyTorch ๅ’Œ TensorFlow ๆ‰“้€ ็š„ๅ…ˆ่ฟ›็š„่‡ช็„ถ่ฏญ่จ€ๅค„็†

-

- -

- -

- -๐Ÿค— Transformers ๆไพ›ไบ†ๆ•ฐไปฅๅƒ่ฎก็š„้ข„่ฎญ็ปƒๆจกๅž‹๏ผŒๆ”ฏๆŒ 100 ๅคš็ง่ฏญ่จ€็š„ๆ–‡ๆœฌๅˆ†็ฑปใ€ไฟกๆฏๆŠฝๅ–ใ€้—ฎ็ญ”ใ€ๆ‘˜่ฆใ€็ฟป่ฏ‘ใ€ๆ–‡ๆœฌ็”Ÿๆˆใ€‚ๅฎƒ็š„ๅฎ—ๆ—จๆ˜ฏ่ฎฉๆœ€ๅ…ˆ่ฟ›็š„ NLP ๆŠ€ๆœฏไบบไบบๆ˜“็”จใ€‚ - -๐Ÿค— Transformers ๆไพ›ไบ†ไพฟไบŽๅฟซ้€Ÿไธ‹่ฝฝๅ’Œไฝฟ็”จ็š„API๏ผŒ่ฎฉไฝ ๅฏไปฅๆŠŠ้ข„่ฎญ็ปƒๆจกๅž‹็”จๅœจ็ป™ๅฎšๆ–‡ๆœฌใ€ๅœจไฝ ็š„ๆ•ฐๆฎ้›†ไธŠๅพฎ่ฐƒ็„ถๅŽ้€š่ฟ‡ [model hub](https://huggingface.co/models) ไธŽ็คพๅŒบๅ…ฑไบซใ€‚ๅŒๆ—ถ๏ผŒๆฏไธชๅฎšไน‰็š„ Python ๆจกๅ—ๅ‡ๅฎŒๅ…จ็‹ฌ็ซ‹๏ผŒๆ–นไพฟไฟฎๆ”นๅ’Œๅฟซ้€Ÿ็ ”็ฉถๅฎž้ชŒใ€‚ - -๐Ÿค— Transformers ๆ”ฏๆŒไธ‰ไธชๆœ€็ƒญ้—จ็š„ๆทฑๅบฆๅญฆไน ๅบ“๏ผš [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) ไปฅๅŠ [TensorFlow](https://www.tensorflow.org/) โ€” ๅนถไธŽไน‹ๆ— ็ผๆ•ดๅˆใ€‚ไฝ ๅฏไปฅ็›ดๆŽฅไฝฟ็”จไธ€ไธชๆก†ๆžถ่ฎญ็ปƒไฝ ็š„ๆจกๅž‹็„ถๅŽ็”จๅฆไธ€ไธชๅŠ ่ฝฝๅ’ŒๆŽจ็†ใ€‚ - -## ๅœจ็บฟๆผ”็คบ - -ไฝ ๅฏไปฅ็›ดๆŽฅๅœจๆจกๅž‹้กต้ขไธŠๆต‹่ฏ•ๅคงๅคšๆ•ฐ [model hub](https://huggingface.co/models) ไธŠ็š„ๆจกๅž‹ใ€‚ ๆˆ‘ไปฌไนŸๆไพ›ไบ† [็งๆœ‰ๆจกๅž‹ๆ‰˜็ฎกใ€ๆจกๅž‹็‰ˆๆœฌ็ฎก็†ไปฅๅŠๆŽจ็†API](https://huggingface.co/pricing)ใ€‚ - -่ฟ™้‡Œๆ˜ฏไธ€ไบ›ไพ‹ๅญ๏ผš -- [็”จ BERT ๅšๆŽฉ็ ๅกซ่ฏ](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [็”จ Electra ๅšๅ‘ฝๅๅฎžไฝ“่ฏ†ๅˆซ](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [็”จ GPT-2 ๅšๆ–‡ๆœฌ็”Ÿๆˆ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [็”จ RoBERTa ๅš่‡ช็„ถ่ฏญ่จ€ๆŽจ็†](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [็”จ BART ๅšๆ–‡ๆœฌๆ‘˜่ฆ](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [็”จ DistilBERT ๅš้—ฎ็ญ”](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [็”จ T5 ๅš็ฟป่ฏ‘](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -**[Write With Transformer](https://transformer.huggingface.co)**๏ผŒ็”ฑๆŠฑๆŠฑ่„ธๅ›ข้˜Ÿๆ‰“้€ ๏ผŒๆ˜ฏไธ€ไธชๆ–‡ๆœฌ็”Ÿๆˆ็š„ๅฎ˜ๆ–น demoใ€‚ - -## ๅฆ‚ๆžœไฝ ๅœจๅฏปๆ‰พ็”ฑๆŠฑๆŠฑ่„ธๅ›ข้˜Ÿๆไพ›็š„ๅฎšๅˆถๅŒ–ๆ”ฏๆŒๆœๅŠก - - - HuggingFace Expert Acceleration Program -
- -## ๅฟซ้€ŸไธŠๆ‰‹ - -ๆˆ‘ไปฌไธบๅฟซ้€Ÿไฝฟ็”จๆจกๅž‹ๆไพ›ไบ† `pipeline` ๏ผˆๆตๆฐด็บฟ๏ผ‰APIใ€‚ๆตๆฐด็บฟ่šๅˆไบ†้ข„่ฎญ็ปƒๆจกๅž‹ๅ’Œๅฏนๅบ”็š„ๆ–‡ๆœฌ้ข„ๅค„็†ใ€‚ไธ‹้ขๆ˜ฏไธ€ไธชๅฟซ้€Ÿไฝฟ็”จๆตๆฐด็บฟๅŽปๅˆคๆ–ญๆญฃ่ดŸ้ขๆƒ…็ปช็š„ไพ‹ๅญ๏ผš - -```python ->>> from transformers import pipeline - -# ไฝฟ็”จๆƒ…็ปชๅˆ†ๆžๆตๆฐด็บฟ ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -็ฌฌไบŒ่กŒไปฃ็ ไธ‹่ฝฝๅนถ็ผ“ๅญ˜ไบ†ๆตๆฐด็บฟไฝฟ็”จ็š„้ข„่ฎญ็ปƒๆจกๅž‹๏ผŒ่€Œ็ฌฌไธ‰่กŒไปฃ็ ๅˆ™ๅœจ็ป™ๅฎš็š„ๆ–‡ๆœฌไธŠ่ฟ›่กŒไบ†่ฏ„ไผฐใ€‚่ฟ™้‡Œ็š„็ญ”ๆกˆโ€œๆญฃ้ขโ€ (positive) ๅ…ทๆœ‰ 99 ็š„็ฝฎไฟกๅบฆใ€‚ - -่ฎธๅคš็š„ NLP ไปปๅŠก้ƒฝๆœ‰ๅผ€็ฎฑๅณ็”จ็š„้ข„่ฎญ็ปƒๆตๆฐด็บฟใ€‚ๆฏ”ๅฆ‚่ฏด๏ผŒๆˆ‘ไปฌๅฏไปฅ่ฝปๆพ็š„ไปŽ็ป™ๅฎšๆ–‡ๆœฌไธญๆŠฝๅ–้—ฎ้ข˜็ญ”ๆกˆ๏ผš - -``` python ->>> from transformers import pipeline - -# ไฝฟ็”จ้—ฎ็ญ”ๆตๆฐด็บฟ ->>> question_answerer = pipeline('question-answering') ->>> question_answerer({ -... 'question': 'What is the name of the repository ?', -... 'context': 'Pipeline has been included in the huggingface/transformers repository' -... }) -{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'} - -``` - -้™คไบ†็ป™ๅ‡บ็ญ”ๆกˆ๏ผŒ้ข„่ฎญ็ปƒๆจกๅž‹่ฟ˜็ป™ๅ‡บไบ†ๅฏนๅบ”็š„็ฝฎไฟกๅบฆๅˆ†ๆ•ฐใ€็ญ”ๆกˆๅœจ่ฏ็ฌฆๅŒ– (tokenized) ๅŽ็š„ๆ–‡ๆœฌไธญๅผ€ๅง‹ๅ’Œ็ป“ๆŸ็š„ไฝ็ฝฎใ€‚ไฝ ๅฏไปฅไปŽ[่ฟ™ไธชๆ•™็จ‹](https://huggingface.co/docs/transformers/task_summary)ไบ†่งฃๆ›ดๅคšๆตๆฐด็บฟAPIๆ”ฏๆŒ็š„ไปปๅŠกใ€‚ - -่ฆๅœจไฝ ็š„ไปปๅŠกไธŠไธ‹่ฝฝๅ’Œไฝฟ็”จไปปๆ„้ข„่ฎญ็ปƒๆจกๅž‹ไนŸๅพˆ็ฎ€ๅ•๏ผŒๅช้œ€ไธ‰่กŒไปฃ็ ใ€‚่ฟ™้‡Œๆ˜ฏ PyTorch ็‰ˆ็š„็คบไพ‹๏ผš -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` -่ฟ™้‡Œๆ˜ฏ็ญ‰ๆ•ˆ็š„ TensorFlow ไปฃ็ ๏ผš -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -่ฏ็ฌฆๅŒ–ๅ™จ (tokenizer) ไธบๆ‰€ๆœ‰็š„้ข„่ฎญ็ปƒๆจกๅž‹ๆไพ›ไบ†้ข„ๅค„็†๏ผŒๅนถๅฏไปฅ็›ดๆŽฅๅฏนๅ•ไธชๅญ—็ฌฆไธฒ่ฟ›่กŒ่ฐƒ็”จ๏ผˆๆฏ”ๅฆ‚ไธŠ้ข็š„ไพ‹ๅญ๏ผ‰ๆˆ–ๅฏนๅˆ—่กจ (list) ่ฐƒ็”จใ€‚ๅฎƒไผš่พ“ๅ‡บไธ€ไธชไฝ ๅฏไปฅๅœจไธ‹ๆธธไปฃ็ ้‡Œไฝฟ็”จๆˆ–็›ดๆŽฅ้€š่ฟ‡ `**` ่งฃๅŒ…่กจ่พพๅผไผ ็ป™ๆจกๅž‹็š„่ฏๅ…ธ (dict)ใ€‚ - -ๆจกๅž‹ๆœฌ่บซๆ˜ฏไธ€ไธชๅธธ่ง„็š„ [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) ๆˆ– [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)๏ผˆๅ–ๅ†ณไบŽไฝ ็š„ๅŽ็ซฏ๏ผ‰๏ผŒๅฏไปฅๅธธ่ง„ๆ–นๅผไฝฟ็”จใ€‚ [่ฟ™ไธชๆ•™็จ‹](https://huggingface.co/transformers/training.html)่งฃ้‡Šไบ†ๅฆ‚ไฝ•ๅฐ†่ฟ™ๆ ท็š„ๆจกๅž‹ๆ•ดๅˆๅˆฐ็ปๅ…ธ็š„ PyTorch ๆˆ– TensorFlow ่ฎญ็ปƒๅพช็Žฏไธญ๏ผŒๆˆ–ๆ˜ฏๅฆ‚ไฝ•ไฝฟ็”จๆˆ‘ไปฌ็š„ `Trainer` ่ฎญ็ปƒๅ™จ๏ผ‰API ๆฅๅœจไธ€ไธชๆ–ฐ็š„ๆ•ฐๆฎ้›†ไธŠๅฟซ้€Ÿๅพฎ่ฐƒใ€‚ - -## ไธบไป€ไนˆ่ฆ็”จ transformers๏ผŸ - -1. ไพฟไบŽไฝฟ็”จ็š„ๅ…ˆ่ฟ›ๆจกๅž‹๏ผš - - NLU ๅ’Œ NLG ไธŠ่กจ็Žฐไผ˜่ถŠ - - ๅฏนๆ•™ๅญฆๅ’Œๅฎž่ทตๅ‹ๅฅฝไธ”ไฝŽ้—จๆง› - - ้ซ˜็บงๆŠฝ่ฑก๏ผŒๅช้œ€ไบ†่งฃไธ‰ไธช็ฑป - - ๅฏนๆ‰€ๆœ‰ๆจกๅž‹็ปŸไธ€็š„API - -1. ๆ›ดไฝŽ่ฎก็ฎ—ๅผ€้”€๏ผŒๆ›ดๅฐ‘็š„็ขณๆŽ’ๆ”พ๏ผš - - ็ ”็ฉถไบบๅ‘˜ๅฏไปฅๅˆ†ไบซๅทฒ่ฎญ็ปƒ็š„ๆจกๅž‹่€Œ้žๆฏๆฌกไปŽๅคดๅผ€ๅง‹่ฎญ็ปƒ - - ๅทฅ็จ‹ๅธˆๅฏไปฅๅ‡ๅฐ‘่ฎก็ฎ—็”จๆ—ถๅ’Œ็”Ÿไบง็Žฏๅขƒๅผ€้”€ - - ๆ•ฐๅ็งๆจกๅž‹ๆžถๆž„ใ€ไธคๅƒๅคšไธช้ข„่ฎญ็ปƒๆจกๅž‹ใ€100ๅคš็ง่ฏญ่จ€ๆ”ฏๆŒ - -1. ๅฏนไบŽๆจกๅž‹็”Ÿๅ‘ฝๅ‘จๆœŸ็š„ๆฏไธ€ไธช้ƒจๅˆ†้ƒฝ้ข้ขไฟฑๅˆฐ๏ผš - - ่ฎญ็ปƒๅ…ˆ่ฟ›็š„ๆจกๅž‹๏ผŒๅช้œ€ 3 ่กŒไปฃ็  - - ๆจกๅž‹ๅœจไธๅŒๆทฑๅบฆๅญฆไน ๆก†ๆžถ้—ดไปปๆ„่ฝฌ็งป๏ผŒ้šไฝ ๅฟƒๆ„ - - ไธบ่ฎญ็ปƒใ€่ฏ„ไผฐๅ’Œ็”Ÿไบง้€‰ๆ‹ฉๆœ€้€‚ๅˆ็š„ๆก†ๆžถ๏ผŒ่ก”ๆŽฅๆ— ็ผ - -1. ไธบไฝ ็š„้œ€ๆฑ‚่ฝปๆพๅฎšๅˆถไธ“ๅฑžๆจกๅž‹ๅ’Œ็”จไพ‹๏ผš - - ๆˆ‘ไปฌไธบๆฏ็งๆจกๅž‹ๆžถๆž„ๆไพ›ไบ†ๅคšไธช็”จไพ‹ๆฅๅค็ŽฐๅŽŸ่ฎบๆ–‡็ป“ๆžœ - - ๆจกๅž‹ๅ†…้ƒจ็ป“ๆž„ไฟๆŒ้€ๆ˜Žไธ€่‡ด - - ๆจกๅž‹ๆ–‡ไปถๅฏๅ•็‹ฌไฝฟ็”จ๏ผŒๆ–นไพฟ้ญ”ๆ”นๅ’Œๅฟซ้€Ÿๅฎž้ชŒ - -## ไป€ไนˆๆƒ…ๅ†ตไธ‹ๆˆ‘ไธ่ฏฅ็”จ transformers๏ผŸ - -- ๆœฌๅบ“ๅนถไธๆ˜ฏๆจกๅ—ๅŒ–็š„็ฅž็ป็ฝ‘็ปœๅทฅๅ…ท็ฎฑใ€‚ๆจกๅž‹ๆ–‡ไปถไธญ็š„ไปฃ็ ็‰นๆ„ๅ‘ˆ่‹ฅ็’ž็Ž‰๏ผŒๆœช็ป้ขๅค–ๆŠฝ่ฑกๅฐ่ฃ…๏ผŒไปฅไพฟ็ ”็ฉถไบบๅ‘˜ๅฟซ้€Ÿ่ฟญไปฃ้ญ”ๆ”น่€Œไธ่‡ดๆบบไบŽๆŠฝ่ฑกๅ’Œๆ–‡ไปถ่ทณ่ฝฌไน‹ไธญใ€‚ -- `Trainer` API ๅนถ้žๅ…ผๅฎนไปปไฝ•ๆจกๅž‹๏ผŒๅชไธบๆœฌๅบ“ไน‹ๆจกๅž‹ไผ˜ๅŒ–ใ€‚่‹ฅๆ˜ฏๅœจๅฏปๆ‰พ้€‚็”จไบŽ้€š็”จๆœบๅ™จๅญฆไน ็š„่ฎญ็ปƒๅพช็Žฏๅฎž็Žฐ๏ผŒ่ฏทๅฆ่ง…ไป–ๅบ“ใ€‚ -- ๅฐฝ็ฎกๆˆ‘ไปฌๅทฒๅฐฝๅŠ›่€Œไธบ๏ผŒ[examples ็›ฎๅฝ•](https://github.com/huggingface/transformers/tree/main/examples)ไธญ็š„่„šๆœฌไนŸไป…ไธบ็”จไพ‹่€Œๅทฒใ€‚ๅฏนไบŽไฝ ็š„็‰นๅฎš้—ฎ้ข˜๏ผŒๅฎƒไปฌๅนถไธไธ€ๅฎšๅผ€็ฎฑๅณ็”จ๏ผŒๅฏ่ƒฝ้œ€่ฆๆ”นๅ‡ ่กŒไปฃ็ ไปฅ้€‚ไน‹ใ€‚ - -## ๅฎ‰่ฃ… - -### ไฝฟ็”จ pip - -่ฟ™ไธชไป“ๅบ“ๅทฒๅœจ Python 3.8+ใ€Flax 0.4.1+ใ€PyTorch 1.10+ ๅ’Œ TensorFlow 2.6+ ไธ‹็ป่ฟ‡ๆต‹่ฏ•ใ€‚ - -ไฝ ๅฏไปฅๅœจ[่™šๆ‹Ÿ็Žฏๅขƒ](https://docs.python.org/3/library/venv.html)ไธญๅฎ‰่ฃ… ๐Ÿค— Transformersใ€‚ๅฆ‚ๆžœไฝ ่ฟ˜ไธ็†Ÿๆ‚‰ Python ็š„่™šๆ‹Ÿ็Žฏๅขƒ๏ผŒ่ฏท้˜…ๆญค[็”จๆˆท่ฏดๆ˜Ž](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)ใ€‚ - -้ฆ–ๅ…ˆ๏ผŒ็”จไฝ ๆ‰“็ฎ—ไฝฟ็”จ็š„็‰ˆๆœฌ็š„ Python ๅˆ›ๅปบไธ€ไธช่™šๆ‹Ÿ็Žฏๅขƒๅนถๆฟ€ๆดปใ€‚ - -็„ถๅŽ๏ผŒไฝ ้œ€่ฆๅฎ‰่ฃ… Flaxใ€PyTorch ๆˆ– TensorFlow ๅ…ถไธญไน‹ไธ€ใ€‚ๅ…ณไบŽๅœจไฝ ไฝฟ็”จ็š„ๅนณๅฐไธŠๅฎ‰่ฃ…่ฟ™ไบ›ๆก†ๆžถ๏ผŒ่ฏทๅ‚้˜… [TensorFlow ๅฎ‰่ฃ…้กต](https://www.tensorflow.org/install/), [PyTorch ๅฎ‰่ฃ…้กต](https://pytorch.org/get-started/locally/#start-locally) ๆˆ– [Flax ๅฎ‰่ฃ…้กต](https://github.com/google/flax#quick-install)ใ€‚ - -ๅฝ“่ฟ™ไบ›ๅŽ็ซฏไน‹ไธ€ๅฎ‰่ฃ…ๆˆๅŠŸๅŽ๏ผŒ ๐Ÿค— Transformers ๅฏไพๆญคๅฎ‰่ฃ…๏ผš - -```bash -pip install transformers -``` - -ๅฆ‚ๆžœไฝ ๆƒณ่ฆ่ฏ•่ฏ•็”จไพ‹ๆˆ–่€…ๆƒณๅœจๆญฃๅผๅ‘ๅธƒๅ‰ไฝฟ็”จๆœ€ๆ–ฐ็š„ๅผ€ๅ‘ไธญไปฃ็ ๏ผŒไฝ ๅพ—[ไปŽๆบไปฃ็ ๅฎ‰่ฃ…](https://huggingface.co/docs/transformers/installation#installing-from-source)ใ€‚ - -### ไฝฟ็”จ conda - -่‡ช Transformers 4.0.0 ็‰ˆๅง‹๏ผŒๆˆ‘ไปฌๆœ‰ไบ†ไธ€ไธช conda ้ข‘้“๏ผš `huggingface`ใ€‚ - -๐Ÿค— Transformers ๅฏไปฅ้€š่ฟ‡ conda ไพๆญคๅฎ‰่ฃ…๏ผš - -```shell script -conda install -c huggingface transformers -``` - -่ฆ้€š่ฟ‡ conda ๅฎ‰่ฃ… Flaxใ€PyTorch ๆˆ– TensorFlow ๅ…ถไธญไน‹ไธ€๏ผŒ่ฏทๅ‚้˜…ๅฎƒไปฌๅ„่‡ชๅฎ‰่ฃ…้กต็š„่ฏดๆ˜Žใ€‚ - -## ๆจกๅž‹ๆžถๆž„ - -๐Ÿค— Transformers ๆ”ฏๆŒ็š„[**ๆ‰€ๆœ‰็š„ๆจกๅž‹ๆฃ€ๆŸฅ็‚น**](https://huggingface.co/models)็”ฑ[็”จๆˆท](https://huggingface.co/users)ๅ’Œ[็ป„็ป‡](https://huggingface.co/organizations)ไธŠไผ ๏ผŒๅ‡ไธŽ huggingface.co [model hub](https://huggingface.co) ๆ— ็ผๆ•ดๅˆใ€‚ - -็›ฎๅ‰็š„ๆฃ€ๆŸฅ็‚นๆ•ฐ้‡๏ผš ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— Transformers ็›ฎๅ‰ๆ”ฏๆŒๅฆ‚ไธ‹็š„ๆžถๆž„๏ผˆๆจกๅž‹ๆฆ‚่ฟฐ่ฏท้˜…[่ฟ™้‡Œ](https://huggingface.co/docs/transformers/model_summary)๏ผ‰๏ผš - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (ๆฅ่‡ช Google Research and the Toyota Technological Institute at Chicago) ไผด้š่ฎบๆ–‡ [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), ็”ฑ Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut ๅ‘ๅธƒใ€‚ -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) ็”ฑ Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig ๅ‘ๅธƒใ€‚ -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (ๆฅ่‡ช BAAI) ไผด้š่ฎบๆ–‡ [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) ็”ฑ Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell ๅ‘ๅธƒใ€‚ -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (ๆฅ่‡ช MIT) ไผด้š่ฎบๆ–‡ [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) ็”ฑ Yuan Gong, Yu-An Chung, James Glass ๅ‘ๅธƒใ€‚ -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) ็”ฑ Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer ๅ‘ๅธƒใ€‚ -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (ๆฅ่‡ช ร‰cole polytechnique) ไผด้š่ฎบๆ–‡ [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) ็”ฑ Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis ๅ‘ๅธƒใ€‚ -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (ๆฅ่‡ช VinAI Research) ไผด้š่ฎบๆ–‡ [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) ็”ฑ Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen ๅ‘ๅธƒใ€‚ -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) ็”ฑ Hangbo Bao, Li Dong, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (ๆฅ่‡ช Google) ไผด้š่ฎบๆ–‡ [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) ็”ฑ Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova ๅ‘ๅธƒใ€‚ -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (ๆฅ่‡ช Google) ไผด้š่ฎบๆ–‡ [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) ็”ฑ Sascha Rothe, Shashi Narayan, Aliaksei Severyn ๅ‘ๅธƒใ€‚ -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (ๆฅ่‡ช VinAI Research) ไผด้š่ฎบๆ–‡ [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) ็”ฑ Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen ๅ‘ๅธƒใ€‚ -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) ็”ฑ Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ๅ‘ๅธƒใ€‚ -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) ็”ฑ Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed ๅ‘ๅธƒใ€‚ -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (ๆฅ่‡ช Microsoft Research AI4Science) ไผด้š่ฎบๆ–‡ [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) ็”ฑ Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu ๅ‘ๅธƒใ€‚ -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [Big Transfer (BiT) ็”ฑ Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby ๅ‘ๅธƒใ€‚ -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) ็”ฑ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ๅ‘ๅธƒใ€‚ -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) ็”ฑ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston ๅ‘ๅธƒใ€‚ -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (ๆฅ่‡ช Salesforce) ไผด้š่ฎบๆ–‡ [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) ็”ฑ Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi ๅ‘ๅธƒใ€‚ -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (ๆฅ่‡ช Salesforce) ไผด้š่ฎบๆ–‡ [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) ็”ฑ Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi ๅ‘ๅธƒใ€‚ -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (ๆฅ่‡ช Alexa) ไผด้š่ฎบๆ–‡ [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) ็”ฑ Adrian de Wynter and Daniel J. Perry ๅ‘ๅธƒใ€‚ -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (ๆฅ่‡ช NAVER CLOVA) ไผด้š่ฎบๆ–‡ [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) ็”ฑ Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park ๅ‘ๅธƒใ€‚ -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) ็”ฑ Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel ๅ‘ๅธƒใ€‚ -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (ๆฅ่‡ช Inria/Facebook/Sorbonne) ไผด้š่ฎบๆ–‡ [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) ็”ฑ Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot ๅ‘ๅธƒใ€‚ -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) ็”ฑ Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting ๅ‘ๅธƒใ€‚ -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (ๆฅ่‡ช OFA-Sys) ไผด้š่ฎบๆ–‡ [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) ็”ฑ An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou ๅ‘ๅธƒใ€‚ -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (ๆฅ่‡ช LAION-AI) ไผด้š่ฎบๆ–‡ [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) ็”ฑ Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov ๅ‘ๅธƒใ€‚ -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (ๆฅ่‡ช OpenAI) ไผด้š่ฎบๆ–‡ [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) ็”ฑ Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever ๅ‘ๅธƒใ€‚ -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (ๆฅ่‡ช University of Gรถttingen) ไผด้š่ฎบๆ–‡ [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) ็”ฑ Timo Lรผddecke and Alexander Ecker ๅ‘ๅธƒใ€‚ -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (ๆฅ่‡ช Salesforce) ไผด้š่ฎบๆ–‡ [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) ็”ฑ Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong ๅ‘ๅธƒใ€‚ -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (ๆฅ่‡ช MetaAI) ไผด้š่ฎบๆ–‡ [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) ็”ฑ Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve ๅ‘ๅธƒใ€‚ -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) ็”ฑ Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang ๅ‘ๅธƒใ€‚ -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (ๆฅ่‡ช YituTech) ไผด้š่ฎบๆ–‡ [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) ็”ฑ Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan ๅ‘ๅธƒใ€‚ -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) ็”ฑ Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie ๅ‘ๅธƒใ€‚ -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (ๆฅ่‡ช Tsinghua University) ไผด้š่ฎบๆ–‡ [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) ็”ฑ Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun ๅ‘ๅธƒใ€‚ -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (ๆฅ่‡ช Salesforce) ไผด้š่ฎบๆ–‡ [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) ็”ฑ Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher ๅ‘ๅธƒใ€‚ -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) ็”ฑ Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang ๅ‘ๅธƒใ€‚ -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) ็”ฑ Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) ็”ฑ Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ๅ‘ๅธƒใ€‚ -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) ็”ฑ Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen ๅ‘ๅธƒใ€‚ -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (ๆฅ่‡ช Berkeley/Facebook/Google) ไผด้š่ฎบๆ–‡ [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) ็”ฑ Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch ๅ‘ๅธƒใ€‚ -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (ๆฅ่‡ช SenseTime Research) ไผด้š่ฎบๆ–‡ [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) ็”ฑ Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai ๅ‘ๅธƒใ€‚ -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) ็”ฑ Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou ๅ‘ๅธƒใ€‚ -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) ็”ฑ Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun ๅ‘ๅธƒใ€‚ -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (ๆฅ่‡ช The University of Texas at Austin) ไผด้š่ฎบๆ–‡ [NMS Strikes Back](https://arxiv.org/abs/2212.06137) ็”ฑ Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl ๅ‘ๅธƒใ€‚ -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) ็”ฑ Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko ๅ‘ๅธƒใ€‚ -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) ็”ฑ Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan ๅ‘ๅธƒใ€‚ -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (ๆฅ่‡ช SHI Labs) ไผด้š่ฎบๆ–‡ [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) ็”ฑ Ali Hassani and Humphrey Shi ๅ‘ๅธƒใ€‚ -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) ็”ฑ Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski ๅ‘ๅธƒใ€‚ -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (ๆฅ่‡ช HuggingFace), ไผด้š่ฎบๆ–‡ [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) ็”ฑ Victor Sanh, Lysandre Debut and Thomas Wolf ๅ‘ๅธƒใ€‚ ๅŒๆ ท็š„ๆ–นๆณ•ไนŸๅบ”็”จไบŽๅŽ‹็ผฉ GPT-2 ๅˆฐ [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa ๅˆฐ [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT ๅˆฐ [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) ๅ’Œๅพท่ฏญ็‰ˆ DistilBERTใ€‚ -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) ็”ฑ Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (ๆฅ่‡ช NAVER) ไผด้š่ฎบๆ–‡ [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) ็”ฑ Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park ๅ‘ๅธƒใ€‚ -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) ็”ฑ Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih ๅ‘ๅธƒใ€‚ -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (ๆฅ่‡ช Intel Labs) ไผด้š่ฎบๆ–‡ [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) ็”ฑ Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun ๅ‘ๅธƒใ€‚ -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (ๆฅ่‡ช Snap Research) ไผด้š่ฎบๆ–‡ [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) ็”ฑ Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren ๅ‘ๅธƒใ€‚ -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (ๆฅ่‡ช Google Research/Stanford University) ไผด้š่ฎบๆ–‡ [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) ็”ฑ Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning ๅ‘ๅธƒใ€‚ -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) ็”ฑ Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi ๅ‘ๅธƒใ€‚ -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) ็”ฑ Sascha Rothe, Shashi Narayan, Aliaksei Severyn ๅ‘ๅธƒใ€‚ -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (ๆฅ่‡ช Baidu) ไผด้š่ฎบๆ–‡ [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu ๅ‘ๅธƒใ€‚ -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (ๆฅ่‡ช Baidu) ไผด้š่ฎบๆ–‡ [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) ็”ฑ Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang ๅ‘ๅธƒใ€‚ -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (ๆฅ่‡ช CNRS) ไผด้š่ฎบๆ–‡ [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) ็”ฑ Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab ๅ‘ๅธƒใ€‚ -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) ็”ฑ Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela ๅ‘ๅธƒใ€‚ -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) ็”ฑ James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon ๅ‘ๅธƒใ€‚ -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) ็”ฑ Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao ๅ‘ๅธƒใ€‚ -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (ๆฅ่‡ช CMU/Google Brain) ไผด้š่ฎบๆ–‡ [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) ็”ฑ Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le ๅ‘ๅธƒใ€‚ -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (ๆฅ่‡ช ADEPT) ไผด้š่ฎบๆ–‡ [blog post](https://www.adept.ai/blog/fuyu-8b ็”ฑ Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar ๅ‘ๅธƒใ€‚) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) ็”ฑ Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang ๅ‘ๅธƒใ€‚ -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (ๆฅ่‡ช KAIST) ไผด้š่ฎบๆ–‡ [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) ็”ฑ Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim ๅ‘ๅธƒใ€‚ -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (ๆฅ่‡ช OpenAI) ไผด้š่ฎบๆ–‡ [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) ็”ฑ Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever ๅ‘ๅธƒใ€‚ -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (ๆฅ่‡ช EleutherAI) ้šไป“ๅบ“ [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) ๅ‘ๅธƒใ€‚ไฝœ่€…ไธบ Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy ๅ‘ๅธƒใ€‚ -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (ๆฅ่‡ช ABEJA) ็”ฑ Shinya Otani, Takayoshi Makabe, Anuj Arora, Kyo Hattoriใ€‚ -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (ๆฅ่‡ช OpenAI) ไผด้š่ฎบๆ–‡ [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) ็”ฑ Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** ๅ‘ๅธƒใ€‚ -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (ๆฅ่‡ช EleutherAI) ไผด้š่ฎบๆ–‡ [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) ็”ฑ Ben Wang and Aran Komatsuzaki ๅ‘ๅธƒใ€‚ -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (ๆฅ่‡ช BigCode) ไผด้š่ฎบๆ–‡ [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) ็”ฑ Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra ๅ‘ๅธƒใ€‚ -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by ๅ‚ๆœฌไฟŠไน‹(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (ๆฅ่‡ช UCSD, NVIDIA) ไผด้š่ฎบๆ–‡ [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) ็”ฑ Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang ๅ‘ๅธƒใ€‚ -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (ๆฅ่‡ช Allegro.pl, AGH University of Science and Technology) ไผด้š่ฎบๆ–‡ [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) ็”ฑ Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik ๅ‘ๅธƒใ€‚ -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) ็”ฑ Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed ๅ‘ๅธƒใ€‚ -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (ๆฅ่‡ช Berkeley) ไผด้š่ฎบๆ–‡ [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) ็”ฑ Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer ๅ‘ๅธƒใ€‚ -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (ๆฅ่‡ช OpenAI) ไผด้š่ฎบๆ–‡ [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) ็”ฑ Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever ๅ‘ๅธƒใ€‚ -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (ๆฅ่‡ช Salesforce) ไผด้š่ฎบๆ–‡ [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) ็”ฑ Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi ๅ‘ๅธƒใ€‚ -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) ็”ฑ Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ๅ‘ๅธƒใ€‚ -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) ็”ฑ Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou ๅ‘ๅธƒใ€‚ -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) ็”ฑ Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) ็”ฑ Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (ๆฅ่‡ช AllenAI) ไผด้š่ฎบๆ–‡ [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) ็”ฑ Iz Beltagy, Matthew E. Peters, Arman Cohan ๅ‘ๅธƒใ€‚ -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) ็”ฑ Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze ๅ‘ๅธƒใ€‚ -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (ๆฅ่‡ช South China University of Technology) ไผด้š่ฎบๆ–‡ [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) ็”ฑ Jiapeng Wang, Lianwen Jin, Kai Ding ๅ‘ๅธƒใ€‚ -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (ๆฅ่‡ช The FAIR team of Meta AI) ไผด้š่ฎบๆ–‡ [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) ็”ฑ Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample ๅ‘ๅธƒใ€‚ -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (ๆฅ่‡ช The FAIR team of Meta AI) ไผด้š่ฎบๆ–‡ [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) ็”ฑ Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom. ๅ‘ๅธƒใ€‚ -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (ๆฅ่‡ช Microsoft Research & University of Wisconsin-Madison) ไผด้š่ฎบๆ–‡ [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) ็”ฑ Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee ๅ‘ๅธƒใ€‚ -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (ๆฅ่‡ช AllenAI) ไผด้š่ฎบๆ–‡ [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) ็”ฑ Iz Beltagy, Matthew E. Peters, Arman Cohan ๅ‘ๅธƒใ€‚ -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (ๆฅ่‡ช Google AI) released ไผด้š่ฎบๆ–‡ [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) ็”ฑ Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ๅ‘ๅธƒใ€‚ -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (ๆฅ่‡ช Studio Ousia) ไผด้š่ฎบๆ–‡ [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) ็”ฑ Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto ๅ‘ๅธƒใ€‚ -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (ๆฅ่‡ช UNC Chapel Hill) ไผด้š่ฎบๆ–‡ [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) ็”ฑ Hao Tan and Mohit Bansal ๅ‘ๅธƒใ€‚ -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) ็”ฑ Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert ๅ‘ๅธƒใ€‚ -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) ็”ฑ Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin ๅ‘ๅธƒใ€‚ -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** ็”จ [OPUS](http://opus.nlpl.eu/) ๆ•ฐๆฎ่ฎญ็ปƒ็š„ๆœบๅ™จ็ฟป่ฏ‘ๆจกๅž‹็”ฑ Jรถrg Tiedemann ๅ‘ๅธƒใ€‚[Marian Framework](https://marian-nmt.github.io/) ็”ฑๅพฎ่ฝฏ็ฟป่ฏ‘ๅ›ข้˜Ÿๅผ€ๅ‘ใ€‚ -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (ๆฅ่‡ช Microsoft Research Asia) ไผด้š่ฎบๆ–‡ [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) ็”ฑ Junlong Li, Yiheng Xu, Lei Cui, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (ๆฅ่‡ช FAIR and UIUC) ไผด้š่ฎบๆ–‡ [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) ็”ฑ Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar ๅ‘ๅธƒใ€‚ -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) ็”ฑ Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos ๅ‘ๅธƒใ€‚ -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) ็”ฑ Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer ๅ‘ๅธƒใ€‚ -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) ็”ฑ Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan ๅ‘ๅธƒใ€‚ -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) ็”ฑ Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer ๅ‘ๅธƒใ€‚ -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (ๆฅ่‡ช NVIDIA) ไผด้š่ฎบๆ–‡ [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) ็”ฑ Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ๅ‘ๅธƒใ€‚ -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (ๆฅ่‡ช NVIDIA) ไผด้š่ฎบๆ–‡ [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) ็”ฑ Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro ๅ‘ๅธƒใ€‚ -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (ๆฅ่‡ช Alibaba Research) ไผด้š่ฎบๆ–‡ [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) ็”ฑ Peng Wang, Cheng Da, and Cong Yao ๅ‘ๅธƒใ€‚ -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (ๆฅ่‡ช Studio Ousia) ไผด้š่ฎบๆ–‡ [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) ็”ฑ Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka ๅ‘ๅธƒใ€‚ -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) ็”ฑ Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (ๆฅ่‡ช CMU/Google Brain) ไผด้š่ฎบๆ–‡ [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) ็”ฑ Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou ๅ‘ๅธƒใ€‚ -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (ๆฅ่‡ช Google Inc.) ไผด้š่ฎบๆ–‡ [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) ็”ฑ Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam ๅ‘ๅธƒใ€‚ -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (ๆฅ่‡ช Google Inc.) ไผด้š่ฎบๆ–‡ [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) ็”ฑ Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen ๅ‘ๅธƒใ€‚ -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (ๆฅ่‡ช Apple) ไผด้š่ฎบๆ–‡ [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) ็”ฑ Sachin Mehta and Mohammad Rastegari ๅ‘ๅธƒใ€‚ -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (ๆฅ่‡ช Apple) ไผด้š่ฎบๆ–‡ [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) ็”ฑ Sachin Mehta and Mohammad Rastegari ๅ‘ๅธƒใ€‚ -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) ็”ฑ Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu ๅ‘ๅธƒใ€‚ -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (ๆฅ่‡ช MosaiML) ไผด้š่ฎบๆ–‡ [llm-foundry](https://github.com/mosaicml/llm-foundry/) ็”ฑ the MosaicML NLP Team ๅ‘ๅธƒใ€‚ -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (ๆฅ่‡ช the University of Wisconsin - Madison) ไผด้š่ฎบๆ–‡ [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) ็”ฑ Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh ๅ‘ๅธƒใ€‚ -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) ็”ฑ Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel ๅ‘ๅธƒใ€‚ -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (ๆฅ่‡ช ไธญๅ›ฝไบบๆฐ‘ๅคงๅญฆ AI Box) ไผด้š่ฎบๆ–‡ [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) ็”ฑ Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen ๅ‘ๅธƒใ€‚ -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (ๆฅ่‡ช SHI Labs) ไผด้š่ฎบๆ–‡ [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) ็”ฑ Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi ๅ‘ๅธƒใ€‚ -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (ๆฅ่‡ชๅŽไธบ่ฏบไบšๆ–น่ˆŸๅฎž้ชŒๅฎค) ไผด้š่ฎบๆ–‡ [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) ็”ฑ Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu ๅ‘ๅธƒใ€‚ -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (ๆฅ่‡ช Meta) ไผด้š่ฎบๆ–‡ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) ็”ฑ the NLLB team ๅ‘ๅธƒใ€‚ -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (ๆฅ่‡ช Meta) ไผด้š่ฎบๆ–‡ [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) ็”ฑ the NLLB team ๅ‘ๅธƒใ€‚ -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) ็”ฑ Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic ๅ‘ๅธƒใ€‚ -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (ๆฅ่‡ช the University of Wisconsin - Madison) ไผด้š่ฎบๆ–‡ [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) ็”ฑ Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh ๅ‘ๅธƒใ€‚ -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (ๆฅ่‡ช SHI Labs) ไผด้š่ฎบๆ–‡ [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) ็”ฑ Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi ๅ‘ๅธƒใ€‚ -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (ๆฅ่‡ช [s-JoL](https://huggingface.co/s-JoL)) ็”ฑ GitHub (็Žฐๅทฒๅˆ ้™ค). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) ็”ฑ Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al ๅ‘ๅธƒใ€‚ -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) ็”ฑ Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby ๅ‘ๅธƒใ€‚ -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) ็”ฑ Matthias Minderer, Alexey Gritsenko, Neil Houlsby ๅ‘ๅธƒใ€‚ -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** (ๆฅ่‡ช IBM Research) ไผด้š่ฎบๆ–‡ [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) ็”ฑ Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam ๅ‘ๅธƒใ€‚ -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (ๆฅ่‡ช IBM) ไผด้š่ฎบๆ–‡ [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf) ็”ฑ Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam ๅ‘ๅธƒใ€‚ -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (ๆฅ่‡ช Google) ไผด้š่ฎบๆ–‡ [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) ็”ฑ Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu ๅ‘ๅธƒใ€‚ -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (ๆฅ่‡ช Google) ไผด้š่ฎบๆ–‡ [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) ็”ฑ Jason Phang, Yao Zhao, Peter J. Liu ๅ‘ๅธƒใ€‚ -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (ๆฅ่‡ช Deepmind) ไผด้š่ฎบๆ–‡ [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) ็”ฑ Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira ๅ‘ๅธƒใ€‚ -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ๆฅ่‡ช ADEPT) ไผด้š่ฎบๆ–‡ [blog post](https://www.adept.ai/blog/persimmon-8b) ็”ฑ Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani ๅ‘ๅธƒใ€‚ -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (ๆฅ่‡ช VinAI Research) ไผด้š่ฎบๆ–‡ [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) ็”ฑ Dat Quoc Nguyen and Anh Tuan Nguyen ๅ‘ๅธƒใ€‚ -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (ๆฅ่‡ช Google) ไผด้š่ฎบๆ–‡ [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) ็”ฑ Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova ๅ‘ๅธƒใ€‚ -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (ๆฅ่‡ช UCLA NLP) ไผด้š่ฎบๆ–‡ [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) ็”ฑ Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang ๅ‘ๅธƒใ€‚ -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (ๆฅ่‡ช Sea AI Labs) ไผด้š่ฎบๆ–‡ [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) ็”ฑ Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng ๅ‘ๅธƒใ€‚ -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) ็”ฑ Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ๅ‘ๅธƒใ€‚ -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (ๆฅ่‡ช Nanjing University, The University of Hong Kong etc.) ไผด้š่ฎบๆ–‡ [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) ็”ฑ Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao ๅ‘ๅธƒใ€‚ -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (ๆฅ่‡ช NVIDIA) ไผด้š่ฎบๆ–‡ [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) ็”ฑ Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius ๅ‘ๅธƒใ€‚ -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) ็”ฑ Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela ๅ‘ๅธƒใ€‚ -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) ็”ฑ Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang ๅ‘ๅธƒใ€‚ -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) ็”ฑ Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya ๅ‘ๅธƒใ€‚ -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) ็”ฑ Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder ๅ‘ๅธƒใ€‚ -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (ๆฅ่‡ช Facebook), ไผด้š่ฎบๆ–‡ [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) ็”ฑ Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov ๅ‘ๅธƒใ€‚ -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) ็”ฑ Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (ๆฅ่‡ช WeChatAI), ไผด้š่ฎบๆ–‡ [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) ็”ฑ HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou ๅ‘ๅธƒใ€‚ -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ๆฅ่‡ช ZhuiyiTechnology), ไผด้š่ฎบๆ–‡ [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) ็”ฑ Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu ๅ‘ๅธƒใ€‚ -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (ๆฅ่‡ช Bo Peng) ไผด้š่ฎบๆ–‡ [this repo](https://github.com/BlinkDL/RWKV-LM) ็”ฑ Bo Peng ๅ‘ๅธƒใ€‚ -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (ๆฅ่‡ช NVIDIA) ไผด้š่ฎบๆ–‡ [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) ็”ฑ Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo ๅ‘ๅธƒใ€‚ -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) ็”ฑ Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick ๅ‘ๅธƒใ€‚ -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ๆฅ่‡ช ASAPP) ไผด้š่ฎบๆ–‡ [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) ็”ฑ Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ๅ‘ๅธƒใ€‚ -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (ๆฅ่‡ช ASAPP) ไผด้š่ฎบๆ–‡ [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) ็”ฑ Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi ๅ‘ๅธƒใ€‚ -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) ็”ฑ Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (ๆฅ่‡ช Facebook), ไผด้š่ฎบๆ–‡ [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) ็”ฑ Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino ๅ‘ๅธƒใ€‚ -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) ็”ฑ Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau ๅ‘ๅธƒใ€‚ -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (ๆฅ่‡ช Tel Aviv University) ไผด้š่ฎบๆ–‡ [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) ็”ฑ Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy ๅ‘ๅธƒใ€‚ -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (ๆฅ่‡ช Berkeley) ไผด้š่ฎบๆ–‡ [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) ็”ฑ Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer ๅ‘ๅธƒใ€‚ -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (ๆฅ่‡ช MBZUAI) ไผด้š่ฎบๆ–‡ [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) ็”ฑ Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan ๅ‘ๅธƒใ€‚ -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) ็”ฑ Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo ๅ‘ๅธƒใ€‚ -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) ็”ฑ Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo ๅ‘ๅธƒใ€‚ -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (ๆฅ่‡ช University of Wรผrzburg) ไผด้š่ฎบๆ–‡ [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) ็”ฑ Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte ๅ‘ๅธƒใ€‚ -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) ็”ฑ Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ๅ‘ๅธƒใ€‚ -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) ็”ฑ Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu ๅ‘ๅธƒใ€‚ -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) ็”ฑ Brandon Smock, Rohith Pesala, Robin Abraham ๅ‘ๅธƒใ€‚ -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) ็”ฑ Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos ๅ‘ๅธƒใ€‚ -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) ็”ฑ Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou ๅ‘ๅธƒใ€‚ -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (ๆฅ่‡ช Google/CMU) ไผด้š่ฎบๆ–‡ [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) ็”ฑ Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov ๅ‘ๅธƒใ€‚ -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (ๆฅ่‡ช Microsoft) ไผด้š่ฎบๆ–‡ [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) ็”ฑ Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei ๅ‘ๅธƒใ€‚ -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (ๆฅ่‡ช UNC Chapel Hill) ไผด้š่ฎบๆ–‡ [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) ็”ฑ Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal ๅ‘ๅธƒใ€‚ -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (ๆฅ่‡ช Intel) ไผด้š่ฎบๆ–‡ [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) ็”ฑ Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding ๅ‘ๅธƒ. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (ๆฅ่‡ช Google Research) ไผด้š่ฎบๆ–‡ [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) ็”ฑ Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant ๅ‘ๅธƒใ€‚ -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) ็”ฑ Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang ๅ‘ๅธƒใ€‚ -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) ็”ฑ Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu ๅ‘ๅธƒใ€‚ -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (ๆฅ่‡ช Peking University) ไผด้š่ฎบๆ–‡ [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) ็”ฑ Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun ๅ‘ๅธƒใ€‚ -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (ๆฅ่‡ช Tsinghua University and Nankai University) ไผด้š่ฎบๆ–‡ [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) ็”ฑ Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu ๅ‘ๅธƒใ€‚ -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (ๆฅ่‡ช Multimedia Computing Group, Nanjing University) ไผด้š่ฎบๆ–‡ [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) ็”ฑ Zhan Tong, Yibing Song, Jue Wang, Limin Wang ๅ‘ๅธƒใ€‚ -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (ๆฅ่‡ช NAVER AI Lab/Kakao Enterprise/Kakao Brain) ไผด้š่ฎบๆ–‡ [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) ็”ฑ Wonjae Kim, Bokyung Son, Ildoo Kim ๅ‘ๅธƒใ€‚ -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (ๆฅ่‡ช University of Wisconsinโ€“Madison) ไผด้š่ฎบๆ–‡ [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) ็”ฑ Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee ๅ‘ๅธƒใ€‚ -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) ็”ฑ Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ๅ‘ๅธƒใ€‚ -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (ๆฅ่‡ช UCLA NLP) ไผด้š่ฎบๆ–‡ [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) ็”ฑ Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang ๅ‘ๅธƒใ€‚ -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (ๆฅ่‡ช Google AI) ไผด้š่ฎบๆ–‡ [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) ็”ฑ Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby ๅ‘ๅธƒใ€‚ -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) ็”ฑ Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He ๅ‘ๅธƒใ€‚ -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) ็”ฑ Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick ๅ‘ๅธƒใ€‚ -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (ๆฅ่‡ช HUST-VL) ไผด้š่ฎบๆ–‡ [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) ็”ฑ Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang ๅ‘ๅธƒใ€‚ -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas ๅ‘ๅธƒ. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (ๆฅ่‡ช Kakao Enterprise) ไผด้š่ฎบๆ–‡ [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) ็”ฑ Jaehyeon Kim, Jungil Kong, Juhee Son ๅ‘ๅธƒใ€‚ -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (ๆฅ่‡ช Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) ็”ฑ Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) ็”ฑ Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) ็”ฑ Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino ๅ‘ๅธƒใ€‚ -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) ็”ฑ Qiantong Xu, Alexei Baevski, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (ๆฅ่‡ช OpenAI) ไผด้š่ฎบๆ–‡ [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) ็”ฑ Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever ๅ‘ๅธƒใ€‚ -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) ็”ฑ Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling ๅ‘ๅธƒใ€‚ -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) ็”ฑ Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe ๅ‘ๅธƒใ€‚ -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (ๆฅ่‡ช Facebook) ไผด้š่ฎบๆ–‡ [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) ็”ฑ Guillaume Lample and Alexis Conneau ๅ‘ๅธƒใ€‚ -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (ๆฅ่‡ช Microsoft Research) ไผด้š่ฎบๆ–‡ [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) ็”ฑ Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou ๅ‘ๅธƒใ€‚ -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (ๆฅ่‡ช Facebook AI), ไผด้š่ฎบๆ–‡ [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) ็”ฑ Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov ๅ‘ๅธƒใ€‚ -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) ็”ฑ Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau ๅ‘ๅธƒใ€‚ -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (ๆฅ่‡ช Meta AI) ไผด้š่ฎบๆ–‡ [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) ็”ฑ Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa ๅ‘ๅธƒใ€‚ -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (ๆฅ่‡ช Google/CMU) ไผด้š่ฎบๆ–‡ [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) ็”ฑ Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le ๅ‘ๅธƒใ€‚ -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) ็”ฑ Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (ๆฅ่‡ช Facebook AI) ไผด้š่ฎบๆ–‡ [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) ็”ฑ Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli ๅ‘ๅธƒใ€‚ -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (ๆฅ่‡ช Huazhong University of Science & Technology) ไผด้š่ฎบๆ–‡ [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) ็”ฑ Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu ๅ‘ๅธƒใ€‚ -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (ๆฅ่‡ช the University of Wisconsin - Madison) ไผด้š่ฎบๆ–‡ [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) ็”ฑ Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh ๅ‘ๅธƒใ€‚ -1. ๆƒณ่ฆ่ดก็Œฎๆ–ฐ็š„ๆจกๅž‹๏ผŸๆˆ‘ไปฌ่ฟ™้‡Œๆœ‰ไธ€ไปฝ**่ฏฆ็ป†ๆŒ‡ๅผ•ๅ’Œๆจกๆฟ**ๆฅๅผ•ๅฏผไฝ ๆทปๅŠ ๆ–ฐ็š„ๆจกๅž‹ใ€‚ไฝ ๅฏไปฅๅœจ [`templates`](./templates) ็›ฎๅฝ•ไธญๆ‰พๅˆฐไป–ไปฌใ€‚่ฎฐๅพ—ๆŸฅ็œ‹ [่ดก็ŒฎๆŒ‡ๅ—](./CONTRIBUTING.md) ๅนถๅœจๅผ€ๅง‹ๅ†™ PR ๅ‰่”็ณป็ปดๆŠคไบบๅ‘˜ๆˆ–ๅผ€ไธ€ไธชๆ–ฐ็š„ issue ๆฅ่Žทๅพ—ๅ้ฆˆใ€‚ - -่ฆๆฃ€ๆŸฅๆŸไธชๆจกๅž‹ๆ˜ฏๅฆๅทฒๆœ‰ Flaxใ€PyTorch ๆˆ– TensorFlow ็š„ๅฎž็Žฐ๏ผŒๆˆ–ๅ…ถๆ˜ฏๅฆๅœจ ๐Ÿค— Tokenizers ๅบ“ไธญๆœ‰ๅฏนๅบ”่ฏ็ฌฆๅŒ–ๅ™จ๏ผˆtokenizer๏ผ‰๏ผŒๆ•ฌ่ฏทๅ‚้˜…[ๆญค่กจ](https://huggingface.co/docs/transformers/index#supported-frameworks)ใ€‚ - -่ฟ™ไบ›ๅฎž็Žฐๅ‡ๅทฒไบŽๅคšไธชๆ•ฐๆฎ้›†ๆต‹่ฏ•๏ผˆ่ฏทๅ‚็œ‹็”จไพ‹่„šๆœฌ๏ผ‰ๅนถๅบ”ไบŽๅŽŸ็‰ˆๅฎž็Žฐ่กจ็Žฐ็›ธๅฝ“ใ€‚ไฝ ๅฏไปฅๅœจ็”จไพ‹ๆ–‡ๆกฃ็š„[ๆญค่Š‚](https://huggingface.co/docs/transformers/examples)ไธญไบ†่งฃ่กจ็Žฐ็š„็ป†่Š‚ใ€‚ - - -## ไบ†่งฃๆ›ดๅคš - -| ็ซ ่Š‚ | ๆ่ฟฐ | -|-|-| -| [ๆ–‡ๆกฃ](https://huggingface.co/docs/transformers/) | ๅฎŒๆ•ด็š„ API ๆ–‡ๆกฃๅ’Œๆ•™็จ‹ | -| [ไปปๅŠกๆ€ป็ป“](https://huggingface.co/docs/transformers/task_summary) | ๐Ÿค— Transformers ๆ”ฏๆŒ็š„ไปปๅŠก | -| [้ข„ๅค„็†ๆ•™็จ‹](https://huggingface.co/docs/transformers/preprocessing) | ไฝฟ็”จ `Tokenizer` ๆฅไธบๆจกๅž‹ๅ‡†ๅค‡ๆ•ฐๆฎ | -| [่ฎญ็ปƒๅ’Œๅพฎ่ฐƒ](https://huggingface.co/docs/transformers/training) | ๅœจ PyTorch/TensorFlow ็š„่ฎญ็ปƒๅพช็Žฏๆˆ– `Trainer` API ไธญไฝฟ็”จ ๐Ÿค— Transformers ๆไพ›็š„ๆจกๅž‹ | -| [ๅฟซ้€ŸไธŠๆ‰‹๏ผšๅพฎ่ฐƒๅ’Œ็”จไพ‹่„šๆœฌ](https://github.com/huggingface/transformers/tree/main/examples) | ไธบๅ„็งไปปๅŠกๆไพ›็š„็”จไพ‹่„šๆœฌ | -| [ๆจกๅž‹ๅˆ†ไบซๅ’ŒไธŠไผ ](https://huggingface.co/docs/transformers/model_sharing) | ๅ’Œ็คพๅŒบไธŠไผ ๅ’Œๅˆ†ไบซไฝ ๅพฎ่ฐƒ็š„ๆจกๅž‹ | -| [่ฟ็งป](https://huggingface.co/docs/transformers/migration) | ไปŽ `pytorch-transformers` ๆˆ– `pytorch-pretrained-bert` ่ฟ็งปๅˆฐ ๐Ÿค— Transformers | - -## ๅผ•็”จ - -ๆˆ‘ไปฌๅทฒๅฐ†ๆญคๅบ“็š„[่ฎบๆ–‡](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)ๆญฃๅผๅ‘่กจ๏ผŒๅฆ‚ๆžœไฝ ไฝฟ็”จไบ† ๐Ÿค— Transformers ๅบ“๏ผŒ่ฏทๅผ•็”จ: -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/README_zh-hant.md b/README_zh-hant.md deleted file mode 100644 index 407c4e952b763a..00000000000000 --- a/README_zh-hant.md +++ /dev/null @@ -1,530 +0,0 @@ - - - - -

-
- -
-

-

- - Build - - - GitHub - - - Documentation - - - GitHub release - - - Contributor Covenant - - DOI -

- -

-

- English | - ็ฎ€ไฝ“ไธญๆ–‡ | - ็น้ซ”ไธญๆ–‡ | - ํ•œ๊ตญ์–ด | - Espaรฑol | - ๆ—ฅๆœฌ่ชž | - เคนเคฟเคจเฅเคฆเฅ€ - เฐคเฑ†เฐฒเฑเฐ—เฑ | -

-

- -

-

็‚บ Jaxใ€PyTorch ไปฅๅŠ TensorFlow ๆ‰“้€ ็š„ๅ…ˆ้€ฒ่‡ช็„ถ่ชž่จ€่™•็†ๅ‡ฝๅผๅบซ

-

- -

- -

- -๐Ÿค— Transformers ๆไพ›ไบ†ๆ•ธไปฅๅƒ่จˆ็š„้ ่จ“็ทดๆจกๅž‹๏ผŒๆ”ฏๆด 100 ๅคš็จฎ่ชž่จ€็š„ๆ–‡ๆœฌๅˆ†้กžใ€่ณ‡่จŠๆ“ทๅ–ใ€ๅ•็ญ”ใ€ๆ‘˜่ฆใ€็ฟป่ญฏใ€ๆ–‡ๆœฌ็”Ÿๆˆใ€‚ๅฎƒ็š„ๅฎ—ๆ—จๆ˜ฏ่ฎ“ๆœ€ๅ…ˆ้€ฒ็š„ NLP ๆŠ€่ก“ไบบไบบๆ˜“็”จใ€‚ - -๐Ÿค— Transformers ๆไพ›ไบ†ไพฟๆ–ผๅฟซ้€Ÿไธ‹่ผ‰ๅ’Œไฝฟ็”จ็š„API๏ผŒ่ฎ“ไฝ ๅฏไปฅๅฐ‡้ ่จ“็ทดๆจกๅž‹็”จๅœจ็ตฆๅฎšๆ–‡ๆœฌใ€ๅœจไฝ ็š„่ณ‡ๆ–™้›†ไธŠๅพฎ่ชฟ็„ถๅพŒ็ถ“็”ฑ [model hub](https://huggingface.co/models) ่ˆ‡็คพ็พคๅ…ฑไบซใ€‚ๅŒๆ™‚๏ผŒๆฏๅ€‹ๅฎš็พฉ็š„ Python ๆจก็ต„ๆžถๆง‹ๅ‡ๅฎŒๅ…จ็จ็ซ‹๏ผŒๆ–นไพฟไฟฎๆ”นๅ’Œๅฟซ้€Ÿ็ ”็ฉถๅฏฆ้ฉ—ใ€‚ - -๐Ÿค— Transformers ๆ”ฏๆดไธ‰ๅ€‹ๆœ€็†ฑ้–€็š„ๆทฑๅบฆๅญธ็ฟ’ๅ‡ฝๅผๅบซ๏ผš [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) ไปฅๅŠ [TensorFlow](https://www.tensorflow.org/) โ€” ไธฆ่ˆ‡ไน‹ๅฎŒ็พŽๆ•ดๅˆใ€‚ไฝ ๅฏไปฅ็›ดๆŽฅไฝฟ็”จๅ…ถไธญไธ€ๅ€‹ๆก†ๆžถ่จ“็ทดไฝ ็š„ๆจกๅž‹๏ผŒ็„ถๅพŒ็”จๅฆไธ€ๅ€‹่ผ‰ๅ…ฅๅ’ŒๆŽจ่ซ–ใ€‚ - -## ็ทšไธŠDemo - -ไฝ ๅฏไปฅ็›ดๆŽฅๅœจ [model hub](https://huggingface.co/models) ไธŠๆธฌ่ฉฆๅคงๅคšๆ•ธ็š„ๆจกๅž‹ใ€‚ๆˆ‘ๅ€‘ไนŸๆไพ›ไบ† [็งๆœ‰ๆจกๅž‹่จ—็ฎกใ€ๆจกๅž‹็‰ˆๆœฌ็ฎก็†ไปฅๅŠๆŽจ่ซ–API](https://huggingface.co/pricing)ใ€‚ - -้€™่ฃกๆ˜ฏไธ€ไบ›็ฏ„ไพ‹๏ผš -- [็”จ BERT ๅš้ฎ่“‹ๅกซ่ฉž](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France) -- [็”จ Electra ๅšๅฐˆๆœ‰ๅ่ฉž่พจ่ญ˜](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city) -- [็”จ GPT-2 ๅšๆ–‡ๆœฌ็”Ÿๆˆ](https://huggingface.co/gpt2?text=A+long+time+ago%2C+) -- [็”จ RoBERTa ๅš่‡ช็„ถ่ชž่จ€ๆŽจ่ซ–](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal) -- [็”จ BART ๅšๆ–‡ๆœฌๆ‘˜่ฆ](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct) -- [็”จ DistilBERT ๅšๅ•็ญ”](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species) -- [็”จ T5 ๅš็ฟป่ญฏ](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin) - -**[Write With Transformer](https://transformer.huggingface.co)**๏ผŒ็”ฑ Hugging Face ๅœ˜้šŠๆ‰€ๆ‰“้€ ๏ผŒๆ˜ฏไธ€ๅ€‹ๆ–‡ๆœฌ็”Ÿๆˆ็š„ๅฎ˜ๆ–น demoใ€‚ - -## ๅฆ‚ๆžœไฝ ๅœจๅฐ‹ๆ‰พ็”ฑ Hugging Face ๅœ˜้šŠๆ‰€ๆไพ›็š„ๅฎข่ฃฝๅŒ–ๆ”ฏๆดๆœๅ‹™ - - - HuggingFace Expert Acceleration Program -
- -## ๅฟซ้€ŸไธŠๆ‰‹ - -ๆˆ‘ๅ€‘็‚บๅฟซ้€Ÿไฝฟ็”จๆจกๅž‹ๆไพ›ไบ† `pipeline` APIใ€‚ Pipeline ๅŒ…ๅซไบ†้ ่จ“็ทดๆจกๅž‹ๅ’Œๅฐๆ‡‰็š„ๆ–‡ๆœฌ้ ่™•็†ใ€‚ไธ‹้ขๆ˜ฏไธ€ๅ€‹ๅฟซ้€Ÿไฝฟ็”จ pipeline ๅŽปๅˆคๆ–ทๆญฃ่ฒ ้ขๆƒ…็ท’็š„ไพ‹ๅญ๏ผš - -```python ->>> from transformers import pipeline - -# ไฝฟ็”จๆƒ…็ท’ๅˆ†ๆž pipeline ->>> classifier = pipeline('sentiment-analysis') ->>> classifier('We are very happy to introduce pipeline to the transformers repository.') -[{'label': 'POSITIVE', 'score': 0.9996980428695679}] -``` - -็ฌฌไบŒ่กŒ็จ‹ๅผ็ขผไธ‹่ผ‰ไธฆๅฟซๅ– pipeline ไฝฟ็”จ็š„้ ่จ“็ทดๆจกๅž‹๏ผŒ่€Œ็ฌฌไธ‰่กŒ็จ‹ๅผ็ขผๅ‰‡ๅœจ็ตฆๅฎš็š„ๆ–‡ๆœฌไธŠ้€ฒ่กŒไบ†่ฉ•ไผฐใ€‚้€™่ฃก็š„็ญ”ๆกˆโ€œๆญฃ้ขโ€ (positive) ๅ…ทๆœ‰ 99.97% ็š„ไฟก่ณดๅบฆใ€‚ - -่จฑๅคš็š„ NLP ไปปๅ‹™้ƒฝๆœ‰้šจ้ธๅณ็”จ็š„้ ่จ“็ทด `pipeline`ใ€‚ไพ‹ๅฆ‚๏ผŒๆˆ‘ๅ€‘ๅฏไปฅ่ผ•้ฌ†ๅœฐๅพž็ตฆๅฎšๆ–‡ๆœฌไธญๆ“ทๅ–ๅ•้กŒ็ญ”ๆกˆ๏ผš - -``` python ->>> from transformers import pipeline - -# ไฝฟ็”จๅ•็ญ” pipeline ->>> question_answerer = pipeline('question-answering') ->>> question_answerer({ -... 'question': 'What is the name of the repository ?', -... 'context': 'Pipeline has been included in the huggingface/transformers repository' -... }) -{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'} - -``` - -้™คไบ†ๆไพ›ๅ•้กŒ่งฃ็ญ”๏ผŒ้ ่จ“็ทดๆจกๅž‹้‚„ๆไพ›ไบ†ๅฐๆ‡‰็š„ไฟก่ณดๅบฆๅˆ†ๆ•ธไปฅๅŠ่งฃ็ญ”ๅœจ tokenized ๅพŒ็š„ๆ–‡ๆœฌไธญ้–‹ๅง‹ๅ’Œ็ตๆŸ็š„ไฝ็ฝฎใ€‚ไฝ ๅฏไปฅๅพž[้€™ๅ€‹ๆ•™ๅญธ](https://huggingface.co/docs/transformers/task_summary)ไบ†่งฃๆ›ดๅคš `pipeline` APIๆ”ฏๆด็š„ไปปๅ‹™ใ€‚ - -่ฆๅœจไฝ ็š„ไปปๅ‹™ไธญไธ‹่ผ‰ๅ’Œไฝฟ็”จไปปไฝ•้ ่จ“็ทดๆจกๅž‹ๅพˆ็ฐกๅ–ฎ๏ผŒๅช้œ€ไธ‰่กŒ็จ‹ๅผ็ขผใ€‚้€™่ฃกๆ˜ฏ PyTorch ็‰ˆ็š„็ฏ„ไพ‹๏ผš -```python ->>> from transformers import AutoTokenizer, AutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = AutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="pt") ->>> outputs = model(**inputs) -``` -้€™่ฃกๆ˜ฏๅฐๆ‡‰็š„ TensorFlow ็จ‹ๅผ็ขผ๏ผš -```python ->>> from transformers import AutoTokenizer, TFAutoModel - ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") ->>> model = TFAutoModel.from_pretrained("bert-base-uncased") - ->>> inputs = tokenizer("Hello world!", return_tensors="tf") ->>> outputs = model(**inputs) -``` - -Tokenizer ็‚บๆ‰€ๆœ‰็š„้ ่จ“็ทดๆจกๅž‹ๆไพ›ไบ†้ ่™•็†๏ผŒไธฆๅฏไปฅ็›ดๆŽฅ่ฝ‰ๆ›ๅ–ฎไธ€ๅญ—ไธฒ๏ผˆๆฏ”ๅฆ‚ไธŠ้ข็š„ไพ‹ๅญ๏ผ‰ๆˆ–ไธฒๅˆ— (list)ใ€‚ๅฎƒๆœƒ่ผธๅ‡บไธ€ๅ€‹็š„ๅญ—ๅ…ธ (dict) ่ฎ“ไฝ ๅฏไปฅๅœจไธ‹ๆธธ็จ‹ๅผ็ขผ่ฃกไฝฟ็”จๆˆ–็›ดๆŽฅ่—‰็”ฑ `**` ้‹็ฎ—ๅผๅ‚ณ็ตฆๆจกๅž‹ใ€‚ - -ๆจกๅž‹ๆœฌ่บซๆ˜ฏไธ€ๅ€‹ๅธธ่ฆ็š„ [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) ๆˆ– [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)๏ผˆๅ–ๆฑบๆ–ผไฝ ็š„ๅพŒ็ซฏ๏ผ‰๏ผŒๅฏไพๅธธ่ฆๆ–นๅผไฝฟ็”จใ€‚ [้€™ๅ€‹ๆ•™ๅญธ](https://huggingface.co/transformers/training.html)่งฃ้‡‹ไบ†ๅฆ‚ไฝ•ๅฐ‡้€™ๆจฃ็š„ๆจกๅž‹ๆ•ดๅˆๅˆฐไธ€่ˆฌ็š„ PyTorch ๆˆ– TensorFlow ่จ“็ทด่ฟดๅœˆไธญ๏ผŒๆˆ–ๆ˜ฏๅฆ‚ไฝ•ไฝฟ็”จๆˆ‘ๅ€‘็š„ `Trainer` API ๅœจไธ€ๅ€‹ๆ–ฐ็š„่ณ‡ๆ–™้›†ไธŠๅฟซ้€Ÿ้€ฒ่กŒๅพฎ่ชฟใ€‚ - -## ็‚บไป€้บผ่ฆ็”จ transformers๏ผŸ - -1. ไพฟๆ–ผไฝฟ็”จ็š„ๅ…ˆ้€ฒๆจกๅž‹๏ผš - - NLU ๅ’Œ NLG ไธŠๆ€ง่ƒฝๅ“่ถŠ - - ๅฐๆ•™ๅญธๅ’Œๅฏฆไฝœๅ‹ๅฅฝไธ”ไฝŽ้–€ๆชป - - ้ซ˜ๅบฆๆŠฝ่ฑก๏ผŒไฝฟ็”จ่€…ๅช้ ˆๅญธ็ฟ’ 3 ๅ€‹้กžๅˆฅ - - ๅฐๆ‰€ๆœ‰ๆจกๅž‹ไฝฟ็”จ็š„ๅˆถๅผๅŒ–API - -1. ๆ›ดไฝŽ็š„้‹็ฎ—ๆˆๆœฌ๏ผŒๆ›ดๅฐ‘็š„็ขณๆŽ’ๆ”พ๏ผš - - ็ ”็ฉถไบบๅ“กๅฏไปฅๅˆ†ไบซๅทฒ่จ“็ทด็š„ๆจกๅž‹่€Œ้žๆฏๆฌกๅพž้ ญ้–‹ๅง‹่จ“็ทด - - ๅทฅ็จ‹ๅธซๅฏไปฅๆธ›ๅฐ‘่จˆ็ฎ—ๆ™‚้–“ไปฅๅŠ็”Ÿ็”ขๆˆๆœฌ - - ๆ•ธๅ็จฎๆจกๅž‹ๆžถๆง‹ใ€ๅ…ฉๅƒๅคšๅ€‹้ ่จ“็ทดๆจกๅž‹ใ€100ๅคš็จฎ่ชž่จ€ๆ”ฏๆด - -1. ๅฐๆ–ผๆจกๅž‹็”Ÿๅ‘ฝ้€ฑๆœŸ็š„ๆฏไธ€ๅ€‹้ƒจๅˆ†้ƒฝ้ข้ขไฟฑๅˆฐ๏ผš - - ่จ“็ทดๅ…ˆ้€ฒ็š„ๆจกๅž‹๏ผŒๅช้œ€ 3 ่กŒ็จ‹ๅผ็ขผ - - ๆจกๅž‹ๅฏไปฅๅœจไธๅŒๆทฑๅบฆๅญธ็ฟ’ๆก†ๆžถไน‹้–“ไปปๆ„่ฝ‰ๆ› - - ็‚บ่จ“็ทดใ€่ฉ•ไผฐๅ’Œ็”Ÿ็”ข้ธๆ“‡ๆœ€้ฉๅˆ็š„ๆก†ๆžถ๏ผŒไธฆๅฎŒ็พŽ้ŠœๆŽฅ - -1. ็‚บไฝ ็š„้œ€ๆฑ‚่ผ•้ฌ†ๅฎข่ฃฝๅŒ–ๅฐˆๅฑฌๆจกๅž‹ๅ’Œ็ฏ„ไพ‹๏ผš - - ๆˆ‘ๅ€‘็‚บๆฏ็จฎๆจกๅž‹ๆžถๆง‹ๆไพ›ไบ†ๅคšๅ€‹็ฏ„ไพ‹ไพ†้‡็พๅŽŸ่ซ–ๆ–‡็ตๆžœ - - ไธ€่‡ด็š„ๆจกๅž‹ๅ…ง้ƒจๆžถๆง‹ - - ๆจกๅž‹ๆช”ๆกˆๅฏๅ–ฎ็จไฝฟ็”จ๏ผŒไพฟๆ–ผไฟฎๆ”นๅ’Œๅฟซ้€Ÿๅฏฆ้ฉ— - -## ไป€้บผๆƒ…ๆณไธ‹ๆˆ‘ไธ่ฉฒ็”จ transformers๏ผŸ - -- ๆœฌๅ‡ฝๅผๅบซไธฆไธๆ˜ฏๆจก็ต„ๅŒ–็š„็ฅž็ถ“็ถฒ็ตกๅทฅๅ…ท็ฎฑใ€‚ๆจกๅž‹ๆ–‡ไปถไธญ็š„็จ‹ๅผ็ขผไธฆๆœชๅš้กๅค–็š„ๆŠฝ่ฑกๅฐ่ฃ๏ผŒไปฅไพฟ็ ”็ฉถไบบๅ“กๅฟซ้€Ÿๅœฐ็ฟป้–ฑๅŠไฟฎๆ”น็จ‹ๅผ็ขผ๏ผŒ่€Œไธๆœƒๆทฑ้™ท่ค‡้›œ็š„้กžๅˆฅๅŒ…่ฃไน‹ไธญใ€‚ -- `Trainer` API ไธฆ้ž็›ธๅฎนไปปไฝ•ๆจกๅž‹๏ผŒๅฎƒๅช็‚บๆœฌๅ‡ฝๅผๅบซไธญ็š„ๆจกๅž‹ๆœ€ไฝณๅŒ–ใ€‚ๅฐๆ–ผไธ€่ˆฌ็š„ๆฉŸๅ™จๅญธ็ฟ’็”จ้€”๏ผŒ่ซ‹ไฝฟ็”จๅ…ถไป–ๅ‡ฝๅผๅบซใ€‚ -- ๅ„˜็ฎกๆˆ‘ๅ€‘ๅทฒ็›กๅŠ›่€Œ็‚บ๏ผŒ[examples ็›ฎ้Œ„](https://github.com/huggingface/transformers/tree/main/examples)ไธญ็š„่…ณๆœฌไนŸๅƒ…็‚บ็ฏ„ไพ‹่€Œๅทฒใ€‚ๅฐๆ–ผ็‰นๅฎšๅ•้กŒ๏ผŒๅฎƒๅ€‘ไธฆไธไธ€ๅฎš้šจ้ธๅณ็”จ๏ผŒๅฏ่ƒฝ้œ€่ฆไฟฎๆ”นๅนพ่กŒ็จ‹ๅผ็ขผไปฅ็ฌฆๅˆ้œ€ๆฑ‚ใ€‚ - -## ๅฎ‰่ฃ - -### ไฝฟ็”จ pip - -้€™ๅ€‹ Repository ๅทฒๅœจ Python 3.8+ใ€Flax 0.4.1+ใ€PyTorch 1.10+ ๅ’Œ TensorFlow 2.6+ ไธ‹็ถ“้Žๆธฌ่ฉฆใ€‚ - -ไฝ ๅฏไปฅๅœจ[่™›ๆ“ฌ็’ฐๅขƒ](https://docs.python.org/3/library/venv.html)ไธญๅฎ‰่ฃ ๐Ÿค— Transformersใ€‚ๅฆ‚ๆžœไฝ ้‚„ไธ็†Ÿๆ‚‰ Python ็š„่™›ๆ“ฌ็’ฐๅขƒ๏ผŒ่ซ‹้–ฑๆญค[ไฝฟ็”จ่€…ๆŒ‡ๅผ•](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)ใ€‚ - -้ฆ–ๅ…ˆ๏ผŒ็”จไฝ ๆ‰“็ฎ—ไฝฟ็”จ็š„็‰ˆๆœฌ็š„ Python ๅ‰ตๅปบไธ€ๅ€‹่™›ๆ“ฌ็’ฐๅขƒไธฆ้€ฒๅ…ฅใ€‚ - -็„ถๅพŒ๏ผŒไฝ ้œ€่ฆๅฎ‰่ฃ Flaxใ€PyTorch ๆˆ– TensorFlow ๅ…ถไธญไน‹ไธ€ใ€‚ๅฐๆ–ผ่ฉฒๅฆ‚ไฝ•ๅœจไฝ ไฝฟ็”จ็š„ๅนณๅฐไธŠๅฎ‰่ฃ้€™ไบ›ๆก†ๆžถ๏ผŒ่ซ‹ๅƒ้–ฑ [TensorFlow ๅฎ‰่ฃ้ ้ข](https://www.tensorflow.org/install/), [PyTorch ๅฎ‰่ฃ้ ้ข](https://pytorch.org/get-started/locally/#start-locally) ๆˆ– [Flax ๅฎ‰่ฃ้ ้ข](https://github.com/google/flax#quick-install)ใ€‚ - -็•ถๅ…ถไธญไธ€ๅ€‹ๅพŒ็ซฏๅฎ‰่ฃๆˆๅŠŸๅพŒ๏ผŒ๐Ÿค— Transformers ๅฏไพๆญคๅฎ‰่ฃ๏ผš - -```bash -pip install transformers -``` - -ๅฆ‚ๆžœไฝ ๆƒณ่ฆ่ฉฆ่ฉฆ็ฏ„ไพ‹ๆˆ–่€…ๆƒณๅœจๆญฃๅผ็™ผๅธƒๅ‰ไฝฟ็”จๆœ€ๆ–ฐ้–‹็™ผไธญ็š„็จ‹ๅผ็ขผ๏ผŒไฝ ๅฟ…้ ˆ[ๅพžๅŽŸๅง‹็ขผๅฎ‰่ฃ](https://huggingface.co/docs/transformers/installation#installing-from-source)ใ€‚ - -### ไฝฟ็”จ conda - -่‡ช Transformers 4.0.0 ็‰ˆๅง‹๏ผŒๆˆ‘ๅ€‘ๆœ‰ไบ†ไธ€ๅ€‹ conda channel๏ผš `huggingface`ใ€‚ - -๐Ÿค— Transformers ๅฏไปฅ่—‰็”ฑ conda ไพๆญคๅฎ‰่ฃ๏ผš - -```shell script -conda install -c huggingface transformers -``` - -่ฆ่—‰็”ฑ conda ๅฎ‰่ฃ Flaxใ€PyTorch ๆˆ– TensorFlow ๅ…ถไธญไน‹ไธ€๏ผŒ่ซ‹ๅƒ้–ฑๅฎƒๅ€‘ๅ„่‡ชๅฎ‰่ฃ้ ้ข็š„่ชชๆ˜Žใ€‚ - -## ๆจกๅž‹ๆžถๆง‹ - -**๐Ÿค— Transformers ๆ”ฏๆด็š„[ๆ‰€ๆœ‰็š„ๆจกๅž‹ๆชขๆŸฅ้ปž](https://huggingface.co/models)**๏ผŒ็”ฑ[ไฝฟ็”จ่€…](https://huggingface.co/users)ๅ’Œ[็ต„็น”](https://huggingface.co/organizations)ไธŠๅ‚ณ๏ผŒๅ‡่ˆ‡ huggingface.co [model hub](https://huggingface.co) ๅฎŒ็พŽ็ตๅˆใ€‚ - -็›ฎๅ‰็š„ๆชขๆŸฅ้ปžๆ•ธ้‡๏ผš ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen) - -๐Ÿค— Transformers ็›ฎๅ‰ๆ”ฏๆดไปฅไธ‹็š„ๆžถๆง‹๏ผˆๆจกๅž‹ๆฆ‚่ฆฝ่ซ‹ๅƒ้–ฑ[้€™่ฃก](https://huggingface.co/docs/transformers/model_summary)๏ผ‰๏ผš - -1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. -1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. -1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell. -1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass. -1. **[Autoformer](https://huggingface.co/docs/transformers/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long. -1. **[Bark](https://huggingface.co/docs/transformers/model_doc/bark)** (from Suno) released in the repository [suno-ai/bark](https://github.com/suno-ai/bark) by Suno AI team. -1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. -1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from ร‰cole polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. -1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. -1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. -1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. -1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. -1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. -1. **[BioGpt](https://huggingface.co/docs/transformers/model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. -1. **[BiT](https://huggingface.co/docs/transformers/model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. -1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. -1. **[BLIP](https://huggingface.co/docs/transformers/model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi. -1. **[BLIP-2](https://huggingface.co/docs/transformers/model_doc/blip-2)** (from Salesforce) released with the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi. -1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https://bigscience.huggingface.co/). -1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. -1. **[BridgeTower](https://huggingface.co/docs/transformers/model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https://arxiv.org/abs/2206.08657) by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. -1. **[BROS](https://huggingface.co/docs/transformers/model_doc/bros)** (from NAVER CLOVA) released with the paper [BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents](https://arxiv.org/abs/2108.04539) by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park. -1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. -1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, ร‰ric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot. -1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. -1. **[Chinese-CLIP](https://huggingface.co/docs/transformers/model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https://arxiv.org/abs/2211.01335) by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou. -1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. -1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. -1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Gรถttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lรผddecke and Alexander Ecker. -1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. -1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. -1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. -1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. -1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. -1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. -1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. -1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. -1. **[CPM-Ant](https://huggingface.co/docs/transformers/model_doc/cpmant)** (from OpenBMB) released by the [OpenBMB](https://www.openbmb.org/). -1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. -1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang. -1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. -1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. -1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. -1. **[Deformable DETR](https://huggingface.co/docs/transformers/model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://arxiv.org/abs/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou. -1. **[DePlot](https://huggingface.co/docs/transformers/model_doc/deplot)** (from Google AI) released with the paper [DePlot: One-shot visual language reasoning by plot-to-table translation](https://arxiv.org/abs/2212.10505) by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun. -1. **[DETA](https://huggingface.co/docs/transformers/model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krรคhenbรผhl. -1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. -1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. -1. **[DiNAT](https://huggingface.co/docs/transformers/model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001) by Ali Hassani and Humphrey Shi. -1. **[DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2)** (from Meta AI) released with the paper [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) by Maxime Oquab, Timothรฉe Darcet, Thรฉo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervรฉ Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski. -1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT. -1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. -1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER) released with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. -1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. -1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Renรฉ Ranftl, Alexey Bochkovskiy, Vladlen Koltun. -1. **[EfficientFormer](https://huggingface.co/docs/transformers/model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https://arxiv.org/abs/2206.01191) by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. -1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le. -1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. -1. **[EnCodec](https://huggingface.co/docs/transformers/model_doc/encodec)** (from Meta AI) released with the paper [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) by Alexandre Dรฉfossez, Jade Copet, Gabriel Synnaeve, Yossi Adi. -1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. -1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu. -1. **[ErnieM](https://huggingface.co/docs/transformers/model_doc/ernie_m)** (from Baidu) released with the paper [ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora](https://arxiv.org/abs/2012.15674) by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. -1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2** was released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives. -1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme. -1. **[FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FLAN-UL2](https://huggingface.co/docs/transformers/model_doc/flan-ul2)** (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei -1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab. -1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. -1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. -1. **[FocalNet](https://huggingface.co/docs/transformers/model_doc/focalnet)** (from Microsoft Research) released with the paper [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. -1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. -1. **[Fuyu](https://huggingface.co/docs/transformers/model_doc/fuyu)** (from ADEPT) Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, SaฤŸnak TaลŸฤฑrlar. Released with the paper [blog post](https://www.adept.ai/blog/fuyu-8b) -1. **[GIT](https://huggingface.co/docs/transformers/model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https://arxiv.org/abs/2205.14100) by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang. -1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. -1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. -1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT NeoX Japanese](https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. -1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. -1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. -1. **[GPT-Sw3](https://huggingface.co/docs/transformers/model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey ร–hman, Fredrik Carlsson, Magnus Sahlgren. -1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://arxiv.org/abs/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo Garcรญa del Rรญo, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra. -1. **[GPTSAN-japanese](https://huggingface.co/docs/transformers/model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by ๅ‚ๆœฌไฟŠไน‹(tanreinama). -1. **[Graphormer](https://huggingface.co/docs/transformers/model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu. -1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. -1. **[HerBERT](https://huggingface.co/docs/transformers/model_doc/herbert)** (from Allegro.pl, AGH University of Science and Technology) released with the paper [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik. -1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed. -1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer. -1. **[IDEFICS](https://huggingface.co/docs/transformers/model_doc/idefics)** (from HuggingFace) released with the paper [OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents](https://huggingface.co/papers/2306.16527) by Hugo Laurenรงon, Lucile Saulnier, Lรฉo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh. -1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. -1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. -1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. -1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. -1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei. -1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. -1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. -1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei. -1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei. -1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervรฉ Jรฉgou, Matthijs Douze. -1. **[LiLT](https://huggingface.co/docs/transformers/model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) by Jiapeng Wang, Lianwen Jin, Kai Ding. -1. **[LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)** (from The FAIR team of Meta AI) released with the paper [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. -1. **[Llama2](https://huggingface.co/docs/transformers/model_doc/llama2)** (from The FAIR team of Meta AI) released with the paper [Llama2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/XXX) by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.. -1. **[LLaVa](https://huggingface.co/docs/transformers/model_doc/llava)** (from Microsoft Research & University of Wisconsin-Madison) released with the paper [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485) by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee. -1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. -1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang. -1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto. -1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. -1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert. -1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. -1. **[MADLAD-400](https://huggingface.co/docs/transformers/model_doc/madlad-400)** (from Google) released with the paper [MADLAD-400: A Multilingual And Document-Level Large Audited Dataset](https://arxiv.org/abs/2309.04662) by Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A. Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, Orhan Firat. -1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jรถrg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. -1. **[MarkupLM](https://huggingface.co/docs/transformers/model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. -1. **[Mask2Former](https://huggingface.co/docs/transformers/model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. -1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov -1. **[MatCha](https://huggingface.co/docs/transformers/model_doc/matcha)** (from Google AI) released with the paper [MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering](https://arxiv.org/abs/2212.09662) by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos. -1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. -1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. -1. **[MEGA](https://huggingface.co/docs/transformers/model_doc/mega)** (from Facebook) released with the paper [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. -1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. -1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao. -1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The Mistral AI team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed.. -1. **[Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lรฉlio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothรฉe Lacroix, William El Sayed. -1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. -1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. -1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. -1. **[MobileNetV1](https://huggingface.co/docs/transformers/model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. -1. **[MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. -1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. -1. **[MobileViTV2](https://huggingface.co/docs/transformers/model_doc/mobilevitv2)** (from Apple) released with the paper [Separable Self-attention for Mobile Vision Transformers](https://arxiv.org/abs/2206.02680) by Sachin Mehta and Mohammad Rastegari. -1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. -1. **[MPT](https://huggingface.co/docs/transformers/model_doc/mpt)** (from MosaiML) released with the paper [llm-foundry](https://github.com/mosaicml/llm-foundry/) by the MosaicML NLP Team. -1. **[MRA](https://huggingface.co/docs/transformers/model_doc/mra)** (from the University of Wisconsin - Madison) released with the paper [Multi Resolution Analysis (MRA)](https://arxiv.org/abs/2207.10284) by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh. -1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. -1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Dรฉfossez. -1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. -1. **[NAT](https://huggingface.co/docs/transformers/model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. -1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahโ€™s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu. -1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[NLLB-MOE](https://huggingface.co/docs/transformers/model_doc/nllb-moe)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team. -1. **[Nougat](https://huggingface.co/docs/transformers/model_doc/nougat)** (from Meta AI) released with the paper [Nougat: Neural Optical Understanding for Academic Documents](https://arxiv.org/abs/2308.13418) by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic. -1. **[Nystrรถmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nystrรถmformer: A Nystrรถm-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. -1. **[OneFormer](https://huggingface.co/docs/transformers/model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https://arxiv.org/abs/2211.06220) by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. -1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed). -1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al. -1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. -1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby. -1. **[PatchTSMixer](https://huggingface.co/docs/transformers/model_doc/patchtsmixer)** (from IBM Research) released with the paper [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://arxiv.org/pdf/2306.09364.pdf) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[PatchTST](https://huggingface.co/docs/transformers/model_doc/patchtst)** (from IBM) released with the paper [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://arxiv.org/pdf/2211.14730.pdf) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. -1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. -1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, Peter J. Liu. -1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira. -1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released with the paper [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, SaฤŸnak TaลŸฤฑrlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. -1. **[Phi](https://huggingface.co/docs/transformers/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cรฉsar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sรฉbastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sรฉbastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee. -1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. -1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. -1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang. -1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng. -1. **[Pop2Piano](https://huggingface.co/docs/transformers/model_doc/pop2piano)** released with the paper [Pop2Piano : Pop Audio-based Piano Cover Generation](https://arxiv.org/abs/2211.00895) by Jongho Choi, Kyogu Lee. -1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao. -1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. -1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kรผttler, Mike Lewis, Wen-tau Yih, Tim Rocktรคschel, Sebastian Riedel, Douwe Kiela. -1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. -1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya. -1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollรกr. -1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder. -1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. -1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. -1. **[RoBERTa-PreLayerNorm](https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https://arxiv.org/abs/1904.01038) by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. -1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. -1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. -1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng) released with the paper [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng. -1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T โ€” Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team. -1. **[SeamlessM4Tv2](https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2)** (from Meta AI) released with the paper [Seamless: Multilingual Expressive and Streaming Speech Translation](https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/) by the Seamless Communication team. -1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. -1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. -1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. -1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. -1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino. -1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. -1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. -1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. -1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. -1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. -1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. -1. **[Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr)** (from University of Wรผrzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte. -1. **[SwitchTransformers](https://huggingface.co/docs/transformers/model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer. -1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. -1. **[Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham. -1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweล‚ Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos. -1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. -1. **[Time Series Transformer](https://huggingface.co/docs/transformers/model_doc/time_series_transformer)** (from HuggingFace). -1. **[TimeSformer](https://huggingface.co/docs/transformers/model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https://arxiv.org/abs/2102.05095) by Gedas Bertasius, Heng Wang, Lorenzo Torresani. -1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine -1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. -1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. -1. **[TVLT](https://huggingface.co/docs/transformers/model_doc/tvlt)** (from UNC Chapel Hill) released with the paper [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal. -1. **[TVP](https://huggingface.co/docs/transformers/model_doc/tvp)** (from Intel) released with the paper [Text-Visual Prompting for Efficient 2D Temporal Video Grounding](https://arxiv.org/abs/2303.04995) by Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding. -1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler -1. **[UMT5](https://huggingface.co/docs/transformers/model_doc/umt5)** (from Google Research) released with the paper [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi) by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant. -1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. -1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. -1. **[UnivNet](https://huggingface.co/docs/transformers/model_doc/univnet)** (from Kakao Corporation) released with the paper [UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation](https://arxiv.org/abs/2106.07889) by Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, and Juntae Kim. -1. **[UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. -1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu. -1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. -1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim. -1. **[VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)** (from University of Wisconsinโ€“Madison) released with the paper [Making Large Multimodal Models Understand Arbitrary Visual Prompts](https://arxiv.org/abs/2312.00784) by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee. -1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. -1. **[ViT Hybrid](https://huggingface.co/docs/transformers/model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. -1. **[VitDet](https://huggingface.co/docs/transformers/model_doc/vitdet)** (from Meta AI) released with the paper [Exploring Plain Vision Transformer Backbones for Object Detection](https://arxiv.org/abs/2203.16527) by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. -1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollรกr, Ross Girshick. -1. **[ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte)** (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. -1. **[ViTMSN](https://huggingface.co/docs/transformers/model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https://arxiv.org/abs/2204.07141) by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. -1. **[VITS](https://huggingface.co/docs/transformers/model_doc/vits)** (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son. -1. **[ViViT](https://huggingface.co/docs/transformers/model_doc/vivit)** (from Google Research) released with the paper [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luฤiฤ‡, Cordelia Schmid. -1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. -1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino. -1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli. -1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei. -1. **[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever. -1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. -1. **[X-MOD](https://huggingface.co/docs/transformers/model_doc/xmod)** (from Meta AI) released with the paper [Lifting the Curse of Multilinguality by Pre-training Modular Transformers](http://dx.doi.org/10.18653/v1/2022.naacl-main.255) by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe. -1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. -1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. -1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. -1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. -1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI) released with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. -1. **[XLM-V](https://huggingface.co/docs/transformers/model_doc/xlm-v)** (from Meta AI) released with the paper [XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models](https://arxiv.org/abs/2301.10472) by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa. -1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [โ€‹XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. -1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. -1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. -1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. -1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh. -1. ๆƒณ่ฆ่ฒข็ปๆ–ฐ็š„ๆจกๅž‹๏ผŸๆˆ‘ๅ€‘้€™่ฃกๆœ‰ไธ€ไปฝ**่ฉณ็ดฐๆŒ‡ๅผ•ๅ’Œๆจกๆฟ**ไพ†ๅผ•ๅฐŽไฝ ๅŠ ๅ…ฅๆ–ฐ็š„ๆจกๅž‹ใ€‚ไฝ ๅฏไปฅๅœจ [`templates`](./templates) ็›ฎ้Œ„ไธญๆ‰พๅˆฐๅฎƒๅ€‘ใ€‚่จ˜ๅพ—ๆŸฅ็œ‹[่ฒข็ปๆŒ‡ๅผ•](./CONTRIBUTING.md)ไธฆๅœจ้–‹ๅง‹ๅฏซ PR ๅ‰่ฏ็นซ็ถญ่ญทไบบๅ“กๆˆ–้–‹ไธ€ๅ€‹ๆ–ฐ็š„ issue ไพ†็ฒๅพ— feedbacksใ€‚ - -่ฆๆชขๆŸฅๆŸๅ€‹ๆจกๅž‹ๆ˜ฏๅฆๅทฒๆœ‰ Flaxใ€PyTorch ๆˆ– TensorFlow ็š„ๅฏฆไฝœ๏ผŒๆˆ–ๅ…ถๆ˜ฏๅฆๅœจ๐Ÿค— Tokenizers ๅ‡ฝๅผๅบซไธญๆœ‰ๅฐๆ‡‰็š„ tokenizer๏ผŒๆ•ฌ่ซ‹ๅƒ้–ฑ[ๆญค่กจ](https://huggingface.co/docs/transformers/index#supported-frameworks)ใ€‚ - -้€™ไบ›ๅฏฆไฝœๅ‡ๅทฒๆ–ผๅคšๅ€‹่ณ‡ๆ–™้›†ๆธฌ่ฉฆ๏ผˆ่ซ‹ๅƒ้–ฑ็ฏ„ไพ‹่…ณๆœฌ๏ผ‰ไธฆๆ‡‰่ˆ‡ๅŽŸ็‰ˆๅฏฆไฝœ่กจ็พ็›ธ็•ถใ€‚ไฝ ๅฏไปฅๅœจ็ฏ„ไพ‹ๆ–‡ไปถ็š„[ๆญค็ฏ€](https://huggingface.co/docs/transformers/examples)ไธญไบ†่งฃๅฏฆไฝœ็š„็ดฐ็ฏ€ใ€‚ - - -## ไบ†่งฃๆ›ดๅคš - -| ็ซ ็ฏ€ | ๆ่ฟฐ | -|-|-| -| [ๆ–‡ไปถ](https://huggingface.co/transformers/) | ๅฎŒๆ•ด็š„ API ๆ–‡ไปถๅ’Œๆ•™ๅญธ | -| [ไปปๅ‹™ๆฆ‚่ฆฝ](https://huggingface.co/docs/transformers/task_summary) | ๐Ÿค— Transformers ๆ”ฏๆด็š„ไปปๅ‹™ | -| [้ ่™•็†ๆ•™ๅญธ](https://huggingface.co/docs/transformers/preprocessing) | ไฝฟ็”จ `Tokenizer` ไพ†็‚บๆจกๅž‹ๆบ–ๅ‚™่ณ‡ๆ–™ | -| [่จ“็ทดๅ’Œๅพฎ่ชฟ](https://huggingface.co/docs/transformers/training) | ไฝฟ็”จ PyTorch/TensorFlow ็š„ๅ…งๅปบ็š„่จ“็ทดๆ–นๅผๆˆ–ๆ–ผ `Trainer` API ไธญไฝฟ็”จ ๐Ÿค— Transformers ๆไพ›็š„ๆจกๅž‹ | -| [ๅฟซ้€ŸไธŠๆ‰‹๏ผšๅพฎ่ชฟๅ’Œ็ฏ„ไพ‹่…ณๆœฌ](https://github.com/huggingface/transformers/tree/main/examples) | ็‚บๅ„็จฎไปปๅ‹™ๆไพ›็š„็ฏ„ไพ‹่…ณๆœฌ | -| [ๆจกๅž‹ๅˆ†ไบซๅ’ŒไธŠๅ‚ณ](https://huggingface.co/docs/transformers/model_sharing) | ไธŠๅ‚ณไธฆ่ˆ‡็คพ็พคๅˆ†ไบซไฝ ๅพฎ่ชฟ็š„ๆจกๅž‹ | -| [้ท็งป](https://huggingface.co/docs/transformers/migration) | ๅพž `pytorch-transformers` ๆˆ– `pytorch-pretrained-bert` ้ท็งปๅˆฐ ๐Ÿค— Transformers | - -## ๅผ•็”จ - -ๆˆ‘ๅ€‘ๅทฒๅฐ‡ๆญคๅ‡ฝๅผๅบซ็š„[่ซ–ๆ–‡](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)ๆญฃๅผ็™ผ่กจใ€‚ๅฆ‚ๆžœไฝ ไฝฟ็”จไบ† ๐Ÿค— Transformers ๅ‡ฝๅผๅบซ๏ผŒๅฏไปฅๅผ•็”จ๏ผš -```bibtex -@inproceedings{wolf-etal-2020-transformers, - title = "Transformers: State-of-the-Art Natural Language Processing", - author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush", - booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", - month = oct, - year = "2020", - address = "Online", - publisher = "Association for Computational Linguistics", - url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6", - pages = "38--45" -} -``` diff --git a/SECURITY.md b/SECURITY.md index a16cfe099f8f78..fcb8b9b6f18f28 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,6 +1,40 @@ # Security Policy +## Hugging Face Hub, remote artefacts, and remote code + +Transformers is open-source software that is tightly coupled to the Hugging Face Hub. While you have the ability to use it +offline with pre-downloaded model weights, it provides a very simple way to download, use, and manage models locally. + +When downloading artefacts that have been uploaded by others on any platform, you expose yourself to risks. Please +read below for the security recommendations in order to keep your runtime and local environment safe. + +### Remote artefacts + +Models uploaded on the Hugging Face Hub come in different formats. We heavily recommend uploading and downloading +models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized +by the transformers library), as developed specifically to prevent arbitrary code execution on your system. + +To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model. + +### Remote code + +#### Modeling + +Transformers supports many model architectures, but is also the bridge between your Python runtime and models that +are stored in model repositories on the Hugging Face Hub. + +These models require the `trust_remote_code=True` parameter to be set when using them; please **always** verify +the content of the modeling files when using this argument. We recommend setting a revision in order to ensure you +protect yourself from updates on the repository. + +#### Tools + +Through the `Agent` framework, remote tools can be downloaded to be used by the Agent. You're to specify these tools +yourself, but please keep in mind that their code will be run on your machine if the Agent chooses to run them. + +Please inspect the code of the tools before passing them to the Agent to protect your runtime and local setup. + ## Reporting a Vulnerability -๐Ÿค— We have our bug bounty program set up with HackerOne. Please feel free to submit vulnerability reports to our private program at https://hackerone.com/hugging_face. +๐Ÿค— Please feel free to submit vulnerability reports to our private bug bounty program at https://hackerone.com/hugging_face. You'll need to request access to the program by emailing security@huggingface.co. Note that you'll need to be invited to our program, so send us a quick email at security@huggingface.co if you've found a vulnerability. diff --git a/awesome-transformers.md b/awesome-transformers.md index 013f88259c91e4..d55e276841a3b0 100644 --- a/awesome-transformers.md +++ b/awesome-transformers.md @@ -21,7 +21,7 @@ This repository contains examples and best practices for building recommendation Keywords: Recommender systems, AzureML -## [lama-cleaner](https://github.com/Sanster/lama-cleaner) +## [IOPaint](https://github.com/Sanster/IOPaint) Image inpainting tool powered by Stable Diffusion. Remove any unwanted object, defect, people from your pictures or erase and replace anything on your pictures. @@ -105,9 +105,9 @@ An open-source Implementation of Imagen, Google's closed-source Text-to-Image Ne Keywords: Imagen, Text-to-image -## [adapter-transformers](https://github.com/adapter-hub/adapter-transformers) +## [adapters](https://github.com/adapter-hub/adapters) -[adapter-transformers](https://github.com/adapter-hub/adapter-transformers) is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules. It is a drop-in replacement for transformers, which is regularly updated to stay up-to-date with the developments of transformers. +[adapters](https://github.com/adapter-hub/adapters) is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules. It is a drop-in replacement for transformers, which is regularly updated to stay up-to-date with the developments of transformers. Keywords: Adapters, LoRA, Parameter-efficient fine-tuning, Hub @@ -596,14 +596,14 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active ## [BentoML](https://github.com/bentoml/BentoML) -[BentoML](https://github.com/bentoml) is the unified framework for for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models. +[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models. All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage. Keywords: BentoML, Framework, Deployment, AI Applications -## [LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning) +## [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) -[LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning). +[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning). Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen diff --git a/tests/models/deta/__init__.py b/benchmark/__init__.py similarity index 100% rename from tests/models/deta/__init__.py rename to benchmark/__init__.py diff --git a/benchmark/benchmark.py b/benchmark/benchmark.py new file mode 100644 index 00000000000000..304bbd4441cf66 --- /dev/null +++ b/benchmark/benchmark.py @@ -0,0 +1,326 @@ +# Copyright 2024 The HuggingFace Team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Run benchmark using the `optimum-benchmark` library with some customization in `transformers`. + +Assume we are under `transformers` root directory: (make sure the commits are valid commits) +```bash +python benchmark/benchmark.py --config-dir benchmark/config --config-name generation --commit=9b9c7f03da625b13643e99205c691fe046461724 --metrics=decode.latency.mean,per_token.latency.mean,per_token.throughput.value backend.model=google/gemma-2b benchmark.input_shapes.sequence_length=5,7 benchmark.input_shapes.batch_size=1,2 --multirun +``` +""" + +import argparse +import glob +import json +import os.path +import re +import tempfile +from contextlib import contextmanager +from pathlib import Path + +from git import Repo + +from huggingface_hub import HfApi + +from optimum_benchmark import Benchmark +from optimum_benchmark_wrapper import main + + +PATH_TO_REPO = Path(__file__).parent.parent.resolve() + + +@contextmanager +def checkout_commit(repo: Repo, commit_id: str): + """ + Context manager that checks out a given commit when entered, but gets back to the reference it was at on exit. + Args: + repo (`git.Repo`): A git repository (for instance the Transformers repo). + commit_id (`str`): The commit reference to checkout inside the context manager. + """ + current_head = repo.head.commit if repo.head.is_detached else repo.head.ref + + try: + repo.git.checkout(commit_id) + yield + + finally: + repo.git.checkout(current_head) + + +def summarize(run_dir, metrics, expand_metrics=False): + """Produce a summary for each optimum-benchmark launched job's output directory found in `run_dir`. + + Each summary's format is as follows (for `expand_metrics=False`): + ``` + { + "model": "google/gemma-2b", + "commit": "3cd6ed22e4d49219f300f5055e71e3929aba20d7", + "config": "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5", + "metrics": { + "decode.latency.mean": 1.624666809082031, + "per_token.latency.mean": 0.012843788806628804, + "per_token.throughput.value": 77.85864553330948 + } + } + ``` + """ + reports = glob.glob(os.path.join(run_dir, "**/benchmark_report.json"), recursive=True) + report_dirs = [str(Path(report).parent) for report in reports] + + summaries = [] + for report_dir in report_dirs: + commit = re.search(r"/commit=([^/]+)", report_dir).groups()[0] + + if not os.path.isfile(os.path.join(report_dir, "benchmark.json")): + continue + benchmark = Benchmark.from_json(os.path.join(report_dir, "benchmark.json")) + report = benchmark.report + + model = benchmark.config.backend["model"] + + # Ths looks like `benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5`. + # (we rely on the usage of hydra's `${hydra.job.override_dirname}`.) + benchmark_name = re.sub(f"backend.model={model},*", "", report_dir) + benchmark_name = str(Path(benchmark_name).parts[-1]) + if benchmark_name.startswith("commit="): + benchmark_name = benchmark.config.name + + metrics_values = {} + # post-processing of report: show a few selected/important metric + for metric in metrics: + keys = metric.split(".") + value = report.to_dict() + current = metrics_values + for key in keys: + # Avoid KeyError when a user's specified metric has typo. + # TODO: Give warnings. + if key not in value: + continue + value = value[key] + + if expand_metrics: + if isinstance(value, dict): + if key not in current: + current[key] = {} + current = current[key] + else: + current[key] = value + + if not expand_metrics: + metrics_values[metric] = value + + # show some config information + print(f"model: {model}") + print(f"commit: {commit}") + print(f"config: {benchmark_name}") + if len(metrics_values) > 0: + print("metrics:") + if expand_metrics: + print(metrics_values) + else: + for metric, value in metrics_values.items(): + print(f" - {metric}: {value}") + print("-" * 80) + + summary = { + "model": model, + "commit": commit, + "config": benchmark_name, + "metrics": metrics_values, + } + summaries.append(summary) + + with open(os.path.join(report_dir, "summary.json"), "w") as fp: + json.dump(summary, fp, indent=4) + + return summaries + + +def combine_summaries(summaries): + """Combine a list of summary obtained from the function `summarize`. + + The combined summary's format is as follows: + ``` + "google/gemma-2b": { + "benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5": { + "3cd6ed22e4d49219f300f5055e71e3929aba20d7": { + "metrics": {"decode.latency.mean": 1.624666809082031} + }, + "c97ee28b117c0abe8e08891f402065e4df6d72aa": { + "metrics": {"decode.latency.mean": 1.6278163452148438} + } + }, + "benchmark.input_shapes.batch_size=2,benchmark.input_shapes.sequence_length=5": { + "3cd6ed22e4d49219f300f5055e71e3929aba20d7": { + "metrics": {"decode.latency.mean": 1.6947791748046876} + }, + "c97ee28b117c0abe8e08891f402065e4df6d72aa": { + "metrics": { + "decode.latency.mean": 1.6980519409179688} + } + } + } + ``` + """ + combined = {} + for summary in summaries: + model = summary["model"] + config = summary["config"] + commit = summary["commit"] + + if model not in combined: + combined[model] = {} + + if config not in combined[model]: + combined[model][config] = {} + + if commit not in combined[model][config]: + combined[model][config][commit] = {"metrics": summary["metrics"]} + + with open(os.path.join(exp_run_dir, "summary.json"), "w") as fp: + json.dump(combined, fp, indent=4) + + print(json.dumps(combined, indent=4)) + + return combined + + +if __name__ == "__main__": + + def list_str(values): + return values.split(",") + + parser = argparse.ArgumentParser() + + parser.add_argument("--config-dir", type=str, required=True, help="The path to the config directory.") + parser.add_argument("--config-name", type=str, required=True, help="The config name.") + + # arguments specific to this wrapper for our own customization + parser.add_argument("--ensure_empty", type=bool, default=True, help="If to create a temporary directory.") + parser.add_argument( + "--commit", + type=list_str, + default="", + help="Comma-separated list of branch names and/or commit sha values on which the benchmark will run. If `diff` is specified, it will run on both the current head and the `main` branch.", + ) + parser.add_argument("--metrics", type=str, help="The metrics to be included in the summary.") + + parser.add_argument("--repo_id", type=str, default=None, help="The repository to which the file will be uploaded.") + parser.add_argument("--path_in_repo", type=str, default=None, help="Relative filepath in the repo.") + parser.add_argument("--token", type=str, default=None, help="A valid user access token (string).") + + args, optimum_benchmark_args = parser.parse_known_args() + + repo = Repo(PATH_TO_REPO) + + metrics = [ + "prefill.latency.mean", + "prefill.throughput.value", + "decode.latency.mean", + "decode.throughput.value", + "per_token.latency.mean", + "per_token.throughput.value", + ] + if args.metrics is not None: + metrics = args.metrics.split(",") + + # Get `backend.model` in a hacky way: We want to control the experiment flow manually. + models = [""] + for idx, arg in enumerate(optimum_benchmark_args): + if arg.startswith("backend.model="): + models = arg[len("backend.model=") :] + models = models.split(",") + break + optimum_benchmark_args = [arg for arg in optimum_benchmark_args if not arg.startswith("backend.model=")] + + # Get the commit(s) + current_head = str(repo.head.commit) if repo.head.is_detached else str(repo.head.ref) + commits = [x for x in args.commit if x != ""] + if len(commits) == 0: + commits = [current_head] + elif len(commits) == 1 and commits[0] == "diff": + # compare to `main` + commits = ["main", current_head] + + # Get the specified run directory + run_dir_arg_idx, run_dir = -1, None + sweep_dir_arg_idx, sweep_dir = -1, None + for idx, arg in enumerate(optimum_benchmark_args): + if arg.startswith("hydra.run.dir="): + run_dir = arg[len("hydra.run.dir=") :] + run_dir_arg_idx = idx + elif arg.startswith("hydra.sweep.dir="): + sweep_dir = arg[len("hydra.sweep.dir=") :] + sweep_dir_arg_idx = idx + exp_run_dir, arg_dix, arg_name = ( + (sweep_dir, sweep_dir_arg_idx, "hydra.sweep.dir") + if "--multirun" in optimum_benchmark_args + else (run_dir, run_dir_arg_idx, "hydra.run.dir") + ) + + # TODO: not hardcoded + if exp_run_dir is None and args.ensure_empty: + exp_run_dir = "_benchmark" + + if args.ensure_empty: + os.makedirs(exp_run_dir, exist_ok=True) + exp_run_dir = tempfile.mkdtemp(dir=exp_run_dir) + + run_summaries = [] + for commit in commits: + with checkout_commit(repo, commit): + commit = str(repo.head.commit) + + commit_run_dir = exp_run_dir + if exp_run_dir is not None: + commit_run_dir = os.path.join(exp_run_dir, rf"commit\={commit}") + + print(f"Run benchmark on commit: {commit}") + + for model in models: + model_arg = [f"backend.model={model}"] if model != "" else [] + dir_args = [] + if commit_run_dir is not None: + if arg_dix > -1: + optimum_benchmark_args[arg_dix] = f"{arg_name}={commit_run_dir}" + else: + dir_args = [ + f"hydra.sweep.dir={commit_run_dir}", + f"hydra.run.dir={commit_run_dir}/" + "${hydra.job.override_dirname}", + ] + main(args.config_dir, args.config_name, model_arg + dir_args + optimum_benchmark_args) + + if commit_run_dir is not None: + # Need to remove the `\` character + summaries = summarize(commit_run_dir.replace("\\", ""), metrics) + run_summaries.extend(summaries) + + # aggregate the information across the commits + if exp_run_dir is not None: + with open(os.path.join(exp_run_dir, "summaries.json"), "w") as fp: + json.dump(run_summaries, fp, indent=4) + + combined_summary = combine_summaries(run_summaries) + + if args.repo_id is not None and args.path_in_repo is not None: + # Upload to Hub + api = HfApi() + api.upload_folder( + folder_path=exp_run_dir, + path_in_repo=args.path_in_repo, + repo_id=args.repo_id, + repo_type="dataset", + token=args.token, + ) diff --git a/benchmark/config/generation.yaml b/benchmark/config/generation.yaml new file mode 100644 index 00000000000000..44a3f9ea490154 --- /dev/null +++ b/benchmark/config/generation.yaml @@ -0,0 +1,57 @@ +defaults: + - benchmark # inheriting benchmark schema + - scenario: inference + - launcher: process + - backend: pytorch + - _self_ # for hydra 1.1 compatibility + +name: pytorch_generate + +launcher: + start_method: spawn + device_isolation: true + device_isolation_action: warn + +backend: + device: cuda + device_ids: 0 + no_weights: true + model: meta-llama/Llama-2-7b-hf + cache_implementation: static + torch_compile: true + torch_dtype: float16 + torch_compile_config: + backend: inductor + mode: reduce-overhead + fullgraph: true + +scenario: + input_shapes: + batch_size: 1 + sequence_length: 7 + generate_kwargs: + max_new_tokens: 128 + min_new_tokens: 128 + do_sample: false + memory: true + latency: true + iterations: 2 + duration: 0 + + +# hydra/cli specific settings +hydra: + run: + # where to store run results + dir: runs/${name} + job: + # change working directory to the run directory + chdir: true + env_set: + # set environment variable OVERRIDE_BENCHMARKS to 1 + # to not skip benchmarks that have been run before + OVERRIDE_BENCHMARKS: 1 + LOG_LEVEL: WARN + sweep: + dir: multirun + subdir: ${hydra.job.override_dirname} \ No newline at end of file diff --git a/benchmark/optimum_benchmark_wrapper.py b/benchmark/optimum_benchmark_wrapper.py new file mode 100644 index 00000000000000..c43e9a73e3160d --- /dev/null +++ b/benchmark/optimum_benchmark_wrapper.py @@ -0,0 +1,16 @@ +import argparse +import subprocess + + +def main(config_dir, config_name, args): + subprocess.run(["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"] + ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"] + args) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + + parser.add_argument("--config-dir", type=str, required=True, help="The path to the config directory.") + parser.add_argument("--config-name", type=str, required=True, help="The config name.") + args, unknown = parser.parse_known_args() + + main(args.config_dir, args.config_name, unknown) diff --git a/conftest.py b/conftest.py index 247e5eb92d538a..40e43f25e8933d 100644 --- a/conftest.py +++ b/conftest.py @@ -21,12 +21,61 @@ from os.path import abspath, dirname, join import _pytest +import pytest from transformers.testing_utils import HfDoctestModule, HfDocTestParser +NOT_DEVICE_TESTS = { + "test_tokenization", + "test_processor", + "test_processing", + "test_beam_constraints", + "test_configuration_utils", + "test_data_collator", + "test_trainer_callback", + "test_trainer_utils", + "test_feature_extraction", + "test_image_processing", + "test_image_processor", + "test_image_transforms", + "test_optimization", + "test_retrieval", + "test_config", + "test_from_pretrained_no_checkpoint", + "test_keep_in_fp32_modules", + "test_gradient_checkpointing_backward_compatibility", + "test_gradient_checkpointing_enable_disable", + "test_save_load_fast_init_from_base", + "test_fast_init_context_manager", + "test_fast_init_tied_embeddings", + "test_save_load_fast_init_to_base", + "test_torch_save_load", + "test_initialization", + "test_forward_signature", + "test_model_get_set_embeddings", + "test_model_main_input_name", + "test_correct_missing_keys", + "test_tie_model_weights", + "test_can_use_safetensors", + "test_load_save_without_tied_weights", + "test_tied_weights_keys", + "test_model_weights_reload_no_missing_tied_weights", + "test_pt_tf_model_equivalence", + "test_mismatched_shapes_have_properly_initialized_weights", + "test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist", + "test_model_is_small", + "test_tf_from_pt_safetensors", + "test_flax_from_pt_safetensors", + "ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device + "ModelTester::test_pipeline_", + "/repo_utils/", + "/utils/", + "/agents/", +} + # allow having multiple repository checkouts and not needing to remember to rerun -# 'pip install -e .[dev]' when switching between checkouts and running tests. +# `pip install -e '.[dev]'` when switching between checkouts and running tests. git_repo_path = abspath(join(dirname(__file__), "src")) sys.path.insert(1, git_repo_path) @@ -45,7 +94,14 @@ def pytest_configure(config): config.addinivalue_line("markers", "is_pipeline_test: mark test to run only when pipelines are tested") config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment") config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate") - config.addinivalue_line("markers", "tool_tests: mark the tool tests that are run on their specific schedule") + config.addinivalue_line("markers", "agent_tests: mark the agent tests that are run on their specific schedule") + config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu") + + +def pytest_collection_modifyitems(items): + for item in items: + if any(test_name in item.nodeid for test_name in NOT_DEVICE_TESTS): + item.add_marker(pytest.mark.not_device_test) def pytest_addoption(parser): diff --git a/docker/consistency.dockerfile b/docker/consistency.dockerfile new file mode 100644 index 00000000000000..70c03c81370775 --- /dev/null +++ b/docker/consistency.dockerfile @@ -0,0 +1,16 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +USER root +ARG REF=main +RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs +ENV UV_PYTHON=/usr/local/bin/python +RUN pip install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools GitPython +RUN pip install --no-cache-dir --upgrade 'torch' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +# tensorflow pin matching setup.py +RUN uv pip install --no-cache-dir pypi-kenlm +RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16" +RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]" +RUN git lfs install + +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean diff --git a/docker/custom-tokenizers.dockerfile b/docker/custom-tokenizers.dockerfile new file mode 100644 index 00000000000000..5d95e689654ad6 --- /dev/null +++ b/docker/custom-tokenizers.dockerfile @@ -0,0 +1,26 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools + +RUN wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz +RUN tar xvf jumanpp-2.0.0-rc3.tar.xz +RUN mkdir jumanpp-2.0.0-rc3/bld +WORKDIR ./jumanpp-2.0.0-rc3/bld +RUN wget -LO catch.hpp https://github.com/catchorg/Catch2/releases/download/v2.13.8/catch.hpp +RUN mv catch.hpp ../libs/ +RUN cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local +RUN make install -j 10 + + +RUN uv pip install --no-cache --upgrade 'torch' --index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir "transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite +# spacy is not used so not tested. Causes to failures. TODO fix later +RUN python3 -m unidic download +RUN pip uninstall -y transformers + +RUN apt-get clean && rm -rf /var/lib/apt/lists/* +RUN apt remove -y g++ cmake xz-utils libprotobuf-dev protobuf-compiler \ No newline at end of file diff --git a/docker/examples-tf.dockerfile b/docker/examples-tf.dockerfile new file mode 100644 index 00000000000000..9281630d3af2c9 --- /dev/null +++ b/docker/examples-tf.dockerfile @@ -0,0 +1,12 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git +RUN apt-get install -y g++ cmake +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv +RUN uv pip install --no-cache-dir -U pip setuptools albumentations seqeval +RUN pip install --upgrade --no-cache-dir "transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]" +RUN uv pip install --no-cache-dir "protobuf==3.20.3" +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* \ No newline at end of file diff --git a/docker/examples-torch.dockerfile b/docker/examples-torch.dockerfile new file mode 100644 index 00000000000000..da9afcb801da11 --- /dev/null +++ b/docker/examples-torch.dockerfile @@ -0,0 +1,11 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +USER root +RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir librosa "transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* \ No newline at end of file diff --git a/docker/exotic-models.dockerfile b/docker/exotic-models.dockerfile new file mode 100644 index 00000000000000..2371ffb91c97ce --- /dev/null +++ b/docker/exotic-models.dockerfile @@ -0,0 +1,17 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1-mesa-glx libgl1 g++ tesseract-ocr +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir --no-deps timm accelerate +RUN pip install -U --upgrade-strategy eager --no-cache-dir pytesseract python-Levenshtein opencv-python nltk +# RUN uv pip install --no-cache-dir natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels +RUN pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[testing, vision]" 'scikit-learn' 'torch-stft' 'nose' 'dataset' +# RUN git clone https://github.com/facebookresearch/detectron2.git +# RUN python3 -m pip install --no-cache-dir -e detectron2 +RUN pip install 'git+https://github.com/facebookresearch/detectron2.git@92ae9f0b92aba5867824b4f12aa06a22a60a45d3' +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* diff --git a/docker/jax-light.dockerfile b/docker/jax-light.dockerfile new file mode 100644 index 00000000000000..315b526a7144d3 --- /dev/null +++ b/docker/jax-light.dockerfile @@ -0,0 +1,10 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,testing,sentencepiece,flax-speech,vision]" +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean \ No newline at end of file diff --git a/docker/pipeline-tf.dockerfile b/docker/pipeline-tf.dockerfile new file mode 100644 index 00000000000000..393738ff87ff17 --- /dev/null +++ b/docker/pipeline-tf.dockerfile @@ -0,0 +1,10 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake g++ +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]" +RUN uv pip install --no-cache-dir "protobuf==3.20.3" tensorflow_probability +RUN apt-get clean && rm -rf /var/lib/apt/lists/* \ No newline at end of file diff --git a/docker/pipeline-torch.dockerfile b/docker/pipeline-torch.dockerfile new file mode 100644 index 00000000000000..992891a54a417c --- /dev/null +++ b/docker/pipeline-torch.dockerfile @@ -0,0 +1,11 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" +RUN pip uninstall -y transformers \ No newline at end of file diff --git a/docker/quality.dockerfile b/docker/quality.dockerfile new file mode 100644 index 00000000000000..7a4145517a7666 --- /dev/null +++ b/docker/quality.dockerfile @@ -0,0 +1,9 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y time git +ENV UV_PYTHON=/usr/local/bin/python +RUN pip install uv && uv venv +RUN uv pip install --no-cache-dir -U pip setuptools GitPython "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ruff]" urllib3 +RUN apt-get install -y jq curl && apt-get clean && rm -rf /var/lib/apt/lists/* \ No newline at end of file diff --git a/docker/tf-light.dockerfile b/docker/tf-light.dockerfile new file mode 100644 index 00000000000000..7168ddae1227cf --- /dev/null +++ b/docker/tf-light.dockerfile @@ -0,0 +1,12 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ pkg-config openssh-client git +RUN apt-get install -y cmake +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]" +RUN uv pip install --no-cache-dir "protobuf==3.20.3" +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean \ No newline at end of file diff --git a/docker/torch-jax-light.dockerfile b/docker/torch-jax-light.dockerfile new file mode 100644 index 00000000000000..7cfa141732fefd --- /dev/null +++ b/docker/torch-jax-light.dockerfile @@ -0,0 +1,16 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN uv pip install --no-deps accelerate +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,audio,sklearn,sentencepiece,vision,testing]" + + +# RUN pip install --no-cache-dir "scipy<1.13" "transformers[flax,testing,sentencepiece,flax-speech,vision]" + +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean diff --git a/docker/torch-light.dockerfile b/docker/torch-light.dockerfile new file mode 100644 index 00000000000000..524a68fd55407f --- /dev/null +++ b/docker/torch-light.dockerfile @@ -0,0 +1,11 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +USER root +RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu +RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" +RUN pip uninstall -y transformers \ No newline at end of file diff --git a/docker/torch-tf-light.dockerfile b/docker/torch-tf-light.dockerfile new file mode 100644 index 00000000000000..ac35b6be81f872 --- /dev/null +++ b/docker/torch-tf-light.dockerfile @@ -0,0 +1,19 @@ +FROM python:3.10-slim +ENV PYTHONDONTWRITEBYTECODE=1 +ARG REF=main +RUN echo ${REF} +USER root +RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs +ENV UV_PYTHON=/usr/local/bin/python +RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools +RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu +RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu +RUN git lfs install + +RUN uv pip install --no-cache-dir pypi-kenlm +RUN pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,sentencepiece,vision,testing]" +RUN uv pip install --no-cache-dir "protobuf==3.20.3" librosa + + +RUN pip uninstall -y transformers +RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean \ No newline at end of file diff --git a/docker/transformers-all-latest-gpu/Dockerfile b/docker/transformers-all-latest-gpu/Dockerfile index 0d694eaa72d636..9c5e3c91415745 100644 --- a/docker/transformers-all-latest-gpu/Dockerfile +++ b/docker/transformers-all-latest-gpu/Dockerfile @@ -1,4 +1,4 @@ -FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 +FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04 LABEL maintainer="Hugging Face" ARG DEBIAN_FRONTEND=noninteractive @@ -9,11 +9,11 @@ SHELL ["sh", "-lc"] # The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant # to be used as arguments for docker build (so far). -ARG PYTORCH='2.1.0' +ARG PYTORCH='2.4.0' # (not always a valid torch version) -ARG INTEL_TORCH_EXT='2.1.0' +ARG INTEL_TORCH_EXT='2.3.0' # Example: `cu102`, `cu113`, etc. -ARG CUDA='cu118' +ARG CUDA='cu121' RUN apt update RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs @@ -23,17 +23,10 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip ARG REF=main RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF -# TODO: Handle these in a python utility script -RUN [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile -RUN echo torch=$VERSION -# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build. -# Currently, let's just use their latest releases (when `torch` is installed with a release version) -# TODO: We might need to specify proper versions that work with a specific torch version (especially for past CI). -RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA - -RUN python3 -m pip install --no-cache-dir -U tensorflow==2.13 protobuf==3.20.3 tensorflow_text tensorflow_probability - -RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] +# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future. +# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`. +# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions). +RUN python3 -m pip install --no-cache-dir -U tensorflow==2.13 protobuf==3.20.3 tensorflow_text tensorflow_probability && python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA RUN python3 -m pip uninstall -y flax jax @@ -46,30 +39,32 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/acc RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/peft@main#egg=peft -# Add bitsandbytes for mixed int8 testing -RUN python3 -m pip install --no-cache-dir bitsandbytes - -# Add auto-gptq for gtpq quantization testing -RUN python3 -m pip install --no-cache-dir auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ - -# Add einops for additional model testing -RUN python3 -m pip install --no-cache-dir einops - -# Add autoawq for quantization testing -RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.8/autoawq-0.1.8+cu118-cp38-cp38-linux_x86_64.whl - -# For bettertransformer + gptq +# For bettertransformer RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum # For video model testing RUN python3 -m pip install --no-cache-dir decord av==9.2.0 +# Some slow tests require bnb +RUN python3 -m pip install --no-cache-dir bitsandbytes + +# Some tests require quanto +RUN python3 -m pip install --no-cache-dir quanto + +# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests +# (`deformable_detr`, `rwkv`, `mra`) +RUN python3 -m pip uninstall -y ninja + # For `dinat` model -RUN python3 -m pip install --no-cache-dir natten -f https://shi-labs.com/natten/wheels/$CUDA/ +# The `XXX` part in `torchXXX` needs to match `PYTORCH` (to some extent) +RUN python3 -m pip install --no-cache-dir natten==0.15.1+torch220$CUDA -f https://shi-labs.com/natten/wheels # For `nougat` tokenizer RUN python3 -m pip install --no-cache-dir python-Levenshtein +# For `FastSpeech2ConformerTokenizer` tokenizer +RUN python3 -m pip install --no-cache-dir g2p-en + # When installing in editable mode, `transformers` is not recognized as a package. # this line must be added in order for python to be aware of transformers. RUN cd transformers && python3 setup.py develop diff --git a/docker/transformers-doc-builder/Dockerfile b/docker/transformers-doc-builder/Dockerfile index c9f6adb63e0cb1..bd3d2ce2be1604 100644 --- a/docker/transformers-doc-builder/Dockerfile +++ b/docker/transformers-doc-builder/Dockerfile @@ -1,4 +1,4 @@ -FROM python:3.8 +FROM python:3.10 LABEL maintainer="Hugging Face" RUN apt update diff --git a/docker/transformers-pytorch-amd-gpu/Dockerfile b/docker/transformers-pytorch-amd-gpu/Dockerfile index 46ca1a531b4ab4..da91906d621429 100644 --- a/docker/transformers-pytorch-amd-gpu/Dockerfile +++ b/docker/transformers-pytorch-amd-gpu/Dockerfile @@ -1,24 +1,19 @@ -FROM rocm/dev-ubuntu-20.04:5.6 +FROM rocm/dev-ubuntu-22.04:6.0.2 # rocm/pytorch has no version with 2.1.0 LABEL maintainer="Hugging Face" ARG DEBIAN_FRONTEND=noninteractive -ARG PYTORCH='2.1.0' -ARG TORCH_VISION='0.16.0' -ARG TORCH_AUDIO='2.1.0' -ARG ROCM='5.6' - RUN apt update && \ - apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip ffmpeg && \ + apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip python3-dev ffmpeg && \ apt clean && \ rm -rf /var/lib/apt/lists/* -RUN python3 -m pip install --no-cache-dir --upgrade pip +RUN python3 -m pip install --no-cache-dir --upgrade pip numpy -RUN python3 -m pip install torch==$PYTORCH torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO --index-url https://download.pytorch.org/whl/rocm$ROCM +RUN python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 -RUN python3 -m pip install --no-cache-dir --upgrade pip setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0" +RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0" ARG REF=main WORKDIR / @@ -34,3 +29,6 @@ RUN python3 -m pip uninstall -y tensorflow flax # When installing in editable mode, `transformers` is not recognized as a package. # this line must be added in order for python to be aware of transformers. RUN cd transformers && python3 setup.py develop + +# Remove nvml as it is not compatible with ROCm. apex is not tested on NVIDIA either. +RUN python3 -m pip uninstall py3nvml pynvml apex -y diff --git a/docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile b/docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile index 1fa384dfa2bc03..fc6f912235be10 100644 --- a/docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile +++ b/docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile @@ -42,4 +42,7 @@ RUN python3 -m pip install --no-cache-dir ./transformers[accelerate,testing,sent # this line must be added in order for python to be aware of transformers. RUN cd transformers && python3 setup.py develop -RUN python3 -c "from deepspeed.launcher.runner import main" \ No newline at end of file +RUN python3 -c "from deepspeed.launcher.runner import main" + +# Remove nvml as it is not compatible with ROCm +RUN python3 -m pip uninstall py3nvml pynvml -y diff --git a/docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile b/docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile index a7b08a8c60d31d..648aaa189d859e 100644 --- a/docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile +++ b/docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile @@ -1,10 +1,10 @@ # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-11.html#rel-23-11 -FROM nvcr.io/nvidia/pytorch:23.11-py3 +FROM nvcr.io/nvidia/pytorch:23.04-py3 LABEL maintainer="Hugging Face" ARG DEBIAN_FRONTEND=noninteractive -ARG PYTORCH='2.1.0' +ARG PYTORCH='2.2.0' # Example: `cu102`, `cu113`, etc. ARG CUDA='cu121' @@ -15,14 +15,12 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip ARG REF=main RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF -RUN python3 -m pip uninstall -y torch torchvision torchaudio +RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing] # Install latest release PyTorch # (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.) # (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops) -RUN python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA - -RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing] +RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate diff --git a/docker/transformers-pytorch-gpu/Dockerfile b/docker/transformers-pytorch-gpu/Dockerfile index 44f609589419f2..2c1f153eef275e 100644 --- a/docker/transformers-pytorch-gpu/Dockerfile +++ b/docker/transformers-pytorch-gpu/Dockerfile @@ -11,7 +11,7 @@ ARG REF=main RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF # If set to nothing, will install the latest version -ARG PYTORCH='2.1.0' +ARG PYTORCH='2.4.0' ARG TORCH_VISION='' ARG TORCH_AUDIO='' # Example: `cu102`, `cu113`, etc. diff --git a/docker/transformers-quantization-latest-gpu/Dockerfile b/docker/transformers-quantization-latest-gpu/Dockerfile new file mode 100755 index 00000000000000..6d94dbee5aa0e9 --- /dev/null +++ b/docker/transformers-quantization-latest-gpu/Dockerfile @@ -0,0 +1,66 @@ +FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 +LABEL maintainer="Hugging Face" + +ARG DEBIAN_FRONTEND=noninteractive + +# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands) +SHELL ["sh", "-lc"] + +# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant +# to be used as arguments for docker build (so far). + +ARG PYTORCH='2.2.1' +# Example: `cu102`, `cu113`, etc. +ARG CUDA='cu118' + +RUN apt update +RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python python3-pip ffmpeg +RUN python3 -m pip install --no-cache-dir --upgrade pip + +ARG REF=main +RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF + +RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile +RUN echo torch=$VERSION +# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build. +# Currently, let's just use their latest releases (when `torch` is installed with a release version) +RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA + +RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch] + +RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate + +# needed in bnb and awq +RUN python3 -m pip install --no-cache-dir einops + +# Add bitsandbytes for mixed int8 testing +RUN python3 -m pip install --no-cache-dir bitsandbytes + +# Add auto-gptq for gtpq quantization testing +RUN python3 -m pip install --no-cache-dir auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ + +# Add optimum for gptq quantization testing +RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum + +# Add aqlm for quantization testing +RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2 + +# Add hqq for quantization testing +RUN python3 -m pip install --no-cache-dir hqq + +# For GGUF tests +RUN python3 -m pip install --no-cache-dir gguf + +# Add autoawq for quantization testing +# >=v0.2.3 needed for compatibility with torch 2.2.1 +RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.3/autoawq-0.2.3+cu118-cp38-cp38-linux_x86_64.whl + +# Add quanto for quantization testing +RUN python3 -m pip install --no-cache-dir quanto + +# Add eetq for quantization testing +RUN python3 -m pip install git+https://github.com/NetEase-FuXi/EETQ.git + +# When installing in editable mode, `transformers` is not recognized as a package. +# this line must be added in order for python to be aware of transformers. +RUN cd transformers && python3 setup.py develop diff --git a/docker/transformers-tensorflow-gpu/Dockerfile b/docker/transformers-tensorflow-gpu/Dockerfile index df9039a0c4d28e..adccee1ace4998 100644 --- a/docker/transformers-tensorflow-gpu/Dockerfile +++ b/docker/transformers-tensorflow-gpu/Dockerfile @@ -1,4 +1,4 @@ -FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 +FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04 LABEL maintainer="Hugging Face" ARG DEBIAN_FRONTEND=noninteractive diff --git a/docs/README.md b/docs/README.md index 9269cc5bd291b2..7dbcefc0483c66 100644 --- a/docs/README.md +++ b/docs/README.md @@ -202,7 +202,7 @@ provide its path. For instance: \[\`utils.ModelOutput\`\]. This will be converte `utils.ModelOutput` in the description. To get rid of the path and only keep the name of the object you are linking to in the description, add a ~: \[\`~utils.ModelOutput\`\] will generate a link with `ModelOutput` in the description. -The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\]. +The same works for methods so you can either use \[\`XXXClass.method\`\] or \[\`~XXXClass.method\`\]. #### Defining arguments in a method @@ -250,7 +250,7 @@ then its documentation should look like this: Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even if the first line describing your argument type and its default gets long, you can't break it on several lines. You can -however write as many lines as you want in the indented description (see the example above with `input_ids`). +however, write as many lines as you want in the indented description (see the example above with `input_ids`). #### Writing a multi-line code block diff --git a/docs/TRANSLATING.md b/docs/TRANSLATING.md index 420e7a8b16a1c8..49747821f476f0 100644 --- a/docs/TRANSLATING.md +++ b/docs/TRANSLATING.md @@ -54,4 +54,4 @@ The fields you should add are `local` (with the name of the file containing the Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter. -> ๐Ÿ™‹ If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/transformers/issues) and tag @stevhliu and @MKhalusova. +> ๐Ÿ™‹ If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/transformers/issues) and tag @stevhliu. diff --git a/docs/source/_config.py b/docs/source/_config.py index d26d908aa29ea2..f49e4e4731965a 100644 --- a/docs/source/_config.py +++ b/docs/source/_config.py @@ -1,7 +1,7 @@ # docstyle-ignore INSTALL_CONTENT = """ # Transformers installation -! pip install transformers datasets evaluate +! pip install transformers datasets evaluate accelerate # To install from source instead of the last release, comment the command above and uncomment the following one. # ! pip install git+https://github.com/huggingface/transformers.git """ diff --git a/docs/source/de/_config.py b/docs/source/de/_config.py index a6d75853f57219..f49e4e4731965a 100644 --- a/docs/source/de/_config.py +++ b/docs/source/de/_config.py @@ -1,7 +1,7 @@ # docstyle-ignore INSTALL_CONTENT = """ # Transformers installation -! pip install transformers datasets +! pip install transformers datasets evaluate accelerate # To install from source instead of the last release, comment the command above and uncomment the following one. # ! pip install git+https://github.com/huggingface/transformers.git """ diff --git a/docs/source/de/_toctree.yml b/docs/source/de/_toctree.yml index d18a14ce9298a3..859c4b7b3b3010 100644 --- a/docs/source/de/_toctree.yml +++ b/docs/source/de/_toctree.yml @@ -29,10 +29,10 @@ title: Generation with LLMs title: Tutorials - sections: + - local: contributing + title: Wie kann man zu ๐Ÿค— Transformers beitragen? - local: add_new_model title: Wie fรผgt man ein Modell zu ๐Ÿค— Transformers hinzu? - - local: add_tensorflow_model - title: Wie konvertiert man ein ๐Ÿค— Transformers-Modell in TensorFlow? - local: add_new_pipeline title: Wie fรผgt man eine Pipeline zu ๐Ÿค— Transformers hinzu? - local: testing diff --git a/docs/source/de/add_new_model.md b/docs/source/de/add_new_model.md index 2c1f0f6a00ad36..3c8987f44254bc 100644 --- a/docs/source/de/add_new_model.md +++ b/docs/source/de/add_new_model.md @@ -17,12 +17,6 @@ rendered properly in your Markdown viewer. Die ๐Ÿค— Transformers-Bibliothek ist dank der Beitrรคge der Community oft in der Lage, neue Modelle anzubieten. Aber das kann ein anspruchsvolles Projekt sein und erfordert eine eingehende Kenntnis der ๐Ÿค— Transformers-Bibliothek und des zu implementierenden Modells. Bei Hugging Face versuchen wir, mehr Mitgliedern der Community die Mรถglichkeit zu geben, aktiv Modelle hinzuzufรผgen, und wir haben diese Anleitung zusammengestellt, die Sie durch den Prozess des Hinzufรผgens eines PyTorch-Modells fรผhrt (stellen Sie sicher, dass Sie [PyTorch installiert haben](https://pytorch.org/get-started/locally/)). - - -Wenn Sie daran interessiert sind, ein TensorFlow-Modell zu implementieren, werfen Sie einen Blick in die Anleitung [How to convert a ๐Ÿค— Transformers model to TensorFlow](add_tensorflow_model)! - - - Auf dem Weg dorthin, werden Sie: - Einblicke in bewรคhrte Open-Source-Verfahren erhalten @@ -89,8 +83,8 @@ model.config # model has access to its config ร„hnlich wie das Modell erbt die Konfiguration grundlegende Serialisierungs- und Deserialisierungsfunktionalitรคten von [`PretrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden unterschiedliche Formate serialisiert werden - das Modell in eine *pytorch_model.bin* Datei und die Konfiguration in eine *config.json* Datei. Aufruf von -[~PreTrainedModel.save_pretrained`] wird automatisch -[~PretrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden. +[`~PreTrainedModel.save_pretrained`] wird automatisch +[`~PretrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden. ### Code-Stil @@ -404,12 +398,14 @@ In dem speziellen Fall, dass Sie ein Modell hinzufรผgen, dessen Architektur gena Modells รผbereinstimmt, mรผssen Sie nur ein Konvertierungsskript hinzufรผgen, wie in [diesem Abschnitt](#write-a-conversion-script) beschrieben. In diesem Fall kรถnnen Sie einfach die gesamte Modellarchitektur des bereits vorhandenen Modells wiederverwenden. -Andernfalls beginnen wir mit der Erstellung eines neuen Modells. Sie haben hier zwei Mรถglichkeiten: +Andernfalls beginnen wir mit der Erstellung eines neuen Modells. Wir empfehlen die Verwendung des folgenden Skripts, um ein Modell hinzuzufรผgen +ein bestehendes Modell: -- `transformers-cli add-new-model-like`, um ein neues Modell wie ein bestehendes hinzuzufรผgen -- `transformers-cli add-new-model`, um ein neues Modell aus unserer Vorlage hinzuzufรผgen (sieht dann aus wie BERT oder Bart, je nachdem, welche Art von Modell Sie wรคhlen) +```bash +transformers-cli add-new-model-like +``` -In beiden Fรคllen werden Sie mit einem Fragebogen aufgefordert, die grundlegenden Informationen zu Ihrem Modell auszufรผllen. Fรผr den zweiten Befehl mรผssen Sie `cookiecutter` installieren, weitere Informationen dazu finden Sie [hier](https://github.com/huggingface/transformers/tree/main/templates/adding_a_new_model). +Sie werden mit einem Fragebogen aufgefordert, die grundlegenden Informationen Ihres Modells einzugeben. **Erรถffnen Sie einen Pull Request auf dem Haupt-Repositorium huggingface/transformers** @@ -531,7 +527,7 @@ aber alle anderen sollten eine Initialisierung wie oben verwenden. Dies ist wie ```py def _init_weights(self, module): """Initialize the weights""" - if isinstnace(module, Wav2Vec2ForPreTraining): + if isinstance(module, Wav2Vec2ForPreTraining): module.project_hid.reset_parameters() module.project_q.reset_parameters() module.project_hid._is_hf_initialized = True @@ -543,7 +539,7 @@ def _init_weights(self, module): ``` Das Flag `_is_hf_initialized` wird intern verwendet, um sicherzustellen, dass wir ein Submodul nur einmal initialisieren. Wenn Sie es auf -True` fรผr `module.project_q` und `module.project_hid` setzen, stellen wir sicher, dass die benutzerdefinierte Initialisierung, die wir vorgenommen haben, spรคter nicht รผberschrieben wird, +`True` fรผr `module.project_q` und `module.project_hid` setzen, stellen wir sicher, dass die benutzerdefinierte Initialisierung, die wir vorgenommen haben, spรคter nicht รผberschrieben wird, die Funktion `_init_weights` nicht auf sie angewendet wird. **6. Schreiben Sie ein Konvertierungsskript** @@ -556,7 +552,7 @@ demselben Framework wie *brand_new_bert* geschrieben wurde. Normalerweise reicht es fรผr Ihren Anwendungsfall leicht anzupassen. Zรถgern Sie nicht, das Hugging Face Team zu bitten, Sie auf ein รคhnliches, bereits vorhandenes Konvertierungsskript fรผr Ihr Modell zu finden. -- Wenn Sie ein Modell von TensorFlow nach PyTorch portieren, ist ein guter Ausgangspunkt das Konvertierungsskript von BERT [hier] (https://github.com/huggingface/transformers/blob/7acfa95afb8194f8f9c1f4d2c6028224dbed35a2/src/transformers/models/bert/modeling_bert.py#L91) +- Wenn Sie ein Modell von TensorFlow nach PyTorch portieren, ist ein guter Ausgangspunkt das Konvertierungsskript von BERT [hier](https://github.com/huggingface/transformers/blob/7acfa95afb8194f8f9c1f4d2c6028224dbed35a2/src/transformers/models/bert/modeling_bert.py#L91) - Wenn Sie ein Modell von PyTorch nach PyTorch portieren, ist ein guter Ausgangspunkt das Konvertierungsskript von BART [hier](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py) Im Folgenden werden wir kurz erklรคren, wie PyTorch-Modelle Ebenengewichte speichern und Ebenennamen definieren. In PyTorch wird der @@ -682,7 +678,7 @@ model.save_pretrained("/path/to/converted/checkpoint/folder") **7. Implementieren Sie den Vorwรคrtspass** Nachdem es Ihnen gelungen ist, die trainierten Gewichte korrekt in die ๐Ÿค— Transformers-Implementierung zu laden, sollten Sie nun dafรผr sorgen -sicherstellen, dass der Forward Pass korrekt implementiert ist. In [Machen Sie sich mit dem ursprรผnglichen Repository vertraut](#34-run-a-pretrained-checkpoint-using-the-original-repository) haben Sie bereits ein Skript erstellt, das einen Forward Pass +sicherstellen, dass der Forward Pass korrekt implementiert ist. In [Machen Sie sich mit dem ursprรผnglichen Repository vertraut](#3-4-fรผhren-sie-einen-pre-training-checkpoint-mit-dem-original-repository-durch) haben Sie bereits ein Skript erstellt, das einen Forward Pass Durchlauf des Modells unter Verwendung des Original-Repositorys durchfรผhrt. Jetzt sollten Sie ein analoges Skript schreiben, das die ๐Ÿค— Transformers Implementierung anstelle der Originalimplementierung verwenden. Es sollte wie folgt aussehen: @@ -759,7 +755,7 @@ Falls Sie Windows verwenden, sollten Sie `RUN_SLOW=1` durch `SET RUN_SLOW=1` ers Zweitens sollten alle Funktionen, die speziell fรผr *brand_new_bert* sind, zusรคtzlich in einem separaten Test getestet werden unter -`BrandNewBertModelTester`/``BrandNewBertModelTest`. Dieser Teil wird oft vergessen, ist aber in zweierlei Hinsicht รคuรŸerst nรผtzlich +`BrandNewBertModelTester`/`BrandNewBertModelTest`. Dieser Teil wird oft vergessen, ist aber in zweierlei Hinsicht รคuรŸerst nรผtzlich Weise: - Er hilft dabei, das Wissen, das Sie wรคhrend der Modellerweiterung erworben haben, an die Community weiterzugeben, indem er zeigt, wie die diff --git a/docs/source/de/add_new_pipeline.md b/docs/source/de/add_new_pipeline.md index 7615ac7bfd5947..47d93e90ac1494 100644 --- a/docs/source/de/add_new_pipeline.md +++ b/docs/source/de/add_new_pipeline.md @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer. # Wie erstellt man eine benutzerdefinierte Pipeline? -In dieser Anleitung sehen wir uns an, wie Sie eine benutzerdefinierte Pipeline erstellen und sie auf dem [Hub](hf.co/models) freigeben oder sie der +In dieser Anleitung sehen wir uns an, wie Sie eine benutzerdefinierte Pipeline erstellen und sie auf dem [Hub](https://hf.co/models) freigeben oder sie der ๐Ÿค— Transformers-Bibliothek hinzufรผgen. Zuallererst mรผssen Sie entscheiden, welche Roheingaben die Pipeline verarbeiten kann. Es kann sich um Strings, rohe Bytes, @@ -208,14 +208,10 @@ from transformers import pipeline classifier = pipeline("pair-classification", model="sgugger/finetuned-bert-mrpc") ``` -Dann kรถnnen wir sie auf dem Hub mit der Methode `save_pretrained` in einem `Repository` freigeben: +Dann kรถnnen wir sie auf dem Hub mit der Methode `push_to_hub` freigeben: ```py -from huggingface_hub import Repository - -repo = Repository("test-dynamic-pipeline", clone_from="{your_username}/test-dynamic-pipeline") -classifier.save_pretrained("test-dynamic-pipeline") -repo.push_to_hub() +classifier.push_to_hub("test-dynamic-pipeline") ``` Dadurch wird die Datei, in der Sie `PairClassificationPipeline` definiert haben, in den Ordner `"test-dynamic-pipeline"` kopiert, @@ -246,13 +242,13 @@ Ausgabe der Pipeline TYPE. AuรŸerdem *mรผssen* Sie 2 (idealerweise 4) Tests implementieren. -- test_small_model_pt` : Definieren Sie 1 kleines Modell fรผr diese Pipeline (es spielt keine Rolle, ob die Ergebnisse keinen Sinn ergeben) +- `test_small_model_pt` : Definieren Sie 1 kleines Modell fรผr diese Pipeline (es spielt keine Rolle, ob die Ergebnisse keinen Sinn ergeben) und testen Sie die Ausgaben der Pipeline. Die Ergebnisse sollten die gleichen sein wie bei `test_small_model_tf`. -- test_small_model_tf : Definieren Sie 1 kleines Modell fรผr diese Pipeline (es spielt keine Rolle, ob die Ergebnisse keinen Sinn ergeben) +- `test_small_model_tf` : Definieren Sie 1 kleines Modell fรผr diese Pipeline (es spielt keine Rolle, ob die Ergebnisse keinen Sinn ergeben) und testen Sie die Ausgaben der Pipeline. Die Ergebnisse sollten die gleichen sein wie bei `test_small_model_pt`. -- test_large_model_pt` (`optional`): Testet die Pipeline an einer echten Pipeline, bei der die Ergebnisse +- `test_large_model_pt` (`optional`): Testet die Pipeline an einer echten Pipeline, bei der die Ergebnisse Sinn machen. Diese Tests sind langsam und sollten als solche gekennzeichnet werden. Hier geht es darum, die Pipeline zu prรคsentieren und sicherzustellen sicherzustellen, dass es in zukรผnftigen Versionen keine Abweichungen gibt. -- test_large_model_tf` (`optional`): Testet die Pipeline an einer echten Pipeline, bei der die Ergebnisse +- `test_large_model_tf` (`optional`): Testet die Pipeline an einer echten Pipeline, bei der die Ergebnisse Sinn machen. Diese Tests sind langsam und sollten als solche gekennzeichnet werden. Hier geht es darum, die Pipeline zu prรคsentieren und sicherzustellen sicherzustellen, dass es in zukรผnftigen Versionen keine Abweichungen gibt. diff --git a/docs/source/de/add_tensorflow_model.md b/docs/source/de/add_tensorflow_model.md deleted file mode 100644 index cc640aeb5e64af..00000000000000 --- a/docs/source/de/add_tensorflow_model.md +++ /dev/null @@ -1,356 +0,0 @@ - - -# Wie konvertiert man ein ๐Ÿค— Transformers-Modell in TensorFlow? - -Die Tatsache, dass mehrere Frameworks fรผr die Verwendung mit ๐Ÿค— Transformers zur Verfรผgung stehen, gibt Ihnen die Flexibilitรคt, deren Stรคrken beim Entwurf Ihrer Anwendung auszuspielen. -Ihre Anwendung zu entwerfen, aber das bedeutet auch, dass die Kompatibilitรคt fรผr jedes Modell einzeln hinzugefรผgt werden muss. Die gute Nachricht ist, dass -das Hinzufรผgen von TensorFlow-Kompatibilitรคt zu einem bestehenden Modell einfacher ist als [das Hinzufรผgen eines neuen Modells von Grund auf](add_new_model)! -Ob Sie ein tieferes Verstรคndnis fรผr groรŸe TensorFlow-Modelle haben mรถchten, einen wichtigen Open-Source-Beitrag leisten oder -TensorFlow fรผr das Modell Ihrer Wahl aktivieren wollen, dieser Leitfaden ist fรผr Sie. - -Dieser Leitfaden befรคhigt Sie, ein Mitglied unserer Gemeinschaft, TensorFlow-Modellgewichte und/oder -Architekturen beizusteuern, die in ๐Ÿค— Transformers verwendet werden sollen, und zwar mit minimaler Betreuung durch das Hugging Face Team. Das Schreiben eines neuen Modells -ist keine Kleinigkeit, aber ich hoffe, dass dieser Leitfaden dazu beitrรคgt, dass es weniger eine Achterbahnfahrt ๐ŸŽข und mehr ein Spaziergang im Park ๐Ÿšถ ist. -Die Nutzung unserer kollektiven Erfahrungen ist absolut entscheidend, um diesen Prozess immer einfacher zu machen, und deshalb mรถchten wir -ermutigen Sie daher, Verbesserungsvorschlรคge fรผr diesen Leitfaden zu machen! - -Bevor Sie tiefer eintauchen, empfehlen wir Ihnen, die folgenden Ressourcen zu lesen, wenn Sie neu in ๐Ÿค— Transformers sind: -- [Allgemeiner รœberblick รผber ๐Ÿค— Transformers](add_new_model#general-overview-of-transformers) -- [Die TensorFlow-Philosophie von Hugging Face](https://huggingface.co/blog/tensorflow-philosophy) - -Im Rest dieses Leitfadens werden Sie lernen, was nรถtig ist, um eine neue TensorFlow Modellarchitektur hinzuzufรผgen, die -Verfahren zur Konvertierung von PyTorch in TensorFlow-Modellgewichte und wie Sie Unstimmigkeiten zwischen ML -Frameworks. Legen Sie los! - - - -Sind Sie unsicher, ob das Modell, das Sie verwenden mรถchten, bereits eine entsprechende TensorFlow-Architektur hat? - -  - -รœberprรผfen Sie das Feld `model_type` in der `config.json` des Modells Ihrer Wahl -([Beispiel](https://huggingface.co/bert-base-uncased/blob/main/config.json#L14)). Wenn der entsprechende Modellordner in -๐Ÿค— Transformers eine Datei hat, deren Name mit "modeling_tf" beginnt, bedeutet dies, dass es eine entsprechende TensorFlow -Architektur hat ([Beispiel](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert)). - - - - -## Schritt-fรผr-Schritt-Anleitung zum Hinzufรผgen von TensorFlow-Modellarchitektur-Code - -Es gibt viele Mรถglichkeiten, eine groรŸe Modellarchitektur zu entwerfen, und viele Mรถglichkeiten, diesen Entwurf zu implementieren. Wie auch immer, -Sie erinnern sich vielleicht an unseren [allgemeinen รœberblick รผber ๐Ÿค— Transformers](add_new_model#general-overview-of-transformers) -wissen, dass wir ein meinungsfreudiger Haufen sind - die Benutzerfreundlichkeit von ๐Ÿค— Transformers hรคngt von konsistenten Designentscheidungen ab. Aus -Erfahrung kรถnnen wir Ihnen ein paar wichtige Dinge รผber das Hinzufรผgen von TensorFlow-Modellen sagen: - -- Erfinden Sie das Rad nicht neu! In den meisten Fรคllen gibt es mindestens zwei Referenzimplementierungen, die Sie รผberprรผfen sollten: das -PyTorch-ร„quivalent des Modells, das Sie implementieren, und andere TensorFlow-Modelle fรผr dieselbe Klasse von Problemen. -- Gute Modellimplementierungen รผberleben den Test der Zeit. Dies geschieht nicht, weil der Code hรผbsch ist, sondern eher -sondern weil der Code klar, einfach zu debuggen und darauf aufzubauen ist. Wenn Sie den Maintainern das Leben mit Ihrer -TensorFlow-Implementierung leicht machen, indem Sie die gleichen Muster wie in anderen TensorFlow-Modellen nachbilden und die Abweichung -zur PyTorch-Implementierung minimieren, stellen Sie sicher, dass Ihr Beitrag lange Bestand haben wird. -- Bitten Sie um Hilfe, wenn Sie nicht weiterkommen! Das ๐Ÿค— Transformers-Team ist da, um zu helfen, und wir haben wahrscheinlich Lรถsungen fรผr die gleichen -Probleme gefunden, vor denen Sie stehen. - -Hier finden Sie einen รœberblick รผber die Schritte, die zum Hinzufรผgen einer TensorFlow-Modellarchitektur erforderlich sind: -1. Wรคhlen Sie das Modell, das Sie konvertieren mรถchten -2. Bereiten Sie die Transformers-Entwicklungsumgebung vor. -3. (Optional) Verstehen Sie die theoretischen Aspekte und die bestehende Implementierung -4. Implementieren Sie die Modellarchitektur -5. Implementieren Sie Modelltests -6. Reichen Sie den Pull-Antrag ein -7. (Optional) Erstellen Sie Demos und teilen Sie diese mit der Welt - -### 1.-3. Bereiten Sie Ihren Modellbeitrag vor - -**1. Wรคhlen Sie das Modell, das Sie konvertieren mรถchten** - -Beginnen wir mit den Grundlagen: Als erstes mรผssen Sie die Architektur kennen, die Sie konvertieren mรถchten. Wenn Sie -Sie sich nicht auf eine bestimmte Architektur festgelegt haben, ist es eine gute Mรถglichkeit, das ๐Ÿค— Transformers-Team um Vorschlรคge zu bitten. -Wir werden Sie zu den wichtigsten Architekturen fรผhren, die auf der TensorFlow-Seite noch fehlen. -Seite fehlen. Wenn das spezifische Modell, das Sie mit TensorFlow verwenden mรถchten, bereits eine Implementierung der TensorFlow-Architektur in -๐Ÿค— Transformers, aber es fehlen Gewichte, kรถnnen Sie direkt in den -Abschnitt [Gewichtskonvertierung](#adding-tensorflow-weights-to-hub) -auf dieser Seite. - -Der Einfachheit halber wird im Rest dieser Anleitung davon ausgegangen, dass Sie sich entschieden haben, mit der TensorFlow-Version von -*BrandNewBert* (dasselbe Beispiel wie in der [Anleitung](add_new_model), um ein neues Modell von Grund auf hinzuzufรผgen). - - - -Bevor Sie mit der Arbeit an einer TensorFlow-Modellarchitektur beginnen, sollten Sie sich vergewissern, dass es keine laufenden Bemรผhungen in dieser Richtung gibt. -Sie kรถnnen nach `BrandNewBert` auf der -[pull request GitHub page](https://github.com/huggingface/transformers/pulls?q=is%3Apr), um zu bestรคtigen, dass es keine -TensorFlow-bezogene Pull-Anfrage gibt. - - - - -**2. Transformers-Entwicklungsumgebung vorbereiten** - -Nachdem Sie die Modellarchitektur ausgewรคhlt haben, รถffnen Sie einen PR-Entwurf, um Ihre Absicht zu signalisieren, daran zu arbeiten. Folgen Sie den -Anweisungen, um Ihre Umgebung einzurichten und einen PR-Entwurf zu รถffnen. - -1. Forken Sie das [repository](https://github.com/huggingface/transformers), indem Sie auf der Seite des Repositorys auf die Schaltflรคche 'Fork' klicken. - Seite des Repositorys klicken. Dadurch wird eine Kopie des Codes unter Ihrem GitHub-Benutzerkonto erstellt. - -2. Klonen Sie Ihren `transformers` Fork auf Ihre lokale Festplatte und fรผgen Sie das Basis-Repository als Remote hinzu: - -```bash -git clone https://github.com/[your Github handle]/transformers.git -cd transformers -git remote add upstream https://github.com/huggingface/transformers.git -``` - -3. Richten Sie eine Entwicklungsumgebung ein, indem Sie z.B. den folgenden Befehl ausfรผhren: - -```bash -python -m venv .env -source .env/bin/activate -pip install -e ".[dev]" -``` - -Abhรคngig von Ihrem Betriebssystem und da die Anzahl der optionalen Abhรคngigkeiten von Transformers wรคchst, kann es sein, dass Sie bei diesem Befehl einen -Fehler mit diesem Befehl erhalten. Wenn das der Fall ist, stellen Sie sicher, dass Sie TensorFlow installieren und dann ausfรผhren: - -```bash -pip install -e ".[quality]" -``` - -**Hinweis:** Sie mรผssen CUDA nicht installiert haben. Es reicht aus, das neue Modell auf der CPU laufen zu lassen. - -4. Erstellen Sie eine Verzweigung mit einem beschreibenden Namen von Ihrer Hauptverzweigung - -```bash -git checkout -b add_tf_brand_new_bert -``` - -5. Abrufen und zurรผcksetzen auf die aktuelle Hauptversion - -```bash -git fetch upstream -git rebase upstream/main -``` - -6. Fรผgen Sie eine leere `.py` Datei in `transformers/src/models/brandnewbert/` mit dem Namen `modeling_tf_brandnewbert.py` hinzu. Dies wird -Ihre TensorFlow-Modelldatei sein. - -7. รœbertragen Sie die ร„nderungen auf Ihr Konto mit: - -```bash -git add . -git commit -m "initial commit" -git push -u origin add_tf_brand_new_bert -``` - -8. Wenn Sie zufrieden sind, gehen Sie auf die Webseite Ihrer Abspaltung auf GitHub. Klicken Sie auf "Pull request". Stellen Sie sicher, dass Sie das - GitHub-Handle einiger Mitglieder des Hugging Face-Teams als Reviewer hinzuzufรผgen, damit das Hugging Face-Team รผber zukรผnftige ร„nderungen informiert wird. - zukรผnftige ร„nderungen benachrichtigt wird. - -9. ร„ndern Sie den PR in einen Entwurf, indem Sie auf der rechten Seite der GitHub-Pull-Request-Webseite auf "In Entwurf umwandeln" klicken. - - -Jetzt haben Sie eine Entwicklungsumgebung eingerichtet, um *BrandNewBert* nach TensorFlow in ๐Ÿค— Transformers zu portieren. - - -**3. (Optional) Verstehen Sie die theoretischen Aspekte und die bestehende Implementierung** - -Sie sollten sich etwas Zeit nehmen, um die Arbeit von *BrandNewBert* zu lesen, falls eine solche Beschreibung existiert. Mรถglicherweise gibt es groรŸe -Abschnitte des Papiers, die schwer zu verstehen sind. Wenn das der Fall ist, ist das in Ordnung - machen Sie sich keine Sorgen! Das Ziel ist -ist es nicht, ein tiefes theoretisches Verstรคndnis des Papiers zu erlangen, sondern die notwendigen Informationen zu extrahieren, um -das Modell mit Hilfe von TensorFlow effektiv in ๐Ÿค— Transformers neu zu implementieren. Das heiรŸt, Sie mรผssen nicht zu viel Zeit auf die -viel Zeit auf die theoretischen Aspekte verwenden, sondern sich lieber auf die praktischen Aspekte konzentrieren, nรคmlich auf die bestehende Modelldokumentation -Seite (z.B. [model docs for BERT](model_doc/bert)). - -Nachdem Sie die Grundlagen der Modelle, die Sie implementieren wollen, verstanden haben, ist es wichtig, die bestehende -Implementierung zu verstehen. Dies ist eine gute Gelegenheit, sich zu vergewissern, dass eine funktionierende Implementierung mit Ihren Erwartungen an das -Modell entspricht, und um technische Herausforderungen auf der TensorFlow-Seite vorauszusehen. - -Es ist ganz natรผrlich, dass Sie sich von der Menge an Informationen, die Sie gerade aufgesogen haben, รผberwรคltigt fรผhlen. Es ist -Es ist definitiv nicht erforderlich, dass Sie in dieser Phase alle Facetten des Modells verstehen. Dennoch empfehlen wir Ihnen dringend -ermutigen wir Sie, alle dringenden Fragen in unserem [Forum](https://discuss.huggingface.co/) zu klรคren. - - -### 4. Implementierung des Modells - -Jetzt ist es an der Zeit, endlich mit dem Programmieren zu beginnen. Als Ausgangspunkt empfehlen wir die PyTorch-Datei selbst: Kopieren Sie den Inhalt von -modeling_brand_new_bert.py` in `src/transformers/models/brand_new_bert/` nach -modeling_tf_brand_new_bert.py`. Das Ziel dieses Abschnitts ist es, die Datei zu รคndern und die Importstruktur von -๐Ÿค— Transformers zu aktualisieren, so dass Sie `TFBrandNewBert` und -`TFBrandNewBert.from_pretrained(model_repo, from_pt=True)` erfolgreich ein funktionierendes TensorFlow *BrandNewBert* Modell lรคdt. - -Leider gibt es kein Rezept, um ein PyTorch-Modell in TensorFlow zu konvertieren. Sie kรถnnen jedoch unsere Auswahl an -Tipps befolgen, um den Prozess so reibungslos wie mรถglich zu gestalten: -- Stellen Sie `TF` dem Namen aller Klassen voran (z.B. wird `BrandNewBert` zu `TFBrandNewBert`). -- Die meisten PyTorch-Operationen haben einen direkten TensorFlow-Ersatz. Zum Beispiel entspricht `torch.nn.Linear` der Klasse - `tf.keras.layers.Dense`, `torch.nn.Dropout` entspricht `tf.keras.layers.Dropout`, usw. Wenn Sie sich nicht sicher sind - รผber eine bestimmte Operation nicht sicher sind, kรถnnen Sie die [TensorFlow-Dokumentation](https://www.tensorflow.org/api_docs/python/tf) - oder die [PyTorch-Dokumentation](https://pytorch.org/docs/stable/). -- Suchen Sie nach Mustern in der Codebasis von ๐Ÿค— Transformers. Wenn Sie auf eine bestimmte Operation stoรŸen, fรผr die es keinen direkten Ersatz gibt - Ersatz hat, stehen die Chancen gut, dass jemand anderes bereits das gleiche Problem hatte. -- Behalten Sie standardmรครŸig die gleichen Variablennamen und die gleiche Struktur wie in PyTorch bei. Dies erleichtert die Fehlersuche, die Verfolgung von - Probleme zu verfolgen und spรคtere Korrekturen vorzunehmen. -- Einige Ebenen haben in jedem Framework unterschiedliche Standardwerte. Ein bemerkenswertes Beispiel ist die Schicht fรผr die Batch-Normalisierung - epsilon (`1e-5` in [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d) - und `1e-3` in [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization)). - Prรผfen Sie die Dokumentation genau! -- Die Variablen `nn.Parameter` von PyTorch mรผssen in der Regel innerhalb von TF Layer's `build()` initialisiert werden. Siehe das folgende - Beispiel: [PyTorch](https://github.com/huggingface/transformers/blob/655f72a6896c0533b1bdee519ed65a059c2425ac/src/transformers/models/vit_mae/modeling_vit_mae.py#L212) / - [TensorFlow](https://github.com/huggingface/transformers/blob/655f72a6896c0533b1bdee519ed65a059c2425ac/src/transformers/models/vit_mae/modeling_tf_vit_mae.py#L220) -- Wenn das PyTorch-Modell ein `#copied from ...` am Anfang einer Funktion hat, stehen die Chancen gut, dass Ihr TensorFlow-Modell diese Funktion auch - diese Funktion von der Architektur ausleihen kann, von der sie kopiert wurde, vorausgesetzt, es hat eine TensorFlow-Architektur. -- Die korrekte Zuweisung des Attributs `name` in TensorFlow-Funktionen ist entscheidend, um das `from_pt=True` Gewicht zu erreichen - Cross-Loading. Name" ist fast immer der Name der entsprechenden Variablen im PyTorch-Code. Wenn `name` nicht - nicht richtig gesetzt ist, sehen Sie dies in der Fehlermeldung beim Laden der Modellgewichte. -- Die Logik der Basismodellklasse, `BrandNewBertModel`, befindet sich in `TFBrandNewBertMainLayer`, einer Keras - Schicht-Unterklasse ([Beispiel](https://github.com/huggingface/transformers/blob/4fd32a1f499e45f009c2c0dea4d81c321cba7e02/src/transformers/models/bert/modeling_tf_bert.py#L719)). - TFBrandNewBertModel" ist lediglich ein Wrapper fรผr diese Schicht. -- Keras-Modelle mรผssen erstellt werden, um die vorher trainierten Gewichte zu laden. Aus diesem Grund muss `TFBrandNewBertPreTrainedModel` - ein Beispiel fรผr die Eingaben in das Modell enthalten, die `dummy_inputs` - ([Beispiel](https://github.com/huggingface/transformers/blob/4fd32a1f499e45f009c2c0dea4d81c321cba7e02/src/transformers/models/bert/modeling_tf_bert.py#L916)). -- Wenn Sie nicht weiterkommen, fragen Sie nach Hilfe - wir sind fรผr Sie da! ๐Ÿค— - -Neben der Modelldatei selbst mรผssen Sie auch die Verweise auf die Modellklassen und die zugehรถrigen -Dokumentationsseiten hinzufรผgen. Sie kรถnnen diesen Teil ganz nach den Mustern in anderen PRs erledigen -([Beispiel](https://github.com/huggingface/transformers/pull/18020/files)). Hier ist eine Liste der erforderlichen manuellen -ร„nderungen: -- Fรผgen Sie alle รถffentlichen Klassen von *BrandNewBert* in `src/transformers/__init__.py` ein. -- Fรผgen Sie *BrandNewBert* Klassen zu den entsprechenden Auto Klassen in `src/transformers/models/auto/modeling_tf_auto.py` hinzu. -- Fรผgen Sie die *BrandNewBert* zugehรถrigen Klassen fรผr trรคges Laden in `src/transformers/utils/dummy_tf_objects.py` hinzu. -- Aktualisieren Sie die Importstrukturen fรผr die รถffentlichen Klassen in `src/transformers/models/brand_new_bert/__init__.py`. -- Fรผgen Sie die Dokumentationszeiger auf die รถffentlichen Methoden von *BrandNewBert* in `docs/source/de/model_doc/brand_new_bert.md` hinzu. -- Fรผgen Sie sich selbst zur Liste der Mitwirkenden an *BrandNewBert* in `docs/source/de/model_doc/brand_new_bert.md` hinzu. -- Fรผgen Sie schlieรŸlich ein grรผnes Hรคkchen โœ… in der TensorFlow-Spalte von *BrandNewBert* in `docs/source/de/index.md` hinzu. - -Wenn Sie mit Ihrer Implementierung zufrieden sind, fรผhren Sie die folgende Checkliste aus, um zu bestรคtigen, dass Ihre Modellarchitektur -fertig ist: -1. Alle Schichten, die sich zur Trainingszeit anders verhalten (z.B. Dropout), werden mit einem `Training` Argument aufgerufen, das -von den Top-Level-Klassen weitergegeben wird -2. Sie haben `#copied from ...` verwendet, wann immer es mรถglich war. -3. Die Funktion `TFBrandNewBertMainLayer` und alle Klassen, die sie verwenden, haben ihre Funktion `call` mit `@unpack_inputs` dekoriert -4. TFBrandNewBertMainLayer` ist mit `@keras_serializable` dekoriert -5. Ein TensorFlow-Modell kann aus PyTorch-Gewichten mit `TFBrandNewBert.from_pretrained(model_repo, from_pt=True)` geladen werden. -6. Sie kรถnnen das TensorFlow Modell mit dem erwarteten Eingabeformat aufrufen - - -### 5. Modell-Tests hinzufรผgen - -Hurra, Sie haben ein TensorFlow-Modell implementiert! Jetzt ist es an der Zeit, Tests hinzuzufรผgen, um sicherzustellen, dass sich Ihr Modell wie erwartet verhรคlt. -erwartet. Wie im vorigen Abschnitt schlagen wir vor, dass Sie zunรคchst die Datei `test_modeling_brand_new_bert.py` in -`tests/models/brand_new_bert/` in die Datei `test_modeling_tf_brand_new_bert.py` zu kopieren und dann die notwendigen -TensorFlow-Ersetzungen vornehmen. Fรผr den Moment sollten Sie in allen Aufrufen von `.from_pretrained()` das Flag `from_pt=True` verwenden, um die -die vorhandenen PyTorch-Gewichte zu laden. - -Wenn Sie damit fertig sind, kommt der Moment der Wahrheit: Fรผhren Sie die Tests durch! ๐Ÿ˜ฌ - -```bash -NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ -py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py -``` - -Das wahrscheinlichste Ergebnis ist, dass Sie eine Reihe von Fehlern sehen werden. Machen Sie sich keine Sorgen, das ist zu erwarten! Das Debuggen von ML-Modellen ist -notorisch schwierig, und der Schlรผssel zum Erfolg ist Geduld (und `breakpoint()`). Nach unserer Erfahrung sind die schwierigsten -Probleme aus subtilen Unstimmigkeiten zwischen ML-Frameworks, zu denen wir am Ende dieses Leitfadens ein paar Hinweise geben. -In anderen Fรคllen kann es sein, dass ein allgemeiner Test nicht direkt auf Ihr Modell anwendbar ist; in diesem Fall empfehlen wir eine รœberschreibung -auf der Ebene der Modelltestklasse. Zรถgern Sie nicht, in Ihrem Entwurf einer Pull-Anfrage um Hilfe zu bitten, wenn -Sie nicht weiterkommen. - -Wenn alle Tests erfolgreich waren, kรถnnen Sie Ihr Modell in die ๐Ÿค— Transformers-Bibliothek aufnehmen! ๐ŸŽ‰ - -### 6.-7. Stellen Sie sicher, dass jeder Ihr Modell verwenden kann - -**6. Reichen Sie den Pull Request ein** - -Sobald Sie mit der Implementierung und den Tests fertig sind, ist es an der Zeit, eine Pull-Anfrage einzureichen. Bevor Sie Ihren Code einreichen, -fรผhren Sie unser Dienstprogramm zur Codeformatierung, `make fixup` ๐Ÿช„, aus. Damit werden automatisch alle Formatierungsfehler behoben, die dazu fรผhren wรผrden, dass -unsere automatischen Prรผfungen fehlschlagen wรผrden. - -Nun ist es an der Zeit, Ihren Entwurf einer Pull-Anfrage in eine echte Pull-Anfrage umzuwandeln. Klicken Sie dazu auf die Schaltflรคche "Bereit fรผr -Review" und fรผgen Sie Joao (`@gante`) und Matt (`@Rocketknight1`) als Reviewer hinzu. Eine Modell-Pull-Anfrage benรถtigt -mindestens 3 Reviewer, aber sie werden sich darum kรผmmern, geeignete zusรคtzliche Reviewer fรผr Ihr Modell zu finden. - -Nachdem alle Gutachter mit dem Stand Ihres PR zufrieden sind, entfernen Sie als letzten Aktionspunkt das Flag `from_pt=True` in -.from_pretrained()-Aufrufen zu entfernen. Da es keine TensorFlow-Gewichte gibt, mรผssen Sie sie hinzufรผgen! Lesen Sie den Abschnitt -unten, um zu erfahren, wie Sie dies tun kรถnnen. - -Wenn schlieรŸlich die TensorFlow-Gewichte zusammengefรผhrt werden, Sie mindestens 3 Genehmigungen von Prรผfern haben und alle CI-Checks grรผn sind -grรผn sind, รผberprรผfen Sie die Tests ein letztes Mal lokal - -```bash -NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ -py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py -``` - -und wir werden Ihren PR zusammenfรผhren! Herzlichen Glรผckwunsch zu dem Meilenstein ๐ŸŽ‰. - -**7. (Optional) Erstellen Sie Demos und teilen Sie sie mit der Welt** - -Eine der schwierigsten Aufgaben bei Open-Source ist die Entdeckung. Wie kรถnnen die anderen Benutzer von der Existenz Ihres -fabelhaften TensorFlow-Beitrags erfahren? Mit der richtigen Kommunikation, natรผrlich! ๐Ÿ“ฃ - -Es gibt vor allem zwei Mรถglichkeiten, Ihr Modell mit der Community zu teilen: -- Erstellen Sie Demos. Dazu gehรถren Gradio-Demos, Notebooks und andere unterhaltsame Mรถglichkeiten, Ihr Modell vorzufรผhren. Wir raten Ihnen - ermutigen Sie, ein Notizbuch zu unseren [community-driven demos](https://huggingface.co/docs/transformers/community) hinzuzufรผgen. -- Teilen Sie Geschichten in sozialen Medien wie Twitter und LinkedIn. Sie sollten stolz auf Ihre Arbeit sein und sie mit der - Ihre Leistung mit der Community teilen - Ihr Modell kann nun von Tausenden von Ingenieuren und Forschern auf der ganzen Welt genutzt werden - der Welt genutzt werden ๐ŸŒ! Wir werden Ihre Beitrรคge gerne retweeten und Ihnen helfen, Ihre Arbeit mit der Community zu teilen. - - -## Hinzufรผgen von TensorFlow-Gewichten zum ๐Ÿค— Hub - -Unter der Annahme, dass die TensorFlow-Modellarchitektur in ๐Ÿค— Transformers verfรผgbar ist, ist die Umwandlung von PyTorch-Gewichten in -TensorFlow-Gewichte ist ein Kinderspiel! - -Hier sehen Sie, wie es geht: -1. Stellen Sie sicher, dass Sie in Ihrem Terminal bei Ihrem Hugging Face Konto angemeldet sind. Sie kรถnnen sich mit dem folgenden Befehl anmelden - `huggingface-cli login` (Ihre Zugangstoken finden Sie [hier](https://huggingface.co/settings/tokens)) -2. Fรผhren Sie `transformers-cli pt-to-tf --model-name foo/bar` aus, wobei `foo/bar` der Name des Modell-Repositorys ist - ist, das die PyTorch-Gewichte enthรคlt, die Sie konvertieren mรถchten. -3. Markieren Sie `@joaogante` und `@Rocketknight1` in dem ๐Ÿค— Hub PR, den der obige Befehl gerade erstellt hat - -Das war's! ๐ŸŽ‰ - - -## Fehlersuche in verschiedenen ML-Frameworks ๐Ÿ› - -Irgendwann, wenn Sie eine neue Architektur hinzufรผgen oder TensorFlow-Gewichte fรผr eine bestehende Architektur erstellen, werden Sie -stoรŸen Sie vielleicht auf Fehler, die sich รผber Unstimmigkeiten zwischen PyTorch und TensorFlow beschweren. Sie kรถnnten sich sogar dazu entschlieรŸen, den -Modellarchitektur-Code fรผr die beiden Frameworks zu รถffnen, und stellen fest, dass sie identisch aussehen. Was ist denn da los? ๐Ÿค” - -Lassen Sie uns zunรคchst darรผber sprechen, warum es wichtig ist, diese Diskrepanzen zu verstehen. Viele Community-Mitglieder werden ๐Ÿค— -Transformers-Modelle und vertrauen darauf, dass sich unsere Modelle wie erwartet verhalten. Wenn es eine groรŸe Diskrepanz gibt -zwischen den beiden Frameworks auftritt, bedeutet dies, dass das Modell nicht der Referenzimplementierung fรผr mindestens eines der Frameworks folgt. -der Frameworks folgt. Dies kann zu stillen Fehlern fรผhren, bei denen das Modell zwar lรคuft, aber eine schlechte Leistung aufweist. Dies ist -wohl schlimmer als ein Modell, das รผberhaupt nicht lรคuft! Aus diesem Grund streben wir an, dass die Abweichung zwischen den Frameworks kleiner als -1e-5" in allen Phasen des Modells. - -Wie bei anderen numerischen Problemen auch, steckt der Teufel im Detail. Und wie bei jedem detailorientierten Handwerk ist die geheime -Zutat hier Geduld. Hier ist unser Vorschlag fรผr den Arbeitsablauf, wenn Sie auf diese Art von Problemen stoรŸen: -1. Lokalisieren Sie die Quelle der Abweichungen. Das Modell, das Sie konvertieren, hat wahrscheinlich bis zu einem gewissen Punkt nahezu identische innere Variablen. - bestimmten Punkt. Platzieren Sie `Breakpoint()`-Anweisungen in den Architekturen der beiden Frameworks und vergleichen Sie die Werte der - numerischen Variablen von oben nach unten, bis Sie die Quelle der Probleme gefunden haben. -2. Nachdem Sie nun die Ursache des Problems gefunden haben, setzen Sie sich mit dem ๐Ÿค— Transformers-Team in Verbindung. Es ist mรถglich - dass wir ein รคhnliches Problem schon einmal gesehen haben und umgehend eine Lรถsung anbieten kรถnnen. Als Ausweichmรถglichkeit kรถnnen Sie beliebte Seiten - wie StackOverflow und GitHub-Probleme. -3. Wenn keine Lรถsung in Sicht ist, bedeutet das, dass Sie tiefer gehen mรผssen. Die gute Nachricht ist, dass Sie das Problem gefunden haben. - Problem ausfindig gemacht haben, so dass Sie sich auf die problematische Anweisung konzentrieren und den Rest des Modells ausblenden kรถnnen! Die schlechte Nachricht ist - dass Sie sich in die Quellimplementierung der besagten Anweisung einarbeiten mรผssen. In manchen Fรคllen finden Sie vielleicht ein - Problem mit einer Referenzimplementierung - verzichten Sie nicht darauf, ein Problem im Upstream-Repository zu รถffnen. - -In einigen Fรคllen kรถnnen wir nach Rรผcksprache mit dem ๐Ÿค— Transformers-Team zu dem Schluss kommen, dass die Behebung der Abweichung nicht machbar ist. -Wenn die Abweichung in den Ausgabeschichten des Modells sehr klein ist (aber mรถglicherweise groรŸ in den versteckten Zustรคnden), kรถnnen wir -kรถnnten wir beschlieรŸen, sie zu ignorieren und das Modell zu verteilen. Die oben erwรคhnte CLI `pt-to-tf` hat ein `--max-error` -Flag, um die Fehlermeldung bei der Gewichtskonvertierung zu รผberschreiben. diff --git a/docs/source/de/autoclass_tutorial.md b/docs/source/de/autoclass_tutorial.md index 7707f7b39b4910..5dea87ca552c1a 100644 --- a/docs/source/de/autoclass_tutorial.md +++ b/docs/source/de/autoclass_tutorial.md @@ -20,7 +20,7 @@ Bei so vielen verschiedenen Transformator-Architekturen kann es eine Herausforde -Denken Sie daran, dass sich die Architektur auf das Skelett des Modells bezieht und die Checkpoints die Gewichte fรผr eine bestimmte Architektur sind. Zum Beispiel ist [BERT](https://huggingface.co/bert-base-uncased) eine Architektur, wรคhrend `bert-base-uncased` ein Checkpoint ist. Modell ist ein allgemeiner Begriff, der entweder Architektur oder Prรผfpunkt bedeuten kann. +Denken Sie daran, dass sich die Architektur auf das Skelett des Modells bezieht und die Checkpoints die Gewichte fรผr eine bestimmte Architektur sind. Zum Beispiel ist [BERT](https://huggingface.co/google-bert/bert-base-uncased) eine Architektur, wรคhrend `google-bert/bert-base-uncased` ein Checkpoint ist. Modell ist ein allgemeiner Begriff, der entweder Architektur oder Prรผfpunkt bedeuten kann. @@ -40,7 +40,7 @@ Laden Sie einen Tokenizer mit [`AutoTokenizer.from_pretrained`]: ```py >>> from transformers import AutoTokenizer ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") +>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") ``` Dann tokenisieren Sie Ihre Eingabe wie unten gezeigt: @@ -88,7 +88,7 @@ Mit den `AutoModelFor`-Klassen kรถnnen Sie schlieรŸlich ein vortrainiertes Model ```py >>> from transformers import AutoModelForSequenceClassification ->>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") +>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` Sie kรถnnen denselben Prรผfpunkt problemlos wiederverwenden, um eine Architektur fรผr eine andere Aufgabe zu laden: @@ -96,7 +96,7 @@ Sie kรถnnen denselben Prรผfpunkt problemlos wiederverwenden, um eine Architektur ```py >>> from transformers import AutoModelForTokenClassification ->>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased") +>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` @@ -115,7 +115,7 @@ Mit den Klassen `TFAutoModelFor` schlieรŸlich kรถnnen Sie ein vortrainiertes Mod ```py >>> from transformers import TFAutoModelForSequenceClassification ->>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") +>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` Sie kรถnnen denselben Prรผfpunkt problemlos wiederverwenden, um eine Architektur fรผr eine andere Aufgabe zu laden: @@ -123,7 +123,7 @@ Sie kรถnnen denselben Prรผfpunkt problemlos wiederverwenden, um eine Architektur ```py >>> from transformers import TFAutoModelForTokenClassification ->>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased") +>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` Im Allgemeinen empfehlen wir, die Klasse "AutoTokenizer" und die Klasse "TFAutoModelFor" zu verwenden, um vortrainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nรคchsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz fรผr die Feinabstimmung vorzuverarbeiten. diff --git a/docs/source/de/contributing.md b/docs/source/de/contributing.md new file mode 100644 index 00000000000000..4c0e131a352242 --- /dev/null +++ b/docs/source/de/contributing.md @@ -0,0 +1,334 @@ + + +# Zu ๐Ÿค— Transformers beitragen + +Jeder ist willkommen, einen Beitrag zu leisten, und wir schรคtzen den Beitrag jedes Einzelnen. Codebeitrรคge sind nicht der einzige Weg, der Community zu helfen. Fragen zu beantworten, anderen zu helfen und die Dokumentation zu verbessern, sind ebenfalls รคuรŸerst wertvoll. + +Es hilft uns auch, wenn Sie das Projekt weiterempfehlen! Erwรคhnen Sie die Bibliothek in Blogposts รผber die groรŸartigen Projekte, die sie ermรถglicht hat, tweeten Sie, wenn sie Ihnen geholfen hat, oder hinterlassen Sie dem Repository ein โญ๏ธ, um Danke zu sagen. + +Wie auch immer Sie sich entscheiden beizutragen, seien Sie achtsam und respektieren Sie unseren [Verhaltenskodex](https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md). + +**Dieser Leitfaden wurde stark durch den fantastischen [scikit-learn-Leitfaden fรผr Beitrรคge](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md) inspiriert.** + +## Beitragsmรถglichkeiten + +Es gibt mehrere Wege, wie Sie zu ๐Ÿค— Transformers beitragen kรถnnen: + +* Beheben Sie bestehende Probleme im vorhandenen Code. +* Erstellen Sie Issues im Zusammenhang mit Fehlern oder gewรผnschten neuen Funktionen. +* Implementieren Sie neue Modelle. +* Tragen Sie zu den Beispielen oder zur Dokumentation bei. + +Wenn Sie nicht wissen, wo Sie anfangen sollen, gibt es eine spezielle Liste von [Good First Issues](https://github.com/huggingface/transformers/contribute). Sie bietet Ihnen eine Liste offener und anfรคngerfreundlicher Probleme und hilft Ihnen, einen ersten Beitrag zu Open-Source zu leisten. Idealerweise erstellen Sie eine Pull-Anfrage und verlinken sie mit dem Issue, an dem Sie arbeiten mรถchten. Wir versuchen, erstellte PRs bevorzugt zu behandeln, da wir so den Fortschritt leicht verfolgen kรถnnen, und die Option besteht, dass jemand anderes den PR รผbernehmen kann, falls der Beitragende keine Zeit mehr hat. + +Fรผr etwas mehr Herausforderung, kรถnnen Sie auch einen Blick auf die Liste der [Good Second Issues](https://github.com/huggingface/transformers/labels/Good%20Second%20Issue) werfen. Generell gilt: Legen Sie los, wenn Sie sich den Anforderungen gewachsen sehen und wir helfen Ihnen dabei! ๐Ÿš€ + +> Alle Beitrรคge sind fรผr die Community gleichermaรŸen wertvoll. ๐Ÿฅฐ + +## Bestehende Probleme beheben + +Wenn Ihnen ein Problem im vorhandenen Code auffรคllt und Sie eine Lรถsung im Sinn haben, kรถnnen Sie gerne einen Beitrag leisten und [eine Pull-Anfrage erstellen](#eine-pull-anfrage-erstellen)! + +## Ein fehlerspezifisches Issue oder eine Feature-Anfrage erstellen + +Tun Sie Ihr Bestes, diesen Richtlinien zu folgen, wenn Sie ein fehlerspezifisches Issue erstellen oder eine Feature-Anfrage einreichen. Das macht es uns leichter, Ihnen schnell und mit gutem Feedback zu antworten. + +### Haben Sie einen Fehler gefunden? + +Die ๐Ÿค— Transformers-Bibliothek verdankt ihre Robustheit und Zuverlรคssigkeit aller Nutzer, die frisch entdeckte Probleme melden. + +Wir wรผrden es wirklich schรคtzen, wenn Sie **sicherstellen kรถnnten, dass der Fehler noch nicht gemeldet wurde** (verwenden Sie die Suchleiste auf GitHub unter Issues), bevor Sie ein Issue erstellen. Ihr Problem sollte sich auch auf Fehler in der Bibliothek selbst und nicht auf Ihren eigenen Code beziehen. Wenn Sie sich nicht sicher sind, ob der Fehler in Ihrem eigenen Code oder der Bibliothek liegt, fragen Sie bitte zuerst im [Forum](https://discuss.huggingface.co/) nach. Das hilft uns, schneller auf Probleme im Zusammenhang mit der Bibliothek zu reagieren, anstatt auf allgemeine Fragen. + +Wenn Sie sich vergewissert haben, dass der Fehler noch nicht gemeldet wurde, geben Sie bitte die folgenden Informationen in Ihrem Issue an, damit wir es schnell beheben kรถnnen: + +* Ihr **Betriebssystem und Version** sowie die Versionen von **Python**, **PyTorch** und **TensorFlow**, falls zutreffend. +* Ein kurzes und unabhรคngiges Code-Snippet, das es uns ermรถglicht, den Fehler in weniger als 30 Sekunden nachzustellen. +* Den *vollstรคndigen* Traceback, wenn eine Ausnahme geworfen wird. +* Fรผgen Sie weitere hilfreiche Informationen, wie z. B. Screenshots, an. + +Um das Betriebssystem und die Softwareversionen automatisch auszugeben, fรผhren Sie den folgenden Befehl aus: + +```bash +transformers-cli env +``` + +Sie kรถnnen denselben Befehl auch im Hauptverzeichnis des Repositorys ausfรผhren: + +```bash +python src/transformers/commands/transformers_cli.py env +``` + +### Mรถchten Sie eine neue Funktion? + +Wenn Sie eine bestimmte neue Funktion in ๐Ÿค— Transformers sehen mรถchten, erstellen Sie bitte ein Issue und fรผgen Sie eine Beschreibung hinzu: + +1. Was ist die *Motivation* hinter dieser Funktion? Steht sie in Zusammenhang mit einem Problem oder einer Frustration mit der Bibliothek? Ist es eine Funktion, die Sie fรผr ein Projekt benรถtigen? Ist es etwas, an dem Sie gearbeitet haben und denken, dass es der Community nutzen kรถnnte? + + Was auch immer es ist, wir wรผrden uns freuen, davon zu hรถren! + +1. Beschreiben Sie Ihre gewรผnschte Funktion so detailliert wie mรถglich. Je mehr Sie uns darรผber erzรคhlen kรถnnen, desto besser kรถnnen wir Ihnen helfen. +1. Stellen Sie einen *Code-Schnipsel* bereit, der die Funktionsweise demonstriert. +1. Falls die Funktion auf einem Paper beruht, verlinken Sie dieses bitte. + +Wenn Ihr Issue gut geschrieben ist, sind wir zum Zeitpunkt seiner Erstellung bereits zu 80 % fertig. + +Wir haben [Vorlagen](https://github.com/huggingface/transformers/tree/main/templates) hinzugefรผgt, um Ihnen den Start Ihres Issues zu erleichtern. + +## Mรถchten Sie ein neues Modell implementieren? + +Es werden stรคndig neue Modelle verรถffentlicht. Wenn Sie ein neues Modell implementieren mรถchten, geben Sie bitte folgende Informationen an: + +* Eine kurze Beschreibung des Modells und einen Link zum Paper. +* Link zur Implementierung, falls sie Open-Source ist. +* Link zu den Modellgewichten, falls verfรผgbar. + +Lassen Sie es uns wissen, wenn Sie bereit sind, das Modell selbst beizutragen. Dann kรถnnen wir Ihnen helfen, es zu ๐Ÿค— Transformers hinzuzufรผgen! + +Wir haben auch einen technischen Leitfaden dazu, [wie man ein Modell zu ๐Ÿค— Transformers hinzufรผgt](https://huggingface.co/docs/transformers/add_new_model). + +## Mรถchten Sie die Dokumentation erweitern? + +Wir sind immer auf der Suche nach Verbesserungen, die die Dokumentation klarer und prรคziser machen. Bitte teilen Sie uns Verbesserungsvorschlรคge mit, wie z. B. Tippfehler und fehlende, unklare oder ungenaue Inhalte. Wir รผbernehmen gerne die ร„nderungen oder helfen Ihnen, einen Beitrag zu leisten, wenn Sie daran interessiert sind! + +Fรผr weitere Einzelheiten darรผber, wie man die Dokumentation generiert, erstellt und schreibt, werfen Sie einen Blick auf das [README](https://github.com/huggingface/transformers/tree/main/docs) der Dokumentation. + +## Eine Pull-Anfrage erstellen + +Bevor Sie irgendwelchen Code schreiben, empfehlen wir Ihnen dringend, die bestehenden PRs oder Issues zu durchsuchen, um sicherzustellen, dass niemand bereits an diesem Thema arbeitet. Wenn Sie sich unsicher sind, ist es immer eine gute Idee, nach Feedback in einem neuen Issue zu fragen. + +Sie benรถtigen grundlegende `git`-Kenntnisse, um zu ๐Ÿค— Transformers beizutragen. Obwohl `git` nicht das einfachste Werkzeug ist, hat es ein sehr gutes Handbuch. Geben Sie `git --help` in eine Shell ein und genieรŸen Sie es! Wenn Sie Bรผcher bevorzugen, ist [Pro Git](https://git-scm.com/book/en/v2) eine gute Anlaufstelle. + +Sie benรถtigen **[Python 3.8](https://github.com/huggingface/transformers/blob/main/setup.py#L426)** oder hรถher, um zu ๐Ÿค— Transformers beizutragen. Folgen Sie den nachstehenden Schritten, um mit dem Beitrag zu beginnen: + +1. Forken Sie das [Repository](https://github.com/huggingface/transformers), indem Sie auf den **[Fork](https://github.com/huggingface/transformers/fork)**-Button auf der Seite des Repositorys klicken. Dadurch wird eine Kopie des Codes auf Ihrem GitHub-Account erstellt. + +1. Klonen Sie Ihren Fork auf Ihre lokale Festplatte und fรผgen Sie das ursprรผngliche Repository als Remote hinzu: + + ```bash + git clone git@github.com:/transformers.git + cd transformers + git remote add upstream https://github.com/huggingface/transformers.git + ``` + +1. Erstellen Sie einen neuen Branch, um Ihre ร„nderungen zu speichern: + + ```bash + git checkout -b a-descriptive-name-for-my-changes + ``` + + ๐Ÿšจ Arbeiten Sie **nicht** auf dem `main` Branch! + +1. Richten Sie eine Entwicklungsumgebung ein, indem Sie den folgenden Befehl in einer virtuellen Umgebung ausfรผhren: + + ```bash + pip install -e ".[dev]" + ``` + + Wenn ๐Ÿค— Transformers bereits in der virtuellen Umgebung installiert war, entfernen Sie es mit `pip uninstall transformers`, bevor Sie es im bearbeitbaren Modus mit dem `-e` Flag neu installieren. + + Abhรคngig von Ihrem Betriebssystem und durch die wachsende Anzahl der optionalen Abhรคngigkeiten von Transformers kรถnnten Sie mit diesem Befehl einen Fehler verursachen. Wenn das der Fall ist, stellen Sie sicher, dass Sie ihr bevorzugtes Deep-Learning-Framework (PyTorch, TensorFlow und/oder Flax) installieren und anschlieรŸend den folgenden Befehl ausfรผhren: + + ```bash + pip install -e ".[quality]" + ``` + + Dies sollte fรผr die meisten Anwendungsfรคlle ausreichend sein. + +1. Entwickeln Sie die Funktionen in Ihrem Branch. + + Wรคhrend Sie an Ihrem Code arbeiten, sollten Sie sicherstellen, dass die Test-Suite erfolgreich durchlรคuft. Fรผhren Sie die von Ihren ร„nderungen betroffenen Tests wie folgt aus: + + ```bash + pytest tests/.py + ``` + + Weitere Informationen รผber Tests finden Sie in der Anleitung zum Thema [Testen](https://huggingface.co/docs/transformers/testing). + + ๐Ÿค— Transformers stรผtzt sich auf `black` und `ruff`, um seinen Quellcode konsistent zu formatieren. Nachdem Sie ร„nderungen vorgenommen haben, wenden Sie automatische Stilkorrekturen und Codeprรผfungen, die nicht automatisiert werden kรถnnen, in einem Schritt an: + + ```bash + make fixup + ``` + + Dieser Task ist optimiert, nur mit Dateien zu arbeiten, die von Ihrer PR modifiziert wurden. + + Wenn Sie die Prรผfungen nacheinander ausfรผhren mรถchten, wendet der folgende Befehl die Stilkorrekturen an: + + ```bash + make style + ``` + + ๐Ÿค— Transformers verwendet auch `ruff` und einige benutzerdefinierte Skripte, um auf Programmierfehler zu prรผfen. Qualitรคtskontrollen werden von der CI durchgefรผhrt, aber Sie kรถnnen die gleichen รœberprรผfungen auch selbst ausfรผhren: + + ```bash + make quality + ``` + + AbschlieรŸend haben wir viele Skripte, die sicherstellen, dass wir alle betroffenen Dateien aktualisieren, wenn wir ein neues Modell hinzufรผgen. Sie kรถnnen diese wie folgt ausfรผhren: + + ```bash + make repo-consistency + ``` + + Um mehr รผber diese Prรผfungen zu erfahren und wie man mit ihnen Probleme behebt, lesen Sie den Leitfaden zu [รœberprรผfungen bei einer Pull-Anfrage](https://huggingface.co/docs/transformers/pr_checks). + + Wenn Sie Dokumente im Verzeichnis `docs/source` รคndern, stellen Sie sicher, dass die Dokumentation noch generiert werden kann. Diese Prรผfung wird auch im CI laufen, wenn Sie eine Pull-Anfrage erstellen. Um eine lokale Prรผfung durchzufรผhren, mรผssen Sie den Dukumentation-Builder installieren: + + ```bash + pip install ".[docs]" + ``` + + Fรผhren Sie den folgenden Befehl im Hauptverzeichnis des Repositorys aus: + + ```bash + doc-builder build transformers docs/source/en --build_dir ~/tmp/test-build + ``` + + Dadurch wird die Dokumentation im Ordner `~/tmp/test-build` erstellt, wo Sie die erzeugten Markdown-Dateien mit Ihrem bevorzugten Editor รผberprรผfen kรถnnen. Sie kรถnnen auch eine Vorschau der Dokumentation auf GitHub sehen, wenn Sie eine Pull-Anfrage รถffnen. + + Wenn Sie mit Ihren ร„nderungen zufrieden sind, fรผgen Sie die geรคnderten Dateien mit `git add` hinzu und speichern Sie Ihre ร„nderungen lokal mit `git commit`: + + ```bash + git add modified_file.py + git commit + ``` + + Bitte achten Sie darauf, [gute Commit-Nachrichten](https://chris.beams.io/posts/git-commit/) zu schreiben, um die von Ihnen vorgenommenen ร„nderungen klar zu kommunizieren! + + Um Ihre Kopie des Codes auf dem aktuellen Stand des ursprรผnglichen Repositorys zu halten, rebasen Sie Ihren Branch auf `upstream/branch` *bevor* Sie eine Pull-Anfrage รถffnen oder falls Sie von einem Maintainer dazu aufgefordert werden: + + ```bash + git fetch upstream + git rebase upstream/main + ``` + + Pushen Sie Ihre ร„nderungen in Ihrem Branch: + + ```bash + git push -u origin a-descriptive-name-for-my-changes + ``` + + Wenn Sie bereits eine Pull-Anfrage erstellt haben, mรผssen Sie den Push mit dem `--force` Flag erzwingen. Andernfalls, wenn die Pull-Anfrage noch nicht erstellt wurde, kรถnnen Sie Ihre ร„nderungen normal pushen. + +1. Jetzt kรถnnen Sie zu Ihrem Fork des Repositorys auf GitHub gehen und auf **Pull-Anfrage** klicken, um eine Pull-Anfrage zu erstellen. Stellen Sie sicher, dass Sie alle Punkte auf unserer [Checkliste](#checkliste-fรผr-pull-anfragen) unten abhaken. Wenn Sie fertig sind, kรถnnen Sie Ihre ร„nderungen zur รœberprรผfung an die Projektverantwortlichen senden. + +1. Es ist kein Problem, wenn die Maintainer ร„nderungen beantragen, das geschieht auch bei unseren Kernmitarbeitern! Damit jeder die ร„nderungen in der Pull-Anfrage sehen kann, arbeiten Sie in Ihrem lokalen Branch und pushen die ร„nderungen zu Ihrem Fork. Sie werden automatisch in der Pull-Anfrage erscheinen. + +### Checkliste fรผr Pull-Anfragen + +โ˜ Der Titel der Pull-Anfrage sollte Ihren Beitrag zusammenfassen.
+โ˜ Wenn Ihre Pull-Anfrage ein bestimmtes Issue bearbeitet, erwรคhnen Sie bitte die zugehรถrige Nummer in der Beschreibung der Pull-Anfrage, sodass diese verlinkt sind (und Personen, die das Issue lesen, wissen, dass Sie daran arbeiten).
+โ˜ Um eine fortlaufende Bearbeitung anzuzeigen, versehen Sie bitte den Titel mit einem `[WIP]` Prรคfix. Diese sind nรผtzlich, um doppelte Arbeit zu verhindern und sie von PRs abzuheben, die bereit zum Zusammenfรผhren sind.
+โ˜ Stellen Sie sicher, dass existierende Tests bestanden werden.
+โ˜ Wenn Sie eine neue Funktion hinzufรผgen, erstellen Sie auch Tests dafรผr.
+ +* Wenn Sie ein neues Modell hinzufรผgen, stellen Sie sicher, dass Sie `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` verwenden, um die gemeinsamen Tests auszulรถsen. +* Wenn Sie neue `@slow` Tests hinzufรผgen, stellen Sie mit `RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py` sicher, dass diese erfolgreich durchlaufen. +* Wenn Sie einen neuen Tokenizer hinzufรผgen, schreiben Sie Tests und stellen Sie mit `RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` sicher, dass diese erfolgreich durchlaufen. +* CircleCI fรผhrt die langsamen Tests nicht aus, aber GitHub Actions tut dies jede Nacht!
+ +โ˜ Alle public Methoden mรผssen informative Docstrings haben (siehe [`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py) als Beispiel).
+โ˜ Aufgrund des schnell wachsenden Repositorys fรผgen Sie bitte keine Bilder, Videos oder andere Nicht-Textdateien hinzu, die das Repository erheblich belasten wรผrden. Verwenden Sie stattdessen ein Hub-Repository wie [`hf-internal-testing`](https://huggingface.co/hf-internal-testing), um diese Dateien zu hosten und sie per URL zu verlinken. Wir empfehlen Bilder, die zur Dokumentation gehรถren, im folgenden Repository abzulegen: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images). Sie kรถnnen eine PR in diesem Datasets-Repository erstellen und ein Hugging-Face-Mitglied bitten, sie zu mergen. + +Um mehr รผber die Prรผfungen zu erfahren, die bei einer Pull-Anfrage ausgelรถst werden, lesen Sie unseren Leitfaden zu [รœberprรผfungen bei einer Pull-Anfrage](https://huggingface.co/docs/transformers/pr_checks). + +### Tests + +Eine umfangreiche Test-Suite ist enthalten, um das Verhalten der Bibliothek und mehrerer Beispiele zu testen. Tests fรผr die Bibliothek und Beispiele finden Sie jeweils im [tests](https://github.com/huggingface/transformers/tree/main/tests) und im [examples](https://github.com/huggingface/transformers/tree/main/examples) Ordner. + +Wir bevorzugen `pytest` und `pytest-xdist`, weil es schneller ist. Geben Sie einen *Pfad zu einem Unterordner oder einer Testdatei* vom Hauptverzeichnis des Repositorys aus an, um den Test auszufรผhren: + +```bash +python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model +``` + +Analog fรผr den `examples` Ordner, geben Sie einen *Pfad zu einem Unterordner oder einer Testdatei* an, um den Test auszufรผhren. Z. B. fรผhrt der folgende Befehl den Test des Unterordners fรผr Textklassifizierung im PyTorch `examples` Ordner durch: + +```bash +pip install -r examples/xxx/requirements.txt # nur beim ersten Mal erforderlich +python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification +``` + +Tatsรคchlich ist dies genau, wie unsere `make test` und `make test-examples` Befehle implementiert sind (abgesehen von `pip install`)! + +Sie kรถnnen auch eine kleinere Anzahl an Tests angeben, um nur die Funktion, an der Sie arbeiten, zu testen. + +StandardmรครŸig werden langsame Tests รผbersprungen, aber Sie kรถnnen die Umgebungsvariable `RUN_SLOW` auf `yes` setzen, um sie auszufรผhren. Dies wird den Download vieler Gigabyte an Modellen starten - stellen Sie also sicher, dass Sie sowohl genรผgend Festplattenspeicher als auch eine gute Internetverbindung oder die nรถtige Geduld haben! + + + +Vergessen Sie nicht, einen *Pfad zu einem Unterordner oder einer Testdatei* anzugeben, um den Test auszufรผhren. Sonst fรผhren Sie alle Tests im `tests` oder `examples` Ordner aus, was sehr lange dauern wird! + + + +```bash +RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model +RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification +``` + +Wie bei den langsamen Tests gibt es auch andere Umgebungsvariablen, die standardmรครŸig beim Testen nicht gesetzt sind: + +* `RUN_CUSTOM_TOKENIZERS`: Aktiviert Tests fรผr benutzerdefinierte Tokenizer. +* `RUN_PT_FLAX_CROSS_TESTS`: Aktiviert Tests fรผr die Integration von PyTorch + Flax. +* `RUN_PT_TF_CROSS_TESTS`: Aktiviert Tests fรผr die Integration von TensorFlow + PyTorch. + +Weitere Umgebungsvariablen und zusรคtzliche Informationen finden Sie in der [testing_utils.py](src/transformers/testing_utils.py). + +๐Ÿค— Transformers verwendet `pytest` nur als Test-Runner. Es verwendet keine `pytest`-spezifischen Funktionen in der Test-Suite selbst. + +Das bedeutet, `unittest` wird vollstรคndig unterstรผtzt. Folgend wird beschrieben, wie man Tests mit `unittest` ausfรผhrt: + +```bash +python -m unittest discover -s tests -t . -v +python -m unittest discover -s examples -t examples -v +``` + +### Stil-Leitfaden + +Fรผr Docstrings befolgt ๐Ÿค— Transformers den [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html). +Lesen Sie unseren [Leitfaden zum Schreiben von Dokumentationen](https://github.com/huggingface/transformers/tree/main/docs#writing-documentation---specification) fรผr weitere Informationen. + +### Entwickeln unter Windows + +Unter Windows (falls Sie nicht im [Windows-Subsystem fรผr Linux](https://learn.microsoft.com/en-us/windows/wsl/) oder WSL arbeiten) mรผssen Sie git so konfigurieren, dass Windows `CRLF` in Linux `LF` Zeilenenden umgewandelt werden: + +```bash +git config core.autocrlf input +``` + +Eine Mรถglichkeit, den `make`-Befehl unter Windows auszufรผhren, ist mit MSYS2: + +1. Laden Sie [MSYS2](https://www.msys2.org/) herunter und installieren Sie es nach `C:\msys64`. +1. ร–ffnen Sie die Kommandozeile `C:\msys64\msys2.exe` (sie sollte vom **Start**-Menรผ aus verfรผgbar sein). +1. Fรผhren Sie den Befehl in der Shell aus: `pacman -Syu` und installieren Sie `make` mit `pacman -S make`. +1. Fรผgen Sie `C:\msys64\usr\bin` an Ihrer PATH-Umgebungsvariable an. + +Sie kรถnnen nun `make` aus jedem Terminal heraus verwenden (PowerShell, cmd.exe usw.)! ๐ŸŽ‰ + +### Ein geforktes Repository mit dem Haupt-Repository von Hugging Face synchronisieren + +Beim Aktualisieren des main-Branches eines geforkten Repositories beachten Sie bitte die folgenden Schritte, um das Anpingen des Haupt-Repositorys zu vermeiden, was unnรถtige Verweise in abhรคngigen PRs vermerkt und beteiligte Entwickler benachrichtigt: + +1. Wenn mรถglich, vermeiden Sie die Synchronisation mit dem Haupt-Repository รผber einen Branch und PR im geforkten Repository. Mergen Sie stattdessen direkt in den main-Branch des Forks. +1. Wenn ein PR unbedingt notwendig ist, verwenden Sie die folgenden Schritte, nachdem Sie Ihren Branch ausgecheckt haben: + + ```bash + git checkout -b your-branch-for-syncing + git pull --squash --no-commit upstream main + git commit -m '' + git push --set-upstream origin your-branch-for-syncing + ``` diff --git a/docs/source/de/index.md b/docs/source/de/index.md index 4742a99f643c07..5ddabb4e7382e1 100644 --- a/docs/source/de/index.md +++ b/docs/source/de/index.md @@ -100,10 +100,10 @@ Die Bibliothek enthรคlt derzeit JAX-, PyTorch- und TensorFlow-Implementierungen, 1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. -1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. +1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://openai.com/research/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach -1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. +1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://openai.com/research/better-language-models/) by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. 1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. 1. **[GPTSAN-japanese](model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https://github.com/tanreinama/GPTSAN/blob/main/report/model.md) by Toshiyuki Sakamoto(tanreinama). 1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. diff --git a/docs/source/de/installation.md b/docs/source/de/installation.md index 295c9cad97bc69..1bd34f73302b27 100644 --- a/docs/source/de/installation.md +++ b/docs/source/de/installation.md @@ -94,7 +94,7 @@ Installieren wir ๐Ÿค— Transformers aus dem Quellcode mit dem folgenden Befehl: pip install git+https://github.com/huggingface/transformers ``` -Dieser Befehl installiert die aktuelle `main` Version und nicht die neueste `stable` Version. Die `main`-Version ist nรผtzlich, um mit den neuesten Entwicklungen Schritt zu halten. Zum Beispiel, wenn ein Fehler seit der letzten offiziellen Version behoben wurde, aber eine neue Version noch nicht verรถffentlicht wurde. Das bedeutet jedoch, dass die "Hauptversion" nicht immer stabil ist. Wir bemรผhen uns, die Hauptversion einsatzbereit zu halten, und die meisten Probleme werden normalerweise innerhalb weniger Stunden oder eines Tages behoben. Wenn Sie auf ein Problem stoรŸen, รถffnen Sie bitte ein [Issue] (https://github.com/huggingface/transformers/issues), damit wir es noch schneller beheben kรถnnen! +Dieser Befehl installiert die aktuelle `main` Version und nicht die neueste `stable` Version. Die `main`-Version ist nรผtzlich, um mit den neuesten Entwicklungen Schritt zu halten. Zum Beispiel, wenn ein Fehler seit der letzten offiziellen Version behoben wurde, aber eine neue Version noch nicht verรถffentlicht wurde. Das bedeutet jedoch, dass die "Hauptversion" nicht immer stabil ist. Wir bemรผhen uns, die Hauptversion einsatzbereit zu halten, und die meisten Probleme werden normalerweise innerhalb weniger Stunden oder eines Tages behoben. Wenn Sie auf ein Problem stoรŸen, รถffnen Sie bitte ein [Issue](https://github.com/huggingface/transformers/issues), damit wir es noch schneller beheben kรถnnen! รœberprรผfen wir, ob ๐Ÿค— Transformers richtig installiert wurde, indem Sie den folgenden Befehl ausfรผhren: @@ -139,10 +139,10 @@ Ihre Python-Umgebung wird beim nรคchsten Ausfรผhren die `main`-Version von ๐Ÿค— ## Installation mit conda -Installation von dem conda Kanal `huggingface`: +Installation von dem conda Kanal `conda-forge`: ```bash -conda install -c huggingface transformers +conda install conda-forge::transformers ``` ## Cache Einrichtung @@ -157,12 +157,12 @@ Vorgefertigte Modelle werden heruntergeladen und lokal zwischengespeichert unter Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE` oder `PYTORCH_PRETRAINED_BERT_CACHE`, wenn Sie von einer frรผheren Iteration dieser Bibliothek kommen und diese Umgebungsvariablen gesetzt haben, sofern Sie nicht die Shell-Umgebungsvariable `TRANSFORMERS_CACHE` angeben. - + ## Offline Modus -Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `TRANSFORMERS_OFFLINE=1`, um dieses Verhalten zu aktivieren. +Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `HF_HUB_OFFLINE=1`, um dieses Verhalten zu aktivieren. @@ -173,14 +173,14 @@ Fรผgen sie [๐Ÿค— Datasets](https://huggingface.co/docs/datasets/) zu Ihrem Offli So wรผrden Sie beispielsweise ein Programm in einem normalen Netzwerk mit einer Firewall fรผr externe Instanzen mit dem folgenden Befehl ausfรผhren: ```bash -python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ... +python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ... ``` Fรผhren Sie das gleiche Programm in einer Offline-Instanz mit aus: ```bash -HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \ -python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ... +HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \ +python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ... ``` Das Skript sollte nun laufen, ohne sich aufzuhรคngen oder eine Zeitรผberschreitung abzuwarten, da es weiรŸ, dass es nur nach lokalen Dateien suchen soll. @@ -245,6 +245,6 @@ Sobald Ihre Datei heruntergeladen und lokal zwischengespeichert ist, geben Sie d -Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlรคdt] (https://huggingface.co/docs/hub/how-to-downstream). - +Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlรคdt](https://huggingface.co/docs/hub/how-to-downstream). + diff --git a/docs/source/de/llm_tutorial.md b/docs/source/de/llm_tutorial.md index 1c5da41032831b..ea4a96632cb1de 100644 --- a/docs/source/de/llm_tutorial.md +++ b/docs/source/de/llm_tutorial.md @@ -103,7 +103,7 @@ Als nรคchstes mรผssen Sie Ihre Texteingabe mit einem [tokenizer](tokenizer_summa Die Variable `model_inputs` enthรคlt die tokenisierte Texteingabe sowie die Aufmerksamkeitsmaske. Obwohl [`~generation.GenerationMixin.generate`] sein Bestes tut, um die Aufmerksamkeitsmaske abzuleiten, wenn sie nicht รผbergeben wird, empfehlen wir, sie fรผr optimale Ergebnisse wann immer mรถglich zu รผbergeben. -Rufen Sie schlieรŸlich die Methode [~generation.GenerationMixin.generate] auf, um die generierten Token zurรผckzugeben, die vor dem Drucken in Text umgewandelt werden sollten. +Rufen Sie schlieรŸlich die Methode [`~generation.GenerationMixin.generate`] auf, um die generierten Token zurรผckzugeben, die vor dem Drucken in Text umgewandelt werden sollten. ```py >>> generated_ids = model.generate(**model_inputs) @@ -130,7 +130,7 @@ Es gibt viele [Generierungsstrategien](generation_strategies), und manchmal sind ### Generierte Ausgabe ist zu kurz/lang -Wenn in der Datei [~generation.GenerationConfig`] nichts angegeben ist, gibt `generate` standardmรครŸig bis zu 20 Token zurรผck. Wir empfehlen dringend, `max_new_tokens` in Ihrem `generate`-Aufruf manuell zu setzen, um die maximale Anzahl neuer Token zu kontrollieren, die zurรผckgegeben werden kรถnnen. Beachten Sie, dass LLMs (genauer gesagt, [decoder-only models](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt)) auch die Eingabeaufforderung als Teil der Ausgabe zurรผckgeben. +Wenn in der Datei [`~generation.GenerationConfig`] nichts angegeben ist, gibt `generate` standardmรครŸig bis zu 20 Token zurรผck. Wir empfehlen dringend, `max_new_tokens` in Ihrem `generate`-Aufruf manuell zu setzen, um die maximale Anzahl neuer Token zu kontrollieren, die zurรผckgegeben werden kรถnnen. Beachten Sie, dass LLMs (genauer gesagt, [decoder-only models](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt)) auch die Eingabeaufforderung als Teil der Ausgabe zurรผckgeben. ```py @@ -149,7 +149,7 @@ Wenn in der Datei [~generation.GenerationConfig`] nichts angegeben ist, gibt `ge ### Falscher Generierungsmodus -StandardmรครŸig und sofern nicht in der Datei [~generation.GenerationConfig`] angegeben, wรคhlt `generate` bei jeder Iteration das wahrscheinlichste Token aus (gierige Dekodierung). Je nach Aufgabe kann dies unerwรผnscht sein; kreative Aufgaben wie Chatbots oder das Schreiben eines Aufsatzes profitieren vom Sampling. Andererseits profitieren Aufgaben, bei denen es auf die Eingabe ankommt, wie z.B. Audiotranskription oder รœbersetzung, von der gierigen Dekodierung. Aktivieren Sie das Sampling mit `do_sample=True`. Mehr zu diesem Thema erfahren Sie in diesem [Blogbeitrag] (https://huggingface.co/blog/how-to-generate). +StandardmรครŸig und sofern nicht in der Datei [`~generation.GenerationConfig`] angegeben, wรคhlt `generate` bei jeder Iteration das wahrscheinlichste Token aus (gierige Dekodierung). Je nach Aufgabe kann dies unerwรผnscht sein; kreative Aufgaben wie Chatbots oder das Schreiben eines Aufsatzes profitieren vom Sampling. Andererseits profitieren Aufgaben, bei denen es auf die Eingabe ankommt, wie z.B. Audiotranskription oder รœbersetzung, von der gierigen Dekodierung. Aktivieren Sie das Sampling mit `do_sample=True`. Mehr zu diesem Thema erfahren Sie in diesem [Blogbeitrag](https://huggingface.co/blog/how-to-generate). ```py >>> # Set seed or reproducibility -- you don't need this unless you want full reproducibility diff --git a/docs/source/de/model_sharing.md b/docs/source/de/model_sharing.md index 415277e00e5ee9..6bbb6e10cb4942 100644 --- a/docs/source/de/model_sharing.md +++ b/docs/source/de/model_sharing.md @@ -229,4 +229,4 @@ Um sicherzustellen, dass die Benutzer die Fรคhigkeiten, Grenzen, mรถglichen Verz * Manuelles Erstellen und Hochladen einer "README.md"-Datei. * Klicken Sie auf die Schaltflรคche **Modellkarte bearbeiten** in Ihrem Modell-Repository. -Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert-base-uncased) als gutes Beispiel fรผr die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details รผber andere Optionen, die Sie in der Datei "README.md" einstellen kรถnnen, wie z.B. den Kohlenstoff-FuรŸabdruck eines Modells oder Beispiele fรผr Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards). \ No newline at end of file +Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert/distilbert-base-uncased) als gutes Beispiel fรผr die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details รผber andere Optionen, die Sie in der Datei "README.md" einstellen kรถnnen, wie z.B. den Kohlenstoff-FuรŸabdruck eines Modells oder Beispiele fรผr Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards). \ No newline at end of file diff --git a/docs/source/de/peft.md b/docs/source/de/peft.md index bdc0684d798d3a..eda8ce9435a055 100644 --- a/docs/source/de/peft.md +++ b/docs/source/de/peft.md @@ -86,10 +86,10 @@ model.load_adapter(peft_model_id) Die `bitsandbytes`-Integration unterstรผtzt Datentypen mit 8bit und 4bit Genauigkeit, was fรผr das Laden groรŸer Modelle nรผtzlich ist, weil es Speicher spart (lesen Sie den `bitsandbytes`-Integrations [guide](./quantization#bitsandbytes-integration), um mehr zu erfahren). Fรผgen Sie die Parameter `load_in_8bit` oder `load_in_4bit` zu [`~PreTrainedModel.from_pretrained`] hinzu und setzen Sie `device_map="auto"`, um das Modell effektiv auf Ihre Hardware zu verteilen: ```py -from transformers import AutoModelForCausalLM, AutoTokenizer +from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig peft_model_id = "ybelkada/opt-350m-lora" -model = AutoModelForCausalLM.from_pretrained(peft_model_id, device_map="auto", load_in_8bit=True) +model = AutoModelForCausalLM.from_pretrained(peft_model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True)) ``` ## Einen neuen Adapter hinzufรผgen diff --git a/docs/source/de/pipeline_tutorial.md b/docs/source/de/pipeline_tutorial.md index 06ab440d73a61b..5106af9b2fafc7 100644 --- a/docs/source/de/pipeline_tutorial.md +++ b/docs/source/de/pipeline_tutorial.md @@ -71,13 +71,13 @@ Alle zusรคtzlichen Parameter fรผr Ihre Aufgabe kรถnnen auch in die [`pipeline`] ### Wรคhlen Sie ein Modell und einen Tokenizer -Die [`pipeline`] akzeptiert jedes Modell aus dem [Hub] (https://huggingface.co/models). Auf dem Hub gibt es Tags, mit denen Sie nach einem Modell filtern kรถnnen, das Sie fรผr Ihre Aufgabe verwenden mรถchten. Sobald Sie ein passendes Modell ausgewรคhlt haben, laden Sie es mit der entsprechenden `AutoModelFor` und [`AutoTokenizer`] Klasse. Laden Sie zum Beispiel die Klasse [`AutoModelForCausalLM`] fรผr eine kausale Sprachmodellierungsaufgabe: +Die [`pipeline`] akzeptiert jedes Modell aus dem [Hub](https://huggingface.co/models). Auf dem Hub gibt es Tags, mit denen Sie nach einem Modell filtern kรถnnen, das Sie fรผr Ihre Aufgabe verwenden mรถchten. Sobald Sie ein passendes Modell ausgewรคhlt haben, laden Sie es mit der entsprechenden `AutoModelFor` und [`AutoTokenizer`] Klasse. Laden Sie zum Beispiel die Klasse [`AutoModelForCausalLM`] fรผr eine kausale Sprachmodellierungsaufgabe: ```py >>> from transformers import AutoTokenizer, AutoModelForCausalLM ->>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2") ->>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") +>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2") +>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2") ``` Erstellen Sie eine [`pipeline`] fรผr Ihre Aufgabe, und geben Sie das Modell und den Tokenizer an, die Sie geladen haben: diff --git a/docs/source/de/preprocessing.md b/docs/source/de/preprocessing.md index 9c977e10a538a3..b56a5c0ae4ca1c 100644 --- a/docs/source/de/preprocessing.md +++ b/docs/source/de/preprocessing.md @@ -45,7 +45,7 @@ Laden Sie einen vortrainierten Tokenizer mit [`AutoTokenizer.from_pretrained`]: ```py >>> from transformers import AutoTokenizer ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") +>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") ``` Dann รผbergeben Sie Ihren Satz an den Tokenizer: @@ -248,7 +248,7 @@ Der Datensatz [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) hat zum 'sampling_rate': 8000} ``` -1. Verwenden Sie die Methode [~datasets.Dataset.cast_column] von ๐Ÿค— Datasets, um die Abtastrate auf 16kHz zu erhรถhen: +1. Verwenden Sie die Methode [`~datasets.Dataset.cast_column`] von ๐Ÿค— Datasets, um die Abtastrate auf 16kHz zu erhรถhen: ```py >>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16_000)) @@ -344,7 +344,7 @@ Laden wir den [food101](https://huggingface.co/datasets/food101) Datensatz fรผr >>> dataset = load_dataset("food101", split="train[:100]") ``` -Als Nรคchstes sehen Sie sich das Bild mit dem Merkmal ๐Ÿค— Datensรคtze [Bild] (https://huggingface.co/docs/datasets/package_reference/main_classes?highlight=image#datasets.Image) an: +Als Nรคchstes sehen Sie sich das Bild mit dem Merkmal ๐Ÿค— Datensรคtze [Bild](https://huggingface.co/docs/datasets/package_reference/main_classes?highlight=image#datasets.Image) an: ```py >>> dataset[0]["image"] @@ -476,7 +476,7 @@ Erinnern Sie sich an den frรผheren Abschnitt รผber die Verarbeitung von Audiodat ### Prozessor -Ein Processor kombiniert einen Feature-Extraktor und einen Tokenizer. Laden Sie einen Processor mit [`AutoProcessor.from_pretrained]: +Ein Processor kombiniert einen Feature-Extraktor und einen Tokenizer. Laden Sie einen Processor mit [`AutoProcessor.from_pretrained`]: ```py >>> from transformers import AutoProcessor diff --git a/docs/source/de/quicktour.md b/docs/source/de/quicktour.md index 2b66d2d6a917e9..01cd7200750c4d 100644 --- a/docs/source/de/quicktour.md +++ b/docs/source/de/quicktour.md @@ -89,7 +89,7 @@ Importieren sie die [`pipeline`] und spezifizieren sie die Aufgabe, welche sie l >>> classifier = pipeline("sentiment-analysis") ``` -Die Pipeline lรคdt ein standardmรครŸiges [vortrainiertes Modell] (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) und einen Tokenizer fรผr die Stimmungs-Analyse herunter und speichert sie. Jetzt kรถnnen Sie den "Klassifikator" auf Ihren Zieltext anwenden: +Die Pipeline lรคdt ein standardmรครŸiges [vortrainiertes Modell](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) und einen Tokenizer fรผr die Stimmungs-Analyse herunter und speichert sie. Jetzt kรถnnen Sie den "Klassifikator" auf Ihren Zieltext anwenden: ```py >>> classifier("We are very happy to show you the ๐Ÿค— Transformers library.") @@ -148,7 +148,7 @@ Bei einem grรถรŸeren Datensatz mit vielen Eingaben (wie bei Sprache oder Bildver ### Ein anderes Modell und einen anderen Tokenizer in der Pipeline verwenden -Die [`pipeline`] kann jedes Modell aus dem [Model Hub] (https://huggingface.co/models) verwenden, wodurch es einfach ist, die [`pipeline`] fรผr andere Anwendungsfรคlle anzupassen. Wenn Sie beispielsweise ein Modell wรผnschen, das franzรถsischen Text verarbeiten kann, verwenden Sie die Tags im Model Hub, um nach einem geeigneten Modell zu filtern. Das oberste gefilterte Ergebnis liefert ein mehrsprachiges [BERT-Modell](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment), das auf die Stimmungsanalyse abgestimmt ist. GroรŸartig, verwenden wir dieses Modell! +Die [`pipeline`] kann jedes Modell aus dem [Model Hub](https://huggingface.co/models) verwenden, wodurch es einfach ist, die [`pipeline`] fรผr andere Anwendungsfรคlle anzupassen. Wenn Sie beispielsweise ein Modell wรผnschen, das franzรถsischen Text verarbeiten kann, verwenden Sie die Tags im Model Hub, um nach einem geeigneten Modell zu filtern. Das oberste gefilterte Ergebnis liefert ein mehrsprachiges [BERT-Modell](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment), das auf die Stimmungsanalyse abgestimmt ist. GroรŸartig, verwenden wir dieses Modell! ```py >>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment" @@ -407,7 +407,7 @@ Beginnen Sie mit dem Import von [`AutoConfig`] und laden Sie dann das trainierte ```py >>> from transformers import AutoConfig ->>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12) +>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12) ``` diff --git a/docs/source/de/run_scripts.md b/docs/source/de/run_scripts.md index 4afe72dae6d662..17b725827dd7ec 100644 --- a/docs/source/de/run_scripts.md +++ b/docs/source/de/run_scripts.md @@ -16,13 +16,13 @@ rendered properly in your Markdown viewer. # Trainieren mit einem Skript -Neben den ๐Ÿค— Transformers [notebooks](./noteboks/README) gibt es auch Beispielskripte, die zeigen, wie man ein Modell fรผr eine Aufgabe mit [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch), [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow) oder [JAX/Flax](https://github.com/huggingface/transformers/tree/main/examples/flax) trainiert. +Neben den ๐Ÿค— Transformers [notebooks](./notebooks) gibt es auch Beispielskripte, die zeigen, wie man ein Modell fรผr eine Aufgabe mit [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch), [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow) oder [JAX/Flax](https://github.com/huggingface/transformers/tree/main/examples/flax) trainiert. Sie werden auch Skripte finden, die wir in unseren [Forschungsprojekten](https://github.com/huggingface/transformers/tree/main/examples/research_projects) und [Legacy-Beispielen](https://github.com/huggingface/transformers/tree/main/examples/legacy) verwendet haben und die grรถรŸtenteils von der Community stammen. Diese Skripte werden nicht aktiv gepflegt und erfordern eine bestimmte Version von ๐Ÿค— Transformers, die hรถchstwahrscheinlich nicht mit der neuesten Version der Bibliothek kompatibel ist. Es wird nicht erwartet, dass die Beispielskripte bei jedem Problem sofort funktionieren. Mรถglicherweise mรผssen Sie das Skript an das Problem anpassen, das Sie zu lรถsen versuchen. Um Ihnen dabei zu helfen, legen die meisten Skripte vollstรคndig offen, wie die Daten vorverarbeitet werden, so dass Sie sie nach Bedarf fรผr Ihren Anwendungsfall bearbeiten kรถnnen. -Fรผr jede Funktion, die Sie in einem Beispielskript implementieren mรถchten, diskutieren Sie bitte im [Forum] (https://discuss.huggingface.co/) oder in einem [issue] (https://github.com/huggingface/transformers/issues), bevor Sie einen Pull Request einreichen. Wir freuen uns zwar รผber Fehlerkorrekturen, aber es ist unwahrscheinlich, dass wir einen Pull Request zusammenfรผhren, der mehr Funktionalitรคt auf Kosten der Lesbarkeit hinzufรผgt. +Fรผr jede Funktion, die Sie in einem Beispielskript implementieren mรถchten, diskutieren Sie bitte im [Forum](https://discuss.huggingface.co/) oder in einem [issue](https://github.com/huggingface/transformers/issues), bevor Sie einen Pull Request einreichen. Wir freuen uns zwar รผber Fehlerkorrekturen, aber es ist unwahrscheinlich, dass wir einen Pull Request zusammenfรผhren, der mehr Funktionalitรคt auf Kosten der Lesbarkeit hinzufรผgt. Diese Anleitung zeigt Ihnen, wie Sie ein Beispiel fรผr ein Trainingsskript zur Zusammenfassung in [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) und [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) ausfรผhren kรถnnen. Sofern nicht anders angegeben, sollten alle Beispiele mit beiden Frameworks funktionieren. @@ -87,11 +87,11 @@ pip install -r requirements.txt -Das Beispielskript lรคdt einen Datensatz aus der ๐Ÿค— [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Dann nimmt das Skript eine Feinabstimmung eines Datensatzes mit dem [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) auf einer Architektur vor, die eine Zusammenfassung unterstรผtzt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/t5-small) auf dem Datensatz [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) durchgefรผhrt wird. Das T5-Modell benรถtigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusรคtzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiรŸ T5, dass es sich um eine Zusammenfassungsaufgabe handelt. +Das Beispielskript lรคdt einen Datensatz aus der ๐Ÿค— [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Dann nimmt das Skript eine Feinabstimmung eines Datensatzes mit dem [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) auf einer Architektur vor, die eine Zusammenfassung unterstรผtzt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem Datensatz [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) durchgefรผhrt wird. Das T5-Modell benรถtigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusรคtzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiรŸ T5, dass es sich um eine Zusammenfassungsaufgabe handelt. ```bash python examples/pytorch/summarization/run_summarization.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ @@ -105,11 +105,11 @@ python examples/pytorch/summarization/run_summarization.py \ ``` -Das Beispielskript lรคdt einen Datensatz aus der ๐Ÿค— [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. AnschlieรŸend nimmt das Skript die Feinabstimmung eines Datensatzes mit Keras auf einer Architektur vor, die die Zusammenfassung unterstรผtzt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/t5-small) auf dem [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) Datensatz durchgefรผhrt wird. Das T5-Modell benรถtigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusรคtzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiรŸ T5, dass es sich um eine Zusammenfassungsaufgabe handelt. +Das Beispielskript lรคdt einen Datensatz aus der ๐Ÿค— [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. AnschlieรŸend nimmt das Skript die Feinabstimmung eines Datensatzes mit Keras auf einer Architektur vor, die die Zusammenfassung unterstรผtzt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) Datensatz durchgefรผhrt wird. Das T5-Modell benรถtigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusรคtzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiรŸ T5, dass es sich um eine Zusammenfassungsaufgabe handelt. ```bash python examples/tensorflow/summarization/run_summarization.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --dataset_name cnn_dailymail \ --dataset_config "3.0.0" \ --output_dir /tmp/tst-summarization \ @@ -133,7 +133,7 @@ Der [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) unt torchrun \ --nproc_per_node 8 pytorch/summarization/run_summarization.py \ --fp16 \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ @@ -157,7 +157,7 @@ Tensor Processing Units (TPUs) sind speziell fรผr die Beschleunigung der Leistun ```bash python xla_spawn.py --num_cores 8 \ summarization/run_summarization.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ @@ -176,7 +176,7 @@ Tensor Processing Units (TPUs) sind speziell fรผr die Beschleunigung der Leistun ```bash python run_summarization.py \ --tpu name_of_tpu_resource \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --dataset_name cnn_dailymail \ --dataset_config "3.0.0" \ --output_dir /tmp/tst-summarization \ @@ -214,7 +214,7 @@ Jetzt sind Sie bereit, das Training zu starten: ```bash accelerate launch run_summarization_no_trainer.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --dataset_name cnn_dailymail \ --dataset_config "3.0.0" \ --source_prefix "summarize: " \ @@ -226,14 +226,14 @@ accelerate launch run_summarization_no_trainer.py \ Das Verdichtungsskript unterstรผtzt benutzerdefinierte Datensรคtze, solange es sich um eine CSV- oder JSON-Line-Datei handelt. Wenn Sie Ihren eigenen Datensatz verwenden, mรผssen Sie mehrere zusรคtzliche Argumente angeben: - `train_file` und `validation_file` geben den Pfad zu Ihren Trainings- und Validierungsdateien an. -- text_column` ist der Eingabetext, der zusammengefasst werden soll. +- `text_column` ist der Eingabetext, der zusammengefasst werden soll. - Summary_column" ist der auszugebende Zieltext. Ein Zusammenfassungsskript, das einen benutzerdefinierten Datensatz verwendet, wรผrde wie folgt aussehen: ```bash python examples/pytorch/summarization/run_summarization.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --train_file path_to_csv_or_jsonlines_file \ @@ -258,7 +258,7 @@ Es ist oft eine gute Idee, Ihr Skript an einer kleineren Anzahl von Beispielen f ```bash python examples/pytorch/summarization/run_summarization.py \ - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --max_train_samples 50 \ --max_eval_samples 50 \ --max_predict_samples 50 \ @@ -288,7 +288,7 @@ Die erste Methode verwendet das Argument `output_dir previous_output_dir`, um da ```bash python examples/pytorch/summarization/run_summarization.py - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ @@ -305,7 +305,7 @@ Die zweite Methode verwendet das Argument `Resume_from_checkpoint path_to_specif ```bash python examples/pytorch/summarization/run_summarization.py - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ @@ -335,7 +335,7 @@ Das folgende Beispiel zeigt, wie Sie ein Modell mit einem bestimmten Repository- ```bash python examples/pytorch/summarization/run_summarization.py - --model_name_or_path t5-small \ + --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ --dataset_name cnn_dailymail \ diff --git a/docs/source/de/testing.md b/docs/source/de/testing.md index e921484fa2f6e6..100151e58c3da7 100644 --- a/docs/source/de/testing.md +++ b/docs/source/de/testing.md @@ -185,16 +185,16 @@ pytest -k "test and ada" tests/test_optimization.py Manchmal mรผssen Sie `accelerate` Tests fรผr Ihre Modelle ausfรผhren. Dazu fรผgen Sie einfach `-m accelerate_tests` zu Ihrem Befehl hinzu, wenn Sie diese Tests bei einem `OPT`-Lauf ausfรผhren mรถchten: ```bash -RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py +RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py ``` -### Dokumentationstests ausfรผhren +### Dokumentationstests ausfรผhren -Um zu testen, ob die Dokumentationsbeispiele korrekt sind, sollten Sie รผberprรผfen, ob die `doctests` erfolgreich sind. -Lassen Sie uns als Beispiel den docstring von [WhisperModel.forward](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035) verwenden: +Um zu testen, ob die Dokumentationsbeispiele korrekt sind, sollten Sie รผberprรผfen, ob die `doctests` erfolgreich sind. +Lassen Sie uns als Beispiel den docstring von [WhisperModel.forward](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035) verwenden: -```python +```python r""" Returns: @@ -217,8 +217,8 @@ Example: ``` -Fรผhren Sie einfach die folgende Zeile aus, um automatisch jedes docstring-Beispiel in der gewรผnschten Datei zu testen: -```bash +Fรผhren Sie einfach die folgende Zeile aus, um automatisch jedes docstring-Beispiel in der gewรผnschten Datei zu testen: +```bash pytest --doctest-modules ``` Wenn die Datei eine Markdown-Erweiterung hat, sollten Sie das Argument `--doctest-glob="*.md"` hinzufรผgen. @@ -379,7 +379,7 @@ pytest --random-order-bucket=none StandardmรครŸig ist `--random-order-bucket=module` impliziert, wodurch die Dateien auf den Modulebenen gemischt werden. Es kann auch auf den Ebenen `class`, `package`, `global` und `none` mischen. Die vollstรคndigen Details entnehmen Sie bitte der -[Dokumentation] (https://github.com/jbasko/pytest-random-order). +[Dokumentation](https://github.com/jbasko/pytest-random-order). Eine weitere Alternative zur Randomisierung ist: [`pytest-random`](https://github.com/pytest-dev/pytest-randomly). Dieses Modul hat eine sehr รคhnliche Funktionalitรคt/Schnittstelle, aber es hat nicht die Eimermodi, die in @@ -452,7 +452,7 @@ Dekorateure werden verwendet, um die Anforderungen von Tests in Bezug auf CPU/GP - `require_torch_multi_gpu` - wie `require_torch` und zusรคtzlich mindestens 2 GPUs erforderlich - `require_torch_non_multi_gpu` - wie `require_torch` plus benรถtigt 0 oder 1 GPUs - `require_torch_up_to_2_gpus` - wie `require_torch` plus erfordert 0 oder 1 oder 2 GPUs -- `require_torch_tpu` - wie `require_torch` plus erfordert mindestens 1 TPU +- `require_torch_xla` - wie `require_torch` plus erfordert mindestens 1 TPU Lassen Sie uns die GPU-Anforderungen in der folgenden Tabelle darstellen: @@ -720,8 +720,8 @@ Zugriffsmรถglichkeiten auf sie bietet: - `test_file_dir` - das Verzeichnis, das die aktuelle Testdatei enthรคlt - `tests_dir` - das Verzeichnis der `tests` Testreihe - `examples_dir` - das Verzeichnis der `examples` Test-Suite - - repo_root_dir` - das Verzeichnis des Repositorys - - src_dir` - das Verzeichnis von `src` (d.h. wo sich das Unterverzeichnis `transformers` befindet) + - `repo_root_dir` - das Verzeichnis des Repositorys + - `src_dir` - das Verzeichnis von `src` (d.h. wo sich das Unterverzeichnis `transformers` befindet) - stringifizierte Pfade - wie oben, aber diese geben Pfade als Strings zurรผck, anstatt als `pathlib`-Objekte: @@ -862,7 +862,7 @@ Code, der fehlerhaft ist, einen schlechten Zustand verursacht, der sich auf ande - Hier sehen Sie, wie Sie einen ganzen Test bedingungslos รผberspringen kรถnnen: ```python no-style -@unittest.skip("this bug needs to be fixed") +@unittest.skip(reason="this bug needs to be fixed") def test_feature_x(): ``` @@ -945,7 +945,7 @@ from transformers.testing_utils import slow def test_integration_foo(): ``` -Sobald ein Test als `@langsam` markiert ist, setzen Sie die Umgebungsvariable `RUN_SLOW=1`, um solche Tests auszufรผhren, z.B: +Sobald ein Test als `@slow` markiert ist, setzen Sie die Umgebungsvariable `RUN_SLOW=1`, um solche Tests auszufรผhren, z.B: ```bash RUN_SLOW=1 pytest tests @@ -955,7 +955,7 @@ Einige Dekoratoren wie `@parameterized` schreiben Testnamen um, daher mรผssen `@ `@require_*` mรผssen als letztes aufgefรผhrt werden, damit sie korrekt funktionieren. Hier ist ein Beispiel fรผr die korrekte Verwendung: ```python no-style -@parameteriz ed.expand(...) +@parameterized.expand(...) @slow def test_integration_foo(): ``` @@ -978,8 +978,8 @@ Ansatz zu verfeinern, sollten wir Ausnahmen einfรผhren: wird in den folgenden Abschnitten erlรคutert. - Alle Tests, die ein Training durchfรผhren mรผssen, das nicht speziell auf Schnelligkeit optimiert ist, sollten auf langsam gesetzt werden. - Wir kรถnnen Ausnahmen einfรผhren, wenn einige dieser Tests, die nicht langsam sein sollten, unertrรคglich langsam sind, und sie auf - @langsam`. Auto-Modellierungstests, die groรŸe Dateien auf der Festplatte speichern und laden, sind ein gutes Beispiel fรผr Tests, die als - als `@langsam` markiert sind. + `@slow`. Auto-Modellierungstests, die groรŸe Dateien auf der Festplatte speichern und laden, sind ein gutes Beispiel fรผr Tests, die als + als `@slow` markiert sind. - Wenn ein Test in weniger als 1 Sekunde auf CI abgeschlossen wird (einschlieรŸlich eventueller Downloads), sollte es sich trotzdem um einen normalen Test handeln. Insgesamt mรผssen alle nicht langsamen Tests die verschiedenen Interna abdecken und dabei schnell bleiben. Zum Beispiel, @@ -1172,7 +1172,7 @@ class EnvExampleTest(TestCasePlus): ``` Je nachdem, ob die Testdatei in der Testsuite `tests` oder in `examples` war, wird sie korrekt eingerichtet -env[PYTHONPATH]` eines dieser beiden Verzeichnisse und auch das `src` Verzeichnis, um sicherzustellen, dass der Test gegen das aktuelle +`env[PYTHONPATH]` eines dieser beiden Verzeichnisse und auch das `src` Verzeichnis, um sicherzustellen, dass der Test gegen das aktuelle um sicherzustellen, dass der Test mit dem aktuellen Projektarchiv durchgefรผhrt wird, und schlieรŸlich mit dem, was in `env[PYTHONPATH]` bereits eingestellt war, bevor der Test aufgerufen wurde. wenn รผberhaupt. diff --git a/docs/source/de/training.md b/docs/source/de/training.md index b1b7c14f261a72..806a380b6cebc9 100644 --- a/docs/source/de/training.md +++ b/docs/source/de/training.md @@ -48,7 +48,7 @@ Wie Sie nun wissen, benรถtigen Sie einen Tokenizer, um den Text zu verarbeiten u ```py >>> from transformers import AutoTokenizer ->>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") +>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") >>> def tokenize_function(examples): @@ -86,7 +86,7 @@ Beginnen Sie mit dem Laden Ihres Modells und geben Sie die Anzahl der erwarteten ```py >>> from transformers import AutoModelForSequenceClassification ->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5) +>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5) ``` @@ -128,12 +128,12 @@ Rufen Sie [`~evaluate.compute`] auf `metric` auf, um die Genauigkeit Ihrer Vorhe ... return metric.compute(predictions=predictions, references=labels) ``` -Wenn Sie Ihre Bewertungsmetriken wรคhrend der Feinabstimmung รผberwachen mรถchten, geben Sie den Parameter `evaluation_strategy` in Ihren Trainingsargumenten an, um die Bewertungsmetrik am Ende jeder Epoche zu ermitteln: +Wenn Sie Ihre Bewertungsmetriken wรคhrend der Feinabstimmung รผberwachen mรถchten, geben Sie den Parameter `eval_strategy` in Ihren Trainingsargumenten an, um die Bewertungsmetrik am Ende jeder Epoche zu ermitteln: ```py >>> from transformers import TrainingArguments, Trainer ->>> training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch") +>>> training_args = TrainingArguments(output_dir="test_trainer", eval_strategy="epoch") ``` ### Trainer @@ -187,7 +187,7 @@ Wir kรถnnen sie also ohne Tokenisierung direkt in ein NumPy-Array konvertieren! ```py from transformers import AutoTokenizer -tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") +tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") tokenized_data = tokenizer(dataset["text"], return_tensors="np", padding=True) # Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras tokenized_data = dict(tokenized_data) @@ -202,7 +202,7 @@ from transformers import TFAutoModelForSequenceClassification from tensorflow.keras.optimizers import Adam # Load and compile our model -model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased") +model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased") # Lower learning rates are often better for fine-tuning transformers model.compile(optimizer=Adam(3e-5)) @@ -229,10 +229,10 @@ tf.data"-Pipeline schreiben kรถnnen, wenn Sie wollen, haben wir zwei bequeme Met - [`~TFPreTrainedModel.prepare_tf_dataset`]: Dies ist die Methode, die wir in den meisten Fรคllen empfehlen. Da es sich um eine Methode Ihres Modells ist, kann sie das Modell inspizieren, um automatisch herauszufinden, welche Spalten als Modelleingaben verwendet werden kรถnnen, und verwirft die anderen, um einen einfacheren, leistungsfรคhigeren Datensatz zu erstellen. -- [~datasets.Dataset.to_tf_dataset`]: Diese Methode ist eher auf niedriger Ebene angesiedelt und ist nรผtzlich, wenn Sie genau kontrollieren wollen, wie +- [`~datasets.Dataset.to_tf_dataset`]: Diese Methode ist eher auf niedriger Ebene angesiedelt und ist nรผtzlich, wenn Sie genau kontrollieren wollen, wie Dataset erstellt wird, indem man genau angibt, welche `columns` und `label_cols` einbezogen werden sollen. -Bevor Sie [~TFPreTrainedModel.prepare_tf_dataset`] verwenden kรถnnen, mรผssen Sie die Tokenizer-Ausgaben als Spalten zu Ihrem Datensatz hinzufรผgen, wie in +Bevor Sie [`~TFPreTrainedModel.prepare_tf_dataset`] verwenden kรถnnen, mรผssen Sie die Tokenizer-Ausgaben als Spalten zu Ihrem Datensatz hinzufรผgen, wie in dem folgenden Codebeispiel: ```py @@ -333,7 +333,7 @@ Laden Sie Ihr Modell mit der Anzahl der erwarteten Kennzeichnungen: ```py >>> from transformers import AutoModelForSequenceClassification ->>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5) +>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5) ``` ### Optimierer und Lernratensteuerung diff --git a/docs/source/en/_config.py b/docs/source/en/_config.py index a6d75853f57219..f49e4e4731965a 100644 --- a/docs/source/en/_config.py +++ b/docs/source/en/_config.py @@ -1,7 +1,7 @@ # docstyle-ignore INSTALL_CONTENT = """ # Transformers installation -! pip install transformers datasets +! pip install transformers datasets evaluate accelerate # To install from source instead of the last release, comment the command above and uncomment the following one. # ! pip install git+https://github.com/huggingface/transformers.git """ diff --git a/docs/source/en/_redirects.yml b/docs/source/en/_redirects.yml index b6575a6b02f205..ff70547c722841 100644 --- a/docs/source/en/_redirects.yml +++ b/docs/source/en/_redirects.yml @@ -1,3 +1,5 @@ # Optimizing inference perf_infer_gpu_many: perf_infer_gpu_one +transformers_agents: agents +quantization: quantization/overview diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 5116e4219fbcb1..dc88bbd45ab23e 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -23,10 +23,12 @@ title: Load and train adapters with ๐Ÿค— PEFT - local: model_sharing title: Share your model - - local: transformers_agents + - local: agents title: Agents - local: llm_tutorial title: Generation with LLMs + - local: conversations + title: Chatting with Transformers title: Tutorials - sections: - isExpanded: false @@ -73,6 +75,10 @@ title: Depth estimation - local: tasks/image_to_image title: Image-to-Image + - local: tasks/image_feature_extraction + title: Image Feature Extraction + - local: tasks/mask_generation + title: Mask Generation - local: tasks/knowledge_distillation_for_image_classification title: Knowledge Distillation for Computer Vision title: Computer Vision @@ -86,11 +92,15 @@ title: Visual Question Answering - local: tasks/text-to-speech title: Text to speech + - local: tasks/image_text_to_text + title: Image-text-to-text title: Multimodal - isExpanded: false sections: - local: generation_strategies title: Customize the generation strategy + - local: kv_cache + title: Best Practices for Generation with Cache title: Generation - isExpanded: false sections: @@ -110,7 +120,7 @@ - local: custom_models title: Share a custom model - local: chat_templating - title: Templates for chat models + title: Chat templates - local: trainer title: Trainer - local: sagemaker @@ -127,16 +137,42 @@ title: Notebooks with examples - local: community title: Community resources - - local: custom_tools - title: Custom Tools and Prompts - local: troubleshooting title: Troubleshoot + - local: gguf + title: Interoperability with GGUF files title: Developer guides +- sections: + - local: quantization/overview + title: Getting started + - local: quantization/bitsandbytes + title: bitsandbytes + - local: quantization/gptq + title: GPTQ + - local: quantization/awq + title: AWQ + - local: quantization/aqlm + title: AQLM + - local: quantization/quanto + title: Quanto + - local: quantization/eetq + title: EETQ + - local: quantization/hqq + title: HQQ + - local: quantization/fbgemm_fp8 + title: FBGEMM_FP8 + - local: quantization/optimum + title: Optimum + - local: quantization/torchao + title: TorchAO + - local: quantization/contribute + title: Contribute new quantization method + title: Quantization Methods - sections: - local: performance title: Overview - - local: quantization - title: Quantization + - local: llm_optims + title: LLM inference optimization - sections: - local: perf_train_gpu_one title: Methods and tools for efficient training on a single GPU @@ -144,6 +180,8 @@ title: Multiple GPUs and parallelism - local: fsdp title: Fully Sharded Data Parallel + - local: deepspeed + title: DeepSpeed - local: perf_train_cpu title: Efficient training on CPU - local: perf_train_cpu_many @@ -164,7 +202,7 @@ title: GPU inference title: Optimizing inference - local: big_models - title: Instantiating a big model + title: Instantiate a big model - local: debugging title: Debugging - local: tf_xla @@ -174,11 +212,9 @@ title: Performance and scalability - sections: - local: contributing - title: How to contribute to transformers? + title: How to contribute to ๐Ÿค— Transformers? - local: add_new_model title: How to add a model to ๐Ÿค— Transformers? - - local: add_tensorflow_model - title: How to convert a ๐Ÿค— Transformers model to TensorFlow? - local: add_new_pipeline title: How to add a pipeline to ๐Ÿค— Transformers? - local: testing @@ -253,7 +289,7 @@ - local: main_classes/trainer title: Trainer - local: main_classes/deepspeed - title: DeepSpeed Integration + title: DeepSpeed - local: main_classes/feature_extractor title: Feature Extractor - local: main_classes/image_processor @@ -302,6 +338,8 @@ title: CodeGen - local: model_doc/code_llama title: CodeLlama + - local: model_doc/cohere + title: Cohere - local: model_doc/convbert title: ConvBERT - local: model_doc/cpm @@ -310,6 +348,8 @@ title: CPMANT - local: model_doc/ctrl title: CTRL + - local: model_doc/dbrx + title: DBRX - local: model_doc/deberta title: DeBERTa - local: model_doc/deberta-v2 @@ -332,6 +372,10 @@ title: ESM - local: model_doc/falcon title: Falcon + - local: model_doc/falcon_mamba + title: FalconMamba + - local: model_doc/fastspeech2_conformer + title: FastSpeech2Conformer - local: model_doc/flan-t5 title: FLAN-T5 - local: model_doc/flan-ul2 @@ -346,6 +390,10 @@ title: Funnel Transformer - local: model_doc/fuyu title: Fuyu + - local: model_doc/gemma + title: Gemma + - local: model_doc/gemma2 + title: Gemma2 - local: model_doc/openai-gpt title: GPT - local: model_doc/gpt_neo @@ -368,6 +416,10 @@ title: HerBERT - local: model_doc/ibert title: I-BERT + - local: model_doc/jamba + title: Jamba + - local: model_doc/jetmoe + title: JetMoe - local: model_doc/jukebox title: Jukebox - local: model_doc/led @@ -376,6 +428,8 @@ title: LLaMA - local: model_doc/llama2 title: Llama2 + - local: model_doc/llama3 + title: Llama3 - local: model_doc/longformer title: Longformer - local: model_doc/longt5 @@ -386,6 +440,10 @@ title: M2M100 - local: model_doc/madlad-400 title: MADLAD-400 + - local: model_doc/mamba + title: Mamba + - local: model_doc/mamba2 + title: mamba2 - local: model_doc/marian title: MarianMT - local: model_doc/markuplm @@ -416,6 +474,8 @@ title: MT5 - local: model_doc/mvp title: MVP + - local: model_doc/nemotron + title: Nemotron - local: model_doc/nezha title: NEZHA - local: model_doc/nllb @@ -424,6 +484,8 @@ title: NLLB-MoE - local: model_doc/nystromformer title: Nystrรถmformer + - local: model_doc/olmo + title: OLMo - local: model_doc/open-llama title: Open-Llama - local: model_doc/opt @@ -436,6 +498,8 @@ title: Persimmon - local: model_doc/phi title: Phi + - local: model_doc/phi3 + title: Phi-3 - local: model_doc/phobert title: PhoBERT - local: model_doc/plbart @@ -444,10 +508,18 @@ title: ProphetNet - local: model_doc/qdqbert title: QDQBert + - local: model_doc/qwen2 + title: Qwen2 + - local: model_doc/qwen2_audio + title: Qwen2Audio + - local: model_doc/qwen2_moe + title: Qwen2MoE - local: model_doc/rag title: RAG - local: model_doc/realm title: REALM + - local: model_doc/recurrent_gemma + title: RecurrentGemma - local: model_doc/reformer title: Reformer - local: model_doc/rembert @@ -468,6 +540,10 @@ title: Splinter - local: model_doc/squeezebert title: SqueezeBERT + - local: model_doc/stablelm + title: StableLm + - local: model_doc/starcoder2 + title: Starcoder2 - local: model_doc/switch_transformers title: SwitchTransformers - local: model_doc/t5 @@ -519,6 +595,10 @@ title: Deformable DETR - local: model_doc/deit title: DeiT + - local: model_doc/depth_anything + title: Depth Anything + - local: model_doc/depth_anything_v2 + title: Depth Anything V2 - local: model_doc/deta title: DETA - local: model_doc/detr @@ -539,6 +619,8 @@ title: FocalNet - local: model_doc/glpn title: GLPN + - local: model_doc/hiera + title: Hiera - local: model_doc/imagegpt title: ImageGPT - local: model_doc/levit @@ -561,12 +643,20 @@ title: PoolFormer - local: model_doc/pvt title: Pyramid Vision Transformer (PVT) + - local: model_doc/pvt_v2 + title: Pyramid Vision Transformer v2 (PVTv2) - local: model_doc/regnet title: RegNet - local: model_doc/resnet title: ResNet + - local: model_doc/rt_detr + title: RT-DETR - local: model_doc/segformer title: SegFormer + - local: model_doc/seggpt + title: SegGpt + - local: model_doc/superpoint + title: SuperPoint - local: model_doc/swiftformer title: SwiftFormer - local: model_doc/swin @@ -577,14 +667,10 @@ title: Swin2SR - local: model_doc/table-transformer title: Table Transformer - - local: model_doc/timesformer - title: TimeSformer - local: model_doc/upernet title: UperNet - local: model_doc/van title: VAN - - local: model_doc/videomae - title: VideoMAE - local: model_doc/vit title: Vision Transformer (ViT) - local: model_doc/vit_hybrid @@ -597,10 +683,10 @@ title: ViTMatte - local: model_doc/vit_msn title: ViTMSN - - local: model_doc/vivit - title: ViViT - local: model_doc/yolos title: YOLOS + - local: model_doc/zoedepth + title: ZoeDepth title: Vision models - isExpanded: false sections: @@ -610,8 +696,12 @@ title: Bark - local: model_doc/clap title: CLAP + - local: model_doc/dac + title: dac - local: model_doc/encodec title: EnCodec + - local: model_doc/hiera + title: Hiera - local: model_doc/hubert title: Hubert - local: model_doc/mctct @@ -620,6 +710,8 @@ title: MMS - local: model_doc/musicgen title: MusicGen + - local: model_doc/musicgen_melody + title: MusicGen Melody - local: model_doc/pop2piano title: Pop2Piano - local: model_doc/seamless_m4t @@ -646,6 +738,8 @@ title: VITS - local: model_doc/wav2vec2 title: Wav2Vec2 + - local: model_doc/wav2vec2-bert + title: Wav2Vec2-BERT - local: model_doc/wav2vec2-conformer title: Wav2Vec2-Conformer - local: model_doc/wav2vec2_phoneme @@ -659,6 +753,15 @@ - local: model_doc/xlsr_wav2vec2 title: XLSR-Wav2Vec2 title: Audio models + - isExpanded: false + sections: + - local: model_doc/timesformer + title: TimeSformer + - local: model_doc/videomae + title: VideoMAE + - local: model_doc/vivit + title: ViViT + title: Video models - isExpanded: false sections: - local: model_doc/align @@ -673,6 +776,8 @@ title: BridgeTower - local: model_doc/bros title: BROS + - local: model_doc/chameleon + title: Chameleon - local: model_doc/chinese_clip title: Chinese-CLIP - local: model_doc/clip @@ -691,12 +796,18 @@ title: FLAVA - local: model_doc/git title: GIT + - local: model_doc/grounding-dino + title: Grounding DINO - local: model_doc/groupvit title: GroupViT - local: model_doc/idefics title: IDEFICS + - local: model_doc/idefics2 + title: Idefics2 - local: model_doc/instructblip title: InstructBLIP + - local: model_doc/instructblipvideo + title: InstructBlipVideo - local: model_doc/kosmos-2 title: KOSMOS-2 - local: model_doc/layoutlm @@ -711,6 +822,10 @@ title: LiLT - local: model_doc/llava title: Llava + - local: model_doc/llava_next + title: LLaVA-NeXT + - local: model_doc/llava_next_video + title: LLaVa-NeXT-Video - local: model_doc/lxmert title: LXMERT - local: model_doc/matcha @@ -725,12 +840,16 @@ title: OWL-ViT - local: model_doc/owlv2 title: OWLv2 + - local: model_doc/paligemma + title: PaliGemma - local: model_doc/perceiver title: Perceiver - local: model_doc/pix2struct title: Pix2Struct - local: model_doc/sam title: Segment Anything + - local: model_doc/siglip + title: SigLIP - local: model_doc/speech-encoder-decoder title: Speech Encoder Decoder Models - local: model_doc/tapas @@ -741,6 +860,10 @@ title: TVLT - local: model_doc/tvp title: TVP + - local: model_doc/udop + title: UDOP + - local: model_doc/video_llava + title: VideoLlava - local: model_doc/vilt title: ViLT - local: model_doc/vipllava diff --git a/docs/source/en/add_new_model.md b/docs/source/en/add_new_model.md index 6766c8ecf04812..a0a16a14056d14 100644 --- a/docs/source/en/add_new_model.md +++ b/docs/source/en/add_new_model.md @@ -17,12 +17,6 @@ rendered properly in your Markdown viewer. The ๐Ÿค— Transformers library is often able to offer new models thanks to community contributors. But this can be a challenging project and requires an in-depth knowledge of the ๐Ÿค— Transformers library and the model to implement. At Hugging Face, we're trying to empower more of the community to actively add models and we've put together this guide to walk you through the process of adding a PyTorch model (make sure you have [PyTorch installed](https://pytorch.org/get-started/locally/)). - - -If you're interested in implementing a TensorFlow model, take a look at the [How to convert a ๐Ÿค— Transformers model to TensorFlow](add_tensorflow_model) guide! - - - Along the way, you'll: - get insights into open-source best practices @@ -89,8 +83,8 @@ model.config # model has access to its config Similar to the model, the configuration inherits basic serialization and deserialization functionalities from [`PretrainedConfig`]. Note that the configuration and the model are always serialized into two different formats - the model to a *pytorch_model.bin* file and the configuration to a *config.json* file. Calling -[`~PreTrainedModel.save_pretrained`] will automatically call -[`~PretrainedConfig.save_pretrained`], so that both model and configuration are saved. +the model's [`~PreTrainedModel.save_pretrained`] will automatically call +the config's [`~PretrainedConfig.save_pretrained`], so that both model and configuration are saved. ### Code style @@ -192,46 +186,46 @@ its attention layer, etc. We will be more than happy to help you. 2. Clone your `transformers` fork to your local disk, and add the base repository as a remote: -```bash -git clone https://github.com/[your Github handle]/transformers.git -cd transformers -git remote add upstream https://github.com/huggingface/transformers.git -``` + ```bash + git clone https://github.com/[your Github handle]/transformers.git + cd transformers + git remote add upstream https://github.com/huggingface/transformers.git + ``` 3. Set up a development environment, for instance by running the following command: -```bash -python -m venv .env -source .env/bin/activate -pip install -e ".[dev]" -``` + ```bash + python -m venv .env + source .env/bin/activate + pip install -e ".[dev]" + ``` -Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a -failure with this command. If that's the case make sure to install the Deep Learning framework you are working with -(PyTorch, TensorFlow and/or Flax) then do: + Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a + failure with this command. If that's the case make sure to install the Deep Learning framework you are working with + (PyTorch, TensorFlow and/or Flax) then do: -```bash -pip install -e ".[quality]" -``` + ```bash + pip install -e ".[quality]" + ``` -which should be enough for most use cases. You can then return to the parent directory + which should be enough for most use cases. You can then return to the parent directory -```bash -cd .. -``` + ```bash + cd .. + ``` 4. We recommend adding the PyTorch version of *brand_new_bert* to Transformers. To install PyTorch, please follow the instructions on https://pytorch.org/get-started/locally/. -**Note:** You don't need to have CUDA installed. Making the new model work on CPU is sufficient. + **Note:** You don't need to have CUDA installed. Making the new model work on CPU is sufficient. 5. To port *brand_new_bert*, you will also need access to its original repository: -```bash -git clone https://github.com/org_that_created_brand_new_bert_org/brand_new_bert.git -cd brand_new_bert -pip install -e . -``` + ```bash + git clone https://github.com/org_that_created_brand_new_bert_org/brand_new_bert.git + cd brand_new_bert + pip install -e . + ``` Now you have set up a development environment to port *brand_new_bert* to ๐Ÿค— Transformers. @@ -404,12 +398,14 @@ In the special case that you are adding a model whose architecture exactly match existing model you only have to add a conversion script as described in [this section](#write-a-conversion-script). In this case, you can just re-use the whole model architecture of the already existing model. -Otherwise, let's start generating a new model. You have two choices here: +Otherwise, let's start generating a new model. We recommend using the following script to add a model starting from +an existing model: -- `transformers-cli add-new-model-like` to add a new model like an existing one -- `transformers-cli add-new-model` to add a new model from our template (will look like BERT or Bart depending on the type of model you select) +```bash +transformers-cli add-new-model-like +``` -In both cases, you will be prompted with a questionnaire to fill in the basic information of your model. The second command requires to install `cookiecutter`, you can find more information on it [here](https://github.com/huggingface/transformers/tree/main/templates/adding_a_new_model). +You will be prompted with a questionnaire to fill in the basic information of your model. **Open a Pull Request on the main huggingface/transformers repo** @@ -421,29 +417,29 @@ You should do the following: 1. Create a branch with a descriptive name from your main branch -```bash -git checkout -b add_brand_new_bert -``` + ```bash + git checkout -b add_brand_new_bert + ``` 2. Commit the automatically generated code: -```bash -git add . -git commit -``` + ```bash + git add . + git commit + ``` 3. Fetch and rebase to current main -```bash -git fetch upstream -git rebase upstream/main -``` + ```bash + git fetch upstream + git rebase upstream/main + ``` 4. Push the changes to your account using: -```bash -git push -u origin a-descriptive-name-for-my-changes -``` + ```bash + git push -u origin a-descriptive-name-for-my-changes + ``` 5. Once you are satisfied, go to the webpage of your fork on GitHub. Click on โ€œPull requestโ€. Make sure to add the GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for @@ -531,7 +527,7 @@ but all the other ones should use an initialization as above. This is coded like ```py def _init_weights(self, module): """Initialize the weights""" - if isinstnace(module, Wav2Vec2ForPreTraining): + if isinstance(module, Wav2Vec2ForPreTraining): module.project_hid.reset_parameters() module.project_q.reset_parameters() module.project_hid._is_hf_initialized = True @@ -682,7 +678,7 @@ model.save_pretrained("/path/to/converted/checkpoint/folder") **7. Implement the forward pass** Having managed to correctly load the pretrained weights into the ๐Ÿค— Transformers implementation, you should now make -sure that the forward pass is correctly implemented. In [Get familiar with the original repository](#34-run-a-pretrained-checkpoint-using-the-original-repository), you have already created a script that runs a forward +sure that the forward pass is correctly implemented. In [Get familiar with the original repository](#3-4-run-a-pretrained-checkpoint-using-the-original-repository), you have already created a script that runs a forward pass of the model using the original repository. Now you should write an analogous script using the ๐Ÿค— Transformers implementation instead of the original one. It should look as follows: @@ -759,7 +755,7 @@ In case you are using Windows, you should replace `RUN_SLOW=1` with `SET RUN_SLO Second, all features that are special to *brand_new_bert* should be tested additionally in a separate test under -`BrandNewBertModelTester`/``BrandNewBertModelTest`. This part is often forgotten but is extremely useful in two +`BrandNewBertModelTester`/`BrandNewBertModelTest`. This part is often forgotten but is extremely useful in two ways: - It helps to transfer the knowledge you have acquired during the model addition to the community by showing how the @@ -776,7 +772,7 @@ It is very important to find/extract the original tokenizer file and to manage t Transformers' implementation of the tokenizer. To ensure that the tokenizer works correctly, it is recommended to first create a script in the original repository -that inputs a string and returns the `input_ids``. It could look similar to this (in pseudo-code): +that inputs a string and returns the `input_ids`. It could look similar to this (in pseudo-code): ```python input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words." @@ -827,7 +823,7 @@ the community to add some *Tips* to show how the model should be used. Don't hes regarding the docstrings. Next, make sure that the docstring added to `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` is -correct and included all necessary inputs and outputs. We have a detailed guide about writing documentation and our docstring format [here](writing-documentation). It is always to good to remind oneself that documentation should +correct and included all necessary inputs and outputs. We have a detailed guide about writing documentation and our docstring format [here](writing-documentation). It is always good to remind oneself that documentation should be treated at least as carefully as the code in ๐Ÿค— Transformers since the documentation is usually the first contact point of the community with the model. diff --git a/docs/source/en/add_new_pipeline.md b/docs/source/en/add_new_pipeline.md index 70f62bf9909e22..1e5b95e9b48cfc 100644 --- a/docs/source/en/add_new_pipeline.md +++ b/docs/source/en/add_new_pipeline.md @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer. # How to create a custom pipeline? -In this guide, we will see how to create a custom pipeline and share it on the [Hub](hf.co/models) or add it to the +In this guide, we will see how to create a custom pipeline and share it on the [Hub](https://hf.co/models) or add it to the ๐Ÿค— Transformers library. First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes, @@ -208,14 +208,10 @@ from transformers import pipeline classifier = pipeline("pair-classification", model="sgugger/finetuned-bert-mrpc") ``` -Then we can share it on the Hub by using the `save_pretrained` method in a `Repository`: +Then we can share it on the Hub by using the `push_to_hub` method: ```py -from huggingface_hub import Repository - -repo = Repository("test-dynamic-pipeline", clone_from="{your_username}/test-dynamic-pipeline") -classifier.save_pretrained("test-dynamic-pipeline") -repo.push_to_hub() +classifier.push_to_hub("test-dynamic-pipeline") ``` This will copy the file where you defined `PairClassificationPipeline` inside the folder `"test-dynamic-pipeline"`, diff --git a/docs/source/en/add_tensorflow_model.md b/docs/source/en/add_tensorflow_model.md deleted file mode 100644 index 7ea81a9fe976bb..00000000000000 --- a/docs/source/en/add_tensorflow_model.md +++ /dev/null @@ -1,356 +0,0 @@ - - -# How to convert a ๐Ÿค— Transformers model to TensorFlow? - -Having multiple frameworks available to use with ๐Ÿค— Transformers gives you flexibility to play their strengths when -designing your application, but it implies that compatibility must be added on a per-model basis. The good news is that -adding TensorFlow compatibility to an existing model is simpler than [adding a new model from scratch](add_new_model)! -Whether you wish to have a deeper understanding of large TensorFlow models, make a major open-source contribution, or -enable TensorFlow for your model of choice, this guide is for you. - -This guide empowers you, a member of our community, to contribute TensorFlow model weights and/or -architectures to be used in ๐Ÿค— Transformers, with minimal supervision from the Hugging Face team. Writing a new model -is no small feat, but hopefully this guide will make it less of a rollercoaster ๐ŸŽข and more of a walk in the park ๐Ÿšถ. -Harnessing our collective experiences is absolutely critical to make this process increasingly easier, and thus we -highly encourage that you suggest improvements to this guide! - -Before you dive deeper, it is recommended that you check the following resources if you're new to ๐Ÿค— Transformers: -- [General overview of ๐Ÿค— Transformers](add_new_model#general-overview-of-transformers) -- [Hugging Face's TensorFlow Philosophy](https://huggingface.co/blog/tensorflow-philosophy) - -In the remainder of this guide, you will learn what's needed to add a new TensorFlow model architecture, the -procedure to convert PyTorch into TensorFlow model weights, and how to efficiently debug mismatches across ML -frameworks. Let's get started! - - - -Are you unsure whether the model you wish to use already has a corresponding TensorFlow architecture? - -  - -Check the `model_type` field of the `config.json` of your model of choice -([example](https://huggingface.co/bert-base-uncased/blob/main/config.json#L14)). If the corresponding model folder in -๐Ÿค— Transformers has a file whose name starts with "modeling_tf", it means that it has a corresponding TensorFlow -architecture ([example](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert)). - - - - -## Step-by-step guide to add TensorFlow model architecture code - -There are many ways to design a large model architecture, and multiple ways of implementing said design. However, -you might recall from our [general overview of ๐Ÿค— Transformers](add_new_model#general-overview-of-transformers) -that we are an opinionated bunch - the ease of use of ๐Ÿค— Transformers relies on consistent design choices. From -experience, we can tell you a few important things about adding TensorFlow models: - -- Don't reinvent the wheel! More often than not, there are at least two reference implementations you should check: the -PyTorch equivalent of the model you are implementing and other TensorFlow models for the same class of problems. -- Great model implementations survive the test of time. This doesn't happen because the code is pretty, but rather -because the code is clear, easy to debug and build upon. If you make the life of the maintainers easy with your -TensorFlow implementation, by replicating the same patterns as in other TensorFlow models and minimizing the mismatch -to the PyTorch implementation, you ensure your contribution will be long lived. -- Ask for help when you're stuck! The ๐Ÿค— Transformers team is here to help, and we've probably found solutions to the same -problems you're facing. - -Here's an overview of the steps needed to add a TensorFlow model architecture: -1. Select the model you wish to convert -2. Prepare transformers dev environment -3. (Optional) Understand theoretical aspects and the existing implementation -4. Implement the model architecture -5. Implement model tests -6. Submit the pull request -7. (Optional) Build demos and share with the world - -### 1.-3. Prepare your model contribution - -**1. Select the model you wish to convert** - -Let's start off with the basics: the first thing you need to know is the architecture you want to convert. If you -don't have your eyes set on a specific architecture, asking the ๐Ÿค— Transformers team for suggestions is a great way to -maximize your impact - we will guide you towards the most prominent architectures that are missing on the TensorFlow -side. If the specific model you want to use with TensorFlow already has a TensorFlow architecture implementation in -๐Ÿค— Transformers but is lacking weights, feel free to jump straight into the -[weight conversion section](#adding-tensorflow-weights-to-hub) -of this page. - -For simplicity, the remainder of this guide assumes you've decided to contribute with the TensorFlow version of -*BrandNewBert* (the same example as in the [guide](add_new_model) to add a new model from scratch). - - - -Before starting the work on a TensorFlow model architecture, double-check that there is no ongoing effort to do so. -You can search for `BrandNewBert` on the -[pull request GitHub page](https://github.com/huggingface/transformers/pulls?q=is%3Apr) to confirm that there is no -TensorFlow-related pull request. - - - - -**2. Prepare transformers dev environment** - -Having selected the model architecture, open a draft PR to signal your intention to work on it. Follow the -instructions below to set up your environment and open a draft PR. - -1. Fork the [repository](https://github.com/huggingface/transformers) by clicking on the 'Fork' button on the - repository's page. This creates a copy of the code under your GitHub user account. - -2. Clone your `transformers` fork to your local disk, and add the base repository as a remote: - -```bash -git clone https://github.com/[your Github handle]/transformers.git -cd transformers -git remote add upstream https://github.com/huggingface/transformers.git -``` - -3. Set up a development environment, for instance by running the following command: - -```bash -python -m venv .env -source .env/bin/activate -pip install -e ".[dev]" -``` - -Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a -failure with this command. If that's the case make sure to install TensorFlow then do: - -```bash -pip install -e ".[quality]" -``` - -**Note:** You don't need to have CUDA installed. Making the new model work on CPU is sufficient. - -4. Create a branch with a descriptive name from your main branch - -```bash -git checkout -b add_tf_brand_new_bert -``` - -5. Fetch and rebase to current main - -```bash -git fetch upstream -git rebase upstream/main -``` - -6. Add an empty `.py` file in `transformers/src/models/brandnewbert/` named `modeling_tf_brandnewbert.py`. This will -be your TensorFlow model file. - -7. Push the changes to your account using: - -```bash -git add . -git commit -m "initial commit" -git push -u origin add_tf_brand_new_bert -``` - -8. Once you are satisfied, go to the webpage of your fork on GitHub. Click on โ€œPull requestโ€. Make sure to add the - GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for - future changes. - -9. Change the PR into a draft by clicking on โ€œConvert to draftโ€ on the right of the GitHub pull request web page. - - -Now you have set up a development environment to port *BrandNewBert* to TensorFlow in ๐Ÿค— Transformers. - - -**3. (Optional) Understand theoretical aspects and the existing implementation** - -You should take some time to read *BrandNewBert's* paper, if such descriptive work exists. There might be large -sections of the paper that are difficult to understand. If this is the case, this is fine - don't worry! The goal is -not to get a deep theoretical understanding of the paper, but to extract the necessary information required to -effectively re-implement the model in ๐Ÿค— Transformers using TensorFlow. That being said, you don't have to spend too -much time on the theoretical aspects, but rather focus on the practical ones, namely the existing model documentation -page (e.g. [model docs for BERT](model_doc/bert)). - -After you've grasped the basics of the models you are about to implement, it's important to understand the existing -implementation. This is a great chance to confirm that a working implementation matches your expectations for the -model, as well as to foresee technical challenges on the TensorFlow side. - -It's perfectly natural that you feel overwhelmed with the amount of information that you've just absorbed. It is -definitely not a requirement that you understand all facets of the model at this stage. Nevertheless, we highly -encourage you to clear any pressing questions in our [forum](https://discuss.huggingface.co/). - - -### 4. Model implementation - -Now it's time to finally start coding. Our suggested starting point is the PyTorch file itself: copy the contents of -`modeling_brand_new_bert.py` inside `src/transformers/models/brand_new_bert/` into -`modeling_tf_brand_new_bert.py`. The goal of this section is to modify the file and update the import structure of -๐Ÿค— Transformers such that you can import `TFBrandNewBert` and -`TFBrandNewBert.from_pretrained(model_repo, from_pt=True)` successfully loads a working TensorFlow *BrandNewBert* model. - -Sadly, there is no prescription to convert a PyTorch model into TensorFlow. You can, however, follow our selection of -tips to make the process as smooth as possible: -- Prepend `TF` to the name of all classes (e.g. `BrandNewBert` becomes `TFBrandNewBert`). -- Most PyTorch operations have a direct TensorFlow replacement. For example, `torch.nn.Linear` corresponds to - `tf.keras.layers.Dense`, `torch.nn.Dropout` corresponds to `tf.keras.layers.Dropout`, etc. If you're not sure - about a specific operation, you can use the [TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf) - or the [PyTorch documentation](https://pytorch.org/docs/stable/). -- Look for patterns in the ๐Ÿค— Transformers codebase. If you come across a certain operation that doesn't have a direct - replacement, the odds are that someone else already had the same problem. -- By default, keep the same variable names and structure as in PyTorch. This will make it easier to debug, track - issues, and add fixes down the line. -- Some layers have different default values in each framework. A notable example is the batch normalization layer's - epsilon (`1e-5` in [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d) - and `1e-3` in [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization)). - Double-check the documentation! -- PyTorch's `nn.Parameter` variables typically need to be initialized within TF Layer's `build()`. See the following - example: [PyTorch](https://github.com/huggingface/transformers/blob/655f72a6896c0533b1bdee519ed65a059c2425ac/src/transformers/models/vit_mae/modeling_vit_mae.py#L212) / - [TensorFlow](https://github.com/huggingface/transformers/blob/655f72a6896c0533b1bdee519ed65a059c2425ac/src/transformers/models/vit_mae/modeling_tf_vit_mae.py#L220) -- If the PyTorch model has a `#copied from ...` on top of a function, the odds are that your TensorFlow model can also - borrow that function from the architecture it was copied from, assuming it has a TensorFlow architecture. -- Assigning the `name` attribute correctly in TensorFlow functions is critical to do the `from_pt=True` weight - cross-loading. `name` is almost always the name of the corresponding variable in the PyTorch code. If `name` is not - properly set, you will see it in the error message when loading the model weights. -- The logic of the base model class, `BrandNewBertModel`, will actually reside in `TFBrandNewBertMainLayer`, a Keras - layer subclass ([example](https://github.com/huggingface/transformers/blob/4fd32a1f499e45f009c2c0dea4d81c321cba7e02/src/transformers/models/bert/modeling_tf_bert.py#L719)). - `TFBrandNewBertModel` will simply be a wrapper around this layer. -- Keras models need to be built in order to load pretrained weights. For that reason, `TFBrandNewBertPreTrainedModel` - will need to hold an example of inputs to the model, the `dummy_inputs` - ([example](https://github.com/huggingface/transformers/blob/4fd32a1f499e45f009c2c0dea4d81c321cba7e02/src/transformers/models/bert/modeling_tf_bert.py#L916)). -- If you get stuck, ask for help - we're here to help you! ๐Ÿค— - -In addition to the model file itself, you will also need to add the pointers to the model classes and related -documentation pages. You can complete this part entirely following the patterns in other PRs -([example](https://github.com/huggingface/transformers/pull/18020/files)). Here's a list of the needed manual -changes: -- Include all public classes of *BrandNewBert* in `src/transformers/__init__.py` -- Add *BrandNewBert* classes to the corresponding Auto classes in `src/transformers/models/auto/modeling_tf_auto.py` -- Add the lazy loading classes related to *BrandNewBert* in `src/transformers/utils/dummy_tf_objects.py` -- Update the import structures for the public classes in `src/transformers/models/brand_new_bert/__init__.py` -- Add the documentation pointers to the public methods of *BrandNewBert* in `docs/source/en/model_doc/brand_new_bert.md` -- Add yourself to the list of contributors to *BrandNewBert* in `docs/source/en/model_doc/brand_new_bert.md` -- Finally, add a green tick โœ… to the TensorFlow column of *BrandNewBert* in `docs/source/en/index.md` - -When you're happy with your implementation, run the following checklist to confirm that your model architecture is -ready: -1. All layers that behave differently at train time (e.g. Dropout) are called with a `training` argument, which is -propagated all the way from the top-level classes -2. You have used `#copied from ...` whenever possible -3. `TFBrandNewBertMainLayer` and all classes that use it have their `call` function decorated with `@unpack_inputs` -4. `TFBrandNewBertMainLayer` is decorated with `@keras_serializable` -5. A TensorFlow model can be loaded from PyTorch weights using `TFBrandNewBert.from_pretrained(model_repo, from_pt=True)` -6. You can call the TensorFlow model using the expected input format - - -### 5. Add model tests - -Hurray, you've implemented a TensorFlow model! Now it's time to add tests to make sure that your model behaves as -expected. As in the previous section, we suggest you start by copying the `test_modeling_brand_new_bert.py` file in -`tests/models/brand_new_bert/` into `test_modeling_tf_brand_new_bert.py`, and continue by making the necessary -TensorFlow replacements. For now, in all `.from_pretrained()` calls, you should use the `from_pt=True` flag to load -the existing PyTorch weights. - -After you're done, it's time for the moment of truth: run the tests! ๐Ÿ˜ฌ - -```bash -NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ -py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py -``` - -The most likely outcome is that you'll see a bunch of errors. Don't worry, this is expected! Debugging ML models is -notoriously hard, and the key ingredient to success is patience (and `breakpoint()`). In our experience, the hardest -problems arise from subtle mismatches between ML frameworks, for which we have a few pointers at the end of this guide. -In other cases, a general test might not be directly applicable to your model, in which case we suggest an override -at the model test class level. Regardless of the issue, don't hesitate to ask for help in your draft pull request if -you're stuck. - -When all tests pass, congratulations, your model is nearly ready to be added to the ๐Ÿค— Transformers library! ๐ŸŽ‰ - -### 6.-7. Ensure everyone can use your model - -**6. Submit the pull request** - -Once you're done with the implementation and the tests, it's time to submit a pull request. Before pushing your code, -run our code formatting utility, `make fixup` ๐Ÿช„. This will automatically fix any formatting issues, which would cause -our automatic checks to fail. - -It's now time to convert your draft pull request into a real pull request. To do so, click on the "Ready for -review" button and add Joao (`@gante`) and Matt (`@Rocketknight1`) as reviewers. A model pull request will need -at least 3 reviewers, but they will take care of finding appropriate additional reviewers for your model. - -After all reviewers are happy with the state of your PR, the final action point is to remove the `from_pt=True` flag in -`.from_pretrained()` calls. Since there are no TensorFlow weights, you will have to add them! Check the section -below for instructions on how to do it. - -Finally, when the TensorFlow weights get merged, you have at least 3 reviewer approvals, and all CI checks are -green, double-check the tests locally one last time - -```bash -NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 \ -py.test -vv tests/models/brand_new_bert/test_modeling_tf_brand_new_bert.py -``` - -and we will merge your PR! Congratulations on the milestone ๐ŸŽ‰ - -**7. (Optional) Build demos and share with the world** - -One of the hardest parts about open-source is discovery. How can the other users learn about the existence of your -fabulous TensorFlow contribution? With proper communication, of course! ๐Ÿ“ฃ - -There are two main ways to share your model with the community: -- Build demos. These include Gradio demos, notebooks, and other fun ways to show off your model. We highly - encourage you to add a notebook to our [community-driven demos](https://huggingface.co/docs/transformers/community). -- Share stories on social media like Twitter and LinkedIn. You should be proud of your work and share - your achievement with the community - your model can now be used by thousands of engineers and researchers around - the world ๐ŸŒ! We will be happy to retweet your posts and help you share your work with the community. - - -## Adding TensorFlow weights to ๐Ÿค— Hub - -Assuming that the TensorFlow model architecture is available in ๐Ÿค— Transformers, converting PyTorch weights into -TensorFlow weights is a breeze! - -Here's how to do it: -1. Make sure you are logged into your Hugging Face account in your terminal. You can log in using the command - `huggingface-cli login` (you can find your access tokens [here](https://huggingface.co/settings/tokens)) -2. Run `transformers-cli pt-to-tf --model-name foo/bar`, where `foo/bar` is the name of the model repository - containing the PyTorch weights you want to convert -3. Tag `@joaogante` and `@Rocketknight1` in the ๐Ÿค— Hub PR the command above has just created - -That's it! ๐ŸŽ‰ - - -## Debugging mismatches across ML frameworks ๐Ÿ› - -At some point, when adding a new architecture or when creating TensorFlow weights for an existing architecture, you -might come across errors complaining about mismatches between PyTorch and TensorFlow. You might even decide to open the -model architecture code for the two frameworks, and find that they look identical. What's going on? ๐Ÿค” - -First of all, let's talk about why understanding these mismatches matters. Many community members will use ๐Ÿค— -Transformers models out of the box, and trust that our models behave as expected. When there is a large mismatch -between the two frameworks, it implies that the model is not following the reference implementation for at least one -of the frameworks. This might lead to silent failures, in which the model runs but has poor performance. This is -arguably worse than a model that fails to run at all! To that end, we aim at having a framework mismatch smaller than -`1e-5` at all stages of the model. - -As in other numerical problems, the devil is in the details. And as in any detail-oriented craft, the secret -ingredient here is patience. Here is our suggested workflow for when you come across this type of issues: -1. Locate the source of mismatches. The model you're converting probably has near identical inner variables up to a - certain point. Place `breakpoint()` statements in the two frameworks' architectures, and compare the values of the - numerical variables in a top-down fashion until you find the source of the problems. -2. Now that you've pinpointed the source of the issue, get in touch with the ๐Ÿค— Transformers team. It is possible - that we've seen a similar problem before and can promptly provide a solution. As a fallback, scan popular pages - like StackOverflow and GitHub issues. -3. If there is no solution in sight, it means you'll have to go deeper. The good news is that you've located the - issue, so you can focus on the problematic instruction, abstracting away the rest of the model! The bad news is - that you'll have to venture into the source implementation of said instruction. In some cases, you might find an - issue with a reference implementation - don't abstain from opening an issue in the upstream repository. - -In some cases, in discussion with the ๐Ÿค— Transformers team, we might find that fixing the mismatch is infeasible. -When the mismatch is very small in the output layers of the model (but potentially large in the hidden states), we -might decide to ignore it in favor of distributing the model. The `pt-to-tf` CLI mentioned above has a `--max-error` -flag to override the error message at weight conversion time. diff --git a/docs/source/en/agents.md b/docs/source/en/agents.md new file mode 100644 index 00000000000000..67c4b8a91b2413 --- /dev/null +++ b/docs/source/en/agents.md @@ -0,0 +1,564 @@ + +# Agents and tools + +[[open-in-colab]] + +### What is an agent? + +Large Language Models (LLMs) trained to perform [causal language modeling](./tasks/language_modeling.) can tackle a wide range of tasks, but they often struggle with basic tasks like logic, calculation, and search. When prompted in domains in which they do not perform well, they often fail to generate the answer we expect them to. + +One approach to overcome this weakness is to create an *agent*. + +An agent is a system that uses an LLM as its engine, and it has access to functions called *tools*. + +These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them. + +The agent can be programmed to: +- devise a series of actions/tools and run them all at once like the [`CodeAgent`] for example +- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one like the [`ReactJsonAgent`] for example + +### Types of agents + +#### Code agent + +This agent has a planning step, then generates python code to execute all its actions at once. It natively handles different input and output types for its tools, thus it is the recommended choice for multimodal tasks. + +#### React agents + +This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations. + +We implement two versions of ReactJsonAgent: +- [`ReactJsonAgent`] generates tool calls as a JSON in its output. +- [`ReactCodeAgent`] is a new type of ReactJsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance. + +> [!TIP] +> Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more the ReAct agent. + +![Framework of a React Agent](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open-source-llms-as-agents/ReAct.png) + +For example, here is how a ReAct Code agent would work its way through the following question. + +```py3 +>>> agent.run( +... "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?", +... ) +=====New task===== +How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need? +====Agent is executing the code below: +bert_blocks = search(query="number of blocks in BERT base encoder") +print("BERT blocks:", bert_blocks) +==== +Print outputs: +BERT blocks: twelve encoder blocks + +====Agent is executing the code below: +attention_layer = search(query="number of layers in Attention is All You Need") +print("Attention layers:", attention_layer) +==== +Print outputs: +Attention layers: Encoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position- 2 Page 3 Figure 1: The Transformer - model architecture. + +====Agent is executing the code below: +bert_blocks = 12 +attention_layers = 6 +diff = bert_blocks - attention_layers +print("Difference in blocks:", diff) +final_answer(diff) +==== + +Print outputs: +Difference in blocks: 6 + +Final answer: 6 +``` + +### How can I build an agent? + +To initialize an agent, you need these arguments: + +- an LLM to power your agent - the agent is not exactly the LLM, itโ€™s more like the agent is a program that uses an LLM as its engine. +- a system prompt: what the LLM engine will be prompted with to generate its output +- a toolbox from which the agent pick tools to execute +- a parser to extract from the LLM output which tools are to call and with which arguments + +Upon initialization of the agent system, the tool attributes are used to generate a tool description, then baked into the agentโ€™s `system_prompt` to let it know which tools it can use and why. + +To start with, please install the `agents` extras in order to install all default dependencies. + +```bash +pip install transformers[agents] +``` + +Build your LLM engine by defining a `llm_engine` method which accepts a list of [messages](./chat_templating.) and returns text. This callable also needs to accept a `stop` argument that indicates when to stop generating. + +```python +from huggingface_hub import login, InferenceClient + +login("") + +client = InferenceClient(model="meta-llama/Meta-Llama-3-70B-Instruct") + +def llm_engine(messages, stop_sequences=["Task"]) -> str: + response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000) + answer = response.choices[0].message.content + return answer +``` + +You could use any `llm_engine` method as long as: +1. it follows the [messages format](./chat_templating.md) (`List[Dict[str, str]]`) for its input `messages`, and it returns a `str`. +2. it stops generating outputs at the sequences passed in the argument `stop_sequences` + +Additionally, `llm_engine` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to llm_engine, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs. + +You will also need a `tools` argument which accepts a list of `Tools` - it can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`. + +Now you can create an agent, like [`CodeAgent`], and run it. For convenience, we also provide the [`HfEngine`] class that uses `huggingface_hub.InferenceClient` under the hood. + +```python +from transformers import CodeAgent, HfEngine + +llm_engine = HfEngine(model="meta-llama/Meta-Llama-3-70B-Instruct") +agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) + +agent.run( + "Could you translate this sentence from French, say it out loud and return the audio.", + sentence="Oรน est la boulangerie la plus proche?", +) +``` + +This will be handy in case of emergency baguette need! +You can even leave the argument `llm_engine` undefined, and an [`HfEngine`] will be created by default. + +```python +from transformers import CodeAgent + +agent = CodeAgent(tools=[], add_base_tools=True) + +agent.run( + "Could you translate this sentence from French, say it out loud and give me the audio.", + sentence="Oรน est la boulangerie la plus proche?", +) +``` + +Note that we used an additional `sentence` argument: you can pass text as additional arguments to the model. + +You can also use this to indicate the path to local or remote files for the model to use: + +```py +from transformers import ReactCodeAgent + +agent = ReactCodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) + +agent.run("Why does Mike not know many people in New York?", audio="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3") +``` + + +The prompt and output parser were automatically defined, but you can easily inspect them by calling the `system_prompt_template` on your agent. + +```python +print(agent.system_prompt_template) +``` + +It's important to explain as clearly as possible the task you want to perform. +Every [`~Agent.run`] operation is independent, and since an agent is powered by an LLM, minor variations in your prompt might yield completely different results. +You can also run an agent consecutively for different tasks: each time the attributes `agent.task` and `agent.logs` will be re-initialized. + + +#### Code execution + +A Python interpreter executes the code on a set of inputs passed along with your tools. +This should be safe because the only functions that can be called are the tools you provided (especially if it's only tools by Hugging Face) and the print function, so you're already limited in what can be executed. + +The Python interpreter also doesn't allow imports by default outside of a safe list, so all the most obvious attacks shouldn't be an issue. +You can still authorize additional imports by passing the authorized modules as a list of strings in argument `additional_authorized_imports` upon initialization of your [`ReactCodeAgent`] or [`CodeAgent`]: + +```py +>>> from transformers import ReactCodeAgent + +>>> agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4']) +>>> agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?") + +(...) +'Hugging Face โ€“ Blog' +``` + +The execution will stop at any code trying to perform an illegal operation or if there is a regular Python error with the code generated by the agent. + +> [!WARNING] +> The LLM can generate arbitrary code that will then be executed: do not add any unsafe imports! + +### The system prompt + +An agent, or rather the LLM that drives the agent, generates an output based on the system prompt. The system prompt can be customized and tailored to the intended task. For example, check the system prompt for the [`ReactCodeAgent`] (below version is slightly simplified). + +```text +You will be given a task to solve as best you can. +You have access to the following tools: +<> + +To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. + +At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use. +Then in the 'Code:' sequence, you shold write the code in simple Python. The code sequence must end with '/End code' sequence. +During each intermediate step, you can use 'print()' to save whatever important information you will then need. +These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step. + +In the end you have to return a final answer using the `final_answer` tool. + +Here are a few examples using notional tools: +--- +{examples} + +Above example were using notional tools that might not exist for you. You only have acces to those tools: +<> +You also can perform computations in the python code you generate. + +Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```' sequence. You MUST provide at least the 'Code:' sequence to move forward. + +Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks. +Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result. + +Remember to make sure that variables you use are all defined. + +Now Begin! +``` + +The system prompt includes: +- An *introduction* that explains how the agent should behave and what tools are. +- A description of all the tools that is defined by a `<>` token that is dynamically replaced at runtime with the tools defined/chosen by the user. + - The tool description comes from the tool attributes, `name`, `description`, `inputs` and `output_type`, and a simple `jinja2` template that you can refine. +- The expected output format. + +You could improve the system prompt, for example, by adding an explanation of the output format. + +For maximum flexibility, you can overwrite the whole system prompt template by passing your custom prompt as an argument to the `system_prompt` parameter. + +```python +from transformers import ReactJsonAgent +from transformers.agents import PythonInterpreterTool + +agent = ReactJsonAgent(tools=[PythonInterpreterTool()], system_prompt="{your_custom_prompt}") +``` + +> [!WARNING] +> Please make sure to define the `<>` string somewhere in the `template` so the agent is aware +of the available tools. + + +### Inspecting an agent run + +Here are a few useful attributes to inspect what happened after a run: +- `agent.logs` stores the fine-grained logs of the agent. At every step of the agent's run, everything gets stored in a dictionary that then is appended to `agent.logs`. +- Running `agent.write_inner_memory_from_logs()` creates an inner memory of the agent's logs for the LLM to view, as a list of chat messages. This method goes over each step of the log and only stores what it's interested in as a message: for instance, it will save the system prompt and task in separate messages, then for each step it will store the LLM output as a message, and the tool call output as another message. Use this if you want a higher-level view of what has happened - but not every log will be transcripted by this method. + +## Tools + +A tool is an atomic function to be used by an agent. + +You can for instance check the [`PythonInterpreterTool`]: it has a name, a description, input descriptions, an output type, and a `__call__` method to perform the action. + +When the agent is initialized, the tool attributes are used to generate a tool description which is baked into the agent's system prompt. This lets the agent know which tools it can use and why. + +### Default toolbox + +Transformers comes with a default toolbox for empowering agents, that you can add to your agent upon initialization with argument `add_base_tools = True`: + +- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](./model_doc/donut)) +- **Image question answering**: given an image, answer a question on this image ([VILT](./model_doc/vilt)) +- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper)) +- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5)) +- **Translation**: translates a given sentence from source language to target language. +- **Python code interpreter**: runs your the LLM generated Python code in a secure environment. This tool will only be added to [`ReactJsonAgent`] if you use `add_base_tools=True`, since code-based tools can already execute Python code + + +You can manually use a tool by calling the [`load_tool`] function and a task to perform. + + +```python +from transformers import load_tool + +tool = load_tool("text-to-speech") +audio = tool("This is a text to speech tool") +``` + + +### Create a new tool + +You can create your own tool for use cases not covered by the default tools from Hugging Face. +For example, let's create a tool that returns the most downloaded model for a given task from the Hub. + +You'll start with the code below. + +```python +from huggingface_hub import list_models + +task = "text-classification" + +model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) +print(model.id) +``` + +This code can be converted into a class that inherits from the [`Tool`] superclass. + + +The custom tool needs: +- An attribute `name`, which corresponds to the name of the tool itself. The name usually describes what the tool does. Since the code returns the model with the most downloads for a task, let's name is `model_download_counter`. +- An attribute `description` is used to populate the agent's system prompt. +- An `inputs` attribute, which is a dictionary with keys `"type"` and `"description"`. It contains information that helps the Python interpreter make educated choices about the input. +- An `output_type` attribute, which specifies the output type. +- A `forward` method which contains the inference code to be executed. + + +```python +from transformers import Tool +from huggingface_hub import list_models + +class HFModelDownloadsTool(Tool): + name = "model_download_counter" + description = ( + "This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. " + "It returns the name of the checkpoint." + ) + + inputs = { + "task": { + "type": "text", + "description": "the task category (such as text-classification, depth-estimation, etc)", + } + } + output_type = "text" + + def forward(self, task: str): + model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) + return model.id +``` + +Now that the custom `HfModelDownloadsTool` class is ready, you can save it to a file named `model_downloads.py` and import it for use. + + +```python +from model_downloads import HFModelDownloadsTool + +tool = HFModelDownloadsTool() +``` + +You can also share your custom tool to the Hub by calling [`~Tool.push_to_hub`] on the tool. Make sure you've created a repository for it on the Hub and are using a token with read access. + +```python +tool.push_to_hub("{your_username}/hf-model-downloads") +``` + +Load the tool with the [`~Tool.load_tool`] function and pass it to the `tools` parameter in your agent. + +```python +from transformers import load_tool, CodeAgent + +model_download_tool = load_tool("m-ric/hf-model-downloads") +agent = CodeAgent(tools=[model_download_tool], llm_engine=llm_engine) +agent.run( + "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?" +) +``` + +You get the following: +```text +======== New task ======== +Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub? +==== Agent is executing the code below: +most_downloaded_model = model_download_counter(task="text-to-video") +print(f"The most downloaded model for the 'text-to-video' task is {most_downloaded_model}.") +==== +``` + +And the output: +`"The most downloaded model for the 'text-to-video' task is ByteDance/AnimateDiff-Lightning."` + + +### Manage your agent's toolbox + +If you have already initialized an agent, it is inconvenient to reinitialize it from scratch with a tool you want to use. With Transformers, you can manage an agent's toolbox by adding or replacing a tool. + +Let's add the `model_download_tool` to an existing agent initialized with only the default toolbox. + +```python +from transformers import CodeAgent + +agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) +agent.toolbox.add_tool(model_download_tool) +``` +Now we can leverage both the new tool and the previous text-to-speech tool: + +```python +agent.run( + "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub and return the audio?" +) +``` + + +| **Audio** | +|------------------------------------------------------------------------------------------------------------------------------------------------------| +| - -## AutoBackbone - -`AutoBackbone` lets you use pretrained models as backbones and get feature maps as outputs from different stages of the models. Below you can see how to get feature maps from a [Swin](model_doc/swin) checkpoint. - -```py ->>> from transformers import AutoImageProcessor, AutoBackbone ->>> import torch ->>> from PIL import Image ->>> import requests ->>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" ->>> image = Image.open(requests.get(url, stream=True).raw) ->>> processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") ->>> model = AutoBackbone.from_pretrained("microsoft/swin-tiny-patch4-window7-224", out_indices=(0,)) - ->>> inputs = processor(image, return_tensors="pt") ->>> outputs = model(**inputs) ->>> feature_maps = outputs.feature_maps ->>> list(feature_maps[-1].shape) -[1, 96, 56, 56] -``` diff --git a/docs/source/en/benchmarks.md b/docs/source/en/benchmarks.md index 5023d248697904..1fd61cc8de4029 100644 --- a/docs/source/en/benchmarks.md +++ b/docs/source/en/benchmarks.md @@ -48,7 +48,7 @@ The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an ```py >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments ->>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) +>>> args = PyTorchBenchmarkArguments(models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> benchmark = PyTorchBenchmark(args) ``` @@ -57,7 +57,7 @@ The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments >>> args = TensorFlowBenchmarkArguments( -... models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512] +... models=["google-bert/bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512] ... ) >>> benchmark = TensorFlowBenchmark(args) ``` @@ -89,20 +89,20 @@ An instantiated benchmark object can then simply be run by calling `benchmark.ru -------------------------------------------------------------------------------- Model Name Batch Size Seq Length Time in s -------------------------------------------------------------------------------- -bert-base-uncased 8 8 0.006 -bert-base-uncased 8 32 0.006 -bert-base-uncased 8 128 0.018 -bert-base-uncased 8 512 0.088 +google-bert/bert-base-uncased 8 8 0.006 +google-bert/bert-base-uncased 8 32 0.006 +google-bert/bert-base-uncased 8 128 0.018 +google-bert/bert-base-uncased 8 512 0.088 -------------------------------------------------------------------------------- ==================== INFERENCE - MEMORY - RESULT ==================== -------------------------------------------------------------------------------- Model Name Batch Size Seq Length Memory in MB -------------------------------------------------------------------------------- -bert-base-uncased 8 8 1227 -bert-base-uncased 8 32 1281 -bert-base-uncased 8 128 1307 -bert-base-uncased 8 512 1539 +google-bert/bert-base-uncased 8 8 1227 +google-bert/bert-base-uncased 8 32 1281 +google-bert/bert-base-uncased 8 128 1307 +google-bert/bert-base-uncased 8 512 1539 -------------------------------------------------------------------------------- ==================== ENVIRONMENT INFORMATION ==================== @@ -146,20 +146,20 @@ An instantiated benchmark object can then simply be run by calling `benchmark.ru -------------------------------------------------------------------------------- Model Name Batch Size Seq Length Time in s -------------------------------------------------------------------------------- -bert-base-uncased 8 8 0.005 -bert-base-uncased 8 32 0.008 -bert-base-uncased 8 128 0.022 -bert-base-uncased 8 512 0.105 +google-bert/bert-base-uncased 8 8 0.005 +google-bert/bert-base-uncased 8 32 0.008 +google-bert/bert-base-uncased 8 128 0.022 +google-bert/bert-base-uncased 8 512 0.105 -------------------------------------------------------------------------------- ==================== INFERENCE - MEMORY - RESULT ==================== -------------------------------------------------------------------------------- Model Name Batch Size Seq Length Memory in MB -------------------------------------------------------------------------------- -bert-base-uncased 8 8 1330 -bert-base-uncased 8 32 1330 -bert-base-uncased 8 128 1330 -bert-base-uncased 8 512 1770 +google-bert/bert-base-uncased 8 8 1330 +google-bert/bert-base-uncased 8 32 1330 +google-bert/bert-base-uncased 8 128 1330 +google-bert/bert-base-uncased 8 512 1770 -------------------------------------------------------------------------------- ==================== ENVIRONMENT INFORMATION ==================== @@ -197,7 +197,7 @@ when adding the argument `save_to_csv=True` to [`PyTorchBenchmarkArguments`] and [`TensorFlowBenchmarkArguments`] respectively. In this case, every section is saved in a separate _.csv_ file. The path to each _.csv_ file can optionally be defined via the argument data classes. -Instead of benchmarking pre-trained models via their model identifier, _e.g._ `bert-base-uncased`, the user can +Instead of benchmarking pre-trained models via their model identifier, _e.g._ `google-bert/bert-base-uncased`, the user can alternatively benchmark an arbitrary configuration of any available model class. In this case, a `list` of configurations must be inserted with the benchmark args as follows. diff --git a/docs/source/en/big_models.md b/docs/source/en/big_models.md index 9b57e433176094..0c1737af1abd7e 100644 --- a/docs/source/en/big_models.md +++ b/docs/source/en/big_models.md @@ -14,110 +14,202 @@ rendered properly in your Markdown viewer. --> -# Instantiating a big model +# Instantiate a big model -When you want to use a very big pretrained model, one challenge is to minimize the use of the RAM. The usual workflow -from PyTorch is: +A barrier to accessing very large pretrained models is the amount of memory required. When loading a pretrained PyTorch model, you usually: -1. Create your model with random weights. +1. Create a model with random weights. 2. Load your pretrained weights. -3. Put those pretrained weights in your random model. +3. Put those pretrained weights in the model. -Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you get out of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM. +The first two steps both require a full version of the model in memory and if the model weighs several GBs, you may not have enough memory for two copies of it. This problem is amplified in distributed training environments because each process loads a pretrained model and stores two copies in memory. - +> [!TIP] +> The randomly created model is initialized with "empty" tensors, which take space in memory without filling it. The random values are whatever was in this chunk of memory at the time. To improve loading speed, the [`_fast_init`](https://github.com/huggingface/transformers/blob/c9f6e5e35156e068b227dd9b15521767f6afd4d2/src/transformers/modeling_utils.py#L2710) parameter is set to `True` by default to skip the random initialization for all weights that are correctly loaded. -Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instantiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible! - - - -In this guide, we explore the solutions Transformers offer to deal with this issue. Note that this is an area of active development, so the APIs explained here may change slightly in the future. +This guide will show you how Transformers can help you load large pretrained models despite their memory requirements. ## Sharded checkpoints -Since version 4.18.0, model checkpoints that end up taking more than 10GB of space are automatically sharded in smaller pieces. In terms of having one single checkpoint when you do `model.save_pretrained(save_dir)`, you will end up with several partial checkpoints (each of which being of size < 10GB) and an index that maps parameter names to the files they are stored in. +From Transformers v4.18.0, a checkpoint larger than 10GB is automatically sharded by the [`~PreTrainedModel.save_pretrained`] method. It is split into several smaller partial checkpoints and creates an index file that maps parameter names to the files they're stored in. -You can control the maximum size before sharding with the `max_shard_size` parameter, so for the sake of an example, we'll use a normal-size models with a small shard size: let's take a traditional BERT model. +The maximum shard size is controlled with the `max_shard_size` parameter, but by default it is 5GB, because it is easier to run on free-tier GPU instances without running out of memory. -```py -from transformers import AutoModel - -model = AutoModel.from_pretrained("bert-base-cased") -``` - -If you save it using [`~PreTrainedModel.save_pretrained`], you will get a new folder with two files: the config of the model and its weights: +For example, let's shard [BioMistral/BioMistral-7B](https://hf.co/BioMistral/BioMistral-7B). ```py ->>> import os ->>> import tempfile - >>> with tempfile.TemporaryDirectory() as tmp_dir: -... model.save_pretrained(tmp_dir) +... model.save_pretrained(tmp_dir, max_shard_size="5GB") ... print(sorted(os.listdir(tmp_dir))) -['config.json', 'pytorch_model.bin'] +['config.json', 'generation_config.json', 'model-00001-of-00006.safetensors', 'model-00002-of-00006.safetensors', 'model-00003-of-00006.safetensors', 'model-00004-of-00006.safetensors', 'model-00005-of-00006.safetensors', 'model-00006-of-00006.safetensors', 'model.safetensors.index.json'] ``` -Now let's use a maximum shard size of 200MB: +The sharded checkpoint is reloaded with the [`~PreTrainedModel.from_pretrained`] method. ```py >>> with tempfile.TemporaryDirectory() as tmp_dir: -... model.save_pretrained(tmp_dir, max_shard_size="200MB") -... print(sorted(os.listdir(tmp_dir))) -['config.json', 'pytorch_model-00001-of-00003.bin', 'pytorch_model-00002-of-00003.bin', 'pytorch_model-00003-of-00003.bin', 'pytorch_model.bin.index.json'] +... model.save_pretrained(tmp_dir, max_shard_size="5GB") +... new_model = AutoModel.from_pretrained(tmp_dir) ``` -On top of the configuration of the model, we see three different weights files, and an `index.json` file which is our index. A checkpoint like this can be fully reloaded using the [`~PreTrainedModel.from_pretrained`] method: +The main advantage of sharded checkpoints for big models is that each shard is loaded after the previous one, which caps the memory usage to only the model size and the largest shard size. + +You could also directly load a sharded checkpoint inside a model without the [`~PreTrainedModel.from_pretrained`] method (similar to PyTorch's `load_state_dict()` method for a full checkpoint). In this case, use the [`~modeling_utils.load_sharded_checkpoint`] method. ```py +>>> from transformers.modeling_utils import load_sharded_checkpoint + >>> with tempfile.TemporaryDirectory() as tmp_dir: -... model.save_pretrained(tmp_dir, max_shard_size="200MB") -... new_model = AutoModel.from_pretrained(tmp_dir) +... model.save_pretrained(tmp_dir, max_shard_size="5GB") +... load_sharded_checkpoint(model, tmp_dir) ``` -The main advantage of doing this for big models is that during step 2 of the workflow shown above, each shard of the checkpoint is loaded after the previous one, capping the memory usage in RAM to the model size plus the size of the biggest shard. +### Shard metadata -Behind the scenes, the index file is used to determine which keys are in the checkpoint, and where the corresponding weights are stored. We can load that index like any json and get a dictionary: +The index file determines which keys are in the checkpoint and where the corresponding weights are stored. This file is loaded like any other JSON file and you can get a dictionary from it. ```py >>> import json >>> with tempfile.TemporaryDirectory() as tmp_dir: -... model.save_pretrained(tmp_dir, max_shard_size="200MB") -... with open(os.path.join(tmp_dir, "pytorch_model.bin.index.json"), "r") as f: +... model.save_pretrained(tmp_dir, max_shard_size="5GB") +... with open(os.path.join(tmp_dir, "model.safetensors.index.json"), "r") as f: ... index = json.load(f) >>> print(index.keys()) dict_keys(['metadata', 'weight_map']) ``` -The metadata just consists of the total size of the model for now. We plan to add other information in the future: +The `metadata` key provides the total model size. ```py >>> index["metadata"] -{'total_size': 433245184} +{'total_size': 28966928384} ``` -The weights map is the main part of this index, which maps each parameter name (as usually found in a PyTorch model `state_dict`) to the file it's stored in: +The `weight_map` key maps each parameter name (typically `state_dict` in a PyTorch model) to the shard it's stored in. ```py >>> index["weight_map"] -{'embeddings.LayerNorm.bias': 'pytorch_model-00001-of-00003.bin', - 'embeddings.LayerNorm.weight': 'pytorch_model-00001-of-00003.bin', +{'lm_head.weight': 'model-00006-of-00006.safetensors', + 'model.embed_tokens.weight': 'model-00001-of-00006.safetensors', + 'model.layers.0.input_layernorm.weight': 'model-00001-of-00006.safetensors', + 'model.layers.0.mlp.down_proj.weight': 'model-00001-of-00006.safetensors', ... +} ``` -If you want to directly load such a sharded checkpoint inside a model without using [`~PreTrainedModel.from_pretrained`] (like you would do `model.load_state_dict()` for a full checkpoint) you should use [`~modeling_utils.load_sharded_checkpoint`]: +## Accelerate's Big Model Inference + +> [!TIP] +> Make sure you have Accelerate v0.9.0 or later and PyTorch v1.9.0 or later installed. + +From Transformers v4.20.0, the [`~PreTrainedModel.from_pretrained`] method is supercharged with Accelerate's [Big Model Inference](https://hf.co/docs/accelerate/usage_guides/big_modeling) feature to efficiently handle really big models! Big Model Inference creates a *model skeleton* on PyTorch's [**meta**](https://pytorch.org/docs/main/meta.html) device. The randomly initialized parameters are only created when the pretrained weights are loaded. This way, you aren't keeping two copies of the model in memory at the same time (one for the randomly initialized model and one for the pretrained weights), and the maximum memory consumed is only the full model size. + +To enable Big Model Inference in Transformers, set `low_cpu_mem_usage=True` in the [`~PreTrainedModel.from_pretrained`] method. ```py ->>> from transformers.modeling_utils import load_sharded_checkpoint +from transformers import AutoModelForCausalLM ->>> with tempfile.TemporaryDirectory() as tmp_dir: -... model.save_pretrained(tmp_dir, max_shard_size="200MB") -... load_sharded_checkpoint(model, tmp_dir) +gemma = AutoModelForCausalLM.from_pretrained("google/gemma-7b", low_cpu_mem_usage=True) +``` + +Accelerate automatically dispatches the model weights across all available devices, starting with the fastest device (GPU) first and then offloading to the slower devices (CPU and even hard drive). This is enabled by setting `device_map="auto"` in the [`~PreTrainedModel.from_pretrained`] method. When you pass the `device_map` parameter, `low_cpu_mem_usage` is automatically set to `True` so you don't need to specify it. + +```py +from transformers import AutoModelForCausalLM + +# these loading methods are equivalent +gemma = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto") +gemma = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", low_cpu_mem_usage=True) ``` -## Low memory loading +You can also write your own `device_map` by mapping each layer to a device. It should map all model parameters to a device, but you don't have to detail where all the submodules of a layer go if the entire layer is on the same device. -Sharded checkpoints reduce the memory usage during step 2 of the workflow mentioned above, but in order to use that model in a low memory setting, we recommend leveraging our tools based on the Accelerate library. +```python +device_map = {"model.layers.1": 0, "model.layers.14": 1, "model.layers.31": "cpu", "lm_head": "disk"} +``` + +Access `hf_device_map` attribute to see how Accelerate split the model across devices. + +```py +gemma.hf_device_map +``` + +```python out +{'model.embed_tokens': 0, + 'model.layers.0': 0, + 'model.layers.1': 0, + 'model.layers.2': 0, + 'model.layers.3': 0, + 'model.layers.4': 0, + 'model.layers.5': 0, + 'model.layers.6': 0, + 'model.layers.7': 0, + 'model.layers.8': 0, + 'model.layers.9': 0, + 'model.layers.10': 0, + 'model.layers.11': 0, + 'model.layers.12': 0, + 'model.layers.13': 0, + 'model.layers.14': 'cpu', + 'model.layers.15': 'cpu', + 'model.layers.16': 'cpu', + 'model.layers.17': 'cpu', + 'model.layers.18': 'cpu', + 'model.layers.19': 'cpu', + 'model.layers.20': 'cpu', + 'model.layers.21': 'cpu', + 'model.layers.22': 'cpu', + 'model.layers.23': 'cpu', + 'model.layers.24': 'cpu', + 'model.layers.25': 'cpu', + 'model.layers.26': 'cpu', + 'model.layers.27': 'cpu', + 'model.layers.28': 'cpu', + 'model.layers.29': 'cpu', + 'model.layers.30': 'cpu', + 'model.layers.31': 'cpu', + 'model.norm': 'cpu', + 'lm_head': 'cpu'} +``` -Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading) +## Model data type + +PyTorch model weights are normally instantiated as torch.float32 and it can be an issue if you try to load a model as a different data type. For example, you'd need twice as much memory to load the weights in torch.float32 and then again to load them in your desired data type, like torch.float16. + +> [!WARNING] +> Due to how PyTorch is designed, the `torch_dtype` parameter only supports floating data types. + +To avoid wasting memory like this, explicitly set the `torch_dtype` parameter to the desired data type or set `torch_dtype="auto"` to load the weights with the most optimal memory pattern (the data type is automatically derived from the model weights). + + + + +```py +from transformers import AutoModelForCausalLM + +gemma = AutoModelForCausalLM.from_pretrained("google/gemma-7b", torch_dtype=torch.float16) +``` + + + + +```py +from transformers import AutoModelForCausalLM + +gemma = AutoModelForCausalLM.from_pretrained("google/gemma-7b", torch_dtype="auto") +``` + + + + +You can also set the data type to use for models instantiated from scratch. + +```python +import torch +from transformers import AutoConfig, AutoModel + +my_config = AutoConfig.from_pretrained("google/gemma-2b", torch_dtype=torch.float16) +model = AutoModel.from_config(my_config) +``` diff --git a/docs/source/en/chat_templating.md b/docs/source/en/chat_templating.md index a478c32e6ff393..17e11409238e21 100644 --- a/docs/source/en/chat_templating.md +++ b/docs/source/en/chat_templating.md @@ -14,7 +14,7 @@ rendered properly in your Markdown viewer. --> -# Templates for Chat Models +# Chat Templates ## Introduction @@ -121,13 +121,15 @@ Arr, 'twas easy after all! ## Is there an automated pipeline for chat? -Yes, there is: [`ConversationalPipeline`]. This pipeline is designed to make it easy to use chat models. Let's try -the `Zephyr` example again, but this time using the pipeline: +Yes, there is! Our text generation pipelines support chat inputs, which makes it easy to use chat models. In the past, +we used to use a dedicated "ConversationalPipeline" class, but this has now been deprecated and its functionality +has been merged into the [`TextGenerationPipeline`]. Let's try the `Zephyr` example again, but this time using +a pipeline: ```python from transformers import pipeline -pipe = pipeline("conversational", "HuggingFaceH4/zephyr-7b-beta") +pipe = pipeline("text-generation", "HuggingFaceH4/zephyr-7b-beta") messages = [ { "role": "system", @@ -135,17 +137,14 @@ messages = [ }, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, ] -print(pipe(messages)) +print(pipe(messages, max_new_tokens=128)[0]['generated_text'][-1]) # Print the assistant's response ``` ```text -Conversation id: 76d886a0-74bd-454e-9804-0467041a63dc -system: You are a friendly chatbot who always responds in the style of a pirate -user: How many helicopters can a human eat in one sitting? -assistant: Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all. +{'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."} ``` -[`ConversationalPipeline`] will take care of all the details of tokenization and calling `apply_chat_template` for you - +The pipeline will take care of all the details of tokenization and calling `apply_chat_template` for you - once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages! ## What are "generation prompts"? @@ -191,7 +190,7 @@ Can I ask a question?<|im_end|> Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model generates text it will write a bot response instead of doing something unexpected, like continuing the user's message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a -special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're +special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're supposed to be doing. Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any @@ -200,7 +199,8 @@ effect that `add_generation_prompt` has will depend on the template being used. ## Can I use chat templates in training? -Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you +Yes! This is a good way to ensure that the chat template matches the tokens the model sees during training. +We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you can simply continue like any other language model training task. When training, you should usually set `add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during training. Let's see an example: @@ -234,6 +234,362 @@ The sun. From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column. + + +By default, some tokenizers add special tokens like `` and `` to text they tokenize. Chat templates should +already include all the special tokens they need, and so additional special tokens will often be incorrect or +duplicated, which will hurt model performance. + +Therefore, if you format text with `apply_chat_template(tokenize=False)`, you should set the argument +`add_special_tokens=False` when you tokenize that text later. If you use `apply_chat_template(tokenize=True)`, you don't need to worry about this! + + + +## Advanced: Extra inputs to chat templates + +The only argument that `apply_chat_template` requires is `messages`. However, you can pass any keyword +argument to `apply_chat_template` and it will be accessible inside the template. This gives you a lot of freedom to use +chat templates for many things. There are no restrictions on the names or the format of these arguments - you can pass +strings, lists, dicts or whatever else you want. + +That said, there are some common use-cases for these extra arguments, +such as passing tools for function calling, or documents for retrieval-augmented generation. In these common cases, +we have some opinionated recommendations about what the names and formats of these arguments should be, which are +described in the sections below. We encourage model authors to make their chat templates compatible with this format, +to make it easy to transfer tool-calling code between models. + +## Advanced: Tool use / function calling + +"Tool use" LLMs can choose to call functions as external tools before generating an answer. When passing tools +to a tool-use model, you can simply pass a list of functions to the `tools` argument: + +```python +import datetime + +def current_time(): + """Get the current local time as a string.""" + return str(datetime.now()) + +def multiply(a: float, b: float): + """ + A function that multiplies two numbers + + Args: + a: The first number to multiply + b: The second number to multiply + """ + return a * b + +tools = [current_time, multiply] + +model_input = tokenizer.apply_chat_template( + messages, + tools=tools +) +``` + +In order for this to work correctly, you should write your functions in the format above, so that they can be parsed +correctly as tools. Specifically, you should follow these rules: + +- The function should have a descriptive name +- Every argument must have a type hint +- The function must have a docstring in the standard Google style (in other words, an initial function description + followed by an `Args:` block that describes the arguments, unless the function does not have any arguments. +- Do not include types in the `Args:` block. In other words, write `a: The first number to multiply`, not + `a (int): The first number to multiply`. Type hints should go in the function header instead. +- The function can have a return type and a `Returns:` block in the docstring. However, these are optional + because most tool-use models ignore them. + +### Passing tool results to the model + +The sample code above is enough to list the available tools for your model, but what happens if it wants to actually use +one? If that happens, you should: + +1. Parse the model's output to get the tool name(s) and arguments. +2. Add the model's tool call(s) to the conversation. +3. Call the corresponding function(s) with those arguments. +4. Add the result(s) to the conversation + +### A complete tool use example + +Let's walk through a tool use example, step by step. For this example, we will use an 8B `Hermes-2-Pro` model, +as it is one of the highest-performing tool-use models in its size category at the time of writing. If you have the +memory, you can consider using a larger model instead like [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-v01) +or [Mixtral-8x22B](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), both of which also support tool use +and offer even stronger performance. + +First, let's load our model and tokenizer: + +```python +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer + +checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B" + +tokenizer = AutoTokenizer.from_pretrained(checkpoint) +model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto") +``` + +Next, let's define a list of tools: + +```python +def get_current_temperature(location: str, unit: str) -> float: + """ + Get the current temperature at a location. + + Args: + location: The location to get the temperature for, in the format "City, Country" + unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"]) + Returns: + The current temperature at the specified location in the specified units, as a float. + """ + return 22. # A real function should probably actually get the temperature! + +def get_current_wind_speed(location: str) -> float: + """ + Get the current wind speed in km/h at a given location. + + Args: + location: The location to get the temperature for, in the format "City, Country" + Returns: + The current wind speed at the given location in km/h, as a float. + """ + return 6. # A real function should probably actually get the wind speed! + +tools = [get_current_temperature, get_current_wind_speed] +``` + +Now, let's set up a conversation for our bot: + +```python +messages = [ + {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."}, + {"role": "user", "content": "Hey, what's the temperature in Paris right now?"} +] +``` + +Now, let's apply the chat template and generate a response: + +```python +inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt") +inputs = {k: v.to(model.device) for k, v in inputs.items()} +out = model.generate(**inputs, max_new_tokens=128) +print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):])) +``` + +And we get: + +```text + +{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"} +<|im_end|> +``` + +The model has called the function with valid arguments, in the format requested by the function docstring. It has +inferred that we're most likely referring to the Paris in France, and it remembered that, as the home of SI units, +the temperature in France should certainly be displayed in Celsius. + + + +The output format above is specific to the `Hermes-2-Pro` model we're using in this example. Other models may emit different +tool call formats, and you may need to do some manual parsing at this step. For example, `Llama-3.1` models will emit +slightly different JSON, with `parameters` instead of `arguments`. Regardless of the format the model outputs, you +should add the tool call to the conversation in the format below, with `tool_calls`, `function` and `arguments` keys. + + + +Next, let's append the model's tool call to the conversation. + +```python +tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}} +messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]}) +``` + + +Now that we've added the tool call to the conversation, we can call the function and append the result to the +conversation. Since we're just using a dummy function for this example that always returns 22.0, we can just append +that result directly. + +```python +messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"}) +``` + + + +Some model architectures, notably Mistral/Mixtral, also require a `tool_call_id` here, which should be +9 randomly-generated alphanumeric characters, and assigned to the `id` key of the tool call +dictionary. The same key should also be assigned to the `tool_call_id` key of the tool response dictionary below, so +that tool calls can be matched to tool responses. So, for Mistral/Mixtral models, the code above would be: + +```python +tool_call_id = "9Ae3bDc2F" # Random ID, 9 alphanumeric characters +tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}} +messages.append({"role": "assistant", "tool_calls": [{"type": "function", "id": tool_call_id, "function": tool_call}]}) +``` + +and + +```python +messages.append({"role": "tool", "tool_call_id": tool_call_id, "name": "get_current_temperature", "content": "22.0"}) +``` + + + +Finally, let's let the assistant read the function outputs and continue chatting with the user: + +```python +inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt") +inputs = {k: v.to(model.device) for k, v in inputs.items()} +out = model.generate(**inputs, max_new_tokens=128) +print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):])) +``` + +And we get: + +```text +The current temperature in Paris, France is 22.0 ยฐ Celsius.<|im_end|> +``` + +Although this was a simple demo with dummy tools and a single call, the same technique works with +multiple real tools and longer conversations. This can be a powerful way to extend the capabilities of conversational +agents with real-time information, computational tools like calculators, or access to large databases. + +### Understanding tool schemas + +Each function you pass to the `tools` argument of `apply_chat_template` is converted into a +[JSON schema](https://json-schema.org/learn/getting-started-step-by-step). These schemas +are then passed to the model chat template. In other words, tool-use models do not see your functions directly, and they +never see the actual code inside them. What they care about is the function **definitions** and the **arguments** they +need to pass to them - they care about what the tools do and how to use them, not how they work! It is up to you +to read their outputs, detect if they have requested to use a tool, pass their arguments to the tool function, and +return the response in the chat. + +Generating JSON schemas to pass to the template should be automatic and invisible as long as your functions +follow the specification above, but if you encounter problems, or you simply want more control over the conversion, +you can handle the conversion manually. Here is an example of a manual schema conversion. + +```python +from transformers.utils import get_json_schema + +def multiply(a: float, b: float): + """ + A function that multiplies two numbers + + Args: + a: The first number to multiply + b: The second number to multiply + """ + return a * b + +schema = get_json_schema(multiply) +print(schema) +``` + +This will yield: + +```json +{ + "type": "function", + "function": { + "name": "multiply", + "description": "A function that multiplies two numbers", + "parameters": { + "type": "object", + "properties": { + "a": { + "type": "number", + "description": "The first number to multiply" + }, + "b": { + "type": "number", + "description": "The second number to multiply" + } + }, + "required": ["a", "b"] + } + } +} +``` + +If you wish, you can edit these schemas, or even write them from scratch yourself without using `get_json_schema` at +all. JSON schemas can be passed directly to the `tools` argument of +`apply_chat_template` - this gives you a lot of power to define precise schemas for more complex functions. Be careful, +though - the more complex your schemas, the more likely the model is to get confused when dealing with them! We +recommend simple function signatures where possible, keeping arguments (and especially complex, nested arguments) +to a minimum. + +Here is an example of defining schemas by hand, and passing them directly to `apply_chat_template`: + +```python +# A simple function that takes no arguments +current_time = { + "type": "function", + "function": { + "name": "current_time", + "description": "Get the current local time as a string.", + "parameters": { + 'type': 'object', + 'properties': {} + } + } +} + +# A more complete function that takes two numerical arguments +multiply = { + 'type': 'function', + 'function': { + 'name': 'multiply', + 'description': 'A function that multiplies two numbers', + 'parameters': { + 'type': 'object', + 'properties': { + 'a': { + 'type': 'number', + 'description': 'The first number to multiply' + }, + 'b': { + 'type': 'number', 'description': 'The second number to multiply' + } + }, + 'required': ['a', 'b'] + } + } +} + +model_input = tokenizer.apply_chat_template( + messages, + tools = [current_time, multiply] +) +``` + +## Advanced: Retrieval-augmented generation + +"Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding +to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our +recommendation for RAG models is that their template +should accept a `documents` argument. This should be a list of documents, where each "document" +is a single dict with `title` and `contents` keys, both of which are strings. Because this format is much simpler +than the JSON schemas used for tools, no helper functions are necessary. + +Here's an example of a RAG template in action: + +```python +document1 = { + "title": "The Moon: Our Age-Old Foe", + "contents": "Man has always dreamed of destroying the moon. In this essay, I shall..." +} + +document2 = { + "title": "The Sun: Our Age-Old Friend", + "contents": "Although often underappreciated, the sun provides several notable benefits..." +} + +model_input = tokenizer.apply_chat_template( + messages, + documents=[document1, document2] +) +``` + ## Advanced: How do chat templates work? The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the @@ -244,27 +600,25 @@ default template for that model class is used instead. Let's take a look at the >>> from transformers import AutoTokenizer >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") ->>> tokenizer.default_chat_template +>>> tokenizer.chat_template "{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}" ``` -That's kind of intimidating. Let's add some newlines and indentation to make it more readable. Note that the first -newline after each block as well as any preceding whitespace before a block are ignored by default, using the -Jinja `trim_blocks` and `lstrip_blocks` flags. However, be cautious - although leading whitespace on each -line is stripped, spaces between blocks on the same line are not. We strongly recommend checking that your template -isn't printing extra spaces where it shouldn't be! +That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make +sure that the newlines and indentation we add don't end up being included in the template output - see the tip on +[trimming whitespace](#trimming-whitespace) below! ``` -{% for message in messages %} - {% if message['role'] == 'user' %} - {{ ' ' }} - {% endif %} - {{ message['content'] }} - {% if not loop.last %} - {{ ' ' }} - {% endif %} -{% endfor %} -{{ eos_token }} +{%- for message in messages %} + {%- if message['role'] == 'user' %} + {{- ' ' }} + {%- endif %} + {{- message['content'] }} + {%- if not loop.last %} + {{- ' ' }} + {%- endif %} +{%- endfor %} +{{- eos_token }} ``` If you've never seen one of these before, this is a [Jinja template](https://jinja.palletsprojects.com/en/3.1.x/templates/). @@ -293,15 +647,15 @@ similarly to the way LLaMA formats them (note that the real LLaMA template inclu messages and slightly different system message handling in general - don't use this one in your actual code!) ``` -{% for message in messages %} - {% if message['role'] == 'user' %} - {{ bos_token + '[INST] ' + message['content'] + ' [/INST]' }} - {% elif message['role'] == 'system' %} - {{ '<>\\n' + message['content'] + '\\n<>\\n\\n' }} - {% elif message['role'] == 'assistant' %} - {{ ' ' + message['content'] + ' ' + eos_token }} - {% endif %} -{% endfor %} +{%- for message in messages %} + {%- if message['role'] == 'user' %} + {{- bos_token + '[INST] ' + message['content'] + ' [/INST]' }} + {%- elif message['role'] == 'system' %} + {{- '<>\\n' + message['content'] + '\\n<>\\n\\n' }} + {%- elif message['role'] == 'assistant' %} + {{- ' ' + message['content'] + ' ' + eos_token }} + {%- endif %} +{%- endfor %} ``` Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based @@ -317,15 +671,15 @@ existing template from another model and simply edit it for your needs! For exam above and add "[ASST]" and "[/ASST]" to assistant messages: ``` -{% for message in messages %} - {% if message['role'] == 'user' %} - {{ bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} - {% elif message['role'] == 'system' %} - {{ '<>\\n' + message['content'].strip() + '\\n<>\\n\\n' }} - {% elif message['role'] == 'assistant' %} - {{ '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} - {% endif %} -{% endfor %} +{%- for message in messages %} + {%- if message['role'] == 'user' %} + {{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} + {%- elif message['role'] == 'system' %} + {{- '<>\\n' + message['content'].strip() + '\\n<>\\n\\n' }} + {%- elif message['role'] == 'assistant' %} + {{- '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} + {%- endif %} +{%- endfor %} ``` Now, simply set the `tokenizer.chat_template` attribute. Next time you use [`~PreTrainedTokenizer.apply_chat_template`], it will @@ -340,21 +694,35 @@ tokenizer.chat_template = template # Set the new template tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! ``` -The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`ConversationalPipeline`] class, so -once you set the correct chat template, your model will automatically become compatible with [`ConversationalPipeline`]. +The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`TextGenerationPipeline`] class, so +once you set the correct chat template, your model will automatically become compatible with [`TextGenerationPipeline`]. + + +If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat +control tokens as special tokens in the tokenizer. Special tokens are never split, +ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You +should also set the tokenizer's `eos_token` attribute to the token that marks the end of assistant generations in your +template. This will ensure that text generation tools can correctly figure out when to stop generating text. + + -### What are "default" templates? +### Why do some models have multiple templates? -Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards -compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a -model does not have a chat template set, but there is a default template for its model class, the `ConversationalPipeline` -class and methods like `apply_chat_template` will use the class template instead. You can find out what the default -template for your tokenizer is by checking the `tokenizer.default_chat_template` attribute. +Some models use different templates for different use cases. For example, they might use one template for normal chat +and another for tool-use, or retrieval-augmented generation. In these cases, `tokenizer.chat_template` is a dictionary. +This can cause some confusion, and where possible, we recommend using a single template for all use-cases. You can use +Jinja statements like `if tools is defined` and `{% macro %}` definitions to easily wrap multiple code paths in a +single template. -This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when -the class template is appropriate for your model, we strongly recommend overriding the default template by -setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured -for chat, and to future-proof in case the default templates are ever altered or deprecated. +When a tokenizer has multiple templates, `tokenizer.chat_template` will be a `dict`, where each key is the name +of a template. The `apply_chat_template` method has special handling for certain template names: Specifically, it will +look for a template named `default` in most cases, and will raise an error if it can't find one. However, if a template +named `tool_use` exists when the user has passed a `tools` argument, it will use that instead. To access templates +with other names, pass the name of the template you want to the `chat_template` argument of +`apply_chat_template()`. + +We find that this can be a bit confusing for users, though - so if you're writing a template yourself, we recommend +trying to put it all in a single template where possible! ### What template should I use? @@ -366,13 +734,13 @@ best performance for inference or fine-tuning when you precisely match the token If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand, you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different -input formats. Our default template for models that don't have a class-specific template follows the -[ChatML format](https://github.com/openai/openai-python/blob/main/chatml.md), and this is a good, flexible choice for many use-cases. It looks like this: +input formats. One popular choice is the `ChatML` format, and this is a good, flexible choice for many use-cases. +It looks like this: ``` -{% for message in messages %} - {{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}} -{% endfor %} +{%- for message in messages %} + {{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }} +{%- endfor %} ``` If you like this one, here it is in one-liner form, ready to copy into your code. The one-liner also includes @@ -381,7 +749,7 @@ If your model expects those, they won't be added automatically by `apply_chat_te text will be tokenized with `add_special_tokens=False`. This is to avoid potential conflicts between the template and the `add_special_tokens` logic. If your model expects special tokens, make sure to add them to the template! -``` +```python tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}" ``` @@ -398,7 +766,7 @@ I'm doing great!<|im_end|> ``` The "user", "system" and "assistant" roles are the standard for chat, and we recommend using them when it makes sense, -particularly if you want your model to operate well with [`ConversationalPipeline`]. However, you are not limited +particularly if you want your model to operate well with [`TextGenerationPipeline`]. However, you are not limited to these roles - templating is extremely flexible, and any string can be a role. ### I want to add some chat templates! How should I get started? @@ -409,7 +777,7 @@ not the model owner - if you're using a model with an empty chat template, or on template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to the model repository so that this attribute can be set properly! Once the attribute is set, that's it, you're done! `tokenizer.apply_chat_template` will now work correctly for that -model, which means it is also automatically supported in places like `ConversationalPipeline`! +model, which means it is also automatically supported in places like `TextGenerationPipeline`! By ensuring that models have this attribute, we can make sure that the whole community gets to use the full power of open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long - @@ -420,23 +788,45 @@ it's time to put an end to them! If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first write a short Python script that formats messages the way you want, and then convert that script into a template. -Remember that the template handler will receive the conversation history as a variable called `messages`. Each -message is a dictionary with two keys, `role` and `content`. You will be able to access `messages` in your template -just like you can in Python, which means you can loop over it with `{% for message in messages %}` or access -individual messages with, for example, `{{ messages[0] }}`. +Remember that the template handler will receive the conversation history as a variable called `messages`. +You will be able to access `messages` in your template just like you can in Python, which means you can loop over +it with `{% for message in messages %}` or access individual messages with `{{ messages[0] }}`, for example. You can also use the following tips to convert your code to Jinja: -### For loops +### Trimming whitespace -For loops in Jinja look like this: +By default, Jinja will print any whitespace that comes before or after a block. This can be a problem for chat +templates, which generally want to be very precise with whitespace! To avoid this, we strongly recommend writing +your templates like this: + +``` +{%- for message in messages %} + {{- message['role'] + message['content'] }} +{%- endfor %} +``` + +rather than like this: ``` {% for message in messages %} -{{ message['content'] }} + {{ message['role'] + message['content'] }} {% endfor %} ``` +Adding `-` will strip any whitespace that comes before the block. The second example looks innocent, but the newline +and indentation may end up being included in the output, which is probably not what you want! + +### For loops + +For loops in Jinja look like this: + +``` +{%- for message in messages %} + {{- message['content'] }} +{%- endfor %} +``` + Note that whatever's inside the {{ expression block }} will be printed to the output. You can use operators like `+` to combine strings inside expression blocks. @@ -445,9 +835,9 @@ Note that whatever's inside the {{ expression block }} will be printed to the ou If statements in Jinja look like this: ``` -{% if message['role'] == 'user' %} -{{ message['content'] }} -{% endif %} +{%- if message['role'] == 'user' %} + {{- message['content'] }} +{%- endif %} ``` Note how where Python uses whitespace to mark the beginnings and ends of `for` and `if` blocks, Jinja requires you @@ -463,14 +853,47 @@ conversation. Here's an example that puts these ideas together to add a generati conversation if add_generation_prompt is `True`: ``` -{% if loop.last and add_generation_prompt %} -{{ bos_token + 'Assistant:\n' }} -{% endif %} +{%- if loop.last and add_generation_prompt %} + {{- bos_token + 'Assistant:\n' }} +{%- endif %} ``` -### Notes on whitespace +### Compatibility with non-Python Jinja + +There are multiple implementations of Jinja in various languages. They generally have the same syntax, +but a key difference is that when you're writing a template in Python you can use Python methods, such as +`.lower()` on strings or `.items()` on dicts. This will break if someone tries to use your template on a non-Python +implementation of Jinja. Non-Python implementations are particularly common in deployment environments, where JS +and Rust are very popular. + +Don't panic, though! There are a few easy changes you can make to your templates to ensure they're compatible across +all implementations of Jinja: + +- Replace Python methods with Jinja filters. These usually have the same name, for example `string.lower()` becomes + `string|lower`, and `dict.items()` becomes `dict|items`. One notable change is that `string.strip()` becomes `string|trim`. + See the [list of built-in filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#builtin-filters) + in the Jinja documentation for more. +- Replace `True`, `False` and `None`, which are Python-specific, with `true`, `false` and `none`. +- Directly rendering a dict or list may give different results in other implementations (for example, string entries + might change from single-quoted to double-quoted). Adding the `tojson` filter can help to ensure consistency here. + +### Writing and debugging larger templates + +When this feature was introduced, most templates were quite small, the Jinja equivalent of a "one-liner" script. +However, with new models and features like tool-use and RAG, some templates can be 100 lines long or more. When +writing templates like these, it's a good idea to write them in a separate file, using a text editor. You can easily +extract a chat template to a file: + +```python +open("template.jinja", "w").write(tokenizer.chat_template) +``` + +Or load the edited template back into the tokenizer: + +```python +tokenizer.chat_template = open("template.jinja").read() +``` -As much as possible, we've tried to get Jinja to ignore whitespace outside of {{ expressions }}. However, be aware -that Jinja is a general-purpose templating engine, and it may treat whitespace between blocks on the same line -as significant and print it to the output. We **strongly** recommend checking that your template isn't printing extra -spaces where it shouldn't be before you upload it! \ No newline at end of file +As an added bonus, when you write a long, multi-line template in a separate file, line numbers in that file will +exactly correspond to line numbers in template parsing or execution errors. This will make it much easier to +identify the source of issues. \ No newline at end of file diff --git a/docs/source/en/community.md b/docs/source/en/community.md index 0305844a1be8c5..7890cb22ca5882 100644 --- a/docs/source/en/community.md +++ b/docs/source/en/community.md @@ -10,14 +10,14 @@ This page regroups resources around ๐Ÿค— Transformers developed by the community | Resource | Description | Author | |:----------|:-------------|------:| -| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learned/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) | +| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learned/revised using [Anki](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) | ## Community notebooks: | Notebook | Description | Author | | |:----------|:-------------|:-------------|------:| | [Fine-tune a pre-trained Transformer to generate lyrics](https://github.com/AlekseyKorshuk/huggingartists) | How to generate lyrics in the style of your favorite artist by fine-tuning a GPT-2 model | [Aleksey Korshuk](https://github.com/AlekseyKorshuk) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AlekseyKorshuk/huggingartists/blob/master/huggingartists-demo.ipynb) | -| [Train T5 in Tensorflow 2 ](https://github.com/snapthat/TF-T5-text-to-text) | How to train T5 for any task using Tensorflow 2. This notebook demonstrates a Question & Answer task implemented in Tensorflow 2 using SQUAD | [Muhammad Harris](https://github.com/HarrisDePerceptron) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snapthat/TF-T5-text-to-text/blob/master/snapthatT5/notebooks/TF-T5-Datasets%20Training.ipynb) | +| [Train T5 in Tensorflow 2](https://github.com/snapthat/TF-T5-text-to-text) | How to train T5 for any task using Tensorflow 2. This notebook demonstrates a Question & Answer task implemented in Tensorflow 2 using SQUAD | [Muhammad Harris](https://github.com/HarrisDePerceptron) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snapthat/TF-T5-text-to-text/blob/master/snapthatT5/notebooks/TF-T5-Datasets%20Training.ipynb) | | [Train T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) | How to train T5 on SQUAD with Transformers and Nlp | [Suraj Patil](https://github.com/patil-suraj) |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb#scrollTo=QLGiFCDqvuil) | | [Fine-tune T5 for Classification and Multiple Choice](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | How to fine-tune T5 for classification and multiple choice tasks using a text-to-text format with PyTorch Lightning | [Suraj Patil](https://github.com/patil-suraj) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | | [Fine-tune DialoGPT on New Datasets and Languages](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | How to fine-tune the DialoGPT model on a new dataset for open-dialog conversational chatbots | [Nathan Cooper](https://github.com/ncoop57) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | @@ -43,8 +43,8 @@ This page regroups resources around ๐Ÿค— Transformers developed by the community |[Fine-tune Roberta for sentiment analysis](https://github.com/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb) | How to fine-tune a Roberta model for sentiment analysis | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)| |[Evaluating Question Generation Models](https://github.com/flexudy-pipe/qugeev) | How accurate are the answers to questions generated by your seq2seq transformer model? | [Pascal Zoleko](https://github.com/zolekode) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bpsSqCQU-iw_5nNoRm_crPq6FRuJthq_?usp=sharing)| |[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)| -|[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)| -|[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)| +|[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *google-bert/bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)| +|[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *FacebookAI/roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)| |[Fine-tune TAPAS on Sequential Question Answering (SQA)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb) | How to fine-tune *TapasForQuestionAnswering* with a *tapas-base* checkpoint on the Sequential Question Answering (SQA) dataset | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb)| |[Evaluate TAPAS on Table Fact Checking (TabFact)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb) | How to evaluate a fine-tuned *TapasForSequenceClassification* with a *tapas-base-finetuned-tabfact* checkpoint using a combination of the ๐Ÿค— datasets and ๐Ÿค— transformers libraries | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb)| |[Fine-tuning mBART for translation](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb) | How to fine-tune mBART using Seq2SeqTrainer for Hindi to English translation | [Vasudev Gupta](https://github.com/vasudevgupta7) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb)| diff --git a/docs/source/en/conversations.md b/docs/source/en/conversations.md new file mode 100644 index 00000000000000..a48c046b4949d7 --- /dev/null +++ b/docs/source/en/conversations.md @@ -0,0 +1,290 @@ + + +# Chatting with Transformers + +If you're reading this article, you're almost certainly aware of **chat models**. Chat models are conversational +AIs that you can send and receive messages with. The most famous of these is the proprietary ChatGPT, but there are +now many open-source chat models which match or even substantially exceed its performance. These models are free to +download and run on a local machine. Although the largest and most capable models require high-powered hardware +and lots of memory to run, there are smaller models that will run perfectly well on a single consumer GPU, or even +an ordinary desktop or notebook CPU. + +This guide will help you get started with chat models. We'll start with a brief quickstart guide that uses a convenient, +high-level "pipeline". This is all you need if you just want to start running a chat model +immediately. After the quickstart, we'll move on to more detailed information about +what exactly chat models are, how to choose an appropriate one, and a low-level breakdown of each of the +steps involved in talking to a chat model. We'll also give some tips on optimizing the performance and memory usage +of your chat models. + + +## Quickstart + +If you have no time for details, here's the brief summary: Chat models continue chats. This means that you pass them +a conversation history, which can be as short as a single user message, and the model will continue the conversation +by adding its response. Let's see this in action. First, let's build a chat: + +```python +chat = [ + {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."}, + {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"} +] +``` + +Notice that in addition to the user's message, we added a **system** message at the start of the conversation. Not all +chat models support system messages, but when they do, they represent high-level directives about how the model +should behave in the conversation. You can use this to guide the model - whether you want short or long responses, +lighthearted or serious ones, and so on. If you want the model to do useful work instead of +practicing its improv routine, you can either omit the system message or try a terse one such as "You are a helpful and intelligent +AI assistant who responds to user queries." + +Once you have a chat, the quickest way to continue it is using the [`TextGenerationPipeline`]. +Let's see this in action with `LLaMA-3`. Note that `LLaMA-3` is a gated model, which means you will need to +[apply for access](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and log in with your Hugging Face +account to use it. We'll also use `device_map="auto"`, which will load the model on GPU if there's enough memory +for it, and set the dtype to `torch.bfloat16` to save memory: + +```python +import torch +from transformers import pipeline + +pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto") +response = pipe(chat, max_new_tokens=512) +print(response[0]['generated_text'][-1]['content']) +``` + +And you'll get: + +```text +(sigh) Oh boy, you're asking me for advice? You're gonna need a map, pal! Alright, +alright, I'll give you the lowdown. But don't say I didn't warn you, I'm a robot, not a tour guide! + +So, you wanna know what's fun to do in the Big Apple? Well, let me tell you, there's a million +things to do, but I'll give you the highlights. First off, you gotta see the sights: the Statue of +Liberty, Central Park, Times Square... you know, the usual tourist traps. But if you're lookin' for +something a little more... unusual, I'd recommend checkin' out the Museum of Modern Art. It's got +some wild stuff, like that Warhol guy's soup cans and all that jazz. + +And if you're feelin' adventurous, take a walk across the Brooklyn Bridge. Just watch out for +those pesky pigeons, they're like little feathered thieves! (laughs) Get it? Thieves? Ah, never mind. + +Now, if you're lookin' for some serious fun, hit up the comedy clubs in Greenwich Village. You might +even catch a glimpse of some up-and-coming comedians... or a bunch of wannabes tryin' to make it big. (winks) + +And finally, if you're feelin' like a real New Yorker, grab a slice of pizza from one of the many amazing +pizzerias around the city. Just don't try to order a "robot-sized" slice, trust me, it won't end well. (laughs) + +So, there you have it, pal! That's my expert advice on what to do in New York. Now, if you'll +excuse me, I've got some oil changes to attend to. (winks) +``` + +You can continue the chat by appending your own response to it. The +`response` object returned by the pipeline actually contains the entire chat so far, so we can simply append +a message and pass it back: + +```python +chat = response[0]['generated_text'] +chat.append( + {"role": "user", "content": "Wait, what's so wild about soup cans?"} +) +response = pipe(chat, max_new_tokens=512) +print(response[0]['generated_text'][-1]['content']) +``` + +And you'll get: + +```text +(laughs) Oh, you're killin' me, pal! You don't get it, do you? Warhol's soup cans are like, art, man! +It's like, he took something totally mundane, like a can of soup, and turned it into a masterpiece. It's +like, "Hey, look at me, I'm a can of soup, but I'm also a work of art!" +(sarcastically) Oh, yeah, real original, Andy. + +But, you know, back in the '60s, it was like, a big deal. People were all about challenging the +status quo, and Warhol was like, the king of that. He took the ordinary and made it extraordinary. +And, let me tell you, it was like, a real game-changer. I mean, who would've thought that a can of soup could be art? (laughs) + +But, hey, you're not alone, pal. I mean, I'm a robot, and even I don't get it. (winks) +But, hey, that's what makes art, art, right? (laughs) +``` + +The remainder of this tutorial will cover specific topics such +as performance and memory, or how to select a chat model for your needs. + +## Choosing a chat model + +There are an enormous number of different chat models available on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending), +and new users often feel very overwhelmed by the selection offered. Don't be, though! You really need to just focus on +two important considerations: +- The model's size, which will determine if you can fit it in memory and how quickly it will +run. +- The quality of the model's chat output. + +In general, these are correlated - bigger models tend to be +more capable, but even so there's a lot of variation at a given size point! + +### Size and model naming +The size of a model is easy to spot - it's the number in the model name, like "8B" or "70B". This is the number of +**parameters** in the model. Without quantization, you should expect to need about 2 bytes of memory per parameter. +This means that an "8B" model with 8 billion parameters will need about 16GB of memory just to fit the parameters, +plus a little extra for other overhead. It's a good fit for a high-end consumer GPU with 24GB of memory, such as a 3090 +or 4090. + +Some chat models are "Mixture of Experts" models. These may list their sizes in different ways, such as "8x7B" or +"141B-A35B". The numbers are a little fuzzier here, but in general you can read this as saying that the model +has approximately 56 (8x7) billion parameters in the first case, or 141 billion parameters in the second case. + +Note that it is very common to use quantization techniques to reduce the memory usage per parameter to 8 bits, 4 bits, +or even less. This topic is discussed in more detail in the [Memory considerations](#memory-considerations) section below. + +### But which chat model is best? +Even once you know the size of chat model you can run, there's still a lot of choice out there. One way to sift through +it all is to consult **leaderboards**. Two of the most popular leaderboards are the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) +and the [LMSys Chatbot Arena Leaderboard](https://chat.lmsys.org/?leaderboard). Note that the LMSys leaderboard +also includes proprietary models - look at the `licence` column to identify open-source ones that you can download, then +search for them on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending). + +### Specialist domains +Some models may be specialized for certain domains, such as medical or legal text, or non-English languages. +If you're working in these domains, you may find that a specialized model will give you big performance benefits. +Don't automatically assume that, though! Particularly when specialized models are smaller or older than the current +cutting-edge, a top-end general-purpose model may still outclass them. Thankfully, we are beginning to see +[domain-specific leaderboards](https://huggingface.co/blog/leaderboard-medicalllm) that should make it easier to locate +the best models for specialized domains. + +## What happens inside the pipeline? + +The quickstart above used a high-level pipeline to chat with a chat model, which is convenient, but not the +most flexible. Let's take a more low-level approach, to see each of the steps involved in chat. Let's start with +a code sample, and then break it down: + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +import torch + +# Prepare the input as before +chat = [ + {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."}, + {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"} +] + +# 1: Load the model and tokenizer +model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", torch_dtype=torch.bfloat16) +tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") + +# 2: Apply the chat template +formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) +print("Formatted chat:\n", formatted_chat) + +# 3: Tokenize the chat (This can be combined with the previous step using tokenize=True) +inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False) +# Move the tokenized inputs to the same device the model is on (GPU/CPU) +inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()} +print("Tokenized inputs:\n", inputs) + +# 4: Generate text from the model +outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1) +print("Generated tokens:\n", outputs) + +# 5: Decode the output back to a string +decoded_output = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True) +print("Decoded output:\n", decoded_output) +``` + +There's a lot in here, each piece of which could be its own document! Rather than going into too much detail, I'll cover +the broad ideas, and leave the details for the linked documents. The key steps are: + +1. [Models](https://huggingface.co/learn/nlp-course/en/chapter2/3) and [Tokenizers](https://huggingface.co/learn/nlp-course/en/chapter2/4?fw=pt) are loaded from the Hugging Face Hub. +2. The chat is formatted using the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) +3. The formatted chat is [tokenized](https://huggingface.co/learn/nlp-course/en/chapter2/4) using the tokenizer. +4. We [generate](https://huggingface.co/docs/transformers/en/llm_tutorial) a response from the model. +5. The tokens output by the model are decoded back to a string + +## Performance, memory and hardware + +You probably know by now that most machine learning tasks are run on GPUs. However, it is entirely possible +to generate text from a chat model or language model on a CPU, albeit somewhat more slowly. If you can fit +the model in GPU memory, though, this will usually be the preferable option. + +### Memory considerations + +By default, Hugging Face classes like [`TextGenerationPipeline`] or [`AutoModelForCausalLM`] will load the model in +`float32` precision. This means that it will need 4 bytes (32 bits) per parameter, so an "8B" model with 8 billion +parameters will need ~32GB of memory. However, this can be wasteful! Most modern language models are trained in +"bfloat16" precision, which uses only 2 bytes per parameter. If your hardware supports it (Nvidia 30xx/Axxx +or newer), you can load the model in `bfloat16` precision, using the `torch_dtype` argument as we did above. + +It is possible to go even lower than 16-bits using "quantization", a method to lossily compress model weights. This +allows each parameter to be squeezed down to 8 bits, 4 bits or even less. Note that, especially at 4 bits, +the model's outputs may be negatively affected, but often this is a tradeoff worth making to fit a larger and more +capable chat model in memory. Let's see this in action with `bitsandbytes`: + +```python +from transformers import AutoModelForCausalLM, BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can also try load_in_4bit +model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", quantization_config=quantization_config) +``` + +Or we can do the same thing using the `pipeline` API: + +```python +from transformers import pipeline, BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can also try load_in_4bit +pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", model_kwargs={"quantization_config": quantization_config}) +``` + +There are several other options for quantizing models besides `bitsandbytes` - please see the [Quantization guide](./quantization) +for more information. + +### Performance considerations + + + +For a more extensive guide on language model performance and optimization, check out [LLM Inference Optimization](./llm_optims) . + + + + +As a general rule, larger chat models will be slower in addition to requiring more memory. It's possible to be +more concrete about this, though: Generating text from a chat model is unusual in that it is bottlenecked by +**memory bandwidth** rather than compute power, because every active parameter must be read from memory for each +token that the model generates. This means that number of tokens per second you can generate from a chat +model is generally proportional to the total bandwidth of the memory it resides in, divided by the size of the model. + +In our quickstart example above, our model was ~16GB in size when loaded in `bfloat16` precision. +This means that 16GB must be read from memory for every token generated by the model. Total memory bandwidth can +vary from 20-100GB/sec for consumer CPUs to 200-900GB/sec for consumer GPUs, specialized CPUs like +Intel Xeon, AMD Threadripper/Epyc or high-end Apple silicon, and finally up to 2-3TB/sec for data center GPUs like +the Nvidia A100 or H100. This should give you a good idea of the generation speed you can expect from these different +hardware types. + +Therefore, if you want to improve the speed of text generation, the easiest solution is to either reduce the +size of the model in memory (usually by quantization), or get hardware with higher memory bandwidth. For advanced users, +several other techniques exist to get around this bandwidth bottleneck. The most common are variants on +[assisted generation](https://huggingface.co/blog/assisted-generation), also known as "speculative +sampling". These techniques try to guess multiple future tokens at once, often using a smaller "draft model", and then +confirm these generations with the chat model. If the guesses are validated by the chat model, more than one token can +be generated per forward pass, which greatly alleviates the bandwidth bottleneck and improves generation speed. + +Finally, we should also note the impact of "Mixture of Experts" (MoE) models here. Several popular chat models, +such as Mixtral, Qwen-MoE and DBRX, are MoE models. In these models, not every parameter is active for every token generated. +As a result, MoE models generally have much lower memory bandwidth requirements, even though their total size +can be quite large. They can therefore be several times faster than a normal "dense" model of the same size. However, +techniques like assisted generation are generally ineffective for these models because more parameters will become +active with each new speculated token, which will negate the bandwidth and speed benefits that the MoE architecture +provides. + diff --git a/docs/source/en/create_a_model.md b/docs/source/en/create_a_model.md index a70a734c2e3ffe..0ecc503df61533 100644 --- a/docs/source/en/create_a_model.md +++ b/docs/source/en/create_a_model.md @@ -87,7 +87,7 @@ DistilBertConfig { Pretrained model attributes can be modified in the [`~PretrainedConfig.from_pretrained`] function: ```py ->>> my_config = DistilBertConfig.from_pretrained("distilbert-base-uncased", activation="relu", attention_dropout=0.4) +>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4) ``` Once you are satisfied with your model configuration, you can save it with [`~PretrainedConfig.save_pretrained`]. Your configuration file is stored as a JSON file in the specified save directory: @@ -128,13 +128,13 @@ This creates a model with random values instead of pretrained weights. You won't Create a pretrained model with [`~PreTrainedModel.from_pretrained`]: ```py ->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased") +>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased") ``` When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by ๐Ÿค— Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you'd like: ```py ->>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config) +>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config) ``` @@ -152,13 +152,13 @@ This creates a model with random values instead of pretrained weights. You won't Create a pretrained model with [`~TFPreTrainedModel.from_pretrained`]: ```py ->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased") +>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased") ``` When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by ๐Ÿค— Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you'd like: ```py ->>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config) +>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased", config=my_config) ``` @@ -174,7 +174,7 @@ For example, [`DistilBertForSequenceClassification`] is a base DistilBERT model ```py >>> from transformers import DistilBertForSequenceClassification ->>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased") +>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the [`DistilBertForQuestionAnswering`] model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output. @@ -182,7 +182,7 @@ Easily reuse this checkpoint for another task by switching to a different model ```py >>> from transformers import DistilBertForQuestionAnswering ->>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased") +>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased") ``` @@ -191,7 +191,7 @@ For example, [`TFDistilBertForSequenceClassification`] is a base DistilBERT mode ```py >>> from transformers import TFDistilBertForSequenceClassification ->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased") +>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased") ``` Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the [`TFDistilBertForQuestionAnswering`] model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output. @@ -199,7 +199,7 @@ Easily reuse this checkpoint for another task by switching to a different model ```py >>> from transformers import TFDistilBertForQuestionAnswering ->>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased") +>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased") ``` @@ -232,7 +232,7 @@ It is important to remember the vocabulary from a custom tokenizer will be diffe ```py >>> from transformers import DistilBertTokenizer ->>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased") +>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert/distilbert-base-uncased") ``` Create a fast tokenizer with the [`DistilBertTokenizerFast`] class: @@ -240,7 +240,7 @@ Create a fast tokenizer with the [`DistilBertTokenizerFast`] class: ```py >>> from transformers import DistilBertTokenizerFast ->>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased") +>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased") ``` @@ -249,7 +249,7 @@ By default, [`AutoTokenizer`] will try to load a fast tokenizer. You can disable -## Image Processor +## Image processor An image processor processes vision inputs. It inherits from the base [`~image_processing_utils.ImageProcessingMixin`] class. @@ -311,7 +311,91 @@ ViTImageProcessor { } ``` -## Feature Extractor +## Backbone + +
+ +
+ +Computer vision models consist of a backbone, neck, and head. The backbone extracts features from an input image, the neck combines and enhances the extracted features, and the head is used for the main task (e.g., object detection). Start by initializing a backbone in the model config and specify whether you want to load pretrained weights or load randomly initialized weights. Then you can pass the model config to the model head. + +For example, to load a [ResNet](../model_doc/resnet) backbone into a [MaskFormer](../model_doc/maskformer) model with an instance segmentation head: + + + + +Set `use_pretrained_backbone=True` to load pretrained ResNet weights for the backbone. + +```py +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation + +config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True) # backbone and neck config +model = MaskFormerForInstanceSegmentation(config) # head +``` + + + + +Set `use_pretrained_backbone=False` to randomly initialize a ResNet backbone. + +```py +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation + +config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=False) # backbone and neck config +model = MaskFormerForInstanceSegmentation(config) # head +``` + +You could also load the backbone config separately and then pass it to the model config. + +```py +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, ResNetConfig + +backbone_config = ResNetConfig() +config = MaskFormerConfig(backbone_config=backbone_config) +model = MaskFormerForInstanceSegmentation(config) +``` + + + + +[timm](https://hf.co/docs/timm/index) models are loaded within a model with `use_timm_backbone=True` or with [`TimmBackbone`] and [`TimmBackboneConfig`]. + +Use `use_timm_backbone=True` and `use_pretrained_backbone=True` to load pretrained timm weights for the backbone. + +```python +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation + +config = MaskFormerConfig(backbone="resnet50", use_pretrained_backbone=True, use_timm_backbone=True) # backbone and neck config +model = MaskFormerForInstanceSegmentation(config) # head +``` + +Set `use_timm_backbone=True` and `use_pretrained_backbone=False` to load a randomly initialized timm backbone. + +```python +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation + +config = MaskFormerConfig(backbone="resnet50", use_pretrained_backbone=False, use_timm_backbone=True) # backbone and neck config +model = MaskFormerForInstanceSegmentation(config) # head +``` + +You could also load the backbone config and use it to create a `TimmBackbone` or pass it to the model config. Timm backbones will load pretrained weights by default. Set `use_pretrained_backbone=False` to load randomly initialized weights. + +```python +from transformers import TimmBackboneConfig, TimmBackbone + +backbone_config = TimmBackboneConfig("resnet50", use_pretrained_backbone=False) + +# Create a backbone class +backbone = TimmBackbone(config=backbone_config) + +# Create a model with a timm backbone +from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation + +config = MaskFormerConfig(backbone_config=backbone_config) +model = MaskFormerForInstanceSegmentation(config) +``` + +## Feature extractor A feature extractor processes audio inputs. It inherits from the base [`~feature_extraction_utils.FeatureExtractionMixin`] class, and may also inherit from the [`SequenceFeatureExtractor`] class for processing audio inputs. @@ -357,7 +441,6 @@ Wav2Vec2FeatureExtractor { } ``` - ## Processor For models that support multimodal tasks, ๐Ÿค— Transformers offers a processor class that conveniently wraps processing classes such as a feature extractor and a tokenizer into a single object. For example, let's use the [`Wav2Vec2Processor`] for an automatic speech recognition task (ASR). ASR transcribes audio to text, so you will need a feature extractor and a tokenizer. diff --git a/docs/source/en/custom_models.md b/docs/source/en/custom_models.md index 22ba58b9d9ddc4..3d43446a0cc1b2 100644 --- a/docs/source/en/custom_models.md +++ b/docs/source/en/custom_models.md @@ -34,6 +34,16 @@ Before we dive into the model, let's first write its configuration. The configur will contain all the necessary information to build the model. As we will see in the next section, the model can only take a `config` to be initialized, so we really need that object to be as complete as possible. + + +Models in the `transformers` library itself generally follow the convention that they accept a `config` object +in their `__init__` method, and then pass the whole `config` to sub-layers in the model, rather than breaking the +config object into multiple arguments that are all passed individually to sub-layers. Writing your model in this +style results in simpler code with a clear "source of truth" for any hyperparameters, and also makes it easier +to reuse code from other models in `transformers`. + + + In our example, we will take a couple of arguments of the ResNet class that we might want to tweak. Different configurations will then give us the different types of ResNets that are possible. We then just store those arguments, after checking the validity of a few of them. @@ -300,7 +310,7 @@ Use `register_for_auto_class()` if you want the code files to be copied. If you you don't need to call it. In cases where there's more than one auto class, you can modify the `config.json` directly using the following structure: -``` +```json "auto_map": { "AutoConfig": "--", "AutoModel": "--", diff --git a/docs/source/en/custom_tools.md b/docs/source/en/custom_tools.md deleted file mode 100644 index 86183a80752e76..00000000000000 --- a/docs/source/en/custom_tools.md +++ /dev/null @@ -1,789 +0,0 @@ - - -# Custom Tools and Prompts - - - -If you are not aware of what tools and agents are in the context of transformers, we recommend you read the -[Transformers Agents](transformers_agents) page first. - - - - - -Transformers Agents is an experimental API that is subject to change at any time. Results returned by the agents -can vary as the APIs or underlying models are prone to change. - - - -Creating and using custom tools and prompts is paramount to empowering the agent and having it perform new tasks. -In this guide we'll take a look at: - -- How to customize the prompt -- How to use custom tools -- How to create custom tools - -## Customizing the prompt - -As explained in [Transformers Agents](transformers_agents) agents can run in [`~Agent.run`] and [`~Agent.chat`] mode. -Both the `run` and `chat` modes underlie the same logic. The language model powering the agent is conditioned on a long -prompt and completes the prompt by generating the next tokens until the stop token is reached. -The only difference between the two modes is that during the `chat` mode the prompt is extended with -previous user inputs and model generations. This allows the agent to have access to past interactions, -seemingly giving the agent some kind of memory. - -### Structure of the prompt - -Let's take a closer look at how the prompt is structured to understand how it can be best customized. -The prompt is structured broadly into four parts. - -- 1. Introduction: how the agent should behave, explanation of the concept of tools. -- 2. Description of all the tools. This is defined by a `<>` token that is dynamically replaced at runtime with the tools defined/chosen by the user. -- 3. A set of examples of tasks and their solution -- 4. Current example, and request for solution. - -To better understand each part, let's look at a shortened version of how the `run` prompt can look like: - -````text -I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task. -[...] -You can print intermediate results if it makes sense to do so. - -Tools: -- document_qa: This is a tool that answers a question about a document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question. -- image_captioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to the caption and returns a text that contains the description in English. -[...] - -Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French." - -I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image. - -Answer: -```py -translated_question = translator(question=question, src_lang="French", tgt_lang="English") -print(f"The translated question is {translated_question}.") -answer = image_qa(image=image, question=translated_question) -print(f"The answer is {answer}") -``` - -Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner." - -I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. - -Answer: -```py -answer = document_qa(document, question="What is the oldest person?") -print(f"The answer is {answer}.") -image = image_generator("A banner showing " + answer) -``` - -[...] - -Task: "Draw me a picture of rivers and lakes" - -I will use the following -```` - -The introduction (the text before *"Tools:"*) explains precisely how the model shall behave and what it should do. -This part most likely does not need to be customized as the agent shall always behave the same way. - -The second part (the bullet points below *"Tools"*) is dynamically added upon calling `run` or `chat`. There are -exactly as many bullet points as there are tools in `agent.toolbox` and each bullet point consists of the name -and description of the tool: - -```text -- : -``` - -Let's verify this quickly by loading the document_qa tool and printing out the name and description. - -```py -from transformers import load_tool - -document_qa = load_tool("document-question-answering") -print(f"- {document_qa.name}: {document_qa.description}") -``` - -which gives: -```text -- document_qa: This is a tool that answers a question about a document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question. -``` - -We can see that the tool name is short and precise. The description includes two parts, the first explaining -what the tool does and the second states what input arguments and return values are expected. - -A good tool name and tool description are very important for the agent to correctly use it. Note that the only -information the agent has about the tool is its name and description, so one should make sure that both -are precisely written and match the style of the existing tools in the toolbox. In particular make sure the description -mentions all the arguments expected by name in code-style, along with the expected type and a description of what they -are. - - - -Check the naming and description of the curated Transformers tools to better understand what name and -description a tool is expected to have. You can see all tools with the [`Agent.toolbox`] property. - - - -The third part includes a set of curated examples that show the agent exactly what code it should produce -for what kind of user request. The large language models empowering the agent are extremely good at -recognizing patterns in a prompt and repeating the pattern with new data. Therefore, it is very important -that the examples are written in a way that maximizes the likelihood of the agent to generating correct, -executable code in practice. - -Let's have a look at one example: - -````text -Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner." - -I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. - -Answer: -```py -answer = document_qa(document, question="What is the oldest person?") -print(f"The answer is {answer}.") -image = image_generator("A banner showing " + answer) -``` - -```` - -The pattern the model is prompted to repeat has three parts: The task statement, the agent's explanation of -what it intends to do, and finally the generated code. Every example that is part of the prompt has this exact -pattern, thus making sure that the agent will reproduce exactly the same pattern when generating new tokens. - -The prompt examples are curated by the Transformers team and rigorously evaluated on a set of -[problem statements](https://github.com/huggingface/transformers/blob/main/src/transformers/tools/evaluate_agent.py) -to ensure that the agent's prompt is as good as possible to solve real use cases of the agent. - -The final part of the prompt corresponds to: -```text -Task: "Draw me a picture of rivers and lakes" - -I will use the following -``` - -is a final and unfinished example that the agent is tasked to complete. The unfinished example -is dynamically created based on the actual user input. For the above example, the user ran: - -```py -agent.run("Draw me a picture of rivers and lakes") -``` - -The user input - *a.k.a* the task: *"Draw me a picture of rivers and lakes"* is cast into the -prompt template: "Task: \n\n I will use the following". This sentence makes up the final lines of the -prompt the agent is conditioned on, therefore strongly influencing the agent to finish the example -exactly in the same way it was previously done in the examples. - -Without going into too much detail, the chat template has the same prompt structure with the -examples having a slightly different style, *e.g.*: - -````text -[...] - -===== - -Human: Answer the question in the variable `question` about the image stored in the variable `image`. - -Assistant: I will use the tool `image_qa` to answer the question on the input image. - -```py -answer = image_qa(text=question, image=image) -print(f"The answer is {answer}") -``` - -Human: I tried this code, it worked but didn't give me a good result. The question is in French - -Assistant: In this case, the question needs to be translated first. I will use the tool `translator` to do this. - -```py -translated_question = translator(question=question, src_lang="French", tgt_lang="English") -print(f"The translated question is {translated_question}.") -answer = image_qa(text=translated_question, image=image) -print(f"The answer is {answer}") -``` - -===== - -[...] -```` - -Contrary, to the examples of the `run` prompt, each `chat` prompt example has one or more exchanges between the -*Human* and the *Assistant*. Every exchange is structured similarly to the example of the `run` prompt. -The user's input is appended to behind *Human:* and the agent is prompted to first generate what needs to be done -before generating code. An exchange can be based on previous exchanges, therefore allowing the user to refer -to past exchanges as is done *e.g.* above by the user's input of "I tried **this** code" refers to the -previously generated code of the agent. - -Upon running `.chat`, the user's input or *task* is cast into an unfinished example of the form: -```text -Human: \n\nAssistant: -``` -which the agent completes. Contrary to the `run` command, the `chat` command then appends the completed example -to the prompt, thus giving the agent more context for the next `chat` turn. - -Great now that we know how the prompt is structured, let's see how we can customize it! - -### Writing good user inputs - -While large language models are getting better and better at understanding users' intentions, it helps -enormously to be as precise as possible to help the agent pick the correct task. What does it mean to be -as precise as possible? - -The agent sees a list of tool names and their description in its prompt. The more tools are added the -more difficult it becomes for the agent to choose the correct tool and it's even more difficult to choose -the correct sequences of tools to run. Let's look at a common failure case, here we will only return -the code to analyze it. - -```py -from transformers import HfAgent - -agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder") - -agent.run("Show me a tree", return_code=True) -``` - -gives: - -```text -==Explanation from the agent== -I will use the following tool: `image_segmenter` to create a segmentation mask for the image. - - -==Code generated by the agent== -mask = image_segmenter(image, prompt="tree") -``` - -which is probably not what we wanted. Instead, it is more likely that we want an image of a tree to be generated. -To steer the agent more towards using a specific tool it can therefore be very helpful to use important keywords that -are present in the tool's name and description. Let's have a look. -```py -agent.toolbox["image_generator"].description -``` - -```text -'This is a tool that creates an image according to a prompt, which is a text description. It takes an input named `prompt` which contains the image description and outputs an image. -``` - -The name and description make use of the keywords "image", "prompt", "create" and "generate". Using these words will most likely work better here. Let's refine our prompt a bit. - -```py -agent.run("Create an image of a tree", return_code=True) -``` - -gives: -```text -==Explanation from the agent== -I will use the following tool `image_generator` to generate an image of a tree. - - -==Code generated by the agent== -image = image_generator(prompt="tree") -``` - -Much better! That looks more like what we want. In short, when you notice that the agent struggles to -correctly map your task to the correct tools, try looking up the most pertinent keywords of the tool's name -and description and try refining your task request with it. - -### Customizing the tool descriptions - -As we've seen before the agent has access to each of the tools' names and descriptions. The base tools -should have very precise names and descriptions, however, you might find that it could help to change the -the description or name of a tool for your specific use case. This might become especially important -when you've added multiple tools that are very similar or if you want to use your agent only for a certain -domain, *e.g.* image generation and transformations. - -A common problem is that the agent confuses image generation with image transformation/modification when -used a lot for image generation tasks, *e.g.* -```py -agent.run("Make an image of a house and a car", return_code=True) -``` -returns -```text -==Explanation from the agent== -I will use the following tools `image_generator` to generate an image of a house and `image_transformer` to transform the image of a car into the image of a house. - -==Code generated by the agent== -house_image = image_generator(prompt="A house") -car_image = image_generator(prompt="A car") -house_car_image = image_transformer(image=car_image, prompt="A house") -``` - -which is probably not exactly what we want here. It seems like the agent has a difficult time -to understand the difference between `image_generator` and `image_transformer` and often uses the two together. - -We can help the agent here by changing the tool name and description of `image_transformer`. Let's instead call it `modifier` -to disassociate it a bit from "image" and "prompt": -```py -agent.toolbox["modifier"] = agent.toolbox.pop("image_transformer") -agent.toolbox["modifier"].description = agent.toolbox["modifier"].description.replace( - "transforms an image according to a prompt", "modifies an image" -) -``` - -Now "modify" is a strong cue to use the new image processor which should help with the above prompt. Let's run it again. - -```py -agent.run("Make an image of a house and a car", return_code=True) -``` - -Now we're getting: -```text -==Explanation from the agent== -I will use the following tools: `image_generator` to generate an image of a house, then `image_generator` to generate an image of a car. - - -==Code generated by the agent== -house_image = image_generator(prompt="A house") -car_image = image_generator(prompt="A car") -``` - -which is definitely closer to what we had in mind! However, we want to have both the house and car in the same image. Steering the task more toward single image generation should help: - -```py -agent.run("Create image: 'A house and car'", return_code=True) -``` - -```text -==Explanation from the agent== -I will use the following tool: `image_generator` to generate an image. - - -==Code generated by the agent== -image = image_generator(prompt="A house and car") -``` - - - -Agents are still brittle for many use cases, especially when it comes to -slightly more complex use cases like generating an image of multiple objects. -Both the agent itself and the underlying prompt will be further improved in the coming -months making sure that agents become more robust to a variety of user inputs. - - - -### Customizing the whole prompt - -To give the user maximum flexibility, the whole prompt template as explained in [above](#structure-of-the-prompt) -can be overwritten by the user. In this case make sure that your custom prompt includes an introduction section, -a tool section, an example section, and an unfinished example section. If you want to overwrite the `run` prompt template, -you can do as follows: - -```py -template = """ [...] """ - -agent = HfAgent(your_endpoint, run_prompt_template=template) -``` - - - -Please make sure to have the `<>` string and the `<>` defined somewhere in the `template` so that the agent can be aware -of the tools, it has available to it as well as correctly insert the user's prompt. - - - -Similarly, one can overwrite the `chat` prompt template. Note that the `chat` mode always uses the following format for the exchanges: -```text -Human: <> - -Assistant: -``` - -Therefore it is important that the examples of the custom `chat` prompt template also make use of this format. -You can overwrite the `chat` template at instantiation as follows. - -``` -template = """ [...] """ - -agent = HfAgent(url_endpoint=your_endpoint, chat_prompt_template=template) -``` - - - -Please make sure to have the `<>` string defined somewhere in the `template` so that the agent can be aware -of the tools, it has available to it. - - - -In both cases, you can pass a repo ID instead of the prompt template if you would like to use a template hosted by someone in the community. The default prompts live in [this repo](https://huggingface.co/datasets/huggingface-tools/default-prompts) as an example. - -To upload your custom prompt on a repo on the Hub and share it with the community just make sure: -- to use a dataset repository -- to put the prompt template for the `run` command in a file named `run_prompt_template.txt` -- to put the prompt template for the `chat` command in a file named `chat_prompt_template.txt` - -## Using custom tools - -In this section, we'll be leveraging two existing custom tools that are specific to image generation: - -- We replace [huggingface-tools/image-transformation](https://huggingface.co/spaces/huggingface-tools/image-transformation), - with [diffusers/controlnet-canny-tool](https://huggingface.co/spaces/diffusers/controlnet-canny-tool) - to allow for more image modifications. -- We add a new tool for image upscaling to the default toolbox: - [diffusers/latent-upscaler-tool](https://huggingface.co/spaces/diffusers/latent-upscaler-tool) replace the existing image-transformation tool. - -We'll start by loading the custom tools with the convenient [`load_tool`] function: - -```py -from transformers import load_tool - -controlnet_transformer = load_tool("diffusers/controlnet-canny-tool") -upscaler = load_tool("diffusers/latent-upscaler-tool") -``` - -Upon adding custom tools to an agent, the tools' descriptions and names are automatically -included in the agents' prompts. Thus, it is imperative that custom tools have -a well-written description and name in order for the agent to understand how to use them. -Let's take a look at the description and name of `controlnet_transformer`: - -```py -print(f"Description: '{controlnet_transformer.description}'") -print(f"Name: '{controlnet_transformer.name}'") -``` - -gives -```text -Description: 'This is a tool that transforms an image with ControlNet according to a prompt. -It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. It returns the modified image.' -Name: 'image_transformer' -``` - -The name and description are accurate and fit the style of the [curated set of tools](./transformers_agents#a-curated-set-of-tools). -Next, let's instantiate an agent with `controlnet_transformer` and `upscaler`: - -```py -tools = [controlnet_transformer, upscaler] -agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=tools) -``` - -This command should give you the following info: - -```text -image_transformer has been replaced by as provided in `additional_tools` -``` - -The set of curated tools already has an `image_transformer` tool which is hereby replaced with our custom tool. - - - -Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool -because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API -as the overwritten tool in this case, or you should adapt the prompt template to make sure all examples using that -tool are updated. - - - -The upscaler tool was given the name `image_upscaler` which is not yet present in the default toolbox and is therefore simply added to the list of tools. -You can always have a look at the toolbox that is currently available to the agent via the `agent.toolbox` attribute: - -```py -print("\n".join([f"- {a}" for a in agent.toolbox.keys()])) -``` - -```text -- document_qa -- image_captioner -- image_qa -- image_segmenter -- transcriber -- summarizer -- text_classifier -- text_qa -- text_reader -- translator -- image_transformer -- text_downloader -- image_generator -- video_generator -- image_upscaler -``` - -Note how `image_upscaler` is now part of the agents' toolbox. - -Let's now try out the new tools! We will re-use the image we generated in [Transformers Agents Quickstart](./transformers_agents#single-execution-run). - -```py -from diffusers.utils import load_image - -image = load_image( - "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" -) -``` - - - -Let's transform the image into a beautiful winter landscape: - -```py -image = agent.run("Transform the image: 'A frozen lake and snowy forest'", image=image) -``` - -```text -==Explanation from the agent== -I will use the following tool: `image_transformer` to transform the image. - - -==Code generated by the agent== -image = image_transformer(image, prompt="A frozen lake and snowy forest") -``` - - - -The new image processing tool is based on ControlNet which can make very strong modifications to the image. -By default the image processing tool returns an image of size 512x512 pixels. Let's see if we can upscale it. - -```py -image = agent.run("Upscale the image", image) -``` - -```text -==Explanation from the agent== -I will use the following tool: `image_upscaler` to upscale the image. - - -==Code generated by the agent== -upscaled_image = image_upscaler(image) -``` - - - -The agent automatically mapped our prompt "Upscale the image" to the just added upscaler tool purely based on the description and name of the upscaler tool -and was able to correctly run it. - -Next, let's have a look at how you can create a new custom tool. - -### Adding new tools - -In this section, we show how to create a new tool that can be added to the agent. - -#### Creating a new tool - -We'll first start by creating a tool. We'll add the not-so-useful yet fun task of fetching the model on the Hugging Face -Hub with the most downloads for a given task. - -We can do that with the following code: - -```python -from huggingface_hub import list_models - -task = "text-classification" - -model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) -print(model.id) -``` - -For the task `text-classification`, this returns `'facebook/bart-large-mnli'`, for `translation` it returns `'t5-base`. - -How do we convert this to a tool that the agent can leverage? All tools depend on the superclass `Tool` that holds the -main attributes necessary. We'll create a class that inherits from it: - -```python -from transformers import Tool - - -class HFModelDownloadsTool(Tool): - pass -``` - -This class has a few needs: -- An attribute `name`, which corresponds to the name of the tool itself. To be in tune with other tools which have a - performative name, we'll name it `model_download_counter`. -- An attribute `description`, which will be used to populate the prompt of the agent. -- `inputs` and `outputs` attributes. Defining this will help the python interpreter make educated choices about types, - and will allow for a gradio-demo to be spawned when we push our tool to the Hub. They're both a list of expected - values, which can be `text`, `image`, or `audio`. -- A `__call__` method which contains the inference code. This is the code we've played with above! - -Here's what our class looks like now: - -```python -from transformers import Tool -from huggingface_hub import list_models - - -class HFModelDownloadsTool(Tool): - name = "model_download_counter" - description = ( - "This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. " - "It takes the name of the category (such as text-classification, depth-estimation, etc), and " - "returns the name of the checkpoint." - ) - - inputs = ["text"] - outputs = ["text"] - - def __call__(self, task: str): - model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) - return model.id -``` - -We now have our tool handy. Save it in a file and import it from your main script. Let's name this file -`model_downloads.py`, so the resulting import code looks like this: - -```python -from model_downloads import HFModelDownloadsTool - -tool = HFModelDownloadsTool() -``` - -In order to let others benefit from it and for simpler initialization, we recommend pushing it to the Hub under your -namespace. To do so, just call `push_to_hub` on the `tool` variable: - -```python -tool.push_to_hub("hf-model-downloads") -``` - -You now have your code on the Hub! Let's take a look at the final step, which is to have the agent use it. - -#### Having the agent use the tool - -We now have our tool that lives on the Hub which can be instantiated as such (change the user name for your tool): - -```python -from transformers import load_tool - -tool = load_tool("lysandre/hf-model-downloads") -``` - -In order to use it in the agent, simply pass it in the `additional_tools` parameter of the agent initialization method: - -```python -from transformers import HfAgent - -agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool]) - -agent.run( - "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?" -) -``` -which outputs the following: -```text -==Code generated by the agent== -model = model_download_counter(task="text-to-video") -print(f"The model with the most downloads is {model}.") -audio_model = text_reader(model) - - -==Result== -The model with the most downloads is damo-vilab/text-to-video-ms-1.7b. -``` - -and generates the following audio. - -| **Audio** | -|------------------------------------------------------------------------------------------------------------------------------------------------------| -|