Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase #3

Merged
merged 12 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/ko/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -358,8 +358,8 @@
title: (번역중) CodeGen
- local: model_doc/cohere
title: Cohere
- local: in_translation
title: (번역중) ConvBERT
- local: model_doc/convbert
title: ConvBERT
- local: in_translation
title: (번역중) CPM
- local: in_translation
Expand Down
135 changes: 135 additions & 0 deletions docs/source/ko/model_doc/convbert.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# ConvBERT [[convbert]]

<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=convbert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-convbert-blueviolet">
</a>
<a href="https://huggingface.co/spaces/docs-demos/conv-bert-base">
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
</a>
</div>

## 개요 [[overview]]

ConvBERT 모델은 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan에 의해 제안되었으며, 제안 논문 제목은 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496)입니다.

논문의 초록은 다음과 같습니다:

*BERT와 그 변형 모델과 같은 사전 학습된 언어 모델들은 최근 다양한 자연어 이해 과제에서 놀라운 성과를 이루었습니다. 그러나 BERT는 글로벌 셀프 어텐션 블록에 크게 의존하기 때문에 메모리 사용량이 많고 계산 비용이 큽니다. 모든 어텐션 헤드가 글로벌 관점에서 어텐션 맵을 생성하기 위해 입력 시퀀스 전체를 탐색하지만, 일부 헤드는 로컬 종속성만 학습할 필요가 있다는 것을 발견했습니다. 이는 불필요한 계산이 포함되어 있음을 의미합니다. 따라서 우리는 이러한 self-attention 헤드들을 대체하여 로컬 종속성을 직접 모델링하기 위해 새로운 span 기반 동적 컨볼루션을 제안합니다. 새로운 컨볼루션 헤드와 나머지 self-attention 헤드들이 결합하여 글로벌 및 로컬 문맥 학습에 더 효율적인 혼합 어텐션 블록을 구성합니다. 우리는 BERT에 이 혼합 어텐션 설계를 적용하여 ConvBERT 모델을 구축했습니다. 실험 결과, ConvBERT는 다양한 다운스트림 과제에서 BERT 및 그 변형 모델보다 더 우수한 성능을 보였으며, 훈련 비용과 모델 파라미터 수가 더 적었습니다. 특히 ConvBERTbase 모델은 GLUE 스코어 86.4를 달성하여 ELECTRAbase보다 0.7 높은 성과를 보이며, 훈련 비용은 1/4 이하로 줄었습니다. 코드와 사전 학습된 모델은 공개될 예정입니다.*

이 모델은 [abhishek](https://huggingface.co/abhishek)에 의해 기여되었으며, 원본 구현은 여기에서 찾을 수 있습니다 : https://github.com/yitu-opensource/ConvBert



## 사용 팁 [[usage-tips]]
ConvBERT 훈련 팁은 BERT와 유사합니다. 사용 팁은 [BERT 문서](bert).를 참고하십시오.


## 리소스 [[resources]]

- [텍스트 분류 작업 가이드 (Text classification task guide)](../tasks/sequence_classification)
- [토큰 분류 작업 가이드 (Token classification task guide)](../tasks/token_classification)
- [질의응답 작업 가이드 (Question answering task guide)](../tasks/question_answering)
- [마스킹된 언어 모델링 작업 가이드 (Masked language modeling task guide)](../tasks/masked_language_modeling)
- [다중 선택 작업 가이드 (Multiple choice task guide)](../tasks/multiple_choice)

## ConvBertConfig [[transformers.ConvBertConfig]]

[[autodoc]] ConvBertConfig

## ConvBertTokenizer [[transformers.ConvBertTokenizer]]

[[autodoc]] ConvBertTokenizer
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- save_vocabulary

## ConvBertTokenizerFast [[transformers.ConvBertTokenizerFast]]

[[autodoc]] ConvBertTokenizerFast

<frameworkcontent>
<pt>

## ConvBertModel [[transformers.ConvBertModel]]

[[autodoc]] ConvBertModel
- forward

## ConvBertForMaskedLM [[transformers.ConvBertForMaskedLM]]

[[autodoc]] ConvBertForMaskedLM
- forward

## ConvBertForSequenceClassification [[transformers.ConvBertForSequenceClassification]]

[[autodoc]] ConvBertForSequenceClassification
- forward

## ConvBertForMultipleChoice [[transformers.ConvBertForMultipleChoice]]

[[autodoc]] ConvBertForMultipleChoice
- forward

## ConvBertForTokenClassification [[transformers.ConvBertForTokenClassification]]

[[autodoc]] ConvBertForTokenClassification
- forward

## ConvBertForQuestionAnswering [[transformers.ConvBertForQuestionAnswering]]

[[autodoc]] ConvBertForQuestionAnswering
- forward

</pt>
<tf>

## TFConvBertModel [[transformers.TFConvBertModel]]

[[autodoc]] TFConvBertModel
- call

## TFConvBertForMaskedLM [[transformers.TFConvBertForMaskedLM]]

[[autodoc]] TFConvBertForMaskedLM
- call

## TFConvBertForSequenceClassification [[transformers.TFConvBertForSequenceClassification]]

[[autodoc]] TFConvBertForSequenceClassification
- call

## TFConvBertForMultipleChoice [[transformers.TFConvBertForMultipleChoice]]

[[autodoc]] TFConvBertForMultipleChoice
- call

## TFConvBertForTokenClassification [[transformers.TFConvBertForTokenClassification]]

[[autodoc]] TFConvBertForTokenClassification
- call

## TFConvBertForQuestionAnswering [[transformers.TFConvBertForQuestionAnswering]]

[[autodoc]] TFConvBertForQuestionAnswering
- call

</tf>
</frameworkcontent>
35 changes: 0 additions & 35 deletions examples/pytorch/contrastive-image-text/run_clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,6 @@ class DataTrainingArguments:
default=None,
metadata={"help": "An optional input evaluation data file (a jsonlines file)."},
)
test_file: Optional[str] = field(
default=None,
metadata={"help": "An optional input testing data file (a jsonlines file)."},
)
max_seq_length: Optional[int] = field(
default=128,
metadata={
Expand Down Expand Up @@ -190,9 +186,6 @@ def __post_init__(self):
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
if self.test_file is not None:
extension = self.test_file.split(".")[-1]
assert extension in ["csv", "json"], "`test_file` should be a csv or a json file."


dataset_name_mapping = {
Expand Down Expand Up @@ -315,9 +308,6 @@ def main():
if data_args.validation_file is not None:
data_files["validation"] = data_args.validation_file
extension = data_args.validation_file.split(".")[-1]
if data_args.test_file is not None:
data_files["test"] = data_args.test_file
extension = data_args.test_file.split(".")[-1]
dataset = load_dataset(
extension,
data_files=data_files,
Expand Down Expand Up @@ -387,8 +377,6 @@ def _freeze_params(module):
column_names = dataset["train"].column_names
elif training_args.do_eval:
column_names = dataset["validation"].column_names
elif training_args.do_predict:
column_names = dataset["test"].column_names
else:
logger.info("There is nothing to do. Please pass `do_train`, `do_eval` and/or `do_predict`.")
return
Expand Down Expand Up @@ -490,29 +478,6 @@ def filter_corrupt_images(examples):
# Transform images on the fly as doing it on the whole dataset takes too much time.
eval_dataset.set_transform(transform_images)

if training_args.do_predict:
if "test" not in dataset:
raise ValueError("--do_predict requires a test dataset")
test_dataset = dataset["test"]
if data_args.max_eval_samples is not None:
max_eval_samples = min(len(test_dataset), data_args.max_eval_samples)
test_dataset = test_dataset.select(range(max_eval_samples))

test_dataset = test_dataset.filter(
filter_corrupt_images, batched=True, num_proc=data_args.preprocessing_num_workers
)
test_dataset = test_dataset.map(
function=tokenize_captions,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=[col for col in column_names if col != image_column],
load_from_cache_file=not data_args.overwrite_cache,
desc="Running tokenizer on test dataset",
)

# Transform images on the fly as doing it on the whole dataset takes too much time.
test_dataset.set_transform(transform_images)

# 8. Initialize our trainer
trainer = Trainer(
model=model,
Expand Down
21 changes: 17 additions & 4 deletions src/transformers/configuration_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,8 @@ class PretrainedConfig(PushToHubMixin):
"""

model_type: str = ""
base_config_key: str = ""
sub_configs: Dict[str, "PretrainedConfig"] = {}
is_composition: bool = False
attribute_map: Dict[str, str] = {}
_auto_class: Optional[str] = None
Expand Down Expand Up @@ -543,11 +545,22 @@ def from_pretrained(
cls._set_token_in_kwargs(kwargs, token)

config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
if cls.base_config_key and cls.base_config_key in config_dict:
config_dict = config_dict[cls.base_config_key]

if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
logger.warning(
f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
)
# sometimes the config has no `base_config_key` if the config is used in several composite models
# e.g. LlamaConfig. In that case we try to see if there is match in `model_type` before raising a warning
for k, v in config_dict.items():
if isinstance(v, dict) and v.get("model_type") == cls.model_type:
config_dict = v

# raise warning only if we still can't see a match in `model_type`
if config_dict["model_type"] != cls.model_type:
logger.warning(
f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
)

return cls.from_dict(config_dict, **kwargs)

Expand Down
9 changes: 4 additions & 5 deletions src/transformers/generation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1452,11 +1452,10 @@ def _prepare_generated_length(
):
generation_config.max_length -= inputs_tensor.shape[1]
elif has_default_max_length: # by default let's always generate 20 new tokens
if generation_config.max_length == GenerationConfig().max_length:
generation_config.max_length = generation_config.max_length + input_ids_length
max_position_embeddings = getattr(self.config, "max_position_embeddings", None)
if max_position_embeddings is not None:
generation_config.max_length = min(generation_config.max_length, max_position_embeddings)
generation_config.max_length = generation_config.max_length + input_ids_length
max_position_embeddings = getattr(self.config, "max_position_embeddings", None)
if max_position_embeddings is not None:
generation_config.max_length = min(generation_config.max_length, max_position_embeddings)

# same for min length
if generation_config.min_new_tokens is not None:
Expand Down
11 changes: 11 additions & 0 deletions src/transformers/modeling_gguf_pytorch_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,17 @@ def load_gguf_checkpoint(gguf_checkpoint_path, return_tensors=False):
if "qwen2moe" in architecture:
updated_architecture = "qwen2_moe"

# For stablelm architecture, we need to set qkv_bias and use_parallel_residual from tensors
# If `qkv_bias=True`, qkv_proj with bias will be present in the tensors
# If `use_parallel_residual=False`, ffn_norm will be present in the tensors
if "stablelm" in architecture:
attn_bias_name = {"attn_q.bias", "attn_k.bias", "attn_v.bias"}
ffn_norm_name = "ffn_norm"
qkv_bias = any(bias_name in tensor.name for tensor in reader.tensors for bias_name in attn_bias_name)
use_parallel_residual = any(ffn_norm_name in tensor.name for tensor in reader.tensors)
parsed_parameters["config"]["qkv_bias"] = qkv_bias
parsed_parameters["config"]["use_parallel_residual"] = not use_parallel_residual

model_size = ""
# extract the number of params from file name as architectures can differ ;
# eg. for falcon : `...falcon-7b-...`
Expand Down
33 changes: 23 additions & 10 deletions src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@


_init_weights = True
_is_quantized = False


def is_fsdp_enabled():
Expand Down Expand Up @@ -213,6 +214,16 @@ def _skip_init(*args, **kwargs):
setattr(torch.nn.init, name, init_func)


@contextmanager
def set_quantized_state():
global _is_quantized
_is_quantized = True
try:
yield
finally:
_is_quantized = False


def get_parameter_device(parameter: Union[nn.Module, "ModuleUtilsMixin"]):
try:
return next(parameter.parameters()).device
Expand Down Expand Up @@ -1531,7 +1542,7 @@ def _from_config(cls, config, **kwargs):
torch_dtype=torch_dtype,
)

if is_deepspeed_zero3_enabled():
if is_deepspeed_zero3_enabled() and not _is_quantized:
import deepspeed

logger.info("Detected DeepSpeed ZeRO-3: activating zero.init() for this model")
Expand Down Expand Up @@ -1597,15 +1608,14 @@ def _autoset_attn_implementation(
# Below we check if a config is composite and manually prepare a dict of attn impl if not already passed as a dict.
# Later each sub-module will dispatch with its own attn impl, by calling `XXXModel._from_config(config.text_config)`
# If any of sub-modules doesn't support requested attn, an error will be raised. See https://github.com/huggingface/transformers/pull/32238
for key in config:
if isinstance(getattr(config, key), PretrainedConfig):
sub_config = getattr(config, key)
curr_attn_implementation = (
requested_attn_implementation
if not isinstance(requested_attn_implementation, dict)
else requested_attn_implementation.get(key, None)
)
sub_config._attn_implementation_internal = curr_attn_implementation
for key in config.sub_configs.keys():
sub_config = getattr(config, key)
curr_attn_implementation = (
requested_attn_implementation
if not isinstance(requested_attn_implementation, dict)
else requested_attn_implementation.get(key, None)
)
sub_config._attn_implementation_internal = curr_attn_implementation

if use_flash_attention_2:
logger.warning_once(
Expand Down Expand Up @@ -4086,6 +4096,9 @@ def from_pretrained(
)
init_contexts.append(init_empty_weights())

if is_deepspeed_zero3_enabled() and is_quantized:
init_contexts.append(set_quantized_state())

config = copy.deepcopy(config) # We do not want to modify the config inplace in from_pretrained.
if not getattr(config, "_attn_implementation_autoset", False):
config = cls._autoset_attn_implementation(
Expand Down
Loading
Loading