Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TextNet #34979

Open
wants to merge 160 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
c63915f
WIP
raghavanone Oct 5, 2023
d8e1bc6
Add config and modeling for Fast model
raghavanone Oct 7, 2023
185603e
Refactor modeling and add tests
raghavanone Oct 8, 2023
5d21171
More changes
raghavanone Oct 11, 2023
a6e1cfd
WIP
raghavanone Oct 13, 2023
a8e4320
Add tests
raghavanone Oct 14, 2023
3b15aa9
Add conversion script
raghavanone Oct 15, 2023
c565cf3
Add conversion scripts, integration tests, image processor
raghavanone Oct 20, 2023
0457e74
Fix style and copies
raghavanone Oct 31, 2023
3fef261
Add fast model to init
raghavanone Oct 31, 2023
597abe1
Add fast model in docs and other places
raghavanone Oct 31, 2023
c3b43e7
Fix import of cv2
raghavanone Oct 31, 2023
4903a69
Rename image processing method
raghavanone Oct 31, 2023
c391cf6
Fix build
raghavanone Oct 31, 2023
d3bf608
Fix Build
raghavanone Nov 1, 2023
13ea2bb
fix style and fix copies
raghavanone Nov 1, 2023
1abfbc0
Fix build
raghavanone Nov 1, 2023
f85fbda
Fix build
raghavanone Nov 1, 2023
cd0b45f
Fix Build
raghavanone Nov 1, 2023
6005f2f
Clean up docstrings
raghavanone Nov 1, 2023
e56bff7
Fix Build
raghavanone Nov 1, 2023
ac672f3
Fix Build
raghavanone Nov 1, 2023
aa1cc41
Fix Build
raghavanone Nov 1, 2023
90e0cd8
Fix build
raghavanone Nov 1, 2023
c94fc70
Add test for image_processing_fast and add documentation tests
raghavanone Nov 1, 2023
47409eb
some refactorings
raghavanone Nov 1, 2023
6b787d6
Fix failing tests
raghavanone Nov 5, 2023
134f4cc
Incorporate PR feedbacks
raghavanone Nov 5, 2023
5b9608b
Incorporate PR feedbacks
raghavanone Nov 5, 2023
344dc6e
Incorporate PR feedbacks
raghavanone Nov 5, 2023
932d592
Incorporate PR feedbacks
raghavanone Nov 5, 2023
5f1af19
Incorporate PR feedbacks
raghavanone Nov 5, 2023
c9a3543
Introduce TextNet
raghavanone Nov 8, 2023
12941e6
Fix failures
raghavanone Nov 8, 2023
30568ef
Refactor textnet model
raghavanone Nov 8, 2023
1f99e84
Fix failures
raghavanone Nov 8, 2023
02e85ed
Add cv2 to setup
raghavanone Nov 8, 2023
632ef06
Fix failures
raghavanone Nov 8, 2023
8c25e47
Fix failures
raghavanone Nov 8, 2023
1537643
Add CV2 dependency
raghavanone Nov 8, 2023
9718ca1
Fix bugs
raghavanone Nov 8, 2023
26c7542
Fix build issue
raghavanone Nov 8, 2023
ed85312
Fix failures
raghavanone Nov 8, 2023
3f8be4d
Remove textnet from modeling fast
raghavanone Nov 9, 2023
45ebd1e
Fix build and other things
raghavanone Nov 9, 2023
5c6dbaf
Fix build
raghavanone Nov 9, 2023
643ccac
some cleanups
raghavanone Nov 9, 2023
d50df43
some cleanups
raghavanone Nov 9, 2023
6acd3ba
Some more cleanups
raghavanone Nov 9, 2023
85c128a
Fix build
raghavanone Nov 9, 2023
c22ba88
Incorporate PR feedbacks
raghavanone Nov 9, 2023
2ee0440
More cleanup
raghavanone Nov 9, 2023
04d761d
More cleanup
raghavanone Nov 9, 2023
2572461
More cleanup
raghavanone Nov 9, 2023
c576f4a
Fix build
raghavanone Nov 9, 2023
5d58c67
Remove all the references of fast model
raghavanone Nov 10, 2023
cbf6c81
More cleanup
raghavanone Nov 10, 2023
1db7bd9
Fix build
raghavanone Nov 10, 2023
bb4ac61
Incorporate PR feedbacks
raghavanone Nov 14, 2023
25b5063
Incorporate PR feedbacks
raghavanone Nov 14, 2023
268d3e8
Incorporate PR feedbacks
raghavanone Nov 14, 2023
9003563
Incorporate PR feedbacks
raghavanone Nov 14, 2023
5e2128c
Incorporate PR feedbacks
raghavanone Nov 15, 2023
aa3a8f0
Incorporate PR feedbacks
raghavanone Nov 16, 2023
0d518b0
Incorporate PR feedbacks
raghavanone Nov 16, 2023
89110a0
Incorporate PR feedbacks
raghavanone Nov 16, 2023
07ab3a2
Incorporate PR feedbacks
raghavanone Nov 16, 2023
66f9d5d
Incorporate PR feedbacks
raghavanone Nov 16, 2023
3c09b69
Fix Build
raghavanone Nov 16, 2023
1698739
Fix build
raghavanone Nov 16, 2023
6a47b12
Fix build
raghavanone Nov 16, 2023
744e157
Fix build
raghavanone Nov 16, 2023
93ad4a2
Fix build
raghavanone Nov 17, 2023
bb01d85
Fix build
raghavanone Nov 17, 2023
e9aafe9
Incorporate PR feedbacks
raghavanone Nov 21, 2023
c96a327
Fix style
raghavanone Nov 21, 2023
c1f33d5
Fix build
raghavanone Nov 21, 2023
532ea29
Incorporate PR feedbacks
raghavanone Nov 21, 2023
0662b12
Fix image processing mean and std
raghavanone Nov 21, 2023
b18822d
Incorporate PR feedbacks
raghavanone Nov 21, 2023
78afd4f
fix build failure
raghavanone Nov 21, 2023
7a59eb5
Add assertion to image processor
raghavanone Nov 22, 2023
e2ec8aa
Incorporate PR feedbacks
raghavanone Nov 23, 2023
1e4ed58
Incorporate PR feedbacks
raghavanone Nov 23, 2023
19368c7
fix style failures
raghavanone Nov 23, 2023
57fedc0
fix build
raghavanone Nov 23, 2023
dc0b360
Fix Imageclassification's linear layer, also introduce TextNetImagePr…
raghavanone Nov 24, 2023
6a6c45e
Fix build
raghavanone Nov 24, 2023
fca92a9
Fix build
raghavanone Nov 24, 2023
138e8e1
Fix build
raghavanone Nov 24, 2023
e3fe76e
Fix build
raghavanone Nov 24, 2023
1a286d5
Incorporate PR feedbacks
raghavanone Nov 25, 2023
e194612
Incorporate PR feedbacks
raghavanone Nov 25, 2023
74881cf
Fix build
raghavanone Nov 25, 2023
a3c9636
Incorporate PR feedbacks
raghavanone Nov 27, 2023
d40c57b
Remove some script
raghavanone Nov 27, 2023
66da2a0
Incorporate PR feedbacks
raghavanone Nov 27, 2023
44f7f90
Incorporate PR feedbacks
raghavanone Nov 27, 2023
f9da25c
Incorporate PR feedbacks
raghavanone Nov 27, 2023
f94fcec
Incorporate PR feedbacks
raghavanone Nov 27, 2023
ef4b3b0
Fix image processing in textnet
raghavanone Nov 27, 2023
bd09746
Incorporate PR Feedbacks
raghavanone Jan 10, 2024
a3427d7
Fix CI failures
raghavanone Jan 16, 2024
e3916ad
Fix failing test
raghavanone Jan 16, 2024
b46875c
Fix failing test
raghavanone Jan 16, 2024
20bb82d
Fix failing test
raghavanone Jan 16, 2024
344455c
Fix failing test
raghavanone Jan 16, 2024
28892e3
Fix failing test
raghavanone Jan 16, 2024
a7b238b
Fix failing test
raghavanone Jan 16, 2024
1bce14f
Add textnet to readme
raghavanone Jan 30, 2024
9a6efa0
Improve readability
raghavanone Jan 30, 2024
ea6c3d8
Incorporate PR feedbacks
raghavanone Feb 1, 2024
43247b6
fix code style
raghavanone Feb 1, 2024
c56a220
adding textnet
Nov 27, 2024
73e3718
fix key error and convert working
jadechoghari Nov 27, 2024
c22534a
tvlt shouldn't be here
jadechoghari Nov 27, 2024
35f2f56
fix test modeling test
jadechoghari Nov 28, 2024
222e20d
Fix tests, make fixup
NielsRogge Dec 3, 2024
253a70e
Merge remote-tracking branch 'upstream/main' into textnet
NielsRogge Dec 3, 2024
01e8729
Make fixup
NielsRogge Dec 3, 2024
15dc1a2
Make fixup
NielsRogge Dec 4, 2024
c155ce5
Merge remote-tracking branch 'upstream/main' into textnet
NielsRogge Dec 4, 2024
adbac84
Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST
NielsRogge Dec 4, 2024
77efc55
Merge branch 'main' into textnet
NielsRogge Dec 4, 2024
29a45fa
improve type annotation
jadechoghari Dec 4, 2024
3956844
Update tests/models/textnet/test_image_processing_textnet.py
jadechoghari Dec 4, 2024
2ce19c3
improve type annotation
jadechoghari Dec 4, 2024
01e838e
space typo
jadechoghari Dec 4, 2024
7d2bfb7
improve type annotation
jadechoghari Dec 4, 2024
2a12e66
Update src/transformers/models/textnet/configuration_textnet.py
jadechoghari Dec 4, 2024
5e234e1
make conv layer kernel sizes and strides default to None
Dec 5, 2024
a8ecd96
Update src/transformers/models/textnet/modeling_textnet.py
jadechoghari Dec 5, 2024
7988497
Update src/transformers/models/textnet/modeling_textnet.py
jadechoghari Dec 5, 2024
e457a59
fix keyword bug
jadechoghari Dec 5, 2024
596458a
add batch init and make fixup
jadechoghari Dec 5, 2024
c61704a
Merge branch 'main' into textnet
NielsRogge Dec 7, 2024
b0c4f60
Make fixup
NielsRogge Dec 7, 2024
4526c99
Update integration test
NielsRogge Dec 7, 2024
979bfae
Add figure
NielsRogge Dec 8, 2024
ebc7d00
Update textnet.md
jadechoghari Dec 8, 2024
ca8b7c0
add testing and fix errors (classification, imgprocess)
Dec 19, 2024
d2f917e
fix error check
jadechoghari Dec 19, 2024
297bcfc
make fixup
jadechoghari Dec 19, 2024
7194556
make fixup
jadechoghari Dec 19, 2024
a065bb7
revert to original docstring
jadechoghari Dec 19, 2024
a7bd6f3
add make style
jadechoghari Dec 19, 2024
7e52457
Merge branch 'main' into textnet
jadechoghari Dec 19, 2024
5493ade
remove conflict for now
jadechoghari Dec 19, 2024
c022a22
Update modeling_auto.py
jadechoghari Dec 19, 2024
5f29454
Update tests/models/textnet/test_modeling_textnet.py
jadechoghari Dec 21, 2024
ed36fef
Update src/transformers/models/textnet/modeling_textnet.py
jadechoghari Dec 21, 2024
5f2b970
Update tests/models/textnet/test_modeling_textnet.py
jadechoghari Dec 21, 2024
e8a5b97
Update src/transformers/models/textnet/modeling_textnet.py
jadechoghari Dec 21, 2024
29d969b
add changes
jadechoghari Dec 21, 2024
a4d14fe
Update textnet.md
jadechoghari Dec 21, 2024
b5a57c0
Merge branch 'main' into textnet
jadechoghari Dec 21, 2024
d0b7028
Merge branch 'main' into textnet
jadechoghari Dec 23, 2024
52bcd2f
add doc
jadechoghari Dec 23, 2024
ef8f800
Merge branch 'main' into textnet
jadechoghari Dec 24, 2024
200a03c
add authors hf ckpt + rename
jadechoghari Dec 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -719,6 +719,8 @@
title: Swin2SR
- local: model_doc/table-transformer
title: Table Transformer
- local: model_doc/textnet
title: TextNet
- local: model_doc/timm_wrapper
title: Timm Wrapper
- local: model_doc/upernet
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Table Transformer](model_doc/table-transformer) | ✅ | ❌ | ❌ |
| [TAPAS](model_doc/tapas) | ✅ | ✅ | ❌ |
| [TAPEX](model_doc/tapex) | ✅ | ✅ | ✅ |
| [TextNet](model_doc/textnet) | ✅ | ❌ | ❌ |
| [Time Series Transformer](model_doc/time_series_transformer) | ✅ | ❌ | ❌ |
| [TimeSformer](model_doc/timesformer) | ✅ | ❌ | ❌ |
| [TimmWrapperModel](model_doc/timm_wrapper) | ✅ | ❌ | ❌ |
Expand Down
49 changes: 49 additions & 0 deletions docs/source/en/model_doc/textnet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# TextNet

## Overview

The TextNet model was proposed in [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/abs/2111.02394) by Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu. TextNet is a vision backbone useful for text detection tasks. It is the result of neural architecture search (NAS) on backbones with reward function as text detection task (to provide powerful features for text detection).
jadechoghari marked this conversation as resolved.
Show resolved Hide resolved

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/fast_architecture.png"
alt="drawing" width="600"/>

<small> TextNet backbone as part of FAST. Taken from the <a href="https://arxiv.org/abs/2111.02394">original paper.</a> </small>

This model was contributed by [Raghavan](https://huggingface.co/Raghavan), [jadechoghari](https://huggingface.co/jadechoghari) and [nielsr](https://huggingface.co/nielsr).


## TextNetConfig

[[autodoc]] TextNetConfig

## TextNetImageProcessor

[[autodoc]] TextNetImageProcessor
- preprocess

## TextNetModel

[[autodoc]] TextNetModel
- forward

## TextNetForImageClassification

[[autodoc]] TextNetForImageClassification
- forward

18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,7 @@
"TapasConfig",
"TapasTokenizer",
],
"models.textnet": ["TextNetConfig"],
"models.time_series_transformer": ["TimeSeriesTransformerConfig"],
"models.timesformer": ["TimesformerConfig"],
"models.timm_backbone": ["TimmBackboneConfig"],
Expand Down Expand Up @@ -1257,6 +1258,7 @@
_import_structure["models.siglip"].append("SiglipImageProcessor")
_import_structure["models.superpoint"].extend(["SuperPointImageProcessor"])
_import_structure["models.swin2sr"].append("Swin2SRImageProcessor")
_import_structure["models.textnet"].extend(["TextNetImageProcessor"])
_import_structure["models.tvp"].append("TvpImageProcessor")
_import_structure["models.video_llava"].append("VideoLlavaImageProcessor")
_import_structure["models.videomae"].extend(["VideoMAEFeatureExtractor", "VideoMAEImageProcessor"])
Expand Down Expand Up @@ -3573,6 +3575,14 @@
"load_tf_weights_in_tapas",
]
)
_import_structure["models.textnet"].extend(
[
"TextNetBackbone",
"TextNetForImageClassification",
"TextNetModel",
"TextNetPreTrainedModel",
]
)
_import_structure["models.time_series_transformer"].extend(
[
"TimeSeriesTransformerForPrediction",
Expand Down Expand Up @@ -5801,6 +5811,7 @@
TapasConfig,
TapasTokenizer,
)
from .models.textnet import TextNetConfig
from .models.time_series_transformer import (
TimeSeriesTransformerConfig,
)
Expand Down Expand Up @@ -6281,6 +6292,7 @@
from .models.siglip import SiglipImageProcessor
from .models.superpoint import SuperPointImageProcessor
from .models.swin2sr import Swin2SRImageProcessor
from .models.textnet import TextNetImageProcessor
from .models.tvp import TvpImageProcessor
from .models.video_llava import VideoLlavaImageProcessor
from .models.videomae import VideoMAEFeatureExtractor, VideoMAEImageProcessor
Expand Down Expand Up @@ -8135,6 +8147,12 @@
TapasPreTrainedModel,
load_tf_weights_in_tapas,
)
from .models.textnet import (
TextNetBackbone,
TextNetForImageClassification,
TextNetModel,
TextNetPreTrainedModel,
)
from .models.time_series_transformer import (
TimeSeriesTransformerForPrediction,
TimeSeriesTransformerModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@
t5,
table_transformer,
tapas,
textnet,
time_series_transformer,
timesformer,
timm_backbone,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@
("t5", "T5Config"),
("table-transformer", "TableTransformerConfig"),
("tapas", "TapasConfig"),
("textnet", "TextNetConfig"),
("time_series_transformer", "TimeSeriesTransformerConfig"),
("timesformer", "TimesformerConfig"),
("timm_backbone", "TimmBackboneConfig"),
Expand Down Expand Up @@ -608,6 +609,7 @@
("table-transformer", "Table Transformer"),
("tapas", "TAPAS"),
("tapex", "TAPEX"),
("textnet", "TextNet"),
("time_series_transformer", "Time Series Transformer"),
("timesformer", "TimeSformer"),
("timm_backbone", "TimmBackbone"),
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@
("t5", "T5Model"),
("table-transformer", "TableTransformerModel"),
("tapas", "TapasModel"),
("textnet", "TextNetModel"),
("time_series_transformer", "TimeSeriesTransformerModel"),
("timesformer", "TimesformerModel"),
("timm_backbone", "TimmBackbone"),
Expand Down Expand Up @@ -701,6 +702,7 @@
("swiftformer", "SwiftFormerForImageClassification"),
("swin", "SwinForImageClassification"),
("swinv2", "Swinv2ForImageClassification"),
("textnet", "TextNetForImageClassification"),
("timm_wrapper", "TimmWrapperForImageClassification"),
("van", "VanForImageClassification"),
("vit", "ViTForImageClassification"),
Expand Down Expand Up @@ -1386,6 +1388,7 @@
("rt_detr_resnet", "RTDetrResNetBackbone"),
("swin", "SwinBackbone"),
("swinv2", "Swinv2Backbone"),
("textnet", "TextNetBackbone"),
("timm_backbone", "TimmBackbone"),
("vitdet", "VitDetBackbone"),
]
Expand Down
74 changes: 74 additions & 0 deletions src/transformers/models/textnet/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# coding=utf-8
# Copyright 2024 the Fast authors and HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ... import is_vision_available
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available


_import_structure = {
"configuration_textnet": ["TEXTNET_PRETRAINED_CONFIG_ARCHIVE_MAP", "TextNetConfig"],
}

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["image_processing_textnet"] = ["TextNetImageProcessor"]

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_textnet"] = [
"TextNetBackbone",
"TextNetModel",
"TextNetPreTrainedModel",
"TextNetForImageClassification",
]


if TYPE_CHECKING:
from .configuration_textnet import TEXTNET_PRETRAINED_CONFIG_ARCHIVE_MAP, TextNetConfig

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .image_processing_textnet import TextNetImageProcessor

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_textnet import (
TextNetBackbone,
TextNetForImageClassification,
TextNetModel,
TextNetPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
135 changes: 135 additions & 0 deletions src/transformers/models/textnet/configuration_textnet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# coding=utf-8
# Copyright 2024 the Fast authors and HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TextNet model configuration"""

from transformers import PretrainedConfig
from transformers.utils import logging
from transformers.utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices


logger = logging.get_logger(__name__)


class TextNetConfig(BackboneConfigMixin, PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`TextNextModel`]. It is used to instantiate a
TextNext model according to the specified arguments, defining the model architecture. Instantiating a configuration
with the defaults will yield a similar configuration to that of the
[czczup/textnet-base](https://huggingface.co/czczup/textnet-base). Configuration objects inherit from
[`PretrainedConfig`] and can be used to control the model outputs.Read the documentation from [`PretrainedConfig`]
for more information.

Args:
stem_kernel_size (`int`, *optional*, defaults to 3):
The kernel size for the initial convolution layer.
stem_stride (`int`, *optional*, defaults to 2):
The stride for the initial convolution layer.
stem_num_channels (`int`, *optional*, defaults to 3):
The num of channels in input for the initial convolution layer.
stem_out_channels (`int`, *optional*, defaults to 64):
The num of channels in out for the initial convolution layer.
stem_act_func (`str`, *optional*, defaults to `"relu"`):
The activation function for the initial convolution layer.
image_size (`Tuple[int, int]`, *optional*, defaults to `[640, 640]`):
The size (resolution) of each image.
conv_layer_kernel_sizes (`List[List[List[int]]]`, *optional*):
A list of stage-wise kernel sizes. If `None`, defaults to:
`[[[3, 3], [3, 3], [3, 3]], [[3, 3], [1, 3], [3, 3], [3, 1]], [[3, 3], [3, 3], [3, 1], [1, 3]], [[3, 3], [3, 1], [1, 3], [3, 3]]]`.
conv_layer_strides (`List[List[int]]`, *optional*):
A list of stage-wise strides. If `None`, defaults to:
`[[1, 2, 1], [2, 1, 1, 1], [2, 1, 1, 1], [2, 1, 1, 1]]`.
hidden_sizes (`List[int]`, *optional*, defaults to `[64, 64, 128, 256, 512]`):
Dimensionality (hidden size) at each stage.
batch_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the batch normalization layers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
out_features (`List[str]`, *optional*):
If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc.
(depending on how many stages the model has). If unset and `out_indices` is set, will default to the
corresponding stages. If unset and `out_indices` is unset, will default to the last stage.
out_indices (`List[int]`, *optional*):
If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how
many stages the model has). If unset and `out_features` is set, will default to the corresponding stages.
If unset and `out_features` is unset, will default to the last stage.

Examples:

```python
>>> from transformers import TextNetConfig, TextNetBackbone

>>> # Initializing a TextNetConfig
>>> configuration = TextNetConfig()

>>> # Initializing a model (with random weights)
>>> model = TextNetBackbone(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
```"""

r"""
[czczup](https://huggingface.co/czczup/textnet-base)
"""
model_type = "textnet"

def __init__(
self,
stem_kernel_size=3,
stem_stride=2,
stem_num_channels=3,
stem_out_channels=64,
stem_act_func="relu",
image_size=[640, 640],
conv_layer_kernel_sizes=None,
conv_layer_strides=None,
hidden_sizes=[64, 64, 128, 256, 512],
batch_norm_eps=1e-5,
initializer_range=0.02,
out_features=None,
out_indices=None,
**kwargs,
):
super().__init__(**kwargs)

if conv_layer_kernel_sizes is None:
conv_layer_kernel_sizes = [
[[3, 3], [3, 3], [3, 3]],
[[3, 3], [1, 3], [3, 3], [3, 1]],
[[3, 3], [3, 3], [3, 1], [1, 3]],
[[3, 3], [3, 1], [1, 3], [3, 3]],
]
if conv_layer_strides is None:
conv_layer_strides = [[1, 2, 1], [2, 1, 1, 1], [2, 1, 1, 1], [2, 1, 1, 1]]

self.stem_kernel_size = stem_kernel_size
self.stem_stride = stem_stride
self.stem_num_channels = stem_num_channels
self.stem_out_channels = stem_out_channels
self.stem_act_func = stem_act_func

self.image_size = image_size
self.conv_layer_kernel_sizes = conv_layer_kernel_sizes
self.conv_layer_strides = conv_layer_strides

self.initializer_range = initializer_range
self.hidden_sizes = hidden_sizes
self.batch_norm_eps = batch_norm_eps

self.depths = [len(layer) for layer in self.conv_layer_kernel_sizes]
self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, 5)]
self._out_features, self._out_indices = get_aligned_output_features_output_indices(
out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
)
Loading