-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve multimodal processors - rely less on kwargs #28711
Draft
molbap
wants to merge
36
commits into
huggingface:main
Choose a base branch
from
molbap:improve_multimodal_processors
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 22 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
42ecf48
expand kwargs from align
molbap ccb2147
remove kwargs from altclip processor
molbap f999e0c
add explicit args for donut processor
molbap 8fb3a6b
add explicit call to current processor for in context manager
molbap a90c766
format
molbap 49cb6cc
remove unused kwargs
molbap 3ac1c7e
move conditions for encodings
molbap 7a819fd
improve flow over text/image
molbap 9cc38b7
[breaking] pass explicit args to bridgetower
molbap ff6a950
wwsMerge branch 'main' into improve_multimodal_processors
molbap 7db64a0
add default kwargs for BC
molbap 41674d9
fix bridgetower
molbap 618a687
debug bridgetower image proc
molbap f39cdc1
format
molbap 9a6f97d
move kwargs message to info level
molbap 380f82f
add debug messages
molbap 75f15d3
fix arguments not being passed in bridgetower
molbap 3df5faa
keep backwards compat for processing + modify testing args dict
molbap 5ad0694
Merge branch 'main' into improve_multimodal_processors
molbap 69e5a2d
fix quality
molbap 68c2f40
log kwargs mismatch to info level
molbap e1e4084
fix quality
molbap bfa81e5
Merge branch 'main' into improve_multimodal_processors
molbap 4b557b0
address comments
molbap b7fc377
fix typo
molbap 270bb9e
fix expected tests for bridgetower
molbap 94a1b75
fix conflicts
molbap 6603bf0
Merge branch 'main' into improve_multimodal_processors
molbap 004c961
fix valid processor keys
molbap c2e49f5
remove unused arg list
molbap 79958b5
quality
molbap a36f524
Merge branch 'main' into improve_multimodal_processors
molbap 3238dd3
skeleton draft - uniform processor call
molbap 3afde22
fix quality
molbap eb99e29
add broken wav2vec audio processing
molbap c6afd63
Merge branch 'main' into improve_multimodal_processors
molbap File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,33 +78,10 @@ def __call__( | |
if images is None and text is None: | ||
raise ValueError("You have to specify either images or text.") | ||
|
||
# Get only text | ||
if images is None: | ||
self.current_processor = self.tokenizer | ||
text_encoding = self.tokenizer( | ||
text=text, | ||
add_special_tokens=add_special_tokens, | ||
padding=padding, | ||
truncation=truncation, | ||
max_length=max_length, | ||
stride=stride, | ||
pad_to_multiple_of=pad_to_multiple_of, | ||
return_attention_mask=return_attention_mask, | ||
return_overflowing_tokens=return_overflowing_tokens, | ||
return_special_tokens_mask=return_special_tokens_mask, | ||
return_offsets_mapping=return_offsets_mapping, | ||
return_token_type_ids=return_token_type_ids, | ||
return_length=return_length, | ||
verbose=verbose, | ||
return_tensors=return_tensors, | ||
**kwargs, | ||
) | ||
return text_encoding | ||
|
||
# add pixel_values | ||
encoding_image_processor = self.image_processor(images, return_tensors=return_tensors) | ||
text_encoding = None | ||
|
||
if text is not None: | ||
self.current_processor = self.tokenizer | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment here about current processors |
||
text_encoding = self.tokenizer( | ||
text=text, | ||
add_special_tokens=add_special_tokens, | ||
|
@@ -123,13 +100,16 @@ def __call__( | |
return_tensors=return_tensors, | ||
**kwargs, | ||
) | ||
else: | ||
text_encoding = None | ||
|
||
if text_encoding is not None: | ||
encoding_image_processor.update(text_encoding) | ||
# add pixel_values encoding. If we also have text_encoding, update image encoding and return it. | ||
# else, return the text encoding. | ||
if images is not None: | ||
encoding_image_processor = self.image_processor(images, return_tensors=return_tensors) | ||
if text_encoding is not None: | ||
encoding_image_processor.update(text_encoding) | ||
return encoding_image_processor | ||
|
||
return encoding_image_processor | ||
return text_encoding | ||
|
||
# Copied from transformers.models.blip.processing_blip.BlipProcessor.batch_decode with BertTokenizerFast->PreTrainedTokenizer | ||
def batch_decode(self, *args, **kwargs): | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current processor behaviour is deprecated, so we don't need to set it here. In fact, we should probably create a
current_processor
property which shows a deprecation message when usedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's good to know! Seen it in another instance I think, I'll drop it in that case and add the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's definitely still around in places. For context, we used to have a behaviour when the current_processor was selected through a context manager. The context manager behaviour was removed, but there's still remnants of this, even though
current_processor
normally has no effect.