Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich TTS pipeline parameters naming #26473

Merged
merged 12 commits into from
Nov 2, 2023

Conversation

ylacombe
Copy link
Contributor

What does this PR do?

#26369 highlighted that the use of forward_params in the TTS pipeline was not clear enough. This PR enrich a bit the docstrings to correct this oversight.

Note that following a discussion with @Vaibhavs10, @sanchit-gandhi and @Narsil in the same issue, I came to the conclusion that:

  • adding max_new_tokens would add confusion to the pipeline usage, since generative and non-generative models coexist
  • in the same manner, I believe that renaming forward_params to generate_params would be equally as confusing

As @Narsil noted, an user that would like to use advanced parameter controls would probably use the classic transformers usage (tokenizer + model + etc).

Of course, I'm open to discussion and modification of current behavior if this problem recurs in the future.

Fixes #26369

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

Hey @osanseviero, @Narsil and @ArthurZucker ! Do you feel like this resolve the issue for now?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 28, 2023

The documentation is not available anymore as the PR was closed or merged.

... "max_new_tokens": 35,
... }

>>> outputs = music_generator("This is a test", forward_params=forward_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would help for consistency allowing the pipeline to accept either forward_params or generate_kwargs? If forward_params, then it gets passed to .forward or .generate (whichever one the model does). If generate_kwargs, it only gets passed if the model generates with .generate. IMO this would help with consistency across pipelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this would make sense in terms of consistency, but not in terms of simplicity, as it might be confusing for users to have two dictionaries doing the same thing. Anyway, I'm happy with your solution, since users can still refer to the documentation in case of confusion.

WDYT @ArthurZucker and @Narsil ?

I'll open a PR once we agree on that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more aligned with sanchit here, generation params should be passed to generate_kwargs for sure and not as forward_params it's confusing and not consistent with our API

@Vaibhavs10
Copy link
Member

Gentle ping here @ylacombe @sanchit-gandhi ?
I'd like to promote more complex operations and the ability to play around with pipelines for TTA/S. Would be cool to be able to do it via generate_kwargs to showcase similarity across various pipeline usage.

@ylacombe ylacombe changed the title Enrich TTS pipeline docstring for clearer forward_params usage Enrich TTS pipeline parameters naming Oct 11, 2023
@ylacombe
Copy link
Contributor Author

Hey @Vaibhavs10, @sanchit-gandhi and @ArthurZucker, I've followed your advice and made it compatible with generate_kwargs if the model is generative. forward_params is still usable for both type of models but generate_kwargs has priority over forward_params if usable.

Let me know your opinion on it!

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM, thanks for iterating @ylacombe! Once we get the final review from @ArthurZucker we can merge

else:
output = self.model(**model_inputs, **kwargs)[0]
output = self.model(**model_inputs, **forward_params)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error out if generate_kwargs are passed but the model doesn't generate? We could then nudge the user to use forward_kwargs if using a forward-only model in the error message. WDYT?


# for reproducibility
set_seed(555)
# make sure nothing is done if generate_kwargs passed since not related
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it would be nice to raise an error here?

tests/pipelines/test_pipelines_text_to_audio.py Outdated Show resolved Hide resolved
@sanchit-gandhi
Copy link
Contributor

The MusicGen TTA doctest is currently timing out (> 120s). Given we already set a low generation max length (35 tokens), I don't think we can really reduce the time for this test much further. Do you think it makes sense to switch to using the VITS model on the doctest, since it'll run in <10s?

Co-authored-by: Sanchit Gandhi <[email protected]>
Copy link
Contributor Author

@ylacombe ylacombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review here @sanchit-gandhi ! I don't mind throwing an error here

tests/pipelines/test_pipelines_text_to_audio.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a strong opinion, so good for me here

src/transformers/pipelines/text_to_audio.py Outdated Show resolved Hide resolved

with self.assertRaises(TypeError):
# assert error if generate parameter
outputs = speech_generator("This is a test", forward_params={"speaker_id": 5, "do_sample": True})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any better example of forward params?
Speaker id seems to me that it could be part of a generation config (but also is not a tensor and the pr seems to say that we could have tensors here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately I can only think of Vits and speaker_id is basically the only optional parameter of the forward pass. BTW, tensor params are tested by test_conversion_additional_tensor

@ylacombe
Copy link
Contributor Author

Thanks for the reviews here @sanchit-gandhi and @ArthurZucker, I have updated according to your comments, and will merge once the checks are done!

@sanchit-gandhi
Copy link
Contributor

Looks like the doc tests might be too slow still: https://app.circleci.com/pipelines/github/huggingface/transformers/75458/workflows/bbc191e3-5c49-410e-a720-307229af6d37/jobs/957271

Should we use a different checkpoint? #26473 (comment)

@ylacombe
Copy link
Contributor Author

ylacombe commented Nov 2, 2023

Hey @sanchit-gandhi, this slips my mind, looking at this in about an hour ;)

@ylacombe
Copy link
Contributor Author

ylacombe commented Nov 2, 2023

No timing out anymore, just by syncing the branch with main! Is it okay to merge in that case @amyeroberts and @sanchit-gandhi ?

@amyeroberts
Copy link
Collaborator

amyeroberts commented Nov 2, 2023

If tests are passing and you have core maintainer approval (from @ArthurZucker here) you're good to go!

@ylacombe ylacombe merged commit 0ed6729 into huggingface:main Nov 2, 2023
3 checks passed
@ylacombe ylacombe deleted the enrich-TTS-pipeline-docstrings branch November 2, 2023 17:07
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
* enrich TTS pipeline docstring for clearer forward_params use

* change token leghts

* update Pipeline parameters

* correct docstring and make style

* fix tests

* make style

* change music prompt

Co-authored-by: Sanchit Gandhi <[email protected]>

* Apply suggestions from code review

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Sanchit Gandhi <[email protected]>

* raise errors if generate_kwargs with forward-only models

* make style

---------

Co-authored-by: Sanchit Gandhi <[email protected]>
Co-authored-by: Arthur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MusicGen with TextToAudioPipeline issues
6 participants