Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream ModelOutputs #1

Closed
wants to merge 41 commits into from
Closed

Stream ModelOutputs #1

wants to merge 41 commits into from

Conversation

dmarx
Copy link
Collaborator

@dmarx dmarx commented Mar 4, 2024

What does this PR do?

The "streaming" feature for text generation is currently constrained to streaming token ids. If a user requests an output dict, the stream still only generates token ids without other attributes that may have been requested such as scores, raw logits, activations. After the stream has been consumed, the function returns its final output, which will be the originally requested output dict.

This PR aligns the return type of the streamer with the requested return type by encapsulating the logic that determines how the return value is constructed. In so doing, this change also permits users to stream richer representations than just token ids.

Summary of changes:

  • Adds a ._prepare_output() function to GenerationMixin, encapsulates output formatting logic
  • ._prepare_output() replaces nested conditionals that previously built return values (beam samplers excluded)
  • Streamer.put() receives its input from ._prepare_output() invocation rather than restricted to token IDs tensor
  • Inputs to Streamer.put() are no longer sent to the CPU prior to being passed to .put()
    • When desired, delegates responsibility for moving the tensor to the provided Streamer.
  • Adds OutputStreamer and OutputIteratorStreamer classes

Before submitting

  • Did you read the contributor guideline,
    Pull Request section?
  • Did you write any new necessary tests?
    • OutputIteratorStreamerTester to be modularized with fixtures, DRYed
  • test against hydra-node
  • fix inconsistent list output from OutputStreamer.on_ready
  • flesh out rest of intermediate return values
    • scores
    • logits
    • attention
    • hidden
  • add some kind of "output formatter" factory to ensure outputs are uniform across the class w/o repeating a ton of logic.
    • encapsulate output construction logic in a function
    • refactor to use function
    • add tests validating output type consistency and parity with streamer.put()
  • ensure tests fully cover targeted cases
    • I'm concerned that injecting bad data directly into these attributes doesn't seem to trigger test failures
    • encoder attentions
    • cross attentions
    • [contrastive search] decoder attention
    • suspect tests not covering assisted decoding
    • added encoder and encoder-decoder models to test suite, need to smoke test
  • update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • Send the CW documentation team a CR
    • docstrings
    • typehints
    • explicit input args for _prepare_output() (? maybe it's fine as is?)
  • fix import
  • fix inconsistent tensor dimensions
  • fix sigterm on hydra-node
  • [ ] revisit ECHO behavior?
    • on our end, just skipping over streamed outputs when output.score is None
    • should be sufficient for HF's end if we just document that the ECHO response will have null scores/logits/activations
  • reimplement their TextIteratorStreamer using the new OutputIteraterSreamer

Who can review?

(Hidden @ tag, uncomment when ready to send to HF)

@dmarx
Copy link
Collaborator Author

dmarx commented Mar 8, 2024

moved to upstream: huggingface#29545

@dmarx dmarx closed this Mar 8, 2024
dmarx pushed a commit that referenced this pull request Mar 19, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (huggingface#3)

* Make Fix (huggingface#5)

* Pr fixes (huggingface#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (huggingface#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (huggingface#7)

* Add modeling tests (huggingface#9)

* Smol Fix (huggingface#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (huggingface#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (huggingface#14)

* Update chat templates to use the new API (huggingface#15)

---------

Co-authored-by: ahmetustun <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Matt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant