Stream ModelOutputs #1

dmarx · 2024-03-04T22:59:56Z

What does this PR do?

The "streaming" feature for text generation is currently constrained to streaming token ids. If a user requests an output dict, the stream still only generates token ids without other attributes that may have been requested such as scores, raw logits, activations. After the stream has been consumed, the function returns its final output, which will be the originally requested output dict.

This PR aligns the return type of the streamer with the requested return type by encapsulating the logic that determines how the return value is constructed. In so doing, this change also permits users to stream richer representations than just token ids.

Summary of changes:

Adds a ._prepare_output() function to GenerationMixin, encapsulates output formatting logic
._prepare_output() replaces nested conditionals that previously built return values (beam samplers excluded)
Streamer.put() receives its input from ._prepare_output() invocation rather than restricted to token IDs tensor
Inputs to Streamer.put() are no longer sent to the CPU prior to being passed to .put()
- When desired, delegates responsibility for moving the tensor to the provided Streamer.
Adds OutputStreamer and OutputIteratorStreamer classes

Before submitting

Who can review?

(Hidden @ tag, uncomment when ready to send to HF)

dmarx · 2024-03-08T20:36:59Z

moved to upstream: huggingface#29545

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (huggingface#3) * Make Fix (huggingface#5) * Pr fixes (huggingface#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (huggingface#8) * tokenizer test * format fix * Adding Docs and other minor changes (huggingface#7) * Add modeling tests (huggingface#9) * Smol Fix (huggingface#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (huggingface#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (huggingface#14) * Update chat templates to use the new API (huggingface#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

dmarx added 30 commits February 29, 2024 12:06

skeleton OutputStreamer

f901ee0

hacky but passes tests

57a77d1

skeleton OutputIteratorStreamer, notes

b34c0b2

fleshed out OutputIteratorStreamer w test case

5eb0e69

token_id streaming passes

ff35e2b

contrastive passes token_id test

76ed5c2

randomize test seed

3885ea1

greedy scores pass test

2958099

ensure we're building the output incrementally

976eb23

multinom sampling outputs match

cabb5fe

throw error if on_ready called on empty buffer

9d075c9

enforce list(values) in on_ready()

ba48d71

POC output_constructor

108d3c2

DRY tests following on_ready() fix

02ec905

fix function spelling

ebe4798

moved output_constructor to class method

528d79a

rename output_constructor -> _prepare_output

97812bf

integrating _prepare_output

2c03ae6

finished integrating _prepare_output

64732bb

placeholder args for attention/hidden streaming

e903f04

cleanup

ca4dc8d

explicit GenerateEncoderDecoderOutput support

6e17323

parameterized OutputStreamer tests

3e2ff84

fixed tests

256884f

assert same types on field, cleanup

75bf3d7

tuple-of-tensors output type parity

673837b

cleanup, test type consistency yielded of stream

8dd7973

test emits helpful info on failure

cd644fb

output_attentions streaming for greedy decoding

2ad0ead

fix skipped arguments

e523c53

dmarx added 11 commits March 7, 2024 13:31

'fixed' tests, but now very messy

b6c2dc1

cleaned up

7ce491f

test checks all output attrs but past_key_values

290b9cb

test over model varieties

41cb5e2

moved instantiation closer to use

a7a8a93

attention for multinom decoding

76ad30d

contrastive working after multinom supported

21adb61

attention streaming passes all test cases

916bf43

add assistive decoding to test parameterization

0b28bab

draft streaming assisted, exceeds max tokens

b89dc83

back out assisted decoding changes for the moment

1e02df6

dmarx closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream ModelOutputs #1

Stream ModelOutputs #1

dmarx commented Mar 4, 2024 •

edited

Loading

dmarx commented Mar 8, 2024

Stream ModelOutputs #1

Stream ModelOutputs #1

Conversation

dmarx commented Mar 4, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

dmarx commented Mar 8, 2024

dmarx commented Mar 4, 2024 •

edited

Loading