Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLD] Bound tokenizers version #3202

Closed
wants to merge 1 commit into from
Closed

Conversation

itaismith
Copy link
Contributor

Description of changes

Latest tokenizers package does not have binaries for python 3.8, so bounding to the previous working version to allow CI tests to pass.

Test plan

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

N/A

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@itaismith itaismith force-pushed the itai/tokenizers-version-fix branch 4 times, most recently from b8770a9 to 74d2bb8 Compare November 27, 2024 07:35
@tazarov
Copy link
Contributor

tazarov commented Nov 27, 2024

@itaismith, the tokenizers' team just confirmed that going forward they won't be supporting 3.8 - huggingface/tokenizers#1690 (comment). This is not ideal given that all our CI runs on 3.8 as baseline.

A few figures about 3.8 usage:

We can still continue to build wheels for 3.8, but maybe everything else can be bumped to 3.9. I feel that the tokenizers ditching 3.8 (possibly due to EOL) will be a growing trend in the coming months for other upstream deps.

@itaismith itaismith force-pushed the itai/tokenizers-version-fix branch from 74d2bb8 to e7e53e8 Compare November 27, 2024 16:56
@itaismith itaismith force-pushed the itai/tokenizers-version-fix branch from e7e53e8 to f091309 Compare November 27, 2024 17:05
HammadB added a commit that referenced this pull request Nov 27, 2024
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
   - Bounds tokenizers as in #3202 
- Includes a change to CVP test to install an old version of a dep in
case of b/w incompat like this has.
 - New functionality
   - None

## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
@HammadB
Copy link
Collaborator

HammadB commented Nov 27, 2024

Closed in favor of #3211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants