Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Upgrade transformers to the latest version 4.34.1 #5994

Merged
merged 25 commits into from
Oct 24, 2023

Conversation

grantmwilliams
Copy link
Contributor

@grantmwilliams grantmwilliams commented Oct 6, 2023

Haystack can support the new Mistral, Persimmon, BROS, ViTMatte, and Nougat models.

Tranformers 4.34.0 Release
Tranformers 4.34.1 Release

Related Issues

Proposed Changes:

  • Bump of Transformers to 4.34.1
  • Install openai-whisper in CI without dependencies to prevent version conflict caused by openai-whisper pinning an old tiktoken version

How did you test it?

Manually with:

pip install '.[dev,preview]' langdetect 'transformers[torch,sentencepiece]==4.34.1' 'sentence-transformers>=2.2.0' pypdf tika 'azure-ai-formrecognizer>=3.2.0b2'
pip install --no-deps llvmlite numba 'openai-whisper>=20230818'
pytest --maxfail=5 test/preview/components/audio

Notes for the reviewer

pip install '.[dev,preview,audio]' wouldn't work anymore after this. I would remove the extra because of this. WDYT?
Also the instructions in haystack/preview/components/audio/whisper_local.py would need to be updated to pip install --no-deps llvmlite numba 'openai-whisper>=20230818' instead of pip install openai-whisper.

Checklist

…n support the new Mistral, Nougat, and other models.
@CLAassistant
Copy link

CLAassistant commented Oct 6, 2023

CLA assistant check
All committers have signed the CLA.

@grantmwilliams grantmwilliams marked this pull request as ready for review October 6, 2023 14:46
@grantmwilliams grantmwilliams requested review from a team as code owners October 6, 2023 14:46
@grantmwilliams grantmwilliams requested review from dfokina and julian-risch and removed request for a team October 6, 2023 14:46
@grantmwilliams grantmwilliams changed the title Upgrade transformers to the latest version 4.34.0 build: Upgrade transformers to the latest version 4.34.0 Oct 6, 2023
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this pull request. What a nice one to start with!
There are two small changes we need, before we can merge this.
The files .github/workflows/tests_preview.yml and .github/workflows/tests.yml also contain the transformers version 4.32.1.
Could you please change those to 4.34.0 too? Should be ready to go then.
Did you try one of these new models with Haystack by any chance already?

@grantmwilliams
Copy link
Contributor Author

There are two small changes we need, before we can merge this. The files .github/workflows/tests_preview.yml and .github/workflows/tests.yml also contain the transformers version 4.32.1. Could you please change those to 4.34.0 too? Should be ready to go then.

Ah, thank you. I forgot fzf doesn't check dotfiles by default.

Did you try one of these new models with Haystack by any chance already?

I've been using the Nougat model for OCRing documents, and I've been very impressed with the results so far. I think it has the potential to be a decent option in a DocumentConverter.

I'm also planning to try the Mistral model. It appears to do well in the benchmarks.

@julian-risch
Copy link
Member

julian-risch commented Oct 6, 2023

@grantmwilliams There is more than one occurrence of 4.32.1 in the two workflow files unfortunately, so we need to change all of them.
I also saw that one test case in test/modeling/test_model_loading.py fails now because a RuntimeError instead of the expected OSError is raised. It's the test_basic_loading_unknown_model.
Regarding the sorted import statements: It's not a big deal. We wanted to introduce isort anyway but still need to configure it properly to work together with black. If you committing another change anyway, I think it's best to undo the sorting. Otherwise it could result in a back-and-fourth with other PRs.

@grantmwilliams
Copy link
Contributor Author

I also saw that one test case in test/modeling/test_model_loading.py fails now because a RuntimeError instead of the expected OSError is raised. It's the test_basic_loading_unknown_model.

@julian-risch I'm assuming this update to huggingface_hub is responsible: https://github.com/huggingface/huggingface_hub/pull/1692/files. It looks like the OSError is now caught instead.

How would you like me to address it on the Haystack side?

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantmwilliams Only remaining problem in the unit tests is now the failing test that expects an OSError but get's a RuntimeError. Let's change https://github.com/grantmwilliams/haystack/blob/eb28cc174e8c19bcc696d069c1b4914e9df6edb8/test/modeling/test_model_loading.py#L30 to RuntimeError then.
One of the integration tests of our preview of Haystack 2.0 is failing too: https://github.com/deepset-ai/haystack/actions/runs/6433646037/job/17471652438?pr=5994 However, I don't fully understand the issue. The transformers release notes mention a tokenizer refactoring: https://github.com/huggingface/transformers/releases/tag/v4.34.0 Maybe that's causing the problem in this test.

@coveralls
Copy link
Collaborator

coveralls commented Oct 8, 2023

Pull Request Test Coverage Report for Build 6629997962

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 25 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.008%) to 50.885%

Files with Coverage Reduction New Missed Lines %
preview/components/audio/whisper_local.py 1 98.15%
utils/context_matching.py 2 93.55%
nodes/audio/whisper_transcriber.py 22 25.97%
Totals Coverage Status
Change from base Build 6628597069: -0.008%
Covered Lines: 12992
Relevant Lines: 25532

💛 - Coveralls

@julian-risch julian-risch self-assigned this Oct 9, 2023
@julian-risch julian-risch changed the title build: Upgrade transformers to the latest version 4.34.0 build: Upgrade transformers to the latest version 4.34.1 Oct 19, 2023
@github-actions github-actions bot added the type:documentation Improvements on the docs label Oct 19, 2023
@julian-risch
Copy link
Member

There was a 4.34.1 patch release with the following change among others:

I updated this PR so that we upgrade to 4.34.1 instead of 4.34.0.

@julian-risch
Copy link
Member

julian-risch commented Oct 19, 2023

I was able to reproduce the issue locally and fixing it is simply done by upgrading openai-whisper. An outdated version of openai-whisper is used in the CI. The problem is that openai-whisper requires tiktoken==0.3.3 but Haystack requires openai-whisper>=0.5.1

pip install '.[dev,preview]' langdetect 'transformers[torch,sentencepiece]==4.34.1' 'sentence-transformers>=2.2.0' pypdf openai-whisper tika 'azure-ai-formrecognizer>=3.2.0b2'
pytest --maxfail=5 -m "integration" test/preview/components/audio

fails but pytest passes after

pip install -U openai-whisper

@julian-risch julian-risch requested a review from ZanSara October 24, 2023 08:55
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we will manage to get rid of this dependency asap 😬 I'll take care of reviewing #6149 soon

@julian-risch julian-risch merged commit 1cf70d3 into deepset-ai:main Oct 24, 2023
72 checks passed
@grantmwilliams grantmwilliams deleted the bump-transformers-to-4-34 branch October 24, 2023 17:20
@bilgeyucel
Copy link
Contributor

bilgeyucel commented Oct 25, 2023

Hi @grantmwilliams, thank you for your contribution to Haystack! Now that you resolved an issue labeled with 'hacktoberfest', you have a chance to receive an exclusive swag package from Haystack. 🎁

Fill in this form, and let us know if you have any questions! https://forms.gle/226vqWoN6NRAaqJ69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformers 4.34.x
7 participants