Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change accumulate_train_batch_on_tokens default to True #1618

Merged
merged 1 commit into from
Oct 27, 2024

Conversation

dakinggg
Copy link
Collaborator

@dakinggg dakinggg commented Oct 27, 2024

Following #1610, update the default value of accumulate_train_batch_on_tokens to True, for a more mathematically correct default. Note: this will slightly change loss curves for models trained with padding. The old behavior can be recovered if desired. by simply setting this to False explicitly. See #1610 and mosaicml/composer#3677 for more discussion of this change.

@dakinggg dakinggg marked this pull request as ready for review October 27, 2024 02:04
@dakinggg dakinggg requested a review from a team as a code owner October 27, 2024 02:04
@dakinggg dakinggg requested a review from mvpatel2000 October 27, 2024 02:05
@dakinggg dakinggg enabled auto-merge (squash) October 27, 2024 02:05
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting. lgtm

Note: do we need a deprecation warning or no? This deviates from old behavior but i guess is a new feature...

@dakinggg dakinggg merged commit 8b2a88b into mosaicml:main Oct 27, 2024
9 checks passed
@dakinggg
Copy link
Collaborator Author

Open to other opinions, but given this is a correctness change, I think it is ok to change without deprecation (also deprecation warning for a default change is a bit unlikely to have any impact). Definitely will call this out in the next release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants