Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nemo-v2 <- nemo-vt #3

Closed
wants to merge 36 commits into from
Closed

nemo-v2 <- nemo-vt #3

wants to merge 36 commits into from

Conversation

tahirjmakhdoomi
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

titu1994 and others added 30 commits May 17, 2023 15:13
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
* [Temp] VP Fixes

Signed-off-by: smajumdar <[email protected]>

* Revert logging

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
(cherry picked from commit b6f46a0)
* check for first or last stage

Signed-off-by: ericharper <[email protected]>

* remove redundant check

Signed-off-by: ericharper <[email protected]>

* fix typo

Signed-off-by: ericharper <[email protected]>

* add map_location

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
* Bug fix to restore act ckpt

Signed-off-by: Markel Sanz Ausin <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Markel Sanz Ausin <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Bug fix to reset sequence parallelism

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Update seq par reset/restore

Signed-off-by: Markel Sanz Ausin <[email protected]>

* Add nested loop

Signed-off-by: Markel Sanz Ausin <[email protected]>

---------

Signed-off-by: Markel Sanz Ausin <[email protected]>
…ng (#6744)

* fix checkpointed forward and add test for full activation checkpointing

Signed-off-by: Abhinav Khattar <[email protected]>

* add method

Signed-off-by: Abhinav Khattar <[email protected]>

* add method

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
* add call to p2p overlap

Signed-off-by: Abhinav Khattar <[email protected]>

* update Jenkins for test

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
* fix get param

Signed-off-by: ericharper <[email protected]>

* change name

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
* initial POC for LDDL Bert

* Finish LDDL POC

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge head

* resolving merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for  val/test loaders

* change to new LDDL class + add winding

* fix logging level

* fix winding

* test fix

* fixes to winding

* add file system

* add prepemption optimizations

* more logging

* more prints

* better logging

* asfsf

* add barrier

* removing prints

* working with mb lddl loader

* final changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update requirements file with LDDL

Signed-off-by: wdykas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert adding to requirements

---------

Signed-off-by: wdykas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
…e (#6740)

* Construct FP8 amax reduction group

Signed-off-by: Tim Moon <[email protected]>

* update core for CI

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
* add interfaces for tp_communication overlap

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Interface to provide custom userbuffer communicator settings by yaml file

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Construct MPI process group for userbuffers support

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
* Fix TTS adapter tutorial

Signed-off-by: hsiehjackson <[email protected]>

* Fix version

Signed-off-by: hsiehjackson <[email protected]>

---------

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
* add trainer.validate example

Signed-off-by: ericharper <[email protected]>

* clean up white space

Signed-off-by: ericharper <[email protected]>

* add mbs and gbs to the config

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Yi Dong <[email protected]>
* add model pretraining and customization classes

Signed-off-by: ericharper <[email protected]>

* fix

Signed-off-by: ericharper <[email protected]>

* test width

Signed-off-by: ericharper <[email protected]>

* increase middle pane width

Signed-off-by: ericharper <[email protected]>

* add modules and datasets

Signed-off-by: ericharper <[email protected]>

* remove global in t5 dataset s and fix formatting in megatron base model

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
* Apply garbage collection inverval to validation steps

Signed-off-by: Sangkug Lym <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.