Add get optimizer method #5149

ShellyNR · 2024-02-18T13:03:10Z

Add the get_optimizer method to accelerators for use during optimizer configuration, instead of checking for the specific accelerator in the engine.py code.

In all accelerators other than hpu the current implementation returns None so that the previous flow is not affected.

In hpu_accelerator.py, the method returns an optimizer if the configured optimizer has hpu-specific implementation or requirements. Otherwise, it returns None.

ShellyNR · 2024-02-18T13:17:55Z

@ShellyNR please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

tjruwase · 2024-02-19T22:06:12Z

accelerator/hpu_accelerator.py

@@ -294,3 +294,22 @@ def build_extension(self):

    def export_envs(self):
        return []
+
+    def get_optimizer(self, optimizer_name, cpu_optimization, model_parameters, **optimizer_parameters):
+        from habana_frameworks.torch.hpex.optimizers import FusedAdamW


I am not convinced that this much code is needed.

It seems the goal is to replace AdamW on hpu accelerator as below:

from deepspeed.ops.adam import FusedAdam with
from habana_frameworks.torch.hpex.optimizers import FusedAdamW

Is this correct?

@tjruwase Yes, but in addition to prepare the ground for other fused optimizers that are addressed in this function. for example: FusedLamb, OneBit, etc...
Do you have in mind something else?

@nelyahu, thanks for the clarification. I do agree that deferring to accelerator to instantiate fused optimizer when available is the way to go. My concern is the duplication of optimizer selection logic here with engine.py will be difficult to maintain long term.

I think the complication is that engine.py mixes optimizer name refinement (especially for Adam variants) and optimizer instantiation. By name refinement, I mean something like ADAM_OPTIMIZER which is a user-facing config value, internally maps to one of many optimizers (e.g., torch.optim.Adam, DeepSpeedCPUAdam, etc.). So, my thought is to

Create an internal naming convention for the various optimizer instantiations. For example,

_TORCH_ADAM_OPTIMIZER for torch.optim.Adam,

_DS_FUSED_ADAM for deepspeed.ops.adam.FusedAdam,

_DS_ADAM_OPTIMIZER for deepspeed.ops.adam.DeepSpeedCPUAdam.

Create a function that maps an external name to an internal one based on other configuration values. For example, ADAM_OPTIMIZER would map to one of _TORCH_ADAM, _DS_FUSED_ADAM, _DS_CPUADAM.

Based on the above the following changes could now be made:

Simplify get_optimizer() by accepting internal optimizer name and creating an optimizer if name is one of supported internal optimizer names. Also, cpu_optimization is no longer needed.

Simplify _configure_basic_optimizer() to (i) convert to internal name, (ii) call accelerator().get_optimizer() with internal name to create optimizer, and (iii) otherwise fall through to existing optimizer creation logic.

Looking forward to your feedback. Thanks!

@tjruwase I applied the changes you suggested, however some optimizers still require specific code. Which one of the commits you think is best?

@ShellyNR, thanks for making the changes. This looks better to me.

When you say 'some optimizers still require specific code, is that referring to _ADAM in hpu_accelerator which maps to torch.optim.Adam()? If so, I think that is fine. My goal is to avoid duplication between accelerator codes and engine.py.

tjruwase · 2024-03-12T12:51:47Z

deepspeed/runtime/engine.py

+            "_TORCH_ADAMW": lambda arg1, **arg2: torch.optim.AdamW(arg1, **arg2),
+            "_CPU_ADAM": lambda arg1, **arg2: DeepSpeedCPUAdam(arg1, **arg2, adamw_mode=False),
+            "_CPU_ADAMW": lambda arg1, **arg2: DeepSpeedCPUAdam(arg1, **arg2, adamw_mode=True),
+            "_ADAM": lambda arg1, **arg2: FusedAdam(arg1, **arg2, adam_w_mode=False),


Thanks for creating this dict. Based on this, I think mappings of cuda-specific optimizers. such as Fused[Adam|Lamb|Lion] should move to cuda_accelerator.py.

What do you think?

tjruwase · 2024-04-14T19:43:53Z

@ShellyNR, thanks for your work on this PR. I wanted to check if you needed more clarification to address my comments?

nelyahu · 2024-04-15T07:27:21Z

@tjruwase , sorry for not updating this PR, @ShellyNR was OOO for few days. we are considering another direction for support FusedAdam via existing HPU implementation.

tjruwase · 2024-04-15T13:15:15Z

@tjruwase , sorry for not updating this PR, @ShellyNR was OOO for few days. we are considering another direction for support FusedAdam via existing HPU implementation.

@nelyahu, thanks for the update. I am excited to see the new approach :).

ShellyNR · 2024-04-17T09:15:27Z

I'm closing this PR, a new PR that addresses this issue without using a get_optimizer method will be pushed soon.
@tjruwase thanks for your review and comments (:

ShellyNR requested review from mrwyattii and tjruwase as code owners February 18, 2024 13:03

tjruwase reviewed Feb 19, 2024

View reviewed changes

ShellyNR requested review from duli2012, conglongli, awan-10, arashb, ShijieZZZZ and loadams as code owners March 11, 2024 14:53

SNahir added 2 commits March 11, 2024 17:33

Add get optimizer method in accelerators

ab55ae4

modify configure basic optimizer

1be755b

ShellyNR force-pushed the add_get_optimizer branch from 03ee38a to 1be755b Compare March 11, 2024 15:36

tjruwase reviewed Mar 12, 2024

View reviewed changes

tjruwase added 2 commits April 13, 2024 17:08

Merge branch 'master' into add_get_optimizer

4ae1109

Merge branch 'master' into add_get_optimizer

8c16a4e

ShellyNR closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get optimizer method #5149

Add get optimizer method #5149

ShellyNR commented Feb 18, 2024

ShellyNR commented Feb 18, 2024

tjruwase Feb 19, 2024

nelyahu Feb 21, 2024

tjruwase Feb 29, 2024 •

edited

Loading

ShellyNR Mar 11, 2024

tjruwase Mar 12, 2024 •

edited

Loading

tjruwase Mar 12, 2024

tjruwase commented Apr 14, 2024

nelyahu commented Apr 15, 2024

tjruwase commented Apr 15, 2024

ShellyNR commented Apr 17, 2024

Add get optimizer method #5149

Add get optimizer method #5149

Conversation

ShellyNR commented Feb 18, 2024

ShellyNR commented Feb 18, 2024

tjruwase Feb 19, 2024

Choose a reason for hiding this comment

nelyahu Feb 21, 2024

Choose a reason for hiding this comment

tjruwase Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

ShellyNR Mar 11, 2024

Choose a reason for hiding this comment

tjruwase Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

tjruwase Mar 12, 2024

Choose a reason for hiding this comment

tjruwase commented Apr 14, 2024

nelyahu commented Apr 15, 2024

tjruwase commented Apr 15, 2024

ShellyNR commented Apr 17, 2024

tjruwase Feb 29, 2024 •

edited

Loading

tjruwase Mar 12, 2024 •

edited

Loading