[BUG] sok amp mode error #462

Orca-bit · 2024-10-21T11:29:32Z

Describe the bug

[1,0]<stderr>:Traceback (most recent call last):
[1,0]<stderr>:  File "/ws/HugeCTR/sparse_operation_kit/SOK_DLRM_Benchmark/main.py", line 129, in <module>
[1,0]<stderr>:    trainer = Trainer(
[1,0]<stderr>:  File "/ws/HugeCTR/sparse_operation_kit/SOK_DLRM_Benchmark/trainer.py", line 161, in __init__
[1,0]<stderr>:    self._embedding_optimizer = tf.keras.mixed_precision.LossScaleOptimizer(
[1,0]<stderr>:  File "/usr/local/lib/python3.10/dist-packages/keras/mixed_precision/loss_scale_optimizer.py", line 343, in __call__
[1,0]<stderr>:    raise TypeError(msg)
[1,0]<stderr>:TypeError: "inner_optimizer" must be an instance of `tf.keras.optimizers.Optimizer` or `tf.keras.optimizers.experimental.Optimizer`, but got: <sparse_operation_kit.optimizer.OptimizerWrapperV2 object at 0x7f1b15b44910>.

To Reproduce
Steps to reproduce the behavior:

How to build including docker pull & docker run commands
How to run including the JSON config file used

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu xx.yy]
Graphic card: [e.g. a single NVIDIA V100 or NVIDIA DGX A100]
CUDA version: [e.g. CUDA 11.x]
Docker image nvcr.io/nvidia/merlin/merlin-tensorflow:nightly
tf: 2.12.0+nv23.6

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

kanghui0204 · 2024-10-23T10:05:08Z

The optimizer in SOK is not a TensorFlow optimizer, so you cannot wrap it with tf.keras.mixed_precision.LossScaleOptimizer. Instead, you can get the scale value from dense part's optimizer , then adjust the gradients accordingly the scale and input them into the SOK optimizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] sok amp mode error #462

[BUG] sok amp mode error #462

Orca-bit commented Oct 21, 2024 •

edited

Loading

kanghui0204 commented Oct 23, 2024

[BUG] sok amp mode error #462

[BUG] sok amp mode error #462

Comments

Orca-bit commented Oct 21, 2024 • edited Loading

kanghui0204 commented Oct 23, 2024

Orca-bit commented Oct 21, 2024 •

edited

Loading