-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] activations scaling to resolve accuracy issues for infer precision of f16 #27265
base: master
Are you sure you want to change the base?
Conversation
* @brief This property scales down activations to prevent overflows when inference precision is f16. | ||
* @ingroup ov_runtime_cpp_prop_api | ||
*/ | ||
static constexpr Property<float, PropertyMutability::RW> activations_scale_factor{"ACTIVATIONS_SCALE_FACTOR"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add python bindings for this property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how users are supposed to understand which value to set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly experimentally, for now. In the future, we plan to have RT Info attribute of ov::Model which can be set from optimum pipelines or NNCF (if they add calibration flow at some point), and this attribute will be converted to plugin property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need later to merge this feature, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property is enough to solve issues in notebooks or solve issue in customers' pipelines. The features that I mentioned are needed to have better user experience, but those are not mandatory to deliver improvements to the end users.
if (m_scale_factor < 1.f) | ||
return false; | ||
|
||
std::cout << "scale_factor: " << m_scale_factor << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
} // namespace pass | ||
} // namespace ov | ||
|
||
class ov::pass::ActivationsScaling : public ov::pass::ModelPass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some description of this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added description of each newly added pass.
std::shared_ptr<ov::Node> inverse_scale_const_f16 = std::make_shared<ov::op::v0::Constant>(ov::element::f16, scale_const_shape, inverse_scale_value); | ||
std::shared_ptr<ov::Node> inverse_scale_const_f32 = std::make_shared<ov::op::v0::Constant>(ov::element::f32, scale_const_shape, inverse_scale_value); | ||
|
||
for (auto& node : f->get_ordered_ops()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be implemented as a set of matcher passes? I think it would be much more flexible and readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it as a set of match passes. Thank you.
src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/activations_scaling.cpp
Outdated
Show resolved
Hide resolved
// \ / ==> \ scale_down | ||
// \ / \ / | ||
// add add | ||
auto add = std::dynamic_pointer_cast<ov::op::v1::Add>(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what this pass should do in the first place is following:
in_f16 -> conv/matmul/any_other_needed_op -> out_f16
in_f16 -> Multiply(down)> conv/matmul/any_other_needed_op -> Multiply(up) -> out_f16
Which works well with matcher pass concept. Then we need to optimize case
Multiply(up) -> Multiply(down) to eliminate those. I think it can be achieved using different approaches:
- Run some common pass which merged chain of Multiply with scalar into single node (likely we have such opt already). + run NopElimination
- Respect Multiply(up) in pattern and do different handling in the callback. For instance, you match optional(Multiply_up) -> Conv pattern. If Multiply is captured, then you just move this Multiply_up node after Convolution. If not, then you insert a pair Multiply down + up around conv.
- Implement custom pass which eliminates Multiply down -> Multiply up sequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Suggestion is to reuse LPT for scale nodes optimization. I.e. at first we insert Multiplies around required ops, then run LPT subset which propagates Multiply (in case of int8 it's dequantization, but it's still Multiply op). Hopefully, we can reuse LPT passes w/o modifications.
Also, MoveEltwiseUpThroughDataMov
pass will help to move those scales up in the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated ActivationsScaling
as you reviewed.
First, it inserts Multiply
nodes around Conv/MatMul, and eliminates these newly added Multiply
nodes by a series of optimization passes. One of them is LinOpSequenceFusion
from LPT.
dep.replace_source_output(scale_up->output(0)); | ||
} | ||
|
||
auto sdpa = std::dynamic_pointer_cast<ov::op::v13::ScaledDotProductAttention>(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[random spot] Please add tests for the transformation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added unit tests.
0d7c7cd
to
bc284f5
Compare
8f22485
to
ebca03d
Compare
ebca03d
to
cc4b37f
Compare
46b17ca
to
6491951
Compare
@e-ddykim, please consider this PR: huggingface/optimum-intel#994 |
cd42c04
to
5820e2b
Compare
de2a490
to
6efd1f0
Compare
Details:
Tickets: