-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose alpha and theta type for parametrized activations #1069
Changes from 4 commits
d804b2d
b19368c
9c8c1dd
22f3800
e68b770
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,9 @@ def _infer_precision(self, node, types_to_infer): | |
if node_class in ['SimpleRNN', 'LSTM', 'GRU']: | ||
return self._infer_rnn_precision(node, types_to_infer) | ||
|
||
if node_class in ['ParametrizedActivation']: | ||
return self._infer_par_act_precision(node, types_to_infer) | ||
|
||
# What about quantized activation layer? Setting it to 'auto' manually will break it here. We should prevent | ||
# this in config_from_* functions | ||
|
||
|
@@ -557,3 +560,14 @@ def _infer_rnn_precision(self, node, types_to_infer): | |
inferred_types.append(f'{weightvar}_t') | ||
|
||
return inferred_types | ||
|
||
def _infer_par_act_precision(self, node, types_to_infer): | ||
inferred_types = [] | ||
|
||
# for now, only set if for threshold relu | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not ELU while we're at it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I left then to be the default values, though you can configure them otherwise manually. Is there a better choice? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with that, my comment was mostly that the comment in the code suggests this is only applicable to thresholded relu while it actually works for three layers (leaky, threholded and elu) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can modify the comment saying that the others are left to the default precision. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the idea of configurable threshold is good in priciple. However I think this way we risk of having an inconsistency between parametrized activations. Thresholded relu gets to have the same type as input, which is the current behavior anyway, but leaky and elu get the default (which may be different from what is the input to the current layer), and prelu is not even considered so it gets the default that way. Why not handle all of them to have a consistent behavior (whatever that is, may be input precision for continuity, may be default precision)? Also, why use input precision when we can be smart and figure out the required precision given that you have access to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They have a fundamentally different function. Threshold compares the input values to a threshold, so it makes sense for it to be the same type as the input. One can tune it if preferred, but there is an implicit connection with the input type. If the input type logically would have units, the threshold would have the same units. The others are scaling factors, so they have no connection to the input type at all. It just doesn't make sense to make them the same as the input type. There is a fundamental difference. I did consider adding prelu to the match on line 87, but then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think logically, for threshold comparisons the same type makes sense, so it seems like a good setting, I believe better than the default. For scaling factors, we have no guidance, so the default precision makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the suggestion about looking at the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a scale (and threshold) that is a scalar the guidance is simple, it's a number for which we can figure out the best precision to store (without using too many decimal bits to match the original stored in float). Does get trickier for the case of PReLU where we would need a smarter way to decide what precision suits best for the most of the array. I undestand your logic for the current behavior. I'm fine with merging this as-is and do a "smarter" way as a follow-up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think all activations can be updated. The output type can be unsigned for relu (but otherwise matching the input type), restricted in range for sigmoid and tanh, etc. It may be good to have another pull request that does precision propagation for all activations. |
||
if 'param_t' in inferred_types and self.get_attr('activation').lower() == 'thresholdedrelu': | ||
in_type = node.get_input_variable().type.precision | ||
node.attributes['param_t'].type = in_type | ||
inferred_types.append('param_t') | ||
|
||
return inferred_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this got in by accident from the QONNX PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not supposed to include that--I thought I was being careful. I had updated example-models in my work area. I am not sure if they are the final ones, either. I can try to recreate this without the example-models change if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to remain. We'll update that pointer anyway.