-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose alpha and theta type for parametrized activations #1069
Conversation
Note: this PR is needed to fix |
def _infer_par_act_precision(self, node, types_to_infer): | ||
inferred_types = [] | ||
|
||
# for now, only set if for threshold relu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not ELU while we're at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left then to be the default values, though you can configure them otherwise manually. Is there a better choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that, my comment was mostly that the comment in the code suggests this is only applicable to thresholded relu while it actually works for three layers (leaky, threholded and elu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can modify the comment saying that the others are left to the default precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the idea of configurable threshold is good in priciple. However I think this way we risk of having an inconsistency between parametrized activations. Thresholded relu gets to have the same type as input, which is the current behavior anyway, but leaky and elu get the default (which may be different from what is the input to the current layer), and prelu is not even considered so it gets the default that way. Why not handle all of them to have a consistent behavior (whatever that is, may be input precision for continuity, may be default precision)? Also, why use input precision when we can be smart and figure out the required precision given that you have access to the activ_param
to figure out exactly how many bits you need to represent that number/array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have a fundamentally different function. Threshold compares the input values to a threshold, so it makes sense for it to be the same type as the input. One can tune it if preferred, but there is an implicit connection with the input type. If the input type logically would have units, the threshold would have the same units. The others are scaling factors, so they have no connection to the input type at all. It just doesn't make sense to make them the same as the input type. There is a fundamental difference. I did consider adding prelu to the match on line 87, but then _infer_par_act_precision
would just ignore it and have it set the default anyway, so I didn't add it to the match. But I am not fundamentally opposed to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logically, for threshold comparisons the same type makes sense, so it seems like a good setting, I believe better than the default. For scaling factors, we have no guidance, so the default precision makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the suggestion about looking at the activ_param
is interesting. I have to think about it. One can set the range, but not necessarily the number of bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a scale (and threshold) that is a scalar the guidance is simple, it's a number for which we can figure out the best precision to store (without using too many decimal bits to match the original stored in float). Does get trickier for the case of PReLU where we would need a smarter way to decide what precision suits best for the most of the array. I undestand your logic for the current behavior. I'm fine with merging this as-is and do a "smarter" way as a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think all activations can be updated. The output type can be unsigned for relu (but otherwise matching the input type), restricted in range for sigmoid and tanh, etc. It may be good to have another pull request that does precision propagation for all activations.
example-models
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this got in by accident from the QONNX PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not supposed to include that--I thought I was being careful. I had updated example-models in my work area. I am not sure if they are the final ones, either. I can try to recreate this without the example-models change if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to remain. We'll update that pointer anyway.
Description
The type of the parameter for parametrized activations were not configurable before, being set the same as either the input of result. Generally there is no reason for that requirement, so this makes the type explicitly configurable,
Type of change
Tests
The standard activations tests (slightly modified) provide the test.
Checklist
pre-commit
on the files I edited or added.