Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose alpha and theta type for parametrized activations #1069

Merged
merged 5 commits into from
Oct 1, 2024

Conversation

jmitrevs
Copy link
Contributor

Description

The type of the parameter for parametrized activations were not configurable before, being set the same as either the input of result. Generally there is no reason for that requirement, so this makes the type explicitly configurable,

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change which adds functionality)

Tests

The standard activations tests (slightly modified) provide the test.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmitrevs jmitrevs added this to the v1.0.0 milestone Sep 29, 2024
@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Sep 29, 2024
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 29, 2024
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 29, 2024
@jmitrevs
Copy link
Contributor Author

jmitrevs commented Sep 29, 2024

Note: this PR is needed to fix test_qonnx.py::test_sep_conv[Vitis] in #979.

@jmitrevs jmitrevs mentioned this pull request Sep 29, 2024
8 tasks
def _infer_par_act_precision(self, node, types_to_infer):
inferred_types = []

# for now, only set if for threshold relu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ELU while we're at it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left then to be the default values, though you can configure them otherwise manually. Is there a better choice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that, my comment was mostly that the comment in the code suggests this is only applicable to thresholded relu while it actually works for three layers (leaky, threholded and elu)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify the comment saying that the others are left to the default precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the idea of configurable threshold is good in priciple. However I think this way we risk of having an inconsistency between parametrized activations. Thresholded relu gets to have the same type as input, which is the current behavior anyway, but leaky and elu get the default (which may be different from what is the input to the current layer), and prelu is not even considered so it gets the default that way. Why not handle all of them to have a consistent behavior (whatever that is, may be input precision for continuity, may be default precision)? Also, why use input precision when we can be smart and figure out the required precision given that you have access to the activ_param to figure out exactly how many bits you need to represent that number/array.

Copy link
Contributor Author

@jmitrevs jmitrevs Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have a fundamentally different function. Threshold compares the input values to a threshold, so it makes sense for it to be the same type as the input. One can tune it if preferred, but there is an implicit connection with the input type. If the input type logically would have units, the threshold would have the same units. The others are scaling factors, so they have no connection to the input type at all. It just doesn't make sense to make them the same as the input type. There is a fundamental difference. I did consider adding prelu to the match on line 87, but then _infer_par_act_precision would just ignore it and have it set the default anyway, so I didn't add it to the match. But I am not fundamentally opposed to that.

Copy link
Contributor Author

@jmitrevs jmitrevs Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logically, for threshold comparisons the same type makes sense, so it seems like a good setting, I believe better than the default. For scaling factors, we have no guidance, so the default precision makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the suggestion about looking at the activ_param is interesting. I have to think about it. One can set the range, but not necessarily the number of bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a scale (and threshold) that is a scalar the guidance is simple, it's a number for which we can figure out the best precision to store (without using too many decimal bits to match the original stored in float). Does get trickier for the case of PReLU where we would need a smarter way to decide what precision suits best for the most of the array. I undestand your logic for the current behavior. I'm fine with merging this as-is and do a "smarter" way as a follow-up

Copy link
Contributor Author

@jmitrevs jmitrevs Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think all activations can be updated. The output type can be unsigned for relu (but otherwise matching the input type), restricted in range for sigmoid and tanh, etc. It may be good to have another pull request that does precision propagation for all activations.

example-models Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this got in by accident from the QONNX PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not supposed to include that--I thought I was being careful. I had updated example-models in my work area. I am not sure if they are the final ones, either. I can try to recreate this without the example-models change if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to remain. We'll update that pointer anyway.

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 1, 2024
@vloncar vloncar merged commit 1654c1c into fastmachinelearning:main Oct 1, 2024
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants