-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics metadata support #1240
Comments
I'd be interested in helping with this feature. I implemented a similar feature for Neural Magic's Sparsezoo calculate_ops.py I propose support of both weight sparsity metrics and operations metrics. Counting operations depends upon whether the runtime engine and hardware supports sparsity, block sparsity, and quantization. The UI design should be capable of supporting these subtypes, if not now then in the future.
As for visualizing operations, I'm in favor of separating the UI from the node metadata tab as to make it clear that these performance (operation) metrics are computed values separate from the data which is embedded in the model file. For example, these could be a togglable UI displayed to the left of a node. |
Another UI idea might be togglable horizontal bars which appear to the left of a node and have different sizes depending on how many operations of which types are associated with it. There should also be a UI element for the total number of ops/sparsity in the model, perhaps in the bottom right |
@kylesayrs all great questions. There seem to be 3 types of data and a question how these are unified and exposed at which API layer.
Since there are likely metrics at the model, graph, node and weights level, initially exposing them as another section in properties pages might be a good way to get started. Which data types exist for metrics in the API. For example, if |
@lutzroeder Hm, we can implement two classes,
To respond to the questions you posed
Let me know what you think †Note that in order to calculate per-node metrics for ONNX, we'll need to hard code which arguments are weights and which arguments are biases for each op type |
I prefer the compact view, at least for the frontend. The backend can maintain a separate members for |
For weight tensors there should be a Tensor Properties view similar to #1122. This will be needed for visualizing tensor data #65, avoids duplicating tensor information, gives each tensor a less crowded space, and solves the issue of mixing node metrics and tensor metrics. The individual metrics would be rendered similar to attribute or metadata, which hopefully results in a single mechanism across attributes, metadata, metrics to annotate the graph. For implementation, |
@lutzroeder In order to analyze sparsity, |
Tensor decoding is generalized in
If the general metrics implementations get complex and impact load times it might be worth considering dynamically loading a module from For tensor, the challenge is multiple changes are needed to enable #1285. Some formats have separate concepts for tensor initializer and tensor and how to opt-in quantization, what level of abstraction should this view operate on. |
Copying my thoughts on default metrics here |
@lutzroeder I personally prefer to not show default (format agnostic) metrics, since these metrics are guaranteed to be unreliable without the full context of the format. I propose an implementation where each format view.Tensor = class {
this._tensor = tensor; // type: xxx.Tensor
this._metrics = null;
...
get metrics() {
if (this._metrics === null) {
const value = this.value;
this._metrics = this._tensor.computeMetrics(value)
}
return this._metrics
}
} |
No as that would require lower level APIs to take dependencies on a higher level API surface. It would also lead to an explosion of metric implementations as each metric would have to be implemented for all formats. Those should probably be computed by the runtime itself and stored as metadata in the model file or a supplemental file. What are 3 specific examples where a general metric is unreliable? Would it be possible to generalize the lower level tensor format to support these cases? The answer might be different for tensor formats (which tend to generalize well) and node formats (which often require processing to make the graph readable). |
I'll focus on just tensor sparsity, arguably the most basic of the metrics. This metric only makes sense in the context of how a theoretical inference engine would perform inference, and how an inference engine performs inference depends on what operation is being performed.
It's my opinion that computing sparsity for tensors in these scenarios is misleading. For example, let's say we implement a metric search feature. If someone wants to use this feature to find tensors which they should prune, they would query for tensors with sparsity < 80% and potentially get back a random |
I think I need more context or an example of what you're thinking of. You're suggesting grouping tensors? This would still require some format-specific implementation, although it might help.
To be fair, the end goal is to support FLOPS, which is by far the most requested metric (#204). FLOPS are 100% operation-specific, so trying to support FLOP counts for all operations across all formats is way too large of a scope in the first place. |
I think sparsity and operation count metrics are a super valuable feature and could really help a lot of people to a sense of how large their models, how well they will perform, and where the performance bottlenecks are. No implementation will be perfect, since metrics related to model performance are entirely dependent on the hardware and the level of sparsity-awareness of the inference engine. I think if we want to support FLOPS, we need to implement it on a per-format basis, which means a slow format-wise rollout |
One idea would be that format-specific implementation can disable by returning a value like
Trying to better understand this. FLOPS would be computed at the node level, not at the tensor level? Is access to the tensor data needed to compute FLOPS? |
I think this is a good idea too. Setting these
I think this is a good point. It seems like parameter sparsity only applies to tensors, but operations (and operation sparsity) only applies to nodes. Separating out the two seems to solve a lot of the ambiguity about how each should be applied
I think this could be fine if we give the user the functionality to query for operation sparsity, and maybe a disclaimer about trying to interpret parameter (tensor) sparsity. In general, when coloring the graph and querying the graph, we should point people towards using operation and operation sparsity. Infact, coloring a node by parameter sparsity doesn't make much sense, since each node can have multiple tensors (parameters) EDIT:
I need to think more about this one. For all the cases I can think of, I think can get away with only needing the tensor sparsity (or block sparsity). Note that in order to compute FLOPS, we need to know input sizes |
We can start by implementing context-unaware tensor sparsity on the frontend. We should disclaimer with a help tool that indicates that sparsity may not always apply. We are then free to slow rollout format-specific In terms of implementing node FLOPs, this is a little bit harder. FLOPs calculations clearly need to be operation-type specific, and I think most operations can be calculated using the parameter sparsity metric alone. I can research this a little more and get back to you. |
After asking around, it seems like some sparsity-aware runtimes such as the DeepSparse engine do skip padding operations, meaning that the actual positioning of the zeros affects the total number of operations. This may or may not be a factor we include. |
Scenarios:
Questions:
The text was updated successfully, but these errors were encountered: