You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am deploying a MobilePose CNN with a MobileNetV2 backbone, into the DPU of a ZCU104 evaluation board, using Vitis-AI 3.5.
The model was quantization aware trained (QAT). The difference in the inference results between the quantized model (running on a workstation's GPU) and the compiled model (running on a MPSoC) is huge.
To debug this issue, I started splitting the model in two by moving layers, one by one, from the DPU to the CPU until the results were fine and what I found is that PyTorch's PixelShuffle operation (tile operation in the Xmodel) is the root cause.
If I run the inference on the DPU just till the layer before the first PixelShuffle and infer the remaining layers on the CPU the results are good, on the other side by adding the first PixelShuffle layer to the DPU inference and still running the remaining layers on the CPU, then the results are very poor to the point of being unusable.
It seems that the PixelShuffle operation is not being correctly compiled or inferred on the DPU (at least for this CNN model).
How is this tile operation implemented in the DPU? I saved the tensors before and after the tile operation and I was unable to come up with a transformation on the input tensor that would match the output tensor. The PixelShuffle operation is very easy decompose into simpler operations using reshape and reordering the axis of the tensor, so I would expect similar behavior for the tile operation.
The MobilePose model I am using is implemented in PyTorch (with a MobileNetV2 backbone) and is available in github at this link.
The text was updated successfully, but these errors were encountered:
Continuation of #1069
I am deploying a MobilePose CNN with a MobileNetV2 backbone, into the DPU of a ZCU104 evaluation board, using Vitis-AI 3.5.
The model was quantization aware trained (QAT). The difference in the inference results between the quantized model (running on a workstation's GPU) and the compiled model (running on a MPSoC) is huge.
To debug this issue, I started splitting the model in two by moving layers, one by one, from the DPU to the CPU until the results were fine and what I found is that PyTorch's
PixelShuffle
operation (tile
operation in the Xmodel) is the root cause.If I run the inference on the DPU just till the layer before the first
PixelShuffle
and infer the remaining layers on the CPU the results are good, on the other side by adding the firstPixelShuffle
layer to the DPU inference and still running the remaining layers on the CPU, then the results are very poor to the point of being unusable.It seems that the
PixelShuffle
operation is not being correctly compiled or inferred on the DPU (at least for this CNN model).How is this
tile
operation implemented in the DPU? I saved the tensors before and after thetile
operation and I was unable to come up with a transformation on the input tensor that would match the output tensor. ThePixelShuffle
operation is very easy decompose into simpler operations using reshape and reordering the axis of the tensor, so I would expect similar behavior for thetile
operation.The MobilePose model I am using is implemented in PyTorch (with a MobileNetV2 backbone) and is available in github at this link.
The text was updated successfully, but these errors were encountered: