You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, my name is John.
Thank for your opening source. In your code, the quantization of weight and activation are both UINT8[0, 255] , so unsigned 88 multipilers are used. If I would like to quantize the weight and activation to [-127, 127], using the signed 88 multipliers, how to adjust the code could achieve the aim?Thanks again.
The text was updated successfully, but these errors were encountered:
Hi John,
The approach is precisely the same, as I mentioned in issue #5. The quantization works as follows: the minimal value is referenced as 0, and the maximal is referenced as 255. And you can simply shift this interval because the data are stored in the bin file sequentially.
But then you will have to shift the result from -32768 to 32768 to interval 0 to 65536.
The only issue of this type of quantization is that 0.0 (float) is not typically expressed as 0.
FILE*f=fopen("output.bin", "wb");
for(unsigned inta=-128; a<128; a++)
for(unsigned intb=-128; b<128; b++) {
int16_tval=approximate_mult(a, b); // replace by your own function calluint16_tval_u=val+32768;
fwrite(&val_u, sizeof(uint16_t), 1, f);
}
fclose(f);
Hi, my name is John.
Thank for your opening source. In your code, the quantization of weight and activation are both UINT8[0, 255] , so unsigned 88 multipilers are used. If I would like to quantize the weight and activation to [-127, 127], using the signed 88 multipliers, how to adjust the code could achieve the aim?Thanks again.
The text was updated successfully, but these errors were encountered: