You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Optimize speed even more aggressively sacrificing memory or precision by using 8bit integer GEMM with intgemm instead of floats. Only available on CPU. Corresponds to --gemm-precision int8");
cli.add<bool>("--int8Alpha",
"Use a precomputed quantisation multipliers for the activations. Requires a special model. Corresponds to --gemm-precision int8Alpha");
cli.add<bool>("--int8shift",
"Use a faster, shifted integer 8bit GEMM implementation. Corresponds to --gemm-precision int8shift");
cli.add<bool>("--int8shiftAlpha",
"Use a faster, shifted integer 8bit GEMM implementation, with precomputed alphas. Corresponds to --gemm-precision int8shiftAlpha");
cli.add<bool>("--int8shiftAll",
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias. Beneficial on VNNI. Corresponds to --gemm-precision int8shiftAll");
cli.add<bool>("--int8shiftAlphaAll",
"Use a faster, shifted integer 8bit GEMM implementation even for matrices that don't have a bias, with precomputed alphas. Should be the fastest option. Corresponds to --gemm-precision int8shiftAlphaAll");
cli.add<std::string>("--gemm-precision",
"Use lower precision for the GEMM operations only. Supported values: float32, int16, int8, int8Alpha, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll", "float32");
cli.add<bool>("--dump-quantmult",
The text was updated successfully, but these errors were encountered:
--gem-precision
and--int*
are two ways to do the same thing. Functionality would still work and be accessible without the following.marian-dev/src/common/config_parser.cpp
Lines 933 to 947 in 844800e
The text was updated successfully, but these errors were encountered: