-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heir-simd-vectorizer: dot product example for ckks incorrectly transformed #1115
Comments
How concerning! I don't immediately see why the rotate-and-reduce pass isn't properly handling the mulf/addf ops. Maybe you could run with I suspect the use of addf/mulf is triggering some incorrect match that causes some pattern or pass to not be applied. However, that shouldn't (absent bugs) cause the output to produce an incorrect result, it should just produce a less efficient program. So looking for other reasons the output might be incorrect: I see the line above after rotating by 6
I believe those zero values should be nonzero. In particular, I recall the encoding used by the simd-vectorizer passes expects 1D tensors to be repeated to fill up the available ciphertext space, since the rotations analyzed are cyclic mod 8 (in your example, because it's a heir/tests/Examples/openfhe/simple_sum_test.cpp Lines 34 to 42 in 51ebc8d
Could that be causing the incorrectness? |
Oh it is exactly this reason, after copying the cyclic filling code from test/Examples, the result is correct. std::vector<double> x1 = {1, 2, 3, 4, 1, 2, 3, 4};
int32_t n =
cryptoContext->GetCryptoParameters()->GetElementParams()->GetRingDimension() / 2;
std::vector<double> outputs;
outputs.reserve(n);
for (int i = 0; i < n; ++i) {
outputs.push_back(x1[i % 8]);
}
const auto& ptxt1 = cryptoContext->MakeCKKSPackedPlaintext(outputs);
const auto& c1 = cryptoContext->Encrypt(keyPair.publicKey, ptxt1); |
The difference is that the first If I manually delete the constant after |
The current pattern is %cst = arith.constant dense<0.000000e+00> : tensor<8xf16>
%1 = arith.mulf %arg2, %arg3 : tensor<8xf16>
%2 = arith.addf %1, %cst : tensor<8xf16>
%3 = tensor_ext.rotate %2, %c6 : tensor<8xf16>, index
%4 = tensor_ext.rotate %1, %c7 : tensor<8xf16>, index
// mixed using of %1 and %2 afterwards. Which means The constant tensor could be saved so that %3/%4 can have a same root, like mentioned in #522, but it is hard to handle, as the constant tensor gets rotated later. I avoided handling saving tensors in earlier PRs. And for this specific case, I think apply-folders should eliminate this constant tensor. heir/lib/Analysis/RotationAnalysis/RotationAnalysis.h Lines 218 to 224 in 51ebc8d
Full IR below module {
func.func @dot_product(%arg0: !secret.secret<tensor<8xf16>>, %arg1: !secret.secret<tensor<8xf16>>) -> !secret.secret<f16> {
%c7 = arith.constant 7 : index
%cst = arith.constant dense<0.000000e+00> : tensor<8xf16>
%c6 = arith.constant 6 : index
%0 = secret.generic ins(%arg0, %arg1 : !secret.secret<tensor<8xf16>>, !secret.secret<tensor<8xf16>>) {
^bb0(%arg2: tensor<8xf16>, %arg3: tensor<8xf16>):
%1 = arith.mulf %arg2, %arg3 : tensor<8xf16>
%2 = arith.addf %1, %cst : tensor<8xf16>
%3 = tensor_ext.rotate %2, %c6 : tensor<8xf16>, index
%4 = tensor_ext.rotate %1, %c7 : tensor<8xf16>, index
%5 = arith.addf %3, %4 : tensor<8xf16>
%6 = arith.addf %5, %1 : tensor<8xf16>
%7 = tensor_ext.rotate %6, %c6 : tensor<8xf16>, index
%8 = arith.addf %7, %4 : tensor<8xf16>
%9 = arith.addf %8, %1 : tensor<8xf16>
%10 = tensor_ext.rotate %9, %c6 : tensor<8xf16>, index
%11 = arith.addf %10, %4 : tensor<8xf16>
%12 = arith.addf %11, %1 : tensor<8xf16>
%13 = tensor_ext.rotate %12, %c7 : tensor<8xf16>, index
%14 = arith.addf %13, %1 : tensor<8xf16>
%extracted = tensor.extract %14[%c7] : tensor<8xf16>
secret.yield %extracted : f16
} -> !secret.secret<f16>
return %0 : !secret.secret<f16>
}
} |
Or this case migrated to BGV, dot product having an initial non-zero sum, and rotate-and-reduce can not handle such thing correctly. Should we handle this in insert-rotate so that inserted-rotations can have a same root? Now the mixed using of %1 and %2 is not friendly func.func @dot_product(%arg0: tensor<8xi16>, %arg1: tensor<8xi16>) -> i16 {
%c0 = arith.constant 0 : index
%c0_si16 = arith.constant 10 : i16
%0 = affine.for %arg2 = 0 to 8 iter_args(%iter = %c0_si16) -> (i16) {
%1 = tensor.extract %arg0[%arg2] : tensor<8xi16>
%2 = tensor.extract %arg1[%arg2] : tensor<8xi16>
%3 = arith.muli %1, %2 : i16
%4 = arith.addi %iter, %3 : i16
affine.yield %4 : i16
}
return %0 : i16
} We get module {
func.func @dot_product(%arg0: !secret.secret<tensor<8xi16>>, %arg1: !secret.secret<tensor<8xi16>>) -> !secret.secret<i16> {
%c6 = arith.constant 6 : index
%cst = arith.constant dense<10> : tensor<8xi16>
%c7 = arith.constant 7 : index
%0 = secret.generic ins(%arg0, %arg1 : !secret.secret<tensor<8xi16>>, !secret.secret<tensor<8xi16>>) {
^bb0(%arg2: tensor<8xi16>, %arg3: tensor<8xi16>):
%1 = arith.muli %arg2, %arg3 : tensor<8xi16>
%2 = arith.addi %1, %cst : tensor<8xi16>
%3 = tensor_ext.rotate %2, %c6 : tensor<8xi16>, index
%4 = tensor_ext.rotate %1, %c7 : tensor<8xi16>, index
%5 = arith.addi %3, %4 : tensor<8xi16>
%6 = arith.addi %5, %1 : tensor<8xi16>
%7 = tensor_ext.rotate %6, %c6 : tensor<8xi16>, index
%8 = arith.addi %7, %4 : tensor<8xi16>
%9 = arith.addi %8, %1 : tensor<8xi16>
%10 = tensor_ext.rotate %9, %c6 : tensor<8xi16>, index
%11 = arith.addi %10, %4 : tensor<8xi16>
%12 = arith.addi %11, %1 : tensor<8xi16>
%13 = tensor_ext.rotate %12, %c7 : tensor<8xi16>, index
%14 = arith.addi %13, %1 : tensor<8xi16>
%extracted = tensor.extract %14[%c7] : tensor<8xi16>
secret.yield %extracted : i16
} -> !secret.secret<i16>
return %0 : !secret.secret<i16>
}
} |
I'm not sure I fully understand what you're suggesting, but let me try to repeat:
If we could figure out why the floating point constant is not folded away, we could solve the immediate problem, but I think in your example with a non-zero initial value of the dot product, this problem would persist in another form. It could maybe be fixed by changing insert-rotate so that it aligns things properly, or maybe it could be changed in rotate-and-reduce to recognize a reduction in a smarter way than looking at a single linear chain (#522). I support either of those improvements. I think also #521 might allow a workaround wherein the rotate-and-reduce works for the entire vector except that first element in the chain, which would be nearly optimal and give other side benefits to IRs that don't do complete reductions.. |
I tried to migrate the bgv dot product example to CKKS, and the result is incorrect. After inspection, it seems that the transformation that heir-simd-vectorizer has done is incorrect.
The input for the pipeline is the belowing; just the bgv example with all i16 substituted with f16.
After running
--mlir-to-secret-arithmetic="entry-function=dot_product"
, we getIt is apparently different from the result of dot product for bgv, where a rotate-and-reduce pattern is working.
If we execute the emitted ckks code with input
(1, 2, 3, 4, 1, 2, 3, 4)
, we get incorrect result, with traces like this:The text was updated successfully, but these errors were encountered: