CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) #108
Job | Run time |
---|---|
1m 21s | |
3m 39s | |
1m 11s | |
1m 12s | |
4m 28s | |
2m 44s | |
1m 35s | |
2m 25s | |
2m 33s | |
1m 43s | |
1m 50s | |
1m 56s | |
6m 55s | |
4m 30s | |
17m 45s | |
3m 14s | |
6m 56s | |
3m 30s | |
6m 10s | |
4m 6s | |
4m 10s | |
17m 46s | |
7m 51s | |
17m 44s | |
2m 19s | |
3m 20s | |
46s | |
2h 13m 39s |