I run the “benchmark” test for some large matrices from the UFSMC using Intel MIC (KNC) and the reported performance was quite small, appros. between 1.5 and 2.2 GFLOP/s for the CSR format (not higher for other formats). With Intel MKL and the same matrices, I got performance between 4 and 22 GFLOP/s, which is much higher.
Any thoughts why Paralution performs so poorly on Xeon Phi?
our current MIC implementation is not very optimized and therefore not optimal. You should make sure that MKL is enabled during the compilation process. Unfortunately, due to very few MIC users, we have currently no plans to further improve the MIC kernels.