The following figures depict the performance benchmarks of several BLAS1 and SpMV routines, using the PARALUTION library on multi-core CPU, NVIDIA GPU, AMD GPU and Xeon Phi (MIC) . All results are obtained in double precision.
Furthermore, a non-preconditioned CG is performed on a Laplace matrix resulting from a finite difference discretization of the unit square with 4.000.000 unknowns.
In the next plot we compare only best setup for all configurations.
For more details please check the PARALUTION User Manual [pdf], Chapter 6 / Performance Benchmarks.