ARM vs x86 for floating point
I am sorry if I am asking something very obvious.
Suppose you are developing a piece of software that is heavy in floating point calculations, and you purchase your own hardware. Let's say you are excluding FPGAs and GPUs for flexibility and maintainability reasons.
Let's assume you have a decent level of parallelism in software.
For a long time, this meant you were stuck with x86.
I'm looking for an objective benchmark to tell if modern ARM processors are in the same chalet. Maybe I was looking wrong, but I find it very difficult to find a reliable benchmark (something like LAPACK or maybe some kind of physics simulation). I understand that performance is clearly task-specific and that compiler optimizations are likely to be better than x86 currently, but at this point I really want to compare orders.
Also, I find it odd that you really can't buy something like the Raspberry PI, but with 8-64 modern cores, comparable to the latest smartphones (such as the latest Snapdragons) all connected to the same bus. Correct me if I'm wrong, but such solutions may one day overtake GPUs in the FLOPS / $ category and not be more flexible.
source to share
Below are Linpack Benchmark results for PCs across Linux, Raspberry Pi and Android devices (I have a lot more through Windows). They are based on my 1996 C / C ++ conversion for PC, which was approved by original author Jack Dongarra and is available via.
http://www.netlib.no/netlib/benchmark/linpack-pc.c
This is for a matrix of about 100 with double precision. Below are the results with one precision. Dongarras historical results for this and supercomputer varieties:
http://netlib.org/benchmark/performance.pdf
This is just one test and others give a different story. You can get a lot more from my site, including source codes and MT variants, (free, no ads):
http://www.roylongbottom.org.uk/
Linux 32/64 Bit Results
Double Precision 100x100 compiled at 32 and 64 bits
Opt No opt
CPU MHz MFLOPS MFLOPS
Atom N455 32b Ub 1666 196 94
Atom N455 64b Ub 1666 226 89
Core 2 Mob 32b Ub 1830 983 307
Athlon 64 32b Ub 2211 936 231
Athlon 64 64b Ub 2211 1118 221
Core 2 Duo 32b Ub 2400 1288 404
Core 2 Duo 64b Ub 2400 1577 378
Phenom II 32b Ub 3000 1464 411
Phenom II 64b Ub 3000 1887 411
Phenom II 64b Fe 3000 1872 407
Core i7 930 64b Ub **** 2265 511
Core i7 4820K 32b Ub $$$1 2534 988
Core i7 4820K 64b Ub $$$1 3672 900
Core i7 4820K AVX Ub $$$12 5413 935
Ub = Ubuntu Linux, Fe = Fedora Linux
**** Rated as 2800 MHz but running at up to
3066 MHz using Turbo Boost
$$$1 Rated as 3700 MHz but running at up to
3900 MHz, using Turbo Boost
$$$12 As $$$1, but compiled with GCC 4.8.2 that
produces AVX SIMD insructions.
####################################################### ### ####
Android and Raspberry Pi Versions
Double Precision and Single Precision (SP) 100x100
v7/v5 v5
CPU MHz Android MFLOPS MFLOPS
ARM 926EJ 800 2.2 5.7 5.6
ARM v7-A8 800 2.3.5 80.2
ARM v7-A9 800 2.3.4 101.4 10.6
ARM v7-A9 1300a 4.1.2 151.1 17.1
ARM v7-A9 1500 4.0.3 171.4
ARM v7-A9 1500a 4.0.3 155.5 16.9
ARM v7-A9 1400 4.0.4 184.4 19.9
ARM v7-A9 1600 4.0.3 196.5
ARM v7-A15 2000b 4.2.2 459.2 28.8
v7 SP Java
CPU MHz Android MFLOPS MFLOPS
ARM 926EJ 800 2.2 9.6 2.3
ARM v7-A9 800 2.3.4 129.1 33.3
ARM v7-A9 1300a 4.1.2 201.3 56.4
ARM v7-A9 1500a 4.0.3 204.6 56.9
ARM v7-A9 1400 4.0.4 235.5 57.0
ARM v7-A15 2000b 4.2.2 803.0 143.1
Atom Ax86 1666 2.2.1 15.7
Core 2 Ax86 2400 2.2.1 53.3
Raspberry Pi DP SP
CPU MHz Linux MFLOPS MFLOPS
ARM 1176 700 3.6.11 42 58
ARM 1176 1000 3.6.11 68 88
NEON SP
CPU MHz Android MFLOPS
ARM v7-A9 800 2.3.4 255.8
ARM v7-A9 1300a 4.1.2 376.0
ARM v7-A9 1500a 4.0.3 382.5
ARM v7-A9 1400 4.0.4 454.2
ARM v7-A15 2000b 4.2.2 1334.9
source to share
Regarding your second question, if you're looking for a cheap but powerful multi-core ARM platform, check out the Odroid XU3 . Otherwise, if you are just interested in performance (no ARM architecture), you can also check out Parallela (chip - Epiphany).
source to share