ARM vs x86 for floating point

I am sorry if I am asking something very obvious.

Suppose you are developing a piece of software that is heavy in floating point calculations, and you purchase your own hardware. Let's say you are excluding FPGAs and GPUs for flexibility and maintainability reasons.

Let's assume you have a decent level of parallelism in software.

For a long time, this meant you were stuck with x86.

I'm looking for an objective benchmark to tell if modern ARM processors are in the same chalet. Maybe I was looking wrong, but I find it very difficult to find a reliable benchmark (something like LAPACK or maybe some kind of physics simulation). I understand that performance is clearly task-specific and that compiler optimizations are likely to be better than x86 currently, but at this point I really want to compare orders.

Also, I find it odd that you really can't buy something like the Raspberry PI, but with 8-64 modern cores, comparable to the latest smartphones (such as the latest Snapdragons) all connected to the same bus. Correct me if I'm wrong, but such solutions may one day overtake GPUs in the FLOPS / $ category and not be more flexible.

+3


source to share


2 answers


Below are Linpack Benchmark results for PCs across Linux, Raspberry Pi and Android devices (I have a lot more through Windows). They are based on my 1996 C / C ++ conversion for PC, which was approved by original author Jack Dongarra and is available via.

http://www.netlib.no/netlib/benchmark/linpack-pc.c

This is for a matrix of about 100 with double precision. Below are the results with one precision. Dongarras historical results for this and supercomputer varieties:

http://netlib.org/benchmark/performance.pdf

This is just one test and others give a different story. You can get a lot more from my site, including source codes and MT variants, (free, no ads):



http://www.roylongbottom.org.uk/

Linux 32/64 Bit Results

Double Precision 100x100 compiled at 32 and 64 bits 

                                   Opt    No opt
CPU                      MHz    MFLOPS    MFLOPS

Atom N455     32b  Ub   1666       196        94
Atom N455     64b  Ub   1666       226        89

Core 2 Mob    32b  Ub   1830       983       307

Athlon 64     32b  Ub   2211       936       231
Athlon 64     64b  Ub   2211      1118       221

Core 2 Duo    32b  Ub   2400      1288       404
Core 2 Duo    64b  Ub   2400      1577       378

Phenom II     32b  Ub   3000      1464       411
Phenom II     64b  Ub   3000      1887       411
Phenom II     64b  Fe   3000      1872       407

Core i7 930   64b  Ub   ****      2265       511

Core i7 4820K 32b  Ub   $$$1      2534       988
Core i7 4820K 64b  Ub   $$$1      3672       900
Core i7 4820K AVX  Ub   $$$12     5413       935

  Ub = Ubuntu Linux,   Fe = Fedora Linux        
 ****  Rated as 2800 MHz but running at up to   
       3066 MHz using Turbo Boost               
 $$$1  Rated as 3700 MHz but running at up to   
       3900 MHz, using Turbo Boost              
 $$$12 As $$$1, but compiled with GCC 4.8.2 that
       produces AVX SIMD insructions.               

      

####################################################### ### ####

      Android and Raspberry Pi Versions

Double Precision and Single Precision (SP) 100x100

                               v7/v5       v5 
CPU          MHz   Android    MFLOPS    MFLOPS

ARM 926EJ    800       2.2       5.7       5.6
ARM v7-A8    800     2.3.5      80.2          
ARM v7-A9    800     2.3.4     101.4      10.6
ARM v7-A9   1300a    4.1.2     151.1      17.1
ARM v7-A9   1500     4.0.3     171.4          
ARM v7-A9   1500a    4.0.3     155.5      16.9
ARM v7-A9   1400     4.0.4     184.4      19.9
ARM v7-A9   1600     4.0.3     196.5          
ARM v7-A15  2000b    4.2.2     459.2      28.8

                               v7 SP     Java 
CPU          MHz   Android    MFLOPS    MFLOPS

ARM 926EJ    800       2.2       9.6       2.3
ARM v7-A9    800     2.3.4     129.1      33.3
ARM v7-A9   1300a    4.1.2     201.3      56.4
ARM v7-A9   1500a    4.0.3     204.6      56.9
ARM v7-A9   1400     4.0.4     235.5      57.0
ARM v7-A15  2000b    4.2.2     803.0     143.1


Atom   Ax86 1666     2.2.1                15.7
Core 2 Ax86 2400     2.2.1                53.3

Raspberry Pi                    DP        SP  
CPU          MHz     Linux    MFLOPS    MFLOPS

ARM  1176    700     3.6.11     42        58  
ARM  1176   1000     3.6.11     68        88  

                              NEON SP         
CPU          MHz   Android    MFLOPS          

ARM v7-A9    800     2.3.4     255.8          
ARM v7-A9   1300a    4.1.2     376.0          
ARM v7-A9   1500a    4.0.3     382.5          
ARM v7-A9   1400     4.0.4     454.2          
ARM v7-A15  2000b    4.2.2    1334.9        

      

+6


source


Regarding your second question, if you're looking for a cheap but powerful multi-core ARM platform, check out the Odroid XU3 . Otherwise, if you are just interested in performance (no ARM architecture), you can also check out Parallela (chip - Epiphany).



+2


source







All Articles