Floating point performance Core 2 x87

I am working with some crunchy code that is float intensive and just plain slow in nature. It examines the code so it can be adapted to one architecture and runs on a Core 2 Quad core. I understand that for the Pentium 4 / Netburst architecture Intel has decisively split the x87 FPU and adopted a more SSE2-oriented design. This resulted in terrible performance of the x87 code. However, the Core 2 architecture is more closely related to the P6 architecture than Netburst.

My compiler doesn't target SSE at all AFAIK, and I understand that very few compilers do it well. Also, I use the D language, which bleeds quite a bit, so there aren't many compilers for it. However, I don't want to switch languages, both because of the inertia of my existing code and because, despite my immaturity, I really like D.

Does the Core 2 architecture also have a downsized x90 FPU? If so, what is the best way to do this?

0


source to share


1 answer


Get yourself into a profiler - there are too many factors like cache misses and memory access latency to be able to attribute poor performance to specific CPU features. If you want to find out what is fast, implement the same algorithm using several different methods and profile it.



I also recommend looking at the liboil library, which allows you to optimize SSE usage without writing the assembly; I don't know how it integrates with D. However.

+2


source







All Articles