Why Exynos Octa 5420 is unusually slow
My code:
#include<ctime>
#include<cstdio>
int main(){
struct timespec t,mt1,mt2;
unsigned long long int mt;
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt1);
//Measured block begin
for(int i=0;i<1000000;i++)
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&t);
//Measured block end
clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt2);
mt = (mt2.tv_sec - mt1.tv_sec)*1000000000LL + mt2.tv_nsec - mt1.tv_nsec;
printf("%lld\n",mt);
return 0;
}
I am using arm-v7a standalone toolchain built with Android NDK r9d which is under /opt/android-toolchain
.
Configuration 1:
These are the default flags created in the toolchain file at https://github.com/taka-no-me/android-cmake .
Compiler config:
/opt/android-toolchain/bin/arm-linux-androideabi-g++ \
-DANDROID -Wno-psabi --sysroot=/opt/android-toolchain/sysroot \
-fpic -funwind-tables -finline-limit=64 -fsigned-char \
-no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp \
-mfpu=vfpv3-d16 -fdata-sections -ffunction-sections \
-Wa,--noexecstack -mthumb -fomit-frame-pointer \
-fno-strict-aliasing -O3 -DNDEBUG \
-isystem /opt/android-toolchain/sysroot/usr/include \
-isystem /opt/android-toolchain/include/c++/4.8 \
-isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a \
-o my-object-file.o -c my-source-file.cpp
Linker config:
/opt/android-toolchain/bin/arm-linux-androideabi-gcc \
-Wno-psabi --sysroot=/opt/android-toolchain/sysroot \
-fpic -funwind-tables -finline-limit=64 -fsigned-char \
-no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp \
-mfpu=vfpv3-d16 -fdata-sections -ffunction-sections \
-Wa,--noexecstack -mthumb -fomit-frame-pointer \
-fno-strict-aliasing -O3 -DNDEBUG -Wl,--fix-cortex-a8 \
-Wl,--no-undefined -Wl,-allow-shlib-undefined -Wl,--gc-sections \
-Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now \
-Wl,-z,nocopyreloc my-object-file.o -o my-executable \
-L/libs/armeabi-v7a -rdynamic \
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" \
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" \
-lm
- Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @ 1.9 Ghz runs with Samsung 4.4.2 ROM stock, code takes 2.0 seconds
- Samsung Galaxy Note II with Exynos 4412 @ 1.6 GHz runs CyanogenMod 11 based on Android 4.4.4, code takes 0.75 seconds
- Samsung Galaxy S3 with Exynos 4412 @ 1.4 Ghz runs CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds
Configuration 2:
Almost all flags from the previously removed.
Compiler config:
/opt/android-toolchain/bin/arm-linux-androideabi-g++ \
-DANDROID --sysroot=/opt/android-toolchain/sysroot \
-O3 -DNDEBUG \
-isystem /opt/android-toolchain/sysroot/usr/include \
-isystem /opt/android-toolchain/include/c++/4.8 \
-isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a \
-o my-object-file.o -c my-source-file.cpp
Linker config:
/opt/android-toolchain/bin/arm-linux-androideabi-gcc \
--sysroot=/opt/android-toolchain/sysroot -O3 -DNDEBUG \
-Wl,-z,nocopyreloc my-object-file.o -o my-executable \
-L/libs/armeabi-v7a -rdynamic \
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" \
"/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" \
-lm
- Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @ 1.9 Ghz works with Samsung. 4.4.2 ROM, code takes 2.2 seconds
- Samsung Galaxy Note II with Exynos 4412 @ 1.6 GHz runs CyanogenMod 11 based on Android 4.4.4, code takes 0.94 seconds
- Samsung Galaxy S3 with Exynos 4412 @ 1.4 Ghz runs CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds
Notes for both configurations:
-
I set the lowest CPU clock speed to the highest possible, i.e. 1.9GHz, CPU tweak app.
-
I made sure that background processes do not obfuscate the processor.
-
I also specifically tried the flag
-mcpu=cortex-a15
without changing the runtime significantly. -
Also tried it
-mfpu=neon -marm -mtune=cortex-a15
, didn't change the runtime significantly. -
clock_gettime()
is not the culprit, the code is noticeably slower. -
Other pieces of code I've tried, including parts of OpenCV
imgproc
and STL calls likestd::map::find()
andstd::sort()
, are visible andclock_gettime()
are noticeably slower. Exynos Octa 5420 compared to the other two listed above.
My hypotheses:
-
My thread somehow gets stuck on one of the Cortex-A7 cores instead of jumping onto one of the Cortex-A15s. If this might be the case, what can I do to make sure that this is the case, or how can I force my threads onto the Cortex-A15 cores?
-
I was unable to set a lower limit on the CPU clock frequency and the CPU is dying out. If it can be, how can I be sure that it is?
-
Samsung's core is somehow worse compared to CM. Could this cause such a big difference in runtime?
At this point, I am very much obsessed. What are your tips and ideas so I can get money from this device?
Edit: I flashed the custom modified kernel ( http://forum.xda-developers.com/showthread.php?t=2725193 ) and set the governor to performance
and the execution time went down to about 1.3 seconds , so I think that my 3rd hypothesis is a little stronger now. It's still slower than older processors, though ...
source to share
No one has answered this question yet
Check out similar questions: