Why Exynos Octa 5420 is unusually slow

My code:

#include<ctime>
#include<cstdio>

int main(){
    struct timespec t,mt1,mt2;
    unsigned long long int mt;

    clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt1);

    //Measured block begin
    for(int i=0;i<1000000;i++)
        clock_gettime(CLOCK_THREAD_CPUTIME_ID,&t);
    //Measured block end

    clock_gettime(CLOCK_THREAD_CPUTIME_ID,&mt2);
    mt = (mt2.tv_sec - mt1.tv_sec)*1000000000LL + mt2.tv_nsec - mt1.tv_nsec;

    printf("%lld\n",mt);

    return 0;
}

      

I am using arm-v7a standalone toolchain built with Android NDK r9d which is under /opt/android-toolchain

.

Configuration 1:

These are the default flags created in the toolchain file at https://github.com/taka-no-me/android-cmake .

Compiler config:

/opt/android-toolchain/bin/arm-linux-androideabi-g++ \
    -DANDROID -Wno-psabi --sysroot=/opt/android-toolchain/sysroot \
    -fpic -funwind-tables -finline-limit=64 -fsigned-char \
    -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp \
    -mfpu=vfpv3-d16 -fdata-sections -ffunction-sections \
    -Wa,--noexecstack  -mthumb -fomit-frame-pointer \
    -fno-strict-aliasing -O3 -DNDEBUG \
    -isystem /opt/android-toolchain/sysroot/usr/include \
    -isystem /opt/android-toolchain/include/c++/4.8 \
    -isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a \
    -o my-object-file.o -c my-source-file.cpp

      

Linker config:

/opt/android-toolchain/bin/arm-linux-androideabi-gcc \
    -Wno-psabi --sysroot=/opt/android-toolchain/sysroot \
    -fpic -funwind-tables -finline-limit=64 -fsigned-char \
    -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp \
    -mfpu=vfpv3-d16 -fdata-sections -ffunction-sections \
    -Wa,--noexecstack  -mthumb -fomit-frame-pointer \
    -fno-strict-aliasing -O3 -DNDEBUG -Wl,--fix-cortex-a8 \
    -Wl,--no-undefined -Wl,-allow-shlib-undefined -Wl,--gc-sections \
    -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now \
    -Wl,-z,nocopyreloc my-object-file.o -o my-executable \
    -L/libs/armeabi-v7a -rdynamic \
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" \
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" \
    -lm

      

  • Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @ 1.9 Ghz runs with Samsung 4.4.2 ROM stock, code takes 2.0 seconds
  • Samsung Galaxy Note II with Exynos 4412 @ 1.6 GHz runs CyanogenMod 11 based on Android 4.4.4, code takes 0.75 seconds
  • Samsung Galaxy S3 with Exynos 4412 @ 1.4 Ghz runs CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds

Configuration 2:

Almost all flags from the previously removed.

Compiler config:

/opt/android-toolchain/bin/arm-linux-androideabi-g++ \
    -DANDROID --sysroot=/opt/android-toolchain/sysroot \
    -O3 -DNDEBUG \
    -isystem /opt/android-toolchain/sysroot/usr/include \
    -isystem /opt/android-toolchain/include/c++/4.8 \
    -isystem /opt/android-toolchain/include/c++/4.8/arm-linux-androideabi/armv7-a \
    -o my-object-file.o -c my-source-file.cpp

      

Linker config:

/opt/android-toolchain/bin/arm-linux-androideabi-gcc \
    --sysroot=/opt/android-toolchain/sysroot -O3 -DNDEBUG \
    -Wl,-z,nocopyreloc my-object-file.o -o my-executable \
    -L/libs/armeabi-v7a -rdynamic \
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libstdc++.a" \
    "/opt/android-toolchain/arm-linux-androideabi/lib/armv7-a/thumb/libsupc++.a" \
    -lm

      

  • Samsung Galaxy Note 10.1 2014 Edition with Exynos Octa 5420 @ 1.9 Ghz works with Samsung. 4.4.2 ROM, code takes 2.2 seconds
  • Samsung Galaxy Note II with Exynos 4412 @ 1.6 GHz runs CyanogenMod 11 based on Android 4.4.4, code takes 0.94 seconds
  • Samsung Galaxy S3 with Exynos 4412 @ 1.4 Ghz runs CyanogenMod 11 based on Android 4.4.4, code takes 1.1 seconds

Notes for both configurations:

  • I set the lowest CPU clock speed to the highest possible, i.e. 1.9GHz, CPU tweak app.

  • I made sure that background processes do not obfuscate the processor.

  • I also specifically tried the flag -mcpu=cortex-a15

    without changing the runtime significantly.

  • Also tried it -mfpu=neon -marm -mtune=cortex-a15

    , didn't change the runtime significantly.

  • clock_gettime()

    is not the culprit, the code is noticeably slower.

  • Other pieces of code I've tried, including parts of OpenCV imgproc

    and STL calls like std::map::find()

    and std::sort()

    , are visible and clock_gettime()

    are noticeably slower. Exynos Octa 5420 compared to the other two listed above.

My hypotheses:

  • My thread somehow gets stuck on one of the Cortex-A7 cores instead of jumping onto one of the Cortex-A15s. If this might be the case, what can I do to make sure that this is the case, or how can I force my threads onto the Cortex-A15 cores?

  • I was unable to set a lower limit on the CPU clock frequency and the CPU is dying out. If it can be, how can I be sure that it is?

  • Samsung's core is somehow worse compared to CM. Could this cause such a big difference in runtime?

At this point, I am very much obsessed. What are your tips and ideas so I can get money from this device?

Edit: I flashed the custom modified kernel ( http://forum.xda-developers.com/showthread.php?t=2725193 ) and set the governor to performance

and the execution time went down to about 1.3 seconds , so I think that my 3rd hypothesis is a little stronger now. It's still slower than older processors, though ...

+3


source to share





All Articles