Optimizing application memory

We have a project written in ANSI C. Usually memory consumption was not a big problem, but now we have a request to install our program in 256KB of RAM. I don't have this exact platform on hand, so I will compile my project under 32-bit x86 Linux (because it provides enough different tools to estimate memory consumption), optimize what I can, remove some features, and eventually I must have a conclusion: what features do we need to sacrifice to be able to work on very small systems (if we can at all). First of all I did some research on what exactly is memory size in Linux and it looks like I need to optimize the size of the RSS, not the VSZ. But in Linux, even the smallest program that prints "Hello world!" once per second consumes 285-320 KB in RSS:

#include  <stdio.h>
#include  <unistd.h>
#include  <signal.h>

unsigned char  cuStopCycle = 0;

void SigIntHandler(int signo)
{
   printf("SIGINT received, terminating the program\n");
   cuStopCycle = 1;
}

int main()
{  
   signal( SIGINT, SigIntHandler);

   while(!cuStopCycle)
   {
      printf("Hello, World!\n");
      sleep(1);
   }
   printf("Exiting...\n");
}

user@Ubuntu12-vm:~/tmp/prog_size$ size ./prog_size     
text       data     bss     dec     hex filename    
1456        272      12    1740     6cc ./prog_size

root@Ubuntu12-vm:/home/app# ps -C prog_size -o pid,rss,vsz,args   
PID     RSS    VSZ   COMMAND 
22348   316   2120   ./prog_size

      

Obviously this program works great on small PLCs with 64KB of RAM. Linux just loads a lot of libraries. I am creating a map file for this program and all this bss data comes from the CRT library. I need to point out that if I add code to this project - 10,000 times "a = a + b" or manipulate 2000 long int arrays, I see a difference in code size, bss size, but ultimately the RSS process size is the same most, it does not affect)

So, I see this as a baseline, a point that I want to reach (and which I can never reach, because I need more functionality than just printing a message once a second).

So here is my project where I removed all the extra functions, removed all the helper functions, removed everything but the main functions. There are several ways to optimize more, but not much that can be removed has already been removed:

root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# ls -l App 
-rwxr-xr-x 1 root root 42520 Jul 13 18:33 App

root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# size ./App 
   text    data     bss     dec     hex filename
  37027     404     736   38167    9517 ./App

      

So I have ~ 36KB of code and ~ 1KB of data. I do not call malloc inside my project, I am using shared memory allocation with a wrapper library so that I can control how much memory is allocated:

The total memory size allocated is 2052 bytes

      

There are malloc calls under the hood, obviously if I replace the malloc calls with my function that summarize all allocation requests, I see ~ 2.3KB being allocated :

 root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt/Cmds# LD_PRELOAD=./override_malloc.so ./App
Malloc allocates 2464 bytes total

      

Now I run my project and see that it is consuming 600KB of RAM .

root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt# ps -C App -o pid,rss,vsz,args
  PID   RSS    VSZ COMMAND
22093   604   2340 ./App

      

I don't understand why he eats so much memory. The code size is small. The amount of allocated memory is limited. The data size is small. Why is it taking up so much memory? I tried to parse the display of the process:

root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt# pmap -x 22093
22093:   ./App
Address   Kbytes     RSS   Dirty Mode   Mapping
08048000       0      28       0 r-x--  App
08052000       0       4       4 r----  App
08053000       0       4       4 rw---  App
09e6a000       0       4       4 rw---    [ anon ]
b7553000       0       4       4 rw---    [ anon ]
b7554000       0      48       0 r-x--  libpthread-2.15.so
b756b000       0       4       4 r----  libpthread-2.15.so
b756c000       0       4       4 rw---  libpthread-2.15.so
b756d000       0       8       8 rw---    [ anon ]
b7570000       0     300       0 r-x--  libc-2.15.so
b7714000       0       8       8 r----  libc-2.15.so
b7716000       0       4       4 rw---  libc-2.15.so
b7717000       0      12      12 rw---    [ anon ]
b771a000       0      16       0 r-x--  librt-2.15.so
b7721000       0       4       4 r----  librt-2.15.so
b7722000       0       4       4 rw---  librt-2.15.so
b7731000       0       4       4 rw-s-    [ shmid=0x70000c ]
b7732000       0       4       4 rw-s-    [ shmid=0x6f800b ]
b7733000       0       4       4 rw-s-    [ shmid=0x6f000a ]
b7734000       0       4       4 rw-s-    [ shmid=0x6e8009 ]
b7735000       0      12      12 rw---    [ anon ]
b7738000       0       4       0 r-x--    [ anon ]
b7739000       0     104       0 r-x--  ld-2.15.so
b7759000       0       4       4 r----  ld-2.15.so
b775a000       0       4       4 rw---  ld-2.15.so
bfb41000       0      12      12 rw---    [ stack ]
-------- ------- ------- ------- -------
total kB    2336       -       -       -

      

And it looks like the size of the program (in RSS format) is only 28KB , the rest is consumed by the shared libraries. BTW I am not using posix streams, I am not explicitly linking to it, but somehow the linker links this library anyway, I have no idea why (this is not very important). If we look at the mapping in more detail:

root@Ubuntu12-vm:/home/app/workspace/proj_sizeopt# cat /proc/22093/smaps 
08048000-08052000 r-xp 00000000 08:01 344838     /home/app/workspace/proj_sizeopt/Cmds/App
Size:                 40 kB
Rss:                  28 kB
Pss:                  28 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:        28 kB
Private_Dirty:         0 kB
Referenced:           28 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

...

09e6a000-09e8b000 rw-p 00000000 00:00 0          [heap]
Size:                132 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

...

b7570000-b7714000 r-xp 00000000 08:01 34450      /lib/i386-linux-gnu/libc-2.15.so
Size:               1680 kB
Rss:                 300 kB
Pss:                   7 kB
Shared_Clean:        300 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          300 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

...

b7739000-b7759000 r-xp 00000000 08:01 33401      /lib/i386-linux-gnu/ld-2.15.so
Size:                128 kB
Rss:                 104 kB
Pss:                   3 kB
Shared_Clean:        104 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          104 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

...

bfb41000-bfb62000 rw-p 00000000 00:00 0          [stack]
Size:                136 kB
Rss:                  12 kB
Pss:                  12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

      

  • So I see that the RSS size for my project is 40KB , but only 28KB is used . Does this mean that this project will fit 256KB of RAM ?
  • The size of the heap 132 KB , but is used only 4K . Why is this? I'm sure it will be different on a small embedded platform.
  • the stack is 136KB , but only 12KB is used .
  • GLIBC / LD is clearly consuming some memory, but what kind of memory will be on the embedded platform?

I don't look at PSS because it doesn't make any sense in my case, I only look at RSS.

What conclusions can I draw from this picture? How to accurately estimate the memory consumption of an application? Look at the size of the RSS feed? Or subtract from this RSS size of all mapped system libraries? What is heap / stack size?

I would be very grateful for any tips, notes, methods for optimizing memory consumption, DOs and DON'Ts for platforms with extremely small amount of RAM (except for the obvious - keep the amount of data and code to a minimum). I would also appreciate an explanation of WHY a program with little code and data (and not allocating much memory) still consumes a lot of RAM in RSS.

Thank you in advance

+3


source to share


2 answers


... let's put our program in 256KB of RAM. I don't have this exact platform on hand, so I will compile my project under 32-bit x86 Linux ..

And now you can see that the Linux platform tools make reasonable assumptions about the possible need for a stack and heap, given that it runs you on a big machine and the links are in a reasonable set of library functions for your needs.Some you won't need, but it does give them you "free".

To fit 256KB on the target platform, you must compile your target platform and link it to the target platform libraries (and CRTs) using the target platform linker.



They will make different assumptions, use possibly smaller Labrador tracks, make small assumptions about the stack and heap area, etc. For example, create "Hello World" for a target platform and test its needs on that target platform. Or use a realistic simulator of the target framework and libraries (and don't forget, OS is partly part of what the libraries should do).

And if it is still too big then, you need to rewrite or tweak the whole CRT and all the libraries ....

0


source


the program must be compiled / linked to an embedded device.

For best results use a makefile

use library 'rt' written for embedded device

use the start.s file located through the makefile where execution starts.

use 'static' in linker options

use linker options to not include any libraries, but what is specifically requested.

don't use libraries written for your development machine. Use only libraries written for an embedded device.

DO NOT include stdio.h etc., unless specifically stated for an embedded device

DO NOT call printf () inside a signal handler.

if possible, don't call printf () at all.

instead write a little char output function and execute it through uart.



don't use signals, use interrupts instead

the resulting app will not run on your pc. but after booting it will run on a 256k device

don't call sleep (), rather write your own function that uses the device timer peripheral that sets the timer and puts the device in shutdown mode.

interrupting the time should bring the device out of power-down mode.

in the makefile specifically set the size of the stack, heap, etc.

has a link step outputting a .map file. study this map file until you understand everything in it.

use an embedded device specific compiler / linker

you will probably need to enable a function that initializes peripherals on the device, such as a clock, uart, timer, watchdog timer, and any other onboard peripherals that the code actually uses.

you will need a file that allocates the interrupt table and a small function to handle each of the interrupts, even if most of those functions do nothing but clear the corresponding interrupt pending flag and return from the interrupt

you will probably need the watchdog refresh function to periodically, by definition, depending on the indication that the main function is still cycling. IE main loop function and initialization function will update the watchdog

0


source







All Articles