Significant performance after a non-task related unrealized thread appears in C

I am working on a C program to generate Ramsey related plots and I am now improving my multithreading performance. At some point, I decided to just do a sanity check and make sure my multi-threaded implementation, when instructed to run only one thread, is at least not slower than my original implementation.

It turns out it runs 50% slower. A lot of trial and error later I managed to get in the following state:

void * blah(){
   printf("Hello, this is thread\n");
}

void clean(){
   //Set up function

   pthread_t tid;
   pthread_create(&tid, NULL, blah, NULL);

   //Do computationally expensive stuff in the current thread
}

      

As shown above, the program takes 42 seconds with the first set of data and 767 with the other. If I comment out the call to pthread_create, then the exact same program processes the first set of data in 30 seconds, and the second in 528 seconds.

I tried looking for solutions and found many great answers to do with false sharing and caching etc. When running a multi-threaded version with 2, 3, or 4 threads, I see that each subsequent thread runs a little slower, but I can't figure out why spawning even one unrelated thread is killing performance so much.

Even spawning dozens of copies of the same blah branch in this shorthand case results in the exact same performance penalty. I even tried again without the -O3 compile flag, and the same proportional slowdown was, albeit slightly less extreme. The only thing that could be related to this that I haven't tried yet is compile and run on a different system.

Any hints would be greatly appreciated.

Edit:

After accounting for suggestions from the comments, it looks like I found the problem. @ninjalj suggested that it might be due to malloc locks, so I wrote the following test code that works after 19 seconds without calling pthread_create and 32 seconds with it, at least on my machine.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>

#define CALLS 10000

void * dummy(void * args){
  printf("Hello, this is thread\n");
}

int main(int argc, char ** argv){
  pthread_t tid;
  pthread_create(&tid, NULL, dummy, NULL);

  time_t start, end;
  start = time(NULL);

  int * memory;
  for(int i = 0; i < CALLS*10; i ++){
    for(int j = 0; j < CALLS; j++){
      memory = malloc(10);
      free(memory);
    }
  }

  end = time(NULL);
  printf("That took %d s\n", end - start);
}

      

+3


source to share





All Articles