MSXML XSL Transformation multi-threaded performance competition

I have a multithreaded C ++ server program that uses MSXML6 and parses XML messages continuously and then applies a prepared XSLT transformation to generate text. I am running this on a 4 cpu server. Each thread is completely independent and uses its own transform object. No COM objects are shared across streams.

This works well, but the problem is scalability. When working:

  • with one thread, I get about 26 streaks + conversions per second per thread.
  • with 2 threads, I get about 20 / s / thread,
  • with 3 threads, 18 / s / thread.
  • with 4 threads, 15 / s / thread.

Having nothing to do between threads, I expected almost linear scalability, so it should be 4x faster with 4 threads than with 1. Instead, it is only 2.3x faster.

This looks like a classic competition problem. I wrote test programs to eliminate the possibility of conflict in my code. I use the DOMDocument60 class instead of FreeThreadedDOMDocument to avoid unnecessary blocking as documents are never shared between threads. I was looking for any evidence of a fake cache line exchange and there is none, at least in my code.

Another key, the context switching speed is> 15 fps for each stream. My guess is that the COM memory manager or the MSXML memory manager is the culprit. It may have a global lock that needs to be acquired and released for every memory allocation / deallocation. I just can't believe that in this day and age, a memory manager isn't written in a way that scales well in multi-threaded multi-processor scenarios.

Does anyone know what is causing this statement or how to fix it?

0


source to share


3 answers


Thanks for answers. I ended up implementing two suggestions.



I made a COM + ServicedComponent in C #, hosted it as a separate server process under COM +, and used XSLCompiledTransform to trigger the transformation. The C ++ server connects to this external process using COM and sends it XML and returns the converted string. This doubled the productivity.

+1


source


It's quite common for heap based memory managers (your main malloc / free) to use a single mutex, for good enough reasons: the heap memory area is a single consistent data structure.

There are alternative memory management strategies (such as hierarchical allocators) that do not have this limitation. You should look into configuring the distributor used by MSXML.



Alternatively, you should consider moving from a multithreaded architecture to a multiprocessor architecture with separate processes for each working MSXML. Since your MSXML worker accepts string data as input and output, you have no serialization problem.

In short: use a multi-processor architecture, it is better suited to your problem and will scale better.

+2


source


MSXML uses BSTRs, which use global locking in heap management. This caused us a lot of problems for a multi-user application a few years ago.

We removed the use of XML in our application, you might not be able to do that, so you might be better off using an alternative XML parser.

+1


source







All Articles