Performance issues when scaling the MSVC 2005 << operator across streams

When we look at some of our logs, I noticed in the profiler that we spent a lot of time in formatting tags operator<<

, etc. It looks like there is a shared lock that is used whenever ostream::operator<<

called when formatting an int (and presumably doubles). After further exploring, I narrowed it down to this example:

Loop1, which it uses ostringstream

for formatting:

DWORD WINAPI doWork1(void* param)
{
    int nTimes = *static_cast<int*>(param);
    for (int i = 0; i < nTimes; ++i)
    {
        ostringstream out;
        out << "[0";
        for (int j = 1; j < 100; ++j)
            out << ", " << j; 
        out << "]\n";
    }
    return 0;
}

      

Loop2 which uses the same ostringstream

to do everything except the int format which is done with itoa

:

DWORD WINAPI doWork2(void* param)
{
    int nTimes = *static_cast<int*>(param);
    for (int i = 0; i < nTimes; ++i)
    {
        ostringstream out;
        char buffer[13];
        out << "[0";
        for (int j = 1; j < 100; ++j)
        {
            _itoa_s(j, buffer, 10);
            out << ", " << buffer;
        }
        out << "]\n";
    }
    return 0;
}

      

For my test, I ran each loop multiple times with threads 1, 2, 3 and 4 (I have a 4 core machine). The number of tests is constant. Here's the result:

doWork1: all ostringstream
n       Total
1         557
2        8092
3       15916
4       15501

doWork2: use itoa
n       Total
1         200
2         112
3         100
4         105

      

As you can see, the performance with ostringstream is terrible. It grows 30x when more threads are added, whereas itoa gets about 2x faster.

One idea is to use M $_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)

as recommended in this article . It doesn't seem to help me. Here's another user who seems to have a similar problem.

We need to be able to format int in multiple threads running in parallel for our application. With this problem in mind, we need to either figure out how to make this work or find another formatting solution. I can code a simple class with the <<operator overloaded for integral and float types and then has a templated version that just calls the <<operator on the underlying stream. A bit ugly, but I think I can get it to work, although maybe not for the user operator<<(ostream&,T)

, because it isn't ostream

.

I should also clarify that this is generated with Microsoft Visual Studio 2005. And I believe this limitation is due to their standard library implementation.

+2


source share


3 answers


Doesn't surprise me MS put "global" locks on a fair few shares - our biggest headache was BSTR memory locking a few years ago.

The best you can do is copy the code and replace the ostream lock and conversion shared memory with your own class. I did this when I write a stream using a printf-style logging system (i.e. I had to use the printf logger and wrapped it in my stream statements). Once you have compiled this application, you should be as fast as itoa. When I am in the office, I will take the code and paste it for you.

EDIT: as promised:



CLogger& operator<<(long l)
{
    if (m_LoggingLevel < m_levelFilter)
        return *this;

    // 33 is the max length of data returned from _ltot
    resize(33);

    _ltot(l, buffer+m_length, m_base);
    m_length += (long)_tcslen(buffer+m_length);

    return *this;
};

static CLogger& hex(CLogger& c)
{
    c.m_base = 16;
    return c;
};

void resize(long extra)
{
    if (extra + m_length > m_size)
    {
        // resize buffer to fit.
        TCHAR* old_buffer = buffer;
        m_size += extra;
        buffer = (TCHAR*)malloc(m_size*sizeof(TCHAR));
        _tcsncpy(buffer, old_buffer, m_length+1);
        free(old_buffer);
    }
}

static CLogger& endl(CLogger& c)
{
    if (c.m_length == 0 && c.m_LoggingLevel < c.m_levelFilter)
        return c;

    c.Write();
    return c;
};

      

Sorry I can't let you have all of this, but these 3 methods show the basics - I allocate a buffer, resize it if necessary (m_size is the buffer size, m_length is the current text length) and keep it for the duration of the log object. The contents of the buffer are written to a file (or OutputDebugString or a list) in the endl method. I also have a registration level to restrict the output at runtime. So you just replace your calls to ostringstream with this, and the Write () method pumps the buffer into the file and flushes the length. Hope this helps.

+1


source


If there are bugs in the implementation of the Visual Studio 2005 Standard Library, why not try other implementations? How:



or even Dinkumware , which the Visual Studio 2005 standard library is based on, perhaps the issue has been fixed since 2005.

Edit . Another user you mentioned was using Visual Studio 2008 SP1, which means that perhaps Dinkumware didn't fix this issue.

+1


source


The problem might be memory allocation. malloc which uses "new" has an internal lock. You can see it if you enter it. Try using a local thread distributor and see if the bad performance goes away.

+1


source







All Articles