C ++ What is the best way to read a dictionary from a text file and store it in a sorted container

What is the best way to read a dictionary from a text file and store it in a sorted container. I have a performance issue while inserting words into a container. So here is my code.

std::set<std::string> m_words;
std::stringstream ss(textStr);
std::string buf;
while (ss >> buf)
{
    m_words.insert(m_words.end(), buf);
}

      

The dictionary file is an English dictionary with 130,000 lines. Is there anything else to optimize for performance.

+3


source to share


2 answers


I assume to use vector

because contiguous storage is good for performance, delays the expected number of records in advance, uses move semantics, and only sorts the vector at the end of the file:

std::vector<std::string> m_words;
m_words.reserve(10000);

std::string buf;
while (ss >> buf)
{
  m_words.push_back(std::move(buf));
  buf.clear();
}

std::sort(m_words.begin(), m_words.end());

      

(You can also check the file size in the preprocessing phase to get a better hint instead of a fixed one)



Due to the move, buf must be reset to empty.

Unless otherwise noted, all standard library functions that accept rvalue Control parameters (such as std :: vector :: push_back) are guaranteed to leave the move-from argument in a valid but unspecified state. That is, only functions without preconditions, such as the assignment operator, can be used on an object after it has been moved to the standard library container ( cppreference )

+3


source


Instead of adding sorted values ​​to std :: set, you can push strings to std :: vector or std :: deque (Note: don't use std :: deque from msvc (2013) - is is not better than std :: list).

Std :: set should have good performance to find the correct insertion point if it observes the "at the end" hint. However, std :: set allocates nodes for each element separately.

Given a sequence of data, you can perform a binary search (std :: lower_bound) on that data.



See also:

Note: Having C ++ 11, you might consider vector :: reserve (huge_amount) and vector :: shrink_to fit ()

+3


source







All Articles