C ++ random access iterators for containers with items loaded on demand

I am currently working on a small project that requires loading messages from a file. Messages are saved sequentially in the file and the files can get huge, so loading the entire contents of the file into memory is not warranted.

Therefore, we decided to implement a class FileReader

that can quickly navigate to specific elements in a file and load them on demand. Usually used something along the following lines

SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m);    // Try deserializing as SpecificMessage 

      

The FileReader itself works great. So we thought about adding STL-compatible iterator support: A random access iterator that provides read-only references for specific messages. Used as follows

for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
  // ...
}

      

Note: the above assumes that the file contains only SpecificMessages. We used boost::iterator_facade

to simplify implementation.

Now my question boils down to this: how to properly implement an iterator? Since FileReader

it doesn't actually contain the sequence of messages inside, but downloads them on demand.

What we have tried so far:

Storing a message as a member of an iterator

This approach stores the message in an iterator instance. Which is great for simple use cases, but not good for more complex purposes. For example. std::reverse_iterator

has a dereference operation that looks like this:

 reference operator*() const
 {  // return designated value
   _RanIt _Tmp = current;
   return (*--_Tmp);
 }

      

This breaks our approach when a reference is returned from a temporary iterator.

Make reference type equal to value type

@DDrmmr in the comments suggesting to make the reference type equal to the value type so that a copy of the internally stored object is returned. However, I think this is not true for a reverse iterator that implements the -> operator as

pointer operator->() const {
  return (&**this);
}

      

which itself shares, calls the * operator, which then returns a copy of the temporary and finally returns the address of that temporary.

Saving a message from the outside

Alternatively, although I want to store the message externally:

SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...

      

which also seems to be wrong for

auto iter2 = iter + 2

      

which will have both iter2

, and iter

will point to the same content.

+3


source to share


4 answers


As I hinted in another answer, you could use memory mapped files. In a comment, you asked:

As far as memory mapped files are concerned, that doesn't seem to be what I want to have, how could you provide them with an iterator over SpecificMessages for them?

Ok, if your SpecificMessage is a POD type, you can just iterate over the raw memory directly. If not, you can have a deserialization helper (as you already have) and use Boost to do the deserialization on demand. transform_iterator

Note that we can make a managed file with memory, which effectively means that you can just use it like a normal heap and you can store all the standard containers. This includes node-based containers ( map<>

for example), dynamically sized containers (for example vector<>

) in addition to fixed sized containers ( array<>

) - and any combination of these.

Here's a demo that uses a simple SpecificMessage

one that contains a string and (de) derives directly into shared memory:

using blob_t       = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;

      

The part you are interested in will be the consumption part:



bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;

// for fun, let reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
    std::cout << "blob: '" << first->contents << "'\n";

// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";

      

So this prints every 13th message in reverse order, followed by a random blob.

Full demo

The online example uses source strings as "messages".

Live On Coliru

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>

#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>

static char const* DBASE_FNAME = "database.map";

namespace bip = boost::interprocess;

namespace shm {
    using segment_manager = bip::managed_mapped_file::segment_manager;
    template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
    template <typename T> using vector    = bip::vector<T, allocator<T> >;
}

using blob_t       = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;

struct SpecificMessage {
    // for demonstration purposes, just a string; could be anything serialized
    std::string contents;

    // trivial save/load serialization code:
    template <typename Blob>
    friend bool save(Blob& blob, SpecificMessage const& msg) {
        blob.assign(msg.contents.begin(), msg.contents.end());
        return true;
    }

    template <typename Blob>
    friend bool load(Blob const& blob, SpecificMessage& msg) {
        msg.contents.assign(blob.begin(), blob.end());
        return true;
    }
};

template <typename Message> struct LazyLoader {
    using type = Message;

    Message operator()(blob_t const& blob) const {
        Message result;
        if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
        return result;
    }
};

///////
// for demo, create some database contents
void create_database_file() {
    bip::file_mapping::remove(DBASE_FNAME);
    bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru

    shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

    std::ifstream ifs("main.cpp");
    std::string line;
    while (std::getline(ifs, line)) {
        table->emplace_back();
        save(table->back(), SpecificMessage { line });
    }

    std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}

///////

void display_random_messages() {
    bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
    shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

    using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;

    // for fun, let reverse the blobs
    for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
        std::cout << "blob: '" << first->contents << "'\n";

    // any kind of random access is okay, though:
    auto random = rand() % table->size();
    SpecificMessage msg;
    load(table->at(random), msg);
    std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}

int main()
{
#ifndef CONSUMER_ONLY
    create_database_file();
#endif

    srand(time(NULL));
    display_random_messages();
}

      

+2


source


You're in trouble because your iterator doesn't meet the requirements of a direct iterator. In particular:

  • *i

    must be an lvalue reference to value_type

    or const value_type

    ([forward.iterators] /1.3)
  • *i

    cannot be a reference to an object stored in the iterator itself, due to the requirement that two iterators are equal if and only if they are bound to the same object ([forward.iterators] / 6)

Yes, these requirements are a huge pain in the butt, and yes, that means things like std::vector<bool>::iterator

are not random access iterators, although some standard library implementations incorrectly state they are.




EDIT: The following proposed solution is horribly broken as dereferencing the temporary iterator returns a reference to an object, which may not work until the reference is used. For example, after the auto& foo = *(i + 1);

referenced object foo

may have been released. The implementation reverse_iterator

referenced in the OP will cause the same problem.

I suggest you split the design into two classes: FileCache

one that stores file resources and a cache of loaded messages, and FileCache::iterator

one that contains the message number and lazily retrieves it from FileCache

when dereferenced. The implementation can be as simple as storing the container weak_ptr<Message>

in FileCache

and shared_ptr<Message>

out of an iterator: Simple hit demo

+2


source


I must admit that I cannot fully understand what you have while holding the current MESSAGE as an Iter member. I would associate each iterator with the FileReader it needs to read, and implement it as a lightweight encapsulation of the read index for FileReader: :( read | moveTo). The most important overwrite method is boost::iterator_facade<...>::advance(...)

, which changes the current index and tries to pull a new MESSAGE from the FileReader. If that fails, it ignores the iterator as invalid and dereferencing will fail.

template<class MESSAGE,int STEP>           
class message_iterator; 

template<class MESSAGE> 
class FileReader { 
public: 
    typedef message_iterator<MESSAGE, 1> const_iterator; 
    typedef message_iterator<MESSAGE,-1> const_reverse_iterator; 

    FileReader(); 
    bool open(const std::string  & rName); 
    bool moveTo(int n); 
    bool read(MESSAGE &m); 

    // get the total count of messages in the file 
    // helps us to find end() and rbegin() 
    int getMessageCount(); 

    const_iterator begin() {                                           
        return const_iterator(this,0); 
    } 
    const_iterator end() { 
        return const_iterator(this,getMessageCount()); 
    } 
    const_reverse_iterator rbegin() { 
        return const_reverse_iterator(this,getMessageCount()-1); 
    } 
    const_reverse_iterator rend() { 
        return const_reverse_iterator(this,-1); 
    } 
}; 

// declaration of message_iterator moving over MESSAGE 
// STEP is used to specify STEP size and direction (e.g -1 == reverse) 
template<class MESSAGE,int STEP=1>                                                 
class message_iterator 
    : public boost::iterator_facade< 
    message_iterator<MESSAGE> 
    , const MESSAGE  
    , boost::random_access_traversal_tag 
    > 
{ 
    typedef  boost::iterator_facade< 
        message_iterator<MESSAGE> 
        , const MESSAGE 
        , boost::random_access_traversal_tag 
        > super; 

public:                                                               
    // constructor associates an iterator with its FileReader and a given position 
    explicit message_iterator(FileReader<MESSAGE> * p=NULL,int n=0): _filereader(p),_idx(n),_valid(false)    { 
        advance(0); 
    } 
    bool equal(const message_iterator & i) const { 
        return i._filereader == _filereader && i._idx == _idx; 
    } 
    void increment() { 
        advance(+1); 
    } 
    void decrement() { 
        advance(-1); 
    } 

    // overwrite with central functionality. Move to a given relative 
    // postion and check wether the position can be read. If move/read 
    // fails we flag the iterator as incalid. 

    void advance(int n) { 
        _idx += n*STEP; 
        if(_filereader!=NULL) { 
            if( _filereader->moveTo( _idx ) && _filereader->read(_m)) { 
                _valid = true; 
                return; 
            } 
        } 
        _valid = false; 
    } 
    // Return a ref to the currently cached MESSAGE. Throw 
    // an acception if positioning at this location in advance(...) failes. 
    typename super::reference dereference() const { 
        if(!_valid) { 
            throw std::runtime_error("access to invalid pos"); 
        } 
        return _m; 
    } 

private: 
    FileReader<MESSAGE> * _filereader; 
    int                   _idx; 
    bool                  _valid; 
    MESSAGE               _m; 
}; 

      

0


source


Boost PropertyMap

You can avoid writing the bulk of your code with the Boost PropertyMap:

Live On Coliru

#include <boost/property_map/property_map.hpp>
#include <boost/property_map/function_property_map.hpp>

using namespace boost;

struct SpecificMessage {
    // add some data
    int index; // just for demo
};

template <typename Message>
struct MyLazyReader {
    typedef Message type;
    std::string fname;

    MyLazyReader(std::string fname) : fname(fname) {}

    Message operator()(size_t index) const { 
        Message m;
        // FileReader fr;
        // fr.open(fname);
        // fr.moveTo(index);     // Move to Message 
        // fr.read(&m);          // Try deserializing as SpecificMessage  
        m.index = index; // just for demo
        return m;
    }
};

#include <iostream>

int main() {

    auto lazy_access = make_function_property_map<size_t>(MyLazyReader<SpecificMessage>("file.bin"));

    for (int i=0; i<10; ++i)
        std::cout << lazy_access[rand()%256].index << "\n";
}

      

Sample output

103
198
105
115
81
255
74
236
41
205

      

Using memory files

You can save a map of objects index → ​​BLOBs in general vector<array<byte, N>>

, flat_map<size_t, std::vector<uint8_t> >

or similar.

So now you only need to deserialize from myshared_map[index].data()

( begin()

and end()

in the case of BLOB resizing)

0


source







All Articles