Can a protobuff read partially?

I want to save the elevation data to a file and only load some parts of it because it is too large to be stored in memory as a whole. In fact, I don't even know if the protobuff is good for this purpose.

For example, I would have a structure like (may not be valid grammatically, I only know the simple basics):

message Quad {
    required int32 x = 1;
    required int32 z = 2;

    repeated int32 y = 3;
}

      

The x and z values are available in my program, and using them, I would like to find the correct Quad object with the same x and z (in the file) to get the y values . However, I can't just parse the file using ParseFromIstream () because (I think so) it loads the entire file into memory, but in my case the file is too big.

So protobuf is able to download one object, send it to me for verification and if the object is wrong, can you give me a second?

Actually ... I could just ask: Does ParseFromIstream () load the whole file?

+3


source to share


2 answers


It depends on which implementation you are using. Some of these "read like sequence" APIs. For example, assuming you saved it as a "repeating square", then with protobuf-net that would be:

int x = ..., y = ...;
var found = Serializer.DeserializeItems<Quad>(source)
            .Where(q => q.x ==x && q.y == y);

      



Point: This gives buffering (not being loaded at the same time) and short-circuiting sequence.

I don't know the C ++ api specifically, but I hope it has something similar, but in the worst case you can parse the varint headers and prepare a stream with length constraints.

+1


source


While some libraries allow you to partially read files, Google's recommended technique is to simply contain a file of multiple messages:

https://developers.google.com/protocol-buffers/docs/techniques



Protocol buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing with messages larger than a megabyte each, it may be time to consider an alternative strategy.

However, protocol buffers are great for handling individual messages in big data set. Typically, large datasets are just a collection of small pieces, each of which can be a structured piece of data.

Thus, you can simply write a long sequence of messages Quad

to a file, limited by message lengths. If you need to search randomly for certain Quad

s, you might want to add some sort of index.

+1


source







All Articles