Protobuffset lazy streaming field deserialization

Question

Protobuffset lazy streaming field deserialization

A common goal . To skip a very long field during deserialization, and when the field is accessed to read elements from it directly from the stream without loading the entire field.

Examples of classes Serializable / deserialized object FatPropertyClass

.

[ProtoContract]
public class FatPropertyClass
{
    [ProtoMember(1)]
    private int smallProperty;

    [ProtoMember(2)]
    private FatArray2<int> fatProperty;

    [ProtoMember(3)]
    private int[] array;

    public FatPropertyClass()
    {

    }

    public FatPropertyClass(int sp, int[] fp)
    {
        smallProperty = sp;
        fatProperty = new FatArray<int>(fp);
    }

    public int SmallProperty
    {
        get { return smallProperty; }
        set { smallProperty = value; }
    }

    public FatArray<int> FatProperty
    {
        get { return fatProperty; }
        set { fatProperty = value; }
    }

    public int[] Array
    {
        get { return array; }
        set { array = value; }
    }
}


[ProtoContract]
public class FatArray2<T>
{
    [ProtoMember(1, DataFormat = DataFormat.FixedSize)]
    private T[] array;
    private Stream sourceStream;
    private long position;

    public FatArray2()
    {
    }

    public FatArray2(T[] array)
    {
        this.array = new T[array.Length];
        Array.Copy(array, this.array, array.Length);
    }


    [ProtoBeforeDeserialization]
    private void BeforeDeserialize(SerializationContext context)
    {
        position = ((Stream)context.Context).Position;
    }

    public T this[int index]
    {
        get
        {
            // logic to get the relevant index from the stream.
            return default(T);
        }
        set
        {
            // only relevant when full array is available for example.
        }
    }
}

I can deserialize like this: FatPropertyClass d = model.Deserialize(fileStream, null, typeof(FatPropertyClass), new SerializationContext() {Context = fileStream}) as FatPropertyClass;

where model

might be for example:

    RuntimeTypeModel model = RuntimeTypeModel.Create();
    MetaType mt = model.Add(typeof(FatPropertyClass), false);
    mt.AddField(1, "smallProperty");
    mt.AddField(2, "fatProperty");
    mt.AddField(3, "array");
    MetaType mtFat = model.Add(typeof(FatArray<int>), false);

This will skip deserialization array

in FatArray<T>

. However, I then need to read the random elements from this array later. One thing I've tried is remembering the position of the stream before deserializing in the method BeforeDeserialize(SerializationContext context)

FatArray2<T>

. As in the above code: position = ((Stream)context.Context).Position;

. However, this always seems to be the end of the stream.

How can I remember the position of the stream where it starts FatProperty2

, and how can I read it using a random index?

Note . The parameter T

in FatArray2<T>

can be of a different type, labeled [ProtoContract]

, not just primitives. There can also be multiple type properties FatProperty2<T>

at different depths in the object graph.

Method 2 . Serialize the field FatProperty2<T>

after serializing the containing object. So, serialize FatPropertyClass

with a length prefix, then serialize with a length prefix all the bold arrays it contains. Mark all these properties of the array with the attribute, and when deserializing, we can remember the stream position for each of them.

Then the question is, how do we read primitives from it? This works fine for classes using T item = Serializer.DeserializeItems<T>(sourceStream, PrefixStyle.Base128, Serializer.ListItemTag).Skip(index).Take(1).ToArray();

to get the item in the index index

. But how does this work for primitives? The array of primitives doesn't seem to be deserialized with DeserializeItems

.

Is DeserializeItems

with LINQ the same as with OK? Does it do what I assume it does (internally stream to the correct element - in the worst case, reads the prefix of each length and skips it)?

Best regards, Julian

+3

c # protobuf-net

Iulian Sep 20 14 at 18:03

source to share

1 answer

Marc gravell · Accepted Answer · 2014-09-20T18:47:17+0000

This question is very dependent on the actual model - this is not a scenario that the library specifically tries to make usable. I suspect that it would be best to write the reader by hand using ProtoReader

. Note that there are some tricks when it comes to reading selected items if the outer object is List<SomeType>

or similar, but the inner objects are usually either just read or skipped.

Starting at the document root again with with ProtoReader

, you can search for the nth element quite efficiently. I can do a concrete example later if you like (I didn't jump in if you're not sure if this would actually be helpful). For reference, the reason the stream position is not useful here is because the library aggressively rereads and buffers data unless you specifically instructed it to limit its length. This is because data such as "varint" is difficult to read efficiently without a lot of buffering, as it will end up with many individual calls ReadByte()

and not just working with a local buffer.

This is a completely untested version of reading the nth array element of a sub-property directly from the reader; note that it would be inefficient to call this many times one after the other, but it should be obvious how to change it to read a series of consecutive values, etc .:

static int? ReadNthArrayItem(Stream source, int index, int maxLen)
{
    using (var reader = new ProtoReader(source, null, null, maxLen))
    {
        int field, count = 0;
        while ((field = reader.ReadFieldHeader()) > 0)
        {
            switch (field)
            {
                case 2: // fat property; a sub object
                    var tok = ProtoReader.StartSubItem(reader);
                    while ((field = reader.ReadFieldHeader()) > 0)
                    {
                        switch (field)
                        {
                            case 1: // the array field
                                if(count++ == index)
                                    return reader.ReadInt32();
                                reader.SkipField();
                                break;
                            default:
                                reader.SkipField();
                                break;
                        }
                    }
                    ProtoReader.EndSubItem(tok, reader);
                    break;
                default:
                    reader.SkipField();
                    break;
            }
        }
    }
    return null;
}

Finally, note that if it's a large array, you can use "packed" arrays (see the protobuf documentation, but this basically keeps them unheaded for the element). This would be much more efficient, but note that this requires slightly different reading code. You are including packed arrays by appending IsPacked = true

to [ProtoMember(...)]

for this array.

Protobuffset lazy streaming field deserialization

More articles: