Best format for Pandas serialization on disk

For my workload, I need to serialize to a Pandas disk dataframe (Text + Datas) with a size of 5Go per Dataframe. Applied various solutions:

HDF5   : Issues with string
Feather: not stable
CSV: Ok, but large file size.
pickle : Ok, cross-platform, can we do better ?
gzip : Same than CSV (slow for read access).
SFrame:  Good, but not maintained anymore.

      

Just wondering if there is any alternative solution to sort for storing the Dataframe row on disk?

+3


source to share





All Articles