Best format for Pandas serialization on disk
For my workload, I need to serialize to a Pandas disk dataframe (Text + Datas) with a size of 5Go per Dataframe. Applied various solutions:
HDF5 : Issues with string
Feather: not stable
CSV: Ok, but large file size.
pickle : Ok, cross-platform, can we do better ?
gzip : Same than CSV (slow for read access).
SFrame: Good, but not maintained anymore.
Just wondering if there is any alternative solution to sort for storing the Dataframe row on disk?
+3
source to share
No one has answered this question yet
Check out similar questions: