Best format for Pandas serialization on disk

For my workload, I need to serialize to a Pandas disk dataframe (Text + Datas) with a size of 5Go per Dataframe. Applied various solutions:

HDF5   : Issues with string
Feather: not stable
CSV: Ok, but large file size.
pickle : Ok, cross-platform, can we do better ?
gzip : Same than CSV (slow for read access).
SFrame:  Good, but not maintained anymore.

      

Just wondering if there is any alternative solution to sort for storing the Dataframe row on disk?

+3
pandas


source to share


No one has answered this question yet

Check out similar questions:

1553
Renaming columns in pandas
1462
How to iterate over rows in a DataFrame in Pandas?
1419
Select rows from DataFrame based on values ​​in column in pandas
1033
Remove column from panda DataFrame
889
Selecting multiple columns in pandas dataframe
879
Get list from pandas DataFrame column headers
873
Big data workflows using pandas
815
Adding a new column to an existing DataFrame in Python pandas
2
Efficient storage of large column of row in pandas dataframe
0
Convert CSV file to HDF5 using pandas



All Articles
Loading...
X
Show
Funny
Dev
Pics