Cheapest way to create pandas.DataFrame or pandas.SparseDataFrame

Suppose we have a huge and sparse matrix, what's the cheapest way to fill it in pandas.DataFrame

? More specifically, the huge matrix comes from a large dataset that includes a lot of dummy variable, and the dense version matrix takes up 150GB + memory, which doesn't seem to be durable.

I am trying to break into python.pandas memory management like the green hand of pandas. The current dilemma is described as follows:

  • Using a dense source matrix and calling pd.DataFrame

    will not copy memory. The dense matrix will consume the most space.
  • If used scipy.csr_matrix

    , pd.DataFrame

    does not accept it as a constructor argument. Taking a step back, if we resort to pd.SparseDataFrame

    , how can I avoid copying memory?
  • Here 's one great approach for converting scipy.csr_matrix

    to pd.SparseDataFrame

    . But for-loop is so inefficient and causes memory copying.

Also, I am trying to initialize sparseDataFrame

to a block of memory and assign a row-by-row value that ends with:

a = np.random.rand(4,5)
b = pd.DataFrame(a)
c = sparse.csr_matrix(a)
d = pd.SparseDataFrame(index=b.index, columns=b.columns)
elem = pd.SparseSeries(c[2].toarray().ravel())
d.loc[[2]] = [ elem ]  # Got a NotImplementedError.
elem = pd.Series(c[2].toarray().ravel())
b.loc[[2]] = [ elem ]  # Yes.

      

I think the scripting language is decent, indisputable. But I might need a pointer at this time, maybe.

Any help is appreciated in advance!

+3
python scipy pandas sparse-matrix bigdata


source to share


No one has answered this question yet

See similar questions:

202
Pandas read_csv low_memory and dtype parameters
25
Populate Pandas SparseDataFrame from SciPy Sparse Matrix

or similar:

3790
How can I safely create a subdirectory?
2097
Is there a way to run Python on Android?
1955
How do I get a substring of a string in Python?
1267
The nicest way of padding zeros to a string
1182
How can I check if a string is empty?
1170
Create a list comprehension dictionary
1093
What is the canonical way of type checking in Python?
1085
Correct way to declare custom exceptions in modern Python?
1065
A way to create multi-line comments in Python?
1028
Pythonic way to create long multi-line string



All Articles
Loading...
X
Show
Funny
Dev
Pics