Cheapest way to create pandas.DataFrame or pandas.SparseDataFrame

Question

Cheapest way to create pandas.DataFrame or pandas.SparseDataFrame

Suppose we have a huge and sparse matrix, what's the cheapest way to fill it in pandas.DataFrame

? More specifically, the huge matrix comes from a large dataset that includes a lot of dummy variable, and the dense version matrix takes up 150GB + memory, which doesn't seem to be durable.

I am trying to break into python.pandas memory management like the green hand of pandas. The current dilemma is described as follows:

Using a dense source matrix and calling pd.DataFrame

will not copy memory. The dense matrix will consume the most space.
If used scipy.csr_matrix

, pd.DataFrame

does not accept it as a constructor argument. Taking a step back, if we resort to pd.SparseDataFrame

, how can I avoid copying memory?
Here 's one great approach for converting scipy.csr_matrix

to pd.SparseDataFrame

. But for-loop is so inefficient and causes memory copying.

Also, I am trying to initialize sparseDataFrame

to a block of memory and assign a row-by-row value that ends with:

a = np.random.rand(4,5)
b = pd.DataFrame(a)
c = sparse.csr_matrix(a)
d = pd.SparseDataFrame(index=b.index, columns=b.columns)
elem = pd.SparseSeries(c[2].toarray().ravel())
d.loc[[2]] = [ elem ]  # Got a NotImplementedError.
elem = pd.Series(c[2].toarray().ravel())
b.loc[[2]] = [ elem ]  # Yes.

I think the scripting language is decent, indisputable. But I might need a pointer at this time, maybe.

Any help is appreciated in advance!

+3

python scipy pandas sparse-matrix bigdata

Jake0x32 June 21. 15 at 18:25

source to share

No one has answered this question yet

See similar questions:

202

Pandas read_csv low_memory and dtype parameters

25

Populate Pandas SparseDataFrame from SciPy Sparse Matrix

or similar:

3790

How can I safely create a subdirectory?

2097

Is there a way to run Python on Android?

1955

How do I get a substring of a string in Python?

1267

The nicest way of padding zeros to a string

1182

How can I check if a string is empty?

1170

Create a list comprehension dictionary

1093

What is the canonical way of type checking in Python?

1085

Correct way to declare custom exceptions in modern Python?

1065

A way to create multi-line comments in Python?

1028

Pythonic way to create long multi-line string

Cheapest way to create pandas.DataFrame or pandas.SparseDataFrame

More articles: