Setting columns for empty pandas data

This is what I'm confused about ...

import pandas as pd

# this works fine
df1 = pd.DataFrame(columns=['A','B'])

# but let say I have this
df2 = pd.DataFrame([])

# this doesn't work!
df2.columns = ['A','B']
# ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 elements

      

Why doesn't it work? What can I do instead? The only way to do something like this?

if len(df2.index) == 0:
    df2 = pd.DataFrame(columns=['A','B'])
else:
    df2.columns = ['A','B']

      

There should be a more elegant way.

Thanks for your help!

Update 4/19/2015

Someone asked why do this at all:

df2 = pd.DataFrame([])

      

The reason is that I am actually doing something like this:

df2 = pd.DataFrame(data)

      

... where the data might be an empty list of lists, but most of the time it isn't. So yes, I could do:

if len(data) > 0:
    df2 = pd.DataFrame(data, columns=['A','B'])
else:
    df2 = pd.DataFrame(columns=['A','B'])

      

... but it doesn't seem very dry (and certainly not concise).

Let me know if you have any questions. Thank!

+3


source to share


2 answers


This looks like a bug in pandas. All these works:

pd.DataFrame(columns=['A', 'B'])
pd.DataFrame({}, columns=['A', 'B'])
pd.DataFrame(None, columns=['A', 'B'])

      

but not this:

pd.DataFrame([], columns=['A', 'B'])

      



Until this is fixed, I suggest something like this:

if len(data) == 0: data = None
df2 = pd.DataFrame(data, columns=['A','B'])

      

or

df2 = pd.DataFrame(data if len(data) > 0 else None, columns=['A', 'B'])

      

+2


source


Update: from Pandas version 0.16.1 , transfer data = []

works:

In [85]: df = pd.DataFrame([], columns=['a', 'b', 'c'])

In [86]: df
Out[86]: 
Empty DataFrame
Columns: [a, b, c]
Index: []

      

so the best solution is to update your Pandas version.


If data

is an empty list of lists, then

data = [[]]

      



But then it len(data)

will be equal to 1, so len(data) > 0

it is not a valid test condition if data

is an empty list of lists.

There are a number of meanings for data

that could make

pd.DataFrame(data, columns=['A','B'])

      

throw an exception. An AssertionError or ValueError is thrown if data

equal to []

(no data), [[]]

(no columns), [[0]]

(one column), or [[0,1,2]]

(too many columns). So instead of trying to test all of this, I find it safer and easier to use try..except

here:

columns = ['A', 'B']
try:
    df2 = pd.DataFrame(data, columns=columns)
except (AssertionError, ValueError):
    df2 = pd.DataFrame(columns=columns)

      

It would be nice if there was a DRY-er to write this, but given that it is the responsibility of the respondent for this , I see no better way.

+2


source







All Articles