Why am I getting a Pandas dataframe with only one vs Series column?

I've noticed single-frame data frames a couple of times, to my chagrin (examples below); but in most cases, a single column data frame will just be a series. Is there any rhyme or reason why a single DF column is being returned?

Examples:

1) when indexing columns with a boolean mask, where the mask has only one true value :

df = pd.DataFrame([list('abc'), list('def')], columns = ['foo', 'bar', 'tar'])
mask = [False, True, False]
type(df.ix[:,mask])

      

2) when setting an index on a DataFrame that only has two columns starting at:

df = pd.DataFrame([list('ab'), list('de'), list('fg')], columns = ['foo', 'bar']
type(df.set_index('foo'))

      

It seems to me that if I am expecting a DF with only one column, I can handle it by simply calling

pd.Series(df.values().ravel(), index = df.index)

      

But in most other cases, a single column data frame will be just a series. Is there any rhyme or reason why a single DF column is being returned?

+3


source to share


1 answer


In general, a single-column DataFrame will be returned when an operation can return a multi-column DataFrame. For example, when you use a boolean column index, a multi-column DataFrame must be returned if there was more than one True value, so the DataFrame will always be returned even if it only has one column. Likewise, when setting the index, if your DataFrame had more than two columns, the result should still be a DataFrame after removing one for the index, so it will still be a DataFrame even if it only has one column to the left.

In contrast, if you do something like df.ix[:,'col']

it returns a series because there is no way to pass one column name to select more than one column.



The idea is that the execution of the operation should not sometimes return a DataFrame and sometimes a Series, based on the specifics of the operands (that is, how many columns they have, how many True values ​​in your boolean mask). When you do it df.set_index('col')

, it's easier if you know you will always get the DataFrame without worrying about how many columns came from the original.

Note that there is also a DataFrame method .squeeze()

to turn a single column DataFrame into a series.

+5


source







All Articles