Pandas Subset of single columns on dataframe creates data structure

Question

Pandas Subset of single columns on dataframe creates data structure

This is really a story about two data frames and strangely different behavior.

I have two csv files that I read into pandas. Every file without a header; header files are stored separately. Like this:

$ ls
A.csv A.header B.csv B.header

I'm using pandas to read them, but first I need to parse the header:

def make_header(flnm):
    return open(flnm, 'rb').read().strip(' \t\n\r').split(',')

A_header = make_header('A.header')
B_header = make_header('B.header')

Now I can read in csvs:

A = read_csv('A.csv', header=0, names=A_header)
B = read_csv('B.csv', header=0, names=B_header)

Make sure this worked correctly:

print type(A)
print type(B)

Result:

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

as was expected.

Now there is an oddity that I cannot figure out. I want to select one column from each of these dataframes. When I do this, one of the data frames returns a Series object (as you would expect) and one returns a single DataFrame column:

print type(A.A_x)
print type(B.B_x)

leads to:

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

As far as I can tell, I processed these files the same way from start to finish, but got different results. What could be causing this? Where is the error in my data sanitization or my understanding of pandas?

A couple of things I looked into

The two columns have the same data types:

print A.A_x.dtype
print B.B_x.B_x.dtype

gives:

int64
int64

(of course I need to fetch columns twice from dataframe B due to the strange behavior I observe).

I also checked for duplicate names in my header:

$ cat A.header | sed 's/,/\n/g' | grep A_x
> A_x

and

$ cat B.header | sed 's/,/\n/g' | grep B_x
> B_x

So, each specified name appears exactly once.

+3

python pandas

Matthew drury 04 Sep '14 at 14:30

source to share

No one has answered this question yet

Check out similar questions:

1553

Renaming columns in pandas

1419

Select rows from DataFrame based on values in column in pandas

1033

Remove column from panda DataFrame

889

Selecting multiple columns in pandas dataframe

879

Get list from pandas DataFrame column headers

873

Big data workflows using pandas

815

Adding a new column to an existing DataFrame in Python pandas

540

Change datatype of columns in Pandas

99

Convert Pandas column containing NaN to dtype `int`

1

Pandas concatenates 2 csvs with a similar column but has a different header name

Pandas Subset of single columns on dataframe creates data structure

More articles: