Pandas Subset of single columns on dataframe creates data structure

This is really a story about two data frames and strangely different behavior.

I have two csv files that I read into pandas. Every file without a header; header files are stored separately. Like this:

$ ls
A.csv A.header B.csv B.header

      

I'm using pandas to read them, but first I need to parse the header:

def make_header(flnm):
    return open(flnm, 'rb').read().strip(' \t\n\r').split(',')

A_header = make_header('A.header')
B_header = make_header('B.header')

      

Now I can read in csvs:

A = read_csv('A.csv', header=0, names=A_header)
B = read_csv('B.csv', header=0, names=B_header)

      

Make sure this worked correctly:

print type(A)
print type(B)

      

Result:

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

      

as was expected.

Now there is an oddity that I cannot figure out. I want to select one column from each of these dataframes. When I do this, one of the data frames returns a Series object (as you would expect) and one returns a single DataFrame column:

print type(A.A_x)
print type(B.B_x)

      

leads to:

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

      

As far as I can tell, I processed these files the same way from start to finish, but got different results. What could be causing this? Where is the error in my data sanitization or my understanding of pandas?

A couple of things I looked into

The two columns have the same data types:

print A.A_x.dtype
print B.B_x.B_x.dtype

      

gives:

int64
int64

      

(of course I need to fetch columns twice from dataframe B due to the strange behavior I observe).

I also checked for duplicate names in my header:

$ cat A.header | sed 's/,/\n/g' | grep A_x
> A_x

      

and

$ cat B.header | sed 's/,/\n/g' | grep B_x
> B_x

      

So, each specified name appears exactly once.

+3


source to share





All Articles