Pandas df.itertuples renaming data columns while printing

I know that normally pandas' itertuples () returns the values ​​of each one, including the column names, like this:

ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
    print(i)

      

and the output is as follows:

Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)

      

However, I have no idea why it is not showing columns as I would expect for my other set of code like below:

            us qqq equity  us spy equity
date                                    
2017-06-19            0.0            1.0
2017-06-20            0.0           -1.0
2017-06-21            0.0            0.0
2017-06-22            0.0            0.0
2017-06-23            1.0            0.0
2017-06-26            0.0            0.0
2017-06-27           -1.0            0.0
2017-06-28            1.0            0.0
2017-06-29           -1.0            0.0
2017-06-30            0.0            0.0

      

the above Pandas framework with timestamp as index, float64 as values ​​in the list, and list of strings ['us qqq equity', 'us spy equity'] as columns.

When I do this:

for row in data.itertuples():
    print (row)

      

It shows columns as _1 and _2 like this:

Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)

      

Does anyone know what I did wrong? Do I need to do this with any sort of variable summarization issue when creating the original frame? (Also, as a side question, I learned from the community that the datatype generated from itertuples () should be tuples, but it seems (as shown above) the return type is as I checked from the type statement?)

Thanks for your patience as I am still trying to get my head around the DataFrame application.

+3


source to share


2 answers


This seems to be a problem when handling column names with spaces in them. If you replace the column names with others without spaces, this works:

df.columns = ['us_qqq_equity', 'us_spy_equity'] 
# df.columns = df.columns.str.replace(r'\s+', '_')  # Courtesy @MaxU  
for r in df.head().itertuples():
    print(r)

# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...

      



Column names with spaces cannot be effectively represented in named tuples, so they are renamed automatically when printed.

+4


source


An interesting observation: from DataFrame.iterrows()

, DataFrame.iteritems()

, DataFrame.itertuples()

only the last rename columns with spaces:



In [140]: df = df.head(3)

In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity    0.0
  us spy equity    1.0
  Name: 2017-06-19 00:00:00, dtype: float64),
 (Timestamp('2017-06-20 00:00:00'), us qqq equity    0.0
  us spy equity   -1.0
  Name: 2017-06-20 00:00:00, dtype: float64),
 (Timestamp('2017-06-21 00:00:00'), us qqq equity    0.0
  us spy equity    0.0
  Name: 2017-06-21 00:00:00, dtype: float64)]

In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
  2017-06-19    0.0
  2017-06-20    0.0
  2017-06-21    0.0
  Name: us qqq equity, dtype: float64), ('us spy equity', date
  2017-06-19    1.0
  2017-06-20   -1.0
  2017-06-21    0.0
  Name: us spy equity, dtype: float64)]

In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
 Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
 Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]

      

+2


source







All Articles