Pandas df.itertuples renaming data columns while printing
I know that normally pandas' itertuples () returns the values ββof each one, including the column names, like this:
ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
print(i)
and the output is as follows:
Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185) Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305) Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)
However, I have no idea why it is not showing columns as I would expect for my other set of code like below:
us qqq equity us spy equity
date
2017-06-19 0.0 1.0
2017-06-20 0.0 -1.0
2017-06-21 0.0 0.0
2017-06-22 0.0 0.0
2017-06-23 1.0 0.0
2017-06-26 0.0 0.0
2017-06-27 -1.0 0.0
2017-06-28 1.0 0.0
2017-06-29 -1.0 0.0
2017-06-30 0.0 0.0
the above Pandas framework with timestamp as index, float64 as values ββin the list, and list of strings ['us qqq equity', 'us spy equity'] as columns.
When I do this:
for row in data.itertuples():
print (row)
It shows columns as _1 and _2 like this:
Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)
Does anyone know what I did wrong? Do I need to do this with any sort of variable summarization issue when creating the original frame? (Also, as a side question, I learned from the community that the datatype generated from itertuples () should be tuples, but it seems (as shown above) the return type is as I checked from the type statement?)
Thanks for your patience as I am still trying to get my head around the DataFrame application.
source to share
This seems to be a problem when handling column names with spaces in them. If you replace the column names with others without spaces, this works:
df.columns = ['us_qqq_equity', 'us_spy_equity']
# df.columns = df.columns.str.replace(r'\s+', '_') # Courtesy @MaxU
for r in df.head().itertuples():
print(r)
# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...
Column names with spaces cannot be effectively represented in named tuples, so they are renamed automatically when printed.
source to share
An interesting observation: from DataFrame.iterrows()
, DataFrame.iteritems()
, DataFrame.itertuples()
only the last rename columns with spaces:
In [140]: df = df.head(3)
In [141]: list(df.iterrows())
Out[141]:
[(Timestamp('2017-06-19 00:00:00'), us qqq equity 0.0
us spy equity 1.0
Name: 2017-06-19 00:00:00, dtype: float64),
(Timestamp('2017-06-20 00:00:00'), us qqq equity 0.0
us spy equity -1.0
Name: 2017-06-20 00:00:00, dtype: float64),
(Timestamp('2017-06-21 00:00:00'), us qqq equity 0.0
us spy equity 0.0
Name: 2017-06-21 00:00:00, dtype: float64)]
In [142]: list(df.iteritems())
Out[142]:
[('us qqq equity', date
2017-06-19 0.0
2017-06-20 0.0
2017-06-21 0.0
Name: us qqq equity, dtype: float64), ('us spy equity', date
2017-06-19 1.0
2017-06-20 -1.0
2017-06-21 0.0
Name: us spy equity, dtype: float64)]
In [143]: list(df.itertuples())
Out[143]:
[Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0),
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0),
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)]
source to share