Column creation and ordering consistent in Pandas DataFrame
I'm looking for an elegant, Pythonic way to make Pandas DataFrame columns consistent. Value:
- Make sure all columns in the main list are present and if not added to an empty placeholder column.
- Make sure the columns are in the same order as the main list.
I have the following example that works, but is there a built-in Pandas method to achieve the same goal?
import pandas as pd
df1 = pd.DataFrame(data=[{'a':1,'b':32, 'c':32}])
print df1
abc 0 1 32 32
column_master_list = ['b', 'c', 'e', 'd', 'a']
def get_dataframe_with_consistent_header(df, headers):
for col in headers:
if col not in df.columns:
df[col] = pd.np.NaN
return df[headers]
print get_dataframe_with_consistent_header(df1, column_master_list)
bceda 0 32 32 NaN NaN 1
+3
source to share
1 answer
You can use the method reindex_axis
. Go to the list of column names and specify 'columns'
. NaN
Default padding value for missing records :
>>> df1.reindex_axis(column_master_list, 'columns')
b c e d a
0 32 32 NaN NaN 1
+4
source to share