How to change XLSX column formatting using Python

I have hundreds of XLSX files that have columns containing long numeric account numbers. I need to convert all these files to CSV automatically. This is trivial with tools like ssconvert

. However, due to a bug, error in Excel and Libreoffice will display long numeric fields using scientific notation and this formatted number (not the main data) will be preserved when exporting to CSV.

This means that any automatic conversion to CSV will truncate the account numbers as the value 1240800388917

will be written to the CSV as 1.2408E + 12 or 1240800000000, resulting in data corruption.

This can be easily fixed by opening the Excel file manually and setting these columns to text format. However, it's a bit tedious to do this for hundreds of files, especially since many of these files have weird macros and formatting that make Libreoffice take a few minutes to open them each (another reason I would like to convert them all to CSV first).

What's the simplest way to use Python to automatically open each file and change the formatting of the entire column to "text"? I see many examples in Python where XLS / XLSX files are read, and in some cases write them, but I can find a few tutorials on manipulating the default formatting.

+3


source to share


1 answer


Took some trial and error and dug around in the code, but the solution turned out to be trivial.



from openpyxl import load_workbook
wb = load_workbook('myfile.xlsx')
ws = wb.active
for row in ws.rows:
    row[col_index].number_format = row[col_index].style.number_format = '@'
wb.save('myfile-fixed.xlsx')

      

+1


source







All Articles