Modifying the XSSFWorkbook stylesheet to remove duplicate CellStyleXfs

After finding some xlsx files on iPad, I found the problem in styles.xml file in xlsx archive is too big. I was able to view the file after manually deleting duplicate records and let excel restore the file, but for solving the problem for other files, I would rather have a programmatic solution using POI, unfortunately I was having problems trying to save the workbook after I changed the table styles.

I tried to copy the optimiseCellStyles (HSSFWorkbook workbook) format for HSSFWorkbook, but the internals are different and the offending styles are only of a certain type. After checking a number of XSSFWorkbook features, I found that

XSSFWorkbook wb; // with proper initialization
wb.getStylesSource().getCTStylesheet().getCellStyleXfs(); 

      

returned about 40,000 records, the bulk of which was

<main:xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>

      

So, I tried to determine the location of the duplicates in the workbook and remove them by calling

XSSFWorkbook wb; // with proper initialization
wb.getStylesSource().getCTStylesheet().getCellStyleXfs().removeXf(i);

      

but after deleting the styles when trying to save the StylesTable.java file it throws the error org.apache.xmlbeans.impl.values.XmlValueDisconnectedException And I tried several different ways and got similar errors.

Taking a look at the code in StylesTable.java it seems that the class keeps a separate copy of the stylesheet than the CTStylesheet and the size difference after changing it is causing the problem, but I suppose I am misunderstanding how to properly track these entries, especially because

XSSFCell cell; //with proper initialization
cell.getCellStyle().getStyleXf().getXfId(); 

      

always returns a number between 1 and 5, despite thousands of entries in the XML file.

Is there a more standard way to clean up this section of the book that I'm missing?

Am I terribly misunderstanding the internal workbook or POI functions?

Any help would be greatly appreciated.

+3


source to share





All Articles