Error: xml.sax.SAXParseException while parsing xml file with wikixmlj
I am parsing a wikipedia xml dump using wikixmlj and getting the following error.
org.xml.sax.SAXParseException; lineNumber: 64243259; columnNumber: 371; JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
at edu.jhu.nlp.wikipedia.WikiXMLSAXParser.parse(WikiXMLSAXParser.java:58)
at edu.virginia.cs.wikirarchy.ParseWikiPage.run(ParseWikiPage.java:36)
at java.lang.Thread.run(Thread.java:745)
So this part of the error is the main one:
The accumulated size of objects is "50,000.001", which has exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".
I cannot find a solution to this problem.
source to share
Adding three more arguments when running the java command solved my problem.
-DentityExpansionLimit = 2147480000 -DtotalEntitySizeLimit = 2147480000 -Djdk.xml.totalEntitySizeLimit = 2147480000
So right now I am running my code with the following command.
nohup java -DentityExpansionLimit = 2147480000 -DtotalEntitySizeLimit = 2147480000 -Djdk.xml.totalEntitySizeLimit = 2147480000 -Xmx16g -cp "lib / *. jar" -jar dist / WikiRarchy.jar 32 &
The issue was due to the fact that, by default, secure processing limits the number of objects to 50,000,000, and this expansion limit controls the expansion of the entity.
source to share