Error: xml.sax.SAXParseException while parsing xml file with wikixmlj

I am parsing a wikipedia xml dump using wikixmlj and getting the following error.

org.xml.sax.SAXParseException; lineNumber: 64243259; columnNumber: 371; JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
        at edu.jhu.nlp.wikipedia.WikiXMLSAXParser.parse(WikiXMLSAXParser.java:58)
        at edu.virginia.cs.wikirarchy.ParseWikiPage.run(ParseWikiPage.java:36)
        at java.lang.Thread.run(Thread.java:745)

      

So this part of the error is the main one:

The accumulated size of objects is "50,000.001", which has exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".

I cannot find a solution to this problem.

+3


source to share


1 answer


Adding three more arguments when running the java command solved my problem.

-DentityExpansionLimit = 2147480000 -DtotalEntitySizeLimit = 2147480000 -Djdk.xml.totalEntitySizeLimit = 2147480000

So right now I am running my code with the following command.



nohup java -DentityExpansionLimit = 2147480000 -DtotalEntitySizeLimit = 2147480000 -Djdk.xml.totalEntitySizeLimit = 2147480000 -Xmx16g -cp "lib / *. jar" -jar dist / WikiRarchy.jar 32 &

The issue was due to the fact that, by default, secure processing limits the number of objects to 50,000,000, and this expansion limit controls the expansion of the entity.

+3


source







All Articles