Skip to content

Commit

Permalink
Fixed error when importing large xml dumps.
Browse files Browse the repository at this point in the history
ERROR: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"

Fixes several parsing issues raised over the previous 8 years. 

[1] https://www.mediawiki.org/wiki/Manual_talk:MWDumper#Exception_in_thread_.22main.22_java.lang.ArrayIndexOutOfBoundsException:_2048

[2] dbpedia/extraction-framework#487 (comment)
  • Loading branch information
james-gould authored Mar 24, 2017
1 parent decb792 commit e84d5a8
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions src/org/mediawiki/importer/XmlDumpReader.java
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,16 @@ public XmlDumpReader(InputStream inputStream, DumpWriter writer) {
*/
public void readDump() throws IOException {
try {
System.setProperty("jdk.xml.totalEntitySizeLimit", String.valueOf(Integer.MAX_VALUE));

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();

parser.parse(input, this);
Reader reader = new InputStreamReader(input,"UTF-8");
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");


parser.parse(is, this);
} catch (ParserConfigurationException e) {
throw (IOException)new IOException(e.getMessage()).initCause(e);
} catch (SAXException e) {
Expand Down

0 comments on commit e84d5a8

Please sign in to comment.