You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO [main] (Log4jLogger.java:28) - Pagelinks 1527700000
INFO [main] (Log4jLogger.java:28) - Processing the text table
Exception in thread "xml2sql" java.lang.NullPointerException at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.sql.SQLEscape.escape(SQLEscape.java:37) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.TextWriter.writeRevision(TextWriter.java:55) at de.tudarmstadt.ukp.wikipedia.mwdumper.importer.PageFilter.writeRevision(PageFilter.java:67) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.closeRevision(AbstractXmlDumpReader.java:548) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.endElement(AbstractXmlDumpReader.java:338) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324) at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:205) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStreamThread.run(XMLDumpTableInputStreamThread.java:90)
INFO [main] (Log4jLogger.java:28) - Write end dead
The command is:
java -Djdk.xml.totalEntitySizeLimit=2147483647 -Xmx512m -cp ".:./log4j.properties:./*" de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine config.xml
The config.xml file is:
This a configuration for the JWPL TimeMachine
english
Contents
Disambiguation_pages
20060101000000
20060102000000
1
/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pages-meta-history1.xml-p1p844.bz2
/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-categorylinks.sql.gz
/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pagelinks.sql.gz
/nethome/felixd/wiki_timemachine/wiki_formatted
false
The command seems to be returning the first wiki entry and then fails. In my output directory, I have a PageMapLine.txt with the following entry:
11286 Fruitarianism 11286 NULL NULL
And I have a Page.txt file:
11286 11286 Fruitarianism [[Image:Fruit.jpg|frame|right|A selection...............
What is going on? Should I edit the SQLEscape file in the jar?
The text was updated successfully, but these errors were encountered:
Please note that this library is not really maintained anymore and has not been tested with newer dumps for at least 5 years.
Having said that: it won't work with -Xmx512m, you will need at least 4g if not more. And a lot of additional space on the hard disk.
@FelixDrinkall JWPL is just getting an update - you might check if the current main branch works for your use-case. Note though that package names and groupId/artifactIds have changed... the next version will be 2.0.0 with breaking changes.
I am getting the following error:
INFO [main] (Log4jLogger.java:28) - Pagelinks 1527700000
INFO [main] (Log4jLogger.java:28) - Processing the text table
Exception in thread "xml2sql" java.lang.NullPointerException at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.sql.SQLEscape.escape(SQLEscape.java:37) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.TextWriter.writeRevision(TextWriter.java:55) at de.tudarmstadt.ukp.wikipedia.mwdumper.importer.PageFilter.writeRevision(PageFilter.java:67) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.closeRevision(AbstractXmlDumpReader.java:548) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.endElement(AbstractXmlDumpReader.java:338) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324) at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:205) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStreamThread.run(XMLDumpTableInputStreamThread.java:90)
INFO [main] (Log4jLogger.java:28) - Write end dead
java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStream.read(XMLDumpTableInputStream.java:83)
java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
de.tudarmstadt.ukp.wikipedia.wikimachine.util.UTFDataInputStream.readUTFAsArray(UTFDataInputStream.java:73)
de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.TextParser.next(TextParser.java:69)
de.tudarmstadt.ukp.wikipedia.wikimachine.domain.DumpVersionProcessor.processText(DumpVersionProcessor.java:153)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.processInputDumps(TimeMachineGenerator.java:133)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.start(TimeMachineGenerator.java:109)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine.main(JWPLTimeMachine.java:83)
The command is:
java -Djdk.xml.totalEntitySizeLimit=2147483647 -Xmx512m -cp ".:./log4j.properties:./*" de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine config.xml
The config.xml file is:
This a configuration for the JWPL TimeMachine english Contents Disambiguation_pages 20060101000000 20060102000000 1 /nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pages-meta-history1.xml-p1p844.bz2 /nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-categorylinks.sql.gz /nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pagelinks.sql.gz /nethome/felixd/wiki_timemachine/wiki_formatted falseThe command seems to be returning the first wiki entry and then fails. In my output directory, I have a PageMapLine.txt with the following entry:
11286 Fruitarianism 11286 NULL NULL
And I have a Page.txt file:
11286 11286 Fruitarianism [[Image:Fruit.jpg|frame|right|A selection...............
What is going on? Should I edit the SQLEscape file in the jar?
The text was updated successfully, but these errors were encountered: