MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata, abstract extraction.
The repository at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config can be used to fork the architecture for creating extensions, developing new extractors or debugging old ones.
Fixes and patches will be manually deployed via a fresh git clone
from the master
branch of the DIEF (DBpedia Information Extraction Framework).
All scripts and config files in this repo are CC-0 (Public Domain). We accept pull requests to improve the config files, all contributions will be merged as CC-0.
git clone https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config
cd marvin-config
# delete previous versions of the DIEF
rm -rf marvin-config/extraction-framework
./setup-dief.sh
# test Romanian extraction, very small
./marvin_extraction_run.sh --group=test
To run the other extractions, use either
# around 4-7 days
./marvin_extraction_run.sh --group=generic
# around 4-7 days
./marvin_extraction_run.sh --group=mappings
# around 7-14 days
./marvin_extraction_run.sh --group=wikidata
Below is a list of cronjobs we use on the different servers
We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providing three servers to run:
- the main DBpedia extraction on a monthly basis
- community-provided extractors on Wikipedia, Wikidata or other sources
- enrichment, cleaning and parsing services, so-called Databus mods for open data on the Databus
This contribution by TIB to DBpedia & its community is a great push towards incentivizing Open Data and establishing a global and national research and innovation data infrastructure.
TODO
TODO
TODO
On the respective server there is a user marvin-fetch, that has access to /data/derive
containing the pom.xml of https://github.com/dbpedia/databus-maven-plugin/tree/master/dbpedia
# query to get all versions fro derive in xml syntax to paste directly into pom.xml
PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
SELECT distinct (?derive) WHERE {
?dataset dataid:group <https://databus.dbpedia.org/marvin/generic> .
?dataset dataid:artifact ?artifact .
?dataset dataid:version ?version .
?dataset dct:hasVersion "2019.08.30"^^xsd:string
BIND (CONCAT("<version>",?artifact,"/${databus.deriveversion}</version>") as ?derive)
}
order by asc(?derive)
su marvin-fetch
tmux a -t derive
WHAT=mappings
NEWVERSION=2019.08.30
# prepare
cd /data/derive/databus-maven-plugin/dbpedia/$WHAT
git pull
mvn versions:set -DnewVersion=$NEWVERSION
# run
mvn databus-derive:clone -Ddatabus.deriveversion=$NEWVERSION
run marvin-fetch.sh script in databus/dbpedia folder
cd /media/bigone/25TB/releases/databus-maven-plugin/dbpedia
./marvin-fetch.sh wikidata 2019.08.01
cd /media/bigone/25TB/releases/databus-maven-plugin/dbpedia/mappings
mvn clean
mvn validate
mvn -T 8 deploy