Skip to content

Commit

Permalink
more cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
kurzum committed Oct 11, 2019
1 parent 29ea1c2 commit d785997
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 17 deletions.
32 changes: 27 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,37 @@
# MARVIN-config

MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata extraction.

MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata, abstract extraction.
The repository at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config can be used to fork the architecture for creating extensions, developing new extractors or debugging old ones.
Fixes and patches will be manually deployed via a freah `git clone` from the `master` branch of the [DBpedia Extraction Framework](https://github.com/dbpedia/extraction-framework/).

## Contributions & License
All scripts and config files in this repo are CC-0 (Public Domain).
We accept pull requests to improve the config files, all contributions will be merged as CC-0.

## Run a MARVIN extraction
```
git clone https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config
cd marvin-config
# Romanian extraction, very small
./marvin_extraction_run.sh --group=test
```

To run the other extractions, use either
```
# around 4-7 days
./marvin_extraction_run.sh --group=generic
# around 4-7 days
./marvin_extraction_run.sh --group=mappings
# around 7-14 days
./marvin_extraction_run.sh --group=wikidata
```

Fixes and patches will be manually deployed via `git pull` from the `master` branch of the [DBpedia Extraction Framework](https://github.com/dbpedia/extraction-framework/).
## Cronjobs

The architecture and workflow can also be forked and adapted to completely different extractions and derive operations outside of the DBpedia framework.
Below is a list


# Acknowledgements
## Acknowledgements
We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providing three servers to run:

* the main DBpedia extraction on a monthly basis
Expand Down
18 changes: 9 additions & 9 deletions functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ prepareExtractionFramework(){
if [ "$SKIPDIEFINSTALL" = "false" ]
then
# TODO make sure this contains marvin-config/marvin-extraction and replace with -rf
echo "deleting $DIEFDIR"
rm -rI $DIEFDIR
git clone "https://github.com/dbpedia/extraction-framework.git" $DIEFDIR
cd $DIEFDIR
Expand All @@ -25,18 +26,17 @@ prepareExtractionFramework(){
# downlaod and extract data
extractDumps() {
cd $DIEFDIR/dump;

# run for all
>&2 ../run extraction $ROOT/config.d/extraction.$GROUP.properties;

# exceptions

## for generic, as English is big and has to be run separately

# exception for generic, 1. spark, 2. as English is big and has to be run separately
if [ "$GROUP" = "generic" ]
then
>&2 ../run sparkextraction $ROOT/config.d/extraction.generic.en.properties;
fi
>&2 ../run sparkextraction $CONFIGDIR/extraction.generic.properties;
>&2 ../run sparkextraction $CONFIGDIR/extraction.generic.en.properties;
else
# run for all
>&2 ../run extraction $CONFIGDIR/extraction.$GROUP.properties;

fi

}

Expand Down
6 changes: 3 additions & 3 deletions marvin_extraction_run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ description:
##############
# setup paths
##############
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )/marvin-extraction/"
ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )/marvin-extraction"
CONFIGDIR="$ROOT/extractionConfiguration"

# set and create
Expand Down Expand Up @@ -101,10 +101,10 @@ cd $DIEFDIR/dump
extractDumps &> $LOGDIR/extraction.log;

# POST-PROCESSING
postProcessing 2> $LOGDIR/postProcessing.log;
#postProcessing 2> $LOGDIR/postProcessing.log;

# RELEASE
databusRelease 2> $LOGDIR/databusDeploy.log
#databusRelease 2> $LOGDIR/databusDeploy.log

# CLEANUP
archiveLogFiles;

0 comments on commit d785997

Please sign in to comment.