Skip to content

6 Using Wikidata as KB

Ricardo Usbeck edited this page Nov 17, 2017 · 16 revisions

This page is under construction

Index created from : https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html

RDF dump of all item labels, descriptions, and aliases in all languages. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-terms.nt.gz RDF dump of all Wikidata property definitions, including datatypes, labels, descriptions, and aliases. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-properties.nt.gz RDF dump of all statements, complete with references and qualifiers. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-statements.nt.gz

Disclaimer:

The dumps differ substantially from the online version. The predicates, in the dumps, are assinged as entities then it may affect the results of a disambiguation process due to wikidata's topology.

For example:

From web, e.g., Barack Obama -> https://www.wikidata.org/wiki/Q76 contains this triple http://www.wikidata.org/entity/Q76 http://www.wikidata.org/prop/direct/P2283 http://www.wikidata.org/entity/Q7212330 .

However, from the dump, it is -> http://www.wikidata.org/entity/Q76 http://www.wikidata.org/entity/P2283v http://www.wikidata.org/entity/Q7212330

http://hobbitdata.informatik.uni-leipzig.de/agdistis/wikidata/

index=index_wikidata_en
index2=index_bycontext

#used to prune edges
nodeType=http://www.wikidata.org/entity/
edgeType=http://www.wikidata.org/entity/
baseURI =http://www.wikidata.org
#SPARQL endpoint to retrieve domain and range information
endpoint=https://query.wikidata.org/
#this is the trigram distance between words, default = 3
ngramDistance=3
#exploration depth of semantic disambiguation graph
maxDepth=2
#threshold for cutting of similar strings
threshholdTrigram=0.87
#heuristicExpansionOn explains whether simple coocurence resolution is done or not, e.g., Barack => Barack Obama if both are in the same text
heuristicExpansionOn=true
#list of entity domains and corporationAffixes
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt

#Active popularity
popularity=false

#Choose an graph-based algorithm "hits" or "pagerank"
algorithm=hits

#Enable search by context
context=false

#Enable search by acronym
acronym=false

#Enable to find common entities
commonEntities=true

# IMPORTANT for creating an own index

folderWithTTLFiles=/Users/diegomoussallem/Desktop/AGDISTIS-WIKIDATA/wikidata/
surfaceFormTSV=
Clone this wiki locally