-
Notifications
You must be signed in to change notification settings - Fork 37
6 Using Wikidata as KB
This page is under construction
Index created from : https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html
RDF dump of all item labels, descriptions, and aliases in all languages. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-terms.nt.gz RDF dump of all Wikidata property definitions, including datatypes, labels, descriptions, and aliases. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-properties.nt.gz RDF dump of all statements, complete with references and qualifiers. -> https://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/wikidata-statements.nt.gz
Disclaimer:
The dumps differ substantially from the online version. The predicates, in the dumps, are assinged as entities then it may affect the results of a disambiguation process due to wikidata's topology.
For example:
From web, e.g., Barack Obama -> https://www.wikidata.org/wiki/Q76 contains this triple http://www.wikidata.org/entity/Q76 http://www.wikidata.org/prop/direct/P2283 http://www.wikidata.org/entity/Q7212330 .
However, from the dump, it is -> http://www.wikidata.org/entity/Q76 http://www.wikidata.org/entity/P2283v http://www.wikidata.org/entity/Q7212330
http://hobbitdata.informatik.uni-leipzig.de/agdistis/wikidata/
index=index_wikidata_en
index2=index_bycontext
#used to prune edges
nodeType=http://www.wikidata.org/entity/
edgeType=http://www.wikidata.org/entity/
baseURI =http://www.wikidata.org
#SPARQL endpoint to retrieve domain and range information
endpoint=https://query.wikidata.org/
#this is the trigram distance between words, default = 3
ngramDistance=3
#exploration depth of semantic disambiguation graph
maxDepth=2
#threshold for cutting of similar strings
threshholdTrigram=0.87
#heuristicExpansionOn explains whether simple coocurence resolution is done or not, e.g., Barack => Barack Obama if both are in the same text
heuristicExpansionOn=true
#list of entity domains and corporationAffixes
whiteList=/config/whiteList.txt
corporationAffixes=/config/corporationAffixes.txt
#Active popularity
popularity=false
#Choose an graph-based algorithm "hits" or "pagerank"
algorithm=hits
#Enable search by context
context=false
#Enable search by acronym
acronym=false
#Enable to find common entities
commonEntities=true
# IMPORTANT for creating an own index
folderWithTTLFiles=/Users/diegomoussallem/Desktop/AGDISTIS-WIKIDATA/wikidata/
surfaceFormTSV=