index-mathjax.html

<!DOCTYPE html>
<html lang="en">
<head>
	<title>Knowledge Graphs</title>
	<meta charset="UTF-8"/>
	<link rel="stylesheet" href="css/style.css"/>
	<link rel="stylesheet" href="css/prism.css"/>
	<link rel="stylesheet" href="css/fonts.css"/>
	<link rel="stylesheet" href="css/print.css" media="print"/>
	<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
	<script id="MathJax-script" async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
	<script src="js/prism.js"></script>
</head>
<body>
	<div style="display:none" id="tex-macros">
		\(
\newcommand{\coloneqq}{\mathrel{\vcenter{:}}=}
\newcommand{\con}{\mathbf{Con}}
\newcommand{\var}{\mathbf{Var}}
\newcommand{\term}{\mathbf{Term}}
\newcommand{\dom}{\mathbf{dom}}
\newcommand{\datatype}[1]{\Delta_{#1}}
\newcommand{\datatypeL}[1]{\datatype{\texttt{#1}}}
\newcommand{\gelab}[1]{{\color{blue}\textsf{#1}}}
\newcommand\arc[2]{\xrightarrow{#1}#2}
\newcommand{\qualified}[4]{\arc{#1}{#2}\{#3,#4\}}
\newcommand{\qualifiedcard}[3]{\arc{#1}{#2}~#3}
\newcommand{\qualifiedL}[4]{\qualified{\gelab{#1}}{#2}{#3}{#4}}
\newcommand{\qualifiedcardL}[3]{\qualifiedcard{\gelab{#1}}{#2}{#3}}
\newcommand{\semantics}[4]{[#1]^{#2,#3,#4}}
\newcommand{\inp}[1]{#1^I}
\newcommand{\inpdom}{\inp{\Delta}}
\newcommand{\T}[1]{#1^{\rm T}}
\newcommand{\D}[1]{#1^{\rm D}}
		\)
	</div>
	<div class="cover"><img alt="mock cover" src="images/mock-cover.jpg"/></div>
	<header>
		<h1 id="title"><span class="big-letter">K</span>nowledge <span class="big-letter">G</span>raphs</h1>
		<ul class="authorlist">
			<li><span class="author"><a href="https://orcid.org/0000-0001-9482-1982">Aidan Hogan</a></span> <span class="affiliation">IMFD, DCC, Universidad de Chile <span class="country">Chile</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0003-0036-6662">Eva Blomqvist</a></span> <span class="affiliation">Linköping University <span class="country">Sweden</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0001-5726-4638">Michael Cochez</a></span> <span class="affiliation">Vrije Universiteit and Discovery Lab, Elsevier <span class="country">The Netherlands</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-3385-987X">Claudia d’Amato</a></span> <span class="affiliation">University of Bari <span class="country">Italy</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-2930-2059">Gerard de Melo</a></span> <span class="affiliation">HPI, University of Potsdam and Rutgers University <span class="country">USA</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-4559-6544">Claudio Gutierrez</a></span> <span class="affiliation">IMFD, DCC, Universidad de Chile <span class="country">Chile</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-6955-7718">Sabrina Kirrane</a></span> <span class="affiliation">WU Vienna <span class="country">Austria</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0001-8907-5348">José Emilio Labra Gayo</a></span> <span class="affiliation">Universidad de Oviedo <span class="country">Spain</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0003-3831-9706">Roberto Navigli</a></span> <span class="affiliation">Sapienza University of Rome <span class="country">Italy</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-9804-4882">Sebastian Neumaier</a></span> <span class="affiliation">St. Pölten University of Applied Sciences <span class="country">Austria</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0001-7112-3516">Axel-Cyrille Ngonga Ngomo</a></span> <span class="affiliation">DICE, Universität Paderborn <span class="country">Germany</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0001-5670-1146">Axel Polleres</a></span> <span class="affiliation">WU Vienna <span class="country">Austria</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-4162-8334">Sabbir M. Rashid</a></span> <span class="affiliation">Tetherless World Constellation, Rensselaer Polytechnic Institute <span class="country">USA</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-8046-7502">Anisa Rula</a></span> <span class="affiliation">University of Milano-Bicocca <span class="country">Italy</span> and University of Bonn <span class="country">Germany</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-2108-2303">Lukas Schmelzeisen</a></span> <span class="affiliation">Universität Stuttgart <span class="country">Germany</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0003-3112-9299">Juan Sequeda</a></span> <span class="affiliation">data.world <span class="country">USA</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0002-0780-4154">Steffen Staab</a></span> <span class="affiliation">Universität Stuttgart <span class="country">Germany</span> and University of Southampton <span class="country">UK</span></span></li>
			<li><span class="author"><a href="https://orcid.org/0000-0003-1502-6986">Antoine Zimmermann</a></span> <span class="affiliation">École des mines de Saint-Étienne <span class="country">France</span></span></li>
		</ul>
		<div id="about" class="info">
			<h2>About the book</h2>
			<p>The book is published by <a href="http://www.morganclaypool.com/">Morgan &amp; Claypool</a> in the series <a href="http://www.morganclaypool.com/toc/wbe.1/7/1">Synthesis Lectures on the Semantic Web: Theory and Technology</a> edited by <a href="http://info.slis.indiana.edu/~dingying/">Ying Ding</a> and Paul Groth. Please, cite the book as:</p>
			<blockquote class="quote">
				 Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, Antoine Zimmermann (2021) <em>Knowledge Graphs</em>, Synthesis Lectures on the Semantic Web: Theory and Technology, No. 22, 1-233, DOI: 10.2200/S01125ED1V01Y202109DSK022, Morgan &amp; Claypool
			</blockquote>
			<dl>
				<dt>ISBN paperback:</dt>
				<dd>9781636392356</dd>
				<dt>ISBN ebook:</dt>
				<dd>9781636392363</dd>
				<dt>ISBN hardcover:</dt>
				<dd>9781636392370</dd>
			</dl>
			<p>Copyright © 2021 by Morgan &amp; Claypool. All rights reserved.</p>
			<!--<p><a href="bibtex.txt">Bibtex</a></p>-->
			<h2>Access options</h2>
			<dl>
				<dt>HTML version:</dt>
				<dd>You are currently reading the free HTML version of the book, the most recent of which is available at <a href="https://kg-book.org/">https://kg-book.org/</a></dd>
				<dt>PDF Version:</dt>
				<dd>You can download or buy the book from <a href="http://www.morganclaypool.com/">Morgan &amp; Claypool</a>. Academic and Corporate licences are available.</dd>
				<dt>Hard copy:</dt>
				<dd>You can order from from <a href="http://www.morganclaypool.com/">Morgan &amp; Claypool</a> or <a href="https://www.amazon.com/">Amazon</a>.</dd>
			</dl>
		</div>
		<p><em>SYNTHESIS LECTURES ON ON THE SEMANTIC WEB #22</em></p>
	</header>
	<h2 id="abstract"><span>Abstract</span></h2>
	<p>This book provides a comprehensive and accessible introduction to knowledge graphs, which have recently garnered notable attention from both industry and academia. Knowledge graphs are founded on the principle of applying a graph-based abstraction to data, and are now broadly deployed in scenarios that require integrating and extracting value from multiple, diverse sources of data at large scale.</p>
	<p>The book is divided into ten chapters. The first chapter provides a general introduction to the area, defines the concept of a “knowledge graph”, and provides a high-level overview of how knowledge graphs are currently being used. The second chapter presents and contrasts popular graph models that are commonly used to represent data as graphs, and the languages by which they can be queried. The third chapter describes how the resulting data graph can be enhanced with notions of schema, identity and context. The fourth chapter discusses how ontologies and rules can be used to encode knowledge, and how they enable deductive forms of reasoning. The fifth chapter delves into how inductive techniques – based on statistics, graph analytics, machine learning, etc. – can be used to encode and extract knowledge. The sixth chapter is dedicated to techniques for the creation and enrichment of knowledge graphs from legacy sources of data. The seventh chapter enumerates a variety of quality measures that can be used to assess a knowledge graph in terms of its fitness for use in a variety of applications. The eighth chapter presents key methods for the refinement of knowledge graphs, with the goal of improving their completeness and correctness. The ninth chapter provides a survey of the open and enterprise knowledge graphs that have emerged in recent years, along with the industries within which, and the applications for which, they have been most widely adopted. The tenth chapter wraps up the book with discussion of the current limitations and future directions along which knowledge graphs are likely to evolve. An appendix further covers knowledge graphs from an historical perspective, establishing their significance in the broader context of the academic study of data and knowledge, as well as surveying prior definitions of “knowledge graphs” from the literature.</p>
	<p>The book is aimed at students, researchers and practitioners who wish to learn more about knowledge graphs, and how they facilitate extracting value from diverse data at large-scale. To make the book accessible for newcomers, running examples and graphical notation are used throughout. Formal definitions and extensive references are also provided for those who opt to delve more deeply into specific topics.</p>
	<h2 id="keywords"><span>Keywords</span></h2>
	<p>knowledge graphs, artificial intelligence, semantic web, machine learning</p>

	<nav id="toc">
		<div><p>Table of ▼ Contents</p></div>
		<ol>
			<li><a href="#sec-preface">Preface</a></li>
			<li><a href="#sec-ack">Acknowledgements</a></li>
			<li><a href="#chap-intro">1. Introduction</a></li>
			<li><a href="#chap-graph">2. Data Graphs</a>
			<ol>
				<li><a href="#ssec-graphModels">2.1. Models</a></li>
				<li><a href="#ssec-querying">2.2. Querying</a></li>
			</ol></li>
			<li><a href="#chap-knowledge">3. Schema, Identity, Context</a>
			<ol>
				<li><a href="#sec-schema">3.1. Schema</a></li>
				<li><a href="#sec-identity">3.2. Identity</a></li>
				<li><a href="#ssec-knowledgeContext">3.3. Context</a></li>
			</ol></li>
			<li><a href="#chap-deductive">4. Deductive Knowledge</a>
			<ol>
				<li><a href="#ssec-ontologies">4.1. Ontologies</a></li>
				<li><a href="#ssec-reasoning">4.2. Reasoning</a></li>
			</ol></li>
			<li><a href="#chap-inductive">5. Inductive Knowledge</a>
			<ol>
				<li><a href="#sec-gAnalytics">5.1. Graph Analytics</a></li>
				<li><a href="#ssec-embeddings">5.2. Knowledge Graph Embeddings</a></li>
				<li><a href="#ssec-gnns">5.3. Graph Neural Networks</a></li>
				<li><a href="#ssec-symlearn">5.4. Symbolic Learning</a></li>
			</ol></li>
			<li><a href="#chap-create">6. Creation and Enrichment</a>
			<ol>
				<li><a href="#sssec-graphCreationHuman">6.1. Human Collaboration</a></li>
				<li><a href="#sssec-graphCreationText">6.2. Text Sources</a></li>
				<li><a href="#sssec-graphCreationSemistructured">6.3. Markup Sources</a></li>
				<li><a href="#sssec-graphCreationStructured">6.4. Structured Sources</a></li>
				<li><a href="#ssec-knowledgeConceptual">6.5. Schema/Ontology Creation</a></li>
			</ol></li>
			<li><a href="#chap-quality">7. Quality Assessment</a>
			<ol>
				<li><a href="#ssec-accuracy">7.1. Accuracy</a></li>
				<li><a href="#sssec-coverage">7.2. Coverage</a></li>
				<li><a href="#ssec-coherency">7.3. Coherency</a></li>
				<li><a href="#ssec-succinctness">7.4. Succinctness</a></li>
				<li><a href="#ssec-other-quality">7.5. Other Quality Dimensions</a></li>
			</ol></li>
			<li><a href="#chap-refine">8. Refinement</a>
			<ol>
				<li><a href="#ssec-completion">8.1. Completion</a></li>
				<li><a href="#ssec-correction">8.2. Correction</a></li>
				<li><a href="#ssec-other-refinement-tasks">8.3. Other Refinement Tasks</a></li>
			</ol></li>
			<li><a href="#chap-publish">9. Publication</a>
			<ol>
				<li><a href="#ssec-principles">9.1. Best Practices</a></li>
				<li><a href="#ssec-access">9.2. Access Protocols</a></li>
				<li><a href="#ssec-UsageControl">9.3. Usage Control</a></li>
			</ol></li>
			<li><a href="#chap-kgs">10. Knowledge Graphs in Practice</a>
			<ol>
				<li><a href="#sec-openkgs">10.1. Open Knowledge Graphs</a></li>
				<li><a href="#ssec-enterprise-kgs">10.2. Enterprise Knowledge Graphs</a></li>
			</ol></li>
			<li><a href="#chap-conclude">11. Summary and Conclusion</a></li>
			<li id="toc-ref"><a href="#sec-references">Bibliography</a></li>
			<li class="toc-app"><a href="#chap-defs">A. Background</a>
			<ol>
				<li><a href="#app-historical">A.1. Historical Perspective</a></li>
				<li><a href="#app-pre2012">A.2. “Knowledge Graphs”: Pre 2012</a></li>
				<li><a href="#app-post2012">A.3. “Knowledge Graphs”: 2012 Onwards</a></li>
			</ol></li>
			<li><a href="#sec-bio">Authors’ Biography</a></li>
			<li id="toc-filler">&nbsp;</li>
			<li id="about"><a href="#about">about this book</a></li>
		</ol>
	</nav>
	<section id="sec-preface" class="prechapter">
		<h2 id="preface">Preface</h2>
		<p>The origins of this book can be traced back to a Dagstuhl Seminar, held in 2018, on the topic of Knowledge Graphs. At the time of the seminar, the topic was quickly becoming mainstream in academia and industry, but there were conflicting messages as to what a “knowledge graph” was. Much of the discussion of the seminar centred on this question, and there were divergent opinions as to how knowledge graphs could (or should) be defined; how they relate to previous concepts such as graph databases, knowledge bases, ontologies, RDF graphs, property graphs, semantic networks, etc.; and how the emerging area of Knowledge Graphs should be positioned with respect to the established areas of Artificial Intelligence, Big Data, Databases, Graph Theory, Logic, Machine Learning, Knowledge Representation, Natural Language Processing, Networks (in their various forms), and the Semantic Web. As the discussion continued, a consensus began to emerge: Knowledge Graphs, as a topic, involves a novel confluence of techniques stemming from previously disparate scientific communities, with the unifying goal of developing novel graph-based techniques for better integrating and extracting value from diverse knowledge sources at large scale.</p>
		<p>As a follow-up to the seminar, the attendees agreed that in order to foster this unifying view of Knowledge Graphs, there was a need for a manuscript that would serve as a general introduction to the area. This manuscript would:</p>
		<ul>
			<li>motivate knowledge graphs and the value of abstracting data as graphs;</li>
			<li>survey the historical context of knowledge graphs and the key initiatives leading to their popularisation;</li>
			<li>draw together disparate views of knowledge graphs into a unifying definition;</li>
			<li>provide an introduction to the key techniques that knowledge graphs enable, relating to querying, validation, reasoning, learning, refinement, enrichment, quality assessment, and more besides;</li>
			<li>describe how knowledge graphs are used in practice, surveying the companies using knowledge graphs, the applications they are used for, the open knowledge graphs that have been published, etc.;</li>
			<li>delineate future research directions for knowledge graphs.</li>
		</ul>
		<p>The manuscript would then serve as an introductory text for students, practitioners and researchers new to the area, helping to form a consensus in terms of what is a knowledge graph, laying the foundations for future developments.</p>
		<p>The goal of preparing this manuscript was an ambitious one, and involved drawing together and distilling down a vast amount of literature on a diverse range of topics into a set of key concepts described in an accessible way. For this reason, the manuscript has been prepared by many authors, who have lent their knowledge and expertise to the preparation of specific sections.</p>
		<p>A key aim of this book is to be accessible to a broader audience. While background knowledge of related topics such as Databases, Logic, Machine Learning, Semantic Web, etc., will help to understand some of the particular topics mentioned, such a background is not necessary to follow the general concepts described within. The book aims to motivate and illustrate the various concepts it introduces from a practical perspective, and in order to be as accessible as possible, relies heavily on an example-driven presentation using a graphical notation. For the reader wishing to dig more into the technical minutiae, we complement this discussion with formal definitions throughout; however, the reader more interested in understanding the general concepts and their rationale will find the discussion to be self-contained if they choose to skip the definitions presented in visually distinctive boxes.</p>
		<p>The book serves as an entry point for those new to the topic, and may thus serve as a useful textbook for university courses, for researchers who are venturing into the topic for the first time, and for practitioners who wish to understand more about how knowledge graphs might be of use within their company or organisation, or indeed, how to maximise the value of the knowledge graphs that they are currently developing. Readers who are already active within specific sub-areas of Knowledge Graphs may further appreciate the technical definitions included, the references to other literature provided, and the broader perspective that this book offers in terms of the other related sub-areas and how they complement each other.</p>
		<p>By drawing together diverse techniques from disparate areas, Knowledge Graphs has become an exciting topic in terms of both research and applications. We expect to see growing interest on this topic as the years advance, and indeed hope that this book will help to more firmly establish the foundations of this topic, and to foster future developments upon these foundations, potentially by its readers.</p>
		<p style="text-align: right; font-style: italic;">Aidan&nbsp;Hogan, Eva&nbsp;Blomqvist, Michael&nbsp;Cochez, Claudia&nbsp;d’Amato, Gerard&nbsp;de&nbsp;Melo, Claudio&nbsp;Gutierrez, Sabrina&nbsp;Kirrane, José&nbsp;Emilio&nbsp;Labra&nbsp;Gayo, Roberto&nbsp;Navigli, Sebastian&nbsp;Neumaier, Axel-Cyrille&nbsp;Ngonga&nbsp;Ngomo, Axel&nbsp;Polleres, Sabbir&nbsp;M.&nbsp;Rashid, Anisa&nbsp;Rula, Lukas&nbsp;Schmelzeisen, Juan&nbsp;Sequeda, Steffen&nbsp;Staab, Antoine&nbsp;Zimmermann<br/>
		September 2021</p>
	</section>
	<section id="sec-ack" class="prechapter">
		<h2 id="ack">Acknowledgements</h2>
		<p>We thank the organisers and attendees of the Dagstuhl Seminar on “Knowledge Graphs”. We also thank those who provided feedback on this content.</p>
		<p>Hogan was funded by Fondecyt Grant No.&nbsp;1181896. Hogan &amp; Gutierrez were funded by ANID – Millennium Science Initiative Program – Code ICN17_002. Cochez did part of the work while employed at Fraunhofer FIT, Germany and was later partially funded by Elsevier’s Discovery Lab. Kirrane, Ngonga Ngomo, Polleres &amp; Staab received funding through the project “KnowGraphs” from the European Union’s Horizon programme under the Marie Skłodowska-Curie grant agreement No.&nbsp;860801. Kirrane &amp; Polleres were supported by the European Union’s Horizon 2020 research and innovation programme under grant 731601. Labra was supported by the Spanish Ministry of Economy and Competitiveness (Society challenges: TIN2017-88877-R). Navigli was supported by the MOUSSE ERC Grant No.&nbsp;726487 under the European Union’s Horizon&nbsp;2020 research and innovation programme. Rashid was supported by IBM Research AI through the AI Horizons Network. Schmelzeisen was supported by the German Research Foundation (DFG) grant STA&nbsp;572/18-1.</p>
		<p style="text-align: right; font-style: italic;">Aidan&nbsp;Hogan, Eva&nbsp;Blomqvist, Michael&nbsp;Cochez, Claudia&nbsp;d’Amato, Gerard&nbsp;de&nbsp;Melo, Claudio&nbsp;Gutierrez, Sabrina&nbsp;Kirrane, José&nbsp;Emilio&nbsp;Labra&nbsp;Gayo, Roberto&nbsp;Navigli, Sebastian&nbsp;Neumaier, Axel-Cyrille&nbsp;Ngonga&nbsp;Ngomo, Axel&nbsp;Polleres, Sabbir&nbsp;M.&nbsp;Rashid, Anisa&nbsp;Rula, Lukas&nbsp;Schmelzeisen, Juan&nbsp;Sequeda, Steffen&nbsp;Staab, Antoine&nbsp;Zimmermann<br/>
		September 2021</p>
	</section>

	<section id="chap-intro" class="chapter">
		<h2>Introduction</h2>
		<p>Though the phrase “knowledge graph” has been used in the literature since at least 1972&nbsp;[<a href="#ref-Schneider72">Schneider, 1973</a>], the modern incarnation of the phrase stems from the 2012 announcement of the Google Knowledge Graph&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>], followed by further announcements of knowledge graphs by Airbnb&nbsp;[<a href="#ref-AirBnBKG">Chang, 2018</a>], Amazon&nbsp;[<a href="#ref-AmazonKG">Krishnan, 2018</a>], eBay&nbsp;[<a href="#ref-eBayKG">Pittman et al., 2017</a>], Facebook&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>], IBM&nbsp;[<a href="#ref-IBMKG">Devarajan, 2017</a>], LinkedIn&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], Microsoft&nbsp;[<a href="#ref-BingKG">Shrivastava, 2017</a>], Uber&nbsp;[<a href="#ref-UberKG">Hamad et al., 2018</a>], and more besides. The growing industrial uptake of the concept proved difficult for academia to ignore: more and more scientific literature is being published on knowledge graphs, which includes books (e.g.,&nbsp;[<a href="#ref-PVGW2017">Pan et al., 2017</a>, <a href="#ref-QiCLWJW19">Qi et al., 2021</a>, <a href="#ref-FenselSAHKPTUW20">Fensel et al., 2020</a>, <a href="#ref-KejriwalKS2021">Kejriwal et al., 2021</a>]), as well as papers outlining definitions (e.g.,&nbsp;[<a href="#ref-EhrlingerW16">Ehrlinger and Wöß, 2016</a>]), novel techniques (e.g.,&nbsp;[<a href="#ref-PujaraMGC13">Pujara et al., 2013</a>, <a href="#ref-wang2014knowledge">Wang et al., 2014</a>, <a href="#ref-lin2015learning">Lin et al., 2015</a>]), and surveys of specific aspects of knowledge graphs (e.g.,&nbsp;[<a href="#ref-Paulheim17">Paulheim, 2017</a>, <a href="#ref-Wang2017KGEmbedding">Wang et al., 2017</a>]).</p>
		<p>Underlying all such developments is the core idea of using graphs to represent data, often enhanced with some way to explicitly represent knowledge&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>]. The result is most often used in application scenarios that involve integrating, managing and extracting value from diverse sources of data at large scale&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>]. Employing a graph-based abstraction of knowledge has numerous benefits in such settings when compared with, for example, a relational model or NoSQL alternatives. Graphs provide a concise and intuitive abstraction for a variety of domains, where edges capture the (potentially cyclical) relations between the entities inherent in social data, biological interactions, bibliographical citations and co-authorships, transport networks, and so forth&nbsp;[<a href="#ref-AnglesG08">Angles and Gutierrez, 2008</a>]. Graphs allow maintainers to postpone the definition of a schema, allowing the data – and its scope – to evolve in a more flexible manner than typically possible in a relational setting, particularly for capturing incomplete knowledge&nbsp;[<a href="#ref-Abiteboul97">Abiteboul, 1997</a>]. Unlike (other) NoSQL models, specialised graph query languages support not only standard relational operators (joins, unions, projections, etc.), but also navigational operators for recursively finding entities connected through arbitrary-length paths&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. Standard knowledge representation formalisms – such as ontologies&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>, <a href="#ref-RDFS">Brickley and Guha, 2014</a>, <a href="#ref-obof">Mungall et al., 2012</a>] and rules&nbsp;[<a href="#ref-swrl">Horrocks et al., 2004</a>, <a href="#ref-rif">Kifer and Boley, 2013</a>] – can be employed to define and reason about the semantics of the terms used to label and describe the nodes and edges in the graph. Scalable frameworks for graph analytics&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>, <a href="#ref-XinGFS13">Xin et al., 2013a</a>, <a href="#ref-signalcollect">Stutz et al., 2016</a>] can be leveraged for computing centrality, clustering, summarisation, etc., in order to gain insights about the domain being described. Various representations have also been developed that support applying machine learning techniques both directly and indirectly over graphs&nbsp;[<a href="#ref-Wang2017KGEmbedding">Wang et al., 2017</a>, <a href="#ref-abs-1901-00596">Wu et al., 2019</a>].</p>
		<p>In summary, the decision to build and use a knowledge graph opens up a range of techniques that can be brought to bear for integrating and extracting value from diverse sources of data et large scale. The goal of this book is to motivate and give a comprehensive introduction to knowledge graphs: to describe their foundational data models and how they can be queried; to discuss representations relating to schema, identity, and context; to discuss deductive and inductive ways to make knowledge explicit; to present a variety of techniques that can be used for the creation and enrichment of graph-structured data; to describe how the quality of knowledge graphs can be discerned and how they can be refined; to discuss standards and best practices by which knowledge graphs can be published; and to provide an overview of existing knowledge graphs found in practice. Our intended audience includes researchers and practitioners who are new to knowledge graphs. As such, we do not assume that readers have specific expertise on knowledge graphs.</p>

		<p><em class="paragraph">Knowledge graph</em>. The definition of a “<em>knowledge graph</em>” remains contentious&nbsp;[<a href="#ref-EhrlingerW16">Ehrlinger and Wöß, 2016</a>, <a href="#ref-BonattiDPP18">Bonatti et al., 2018</a>, <a href="#ref-Bergman19">Bergman, 2019</a>], where a number of (sometimes conflicting) definitions have emerged, varying from specific technical proposals to more inclusive general proposals; we address these prior definitions in Appendix&nbsp;<a href="#chap-defs">A</a>. Herein we adopt an inclusive definition, where we view a knowledge graph as <em>a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities</em>. The graph of data (aka <em>data graph</em>) conforms to a graph-based data model, which may be a <em>directed edge-labelled graph</em>, a <em>property graph</em>, etc. (we discuss concrete alternatives in Chapter&nbsp;<a href="#chap-graph">2</a>). By <em>knowledge</em>, we refer to something that is <em>known</em>. Such knowledge may be accumulated from external sources, or extracted from the knowledge graph itself. Knowledge may be composed of simple statements, such as “<em>Santiago is the capital of Chile</em>”, or quantified statements, such as “<em>all capitals are cities</em>”. Simple statements can be accumulated as edges in the data graph. If the knowledge graph intends to accumulate quantified statements, a more expressive way to represent knowledge – such as <em>ontologies</em> or <em>rules</em> – is required. <em>Deductive methods</em> can then be used to entail and accumulate further knowledge (e.g., “<em>Santiago is a city</em>”). Additional knowledge – based on simple or quantified statements – can also be extracted from and accumulated by the knowledge graph using <em>inductive methods</em>.</p>
		<p>Knowledge graphs are often assembled from numerous sources, and as a result, can be highly diverse in terms of structure and granularity. To address this diversity, representations of <em>schema</em>, <em>identity</em>, and <em>context</em> often play a key role, where a <em>schema</em> defines a high-level structure for the knowledge graph, <em>identity</em> denotes which nodes in the graph (or in external sources) refer to the same real-world entity, while <em>context</em> may indicate a specific setting in which some unit of knowledge is held true. As aforementioned, effective methods for <em>extraction</em>, <em>enrichment</em>, <em>quality assessment</em>, and <em>refinement</em> are required for a knowledge graph to grow and improve over time.</p>

		<p><em class="paragraph">In practice</em>. Knowledge graphs aim to serve as an ever-evolving shared substrate of knowledge within an organisation or community&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>]. We distinguish two types of knowledge graphs in practice: <em>open knowledge graphs</em> and <em>enterprise knowledge graphs</em>. Open knowledge graphs are published online, making their content accessible for the public good. The most prominent examples – DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>], Freebase&nbsp;[<a href="#ref-bollacker2007freebase">Bollacker et al., 2007b</a>], Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>], YAGO&nbsp;[<a href="#ref-YAGO">Hoffart et al., 2011</a>], etc. – cover many domains and are either extracted from Wikipedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>, <a href="#ref-YAGO">Hoffart et al., 2011</a>], or built by communities of volunteers&nbsp;[<a href="#ref-bollacker2007freebase">Bollacker et al., 2007b</a>, <a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>]. Open knowledge graphs have also been published within specific domains, such as media&nbsp;[<a href="#ref-RaimondFSA14">Raimond et al., 2014</a>], government&nbsp;[<a href="#ref-HendlerHMT12">Hendler et al., 2012</a>, <a href="#ref-ShadboltO13">Shadbolt and O'Hara, 2013</a>], geography&nbsp;[<a href="#ref-StadlerLHA12">Stadler et al., 2012</a>], tourism&nbsp;[<a href="#ref-LuLS16">Lu et al., 2016</a>, <a href="#ref-abs-1805-05744">Kärle et al., 2018</a>, <a href="#ref-MaturanaALMH18">Maturana et al., 2018</a>, <a href="#ref-ZhangCHYAL19">Zhang et al., 2019</a>], life sciences&nbsp;[<a href="#ref-CallahanCAD13">Callahan et al., 2013</a>], and more besides. Enterprise knowledge graphs are typically internal to a company and applied for commercial use-cases&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>]. Prominent industries using enterprise knowledge graphs include Web search (e.g., Bing&nbsp;[<a href="#ref-BingKG">Shrivastava, 2017</a>], Google&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>]), commerce (e.g., Airbnb&nbsp;[<a href="#ref-AirBnBKG">Chang, 2018</a>], Amazon&nbsp;[<a href="#ref-AmazonKG">Krishnan, 2018</a>, <a href="#ref-dong2019building">Dong, 2019</a>], eBay&nbsp;[<a href="#ref-eBayKG">Pittman et al., 2017</a>], Uber&nbsp;[<a href="#ref-UberKG">Hamad et al., 2018</a>]), social networks (e.g., Facebook&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>], LinkedIn&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>]), finance (e.g., Accenture&nbsp;[<a href="#ref-AccentureKG">Okorafor and Ray, 2019</a>], Banca d’Italia&nbsp;[<a href="#ref-BellomariniFGS19">Bellomarini et al., 2019</a>], Bloomberg&nbsp;[<a href="#ref-BloombergKG">Meij, 2019</a>], Capital One&nbsp;[<a href="#ref-CapitalOneKG">Branum and Sehon, 2019</a>], Wells Fargo&nbsp;[<a href="#ref-WellsFargoKG">Newman, 2019</a>]), among others. Applications include search&nbsp;[<a href="#ref-BingKG">Shrivastava, 2017</a>, <a href="#ref-GoogleKG">Singhal, 2012</a>], recommendations&nbsp;[<a href="#ref-AirBnBKG">Chang, 2018</a>, <a href="#ref-UberKG">Hamad et al., 2018</a>, <a href="#ref-LinkedInKG">He et al., 2016</a>, <a href="#ref-NoyGJNPT19">Noy et al., 2019</a>], personal agents&nbsp;[<a href="#ref-eBayKG">Pittman et al., 2017</a>], advertising&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], business analytics&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], risk assessment&nbsp;[<a href="#ref-ThompsonReutersKG">Tobin, 2017</a>, <a href="#ref-MaanaKG">Dalgliesh, 2016</a>], automation&nbsp;[<a href="#ref-HensonSTK19">Henson et al., 2019</a>], and more besides. We will provide more details on the use of knowledge graphs in practice in Chapter&nbsp;<a href="#chap-kgs">10</a>.</p>

		<p><em class="paragraph">Running example</em>. To keep the discussion accessible, throughout the book, we present concrete examples in the context of a hypothetical knowledge graph relating to tourism in Chile (loosely inspired by related use-cases&nbsp;[<a href="#ref-abs-1805-05744">Kärle et al., 2018</a>, <a href="#ref-LuLS16">Lu et al., 2016</a>]). The knowledge graph is managed by a tourism board that aims to increase tourism in the country and promote new attractions in strategic areas. The knowledge graph itself will eventually describe tourist attractions, cultural events, services, businesses, travel routes, etc. Some applications the organisation envisages are to:</p>
		<ul>
			<li>create a tourism portal that allows visitors to search for attractions, upcoming events, and other related services (in multiple languages);</li>
			<li>gain insights into tourism demographics in terms of season, nationalities, etc.;</li>
			<li>analyse sentiment about tourist attractions, including positive reviews, summaries of complaints about events and services, crime reports, etc.;</li>
			<li>understand tourism trajectories: the sequence of attractions, events, etc., that tourists often visit;</li>
			<li>cross-reference these tourism trajectories with currently available flights, buses, etc., to suggest new strategic routes for public transport;</li>
			<li>offer personalised recommendations of places to visit;</li>
			<li>and so forth.</li>
		</ul>

		<p><em class="paragraph">Outline</em>. The remainder of the book is structured as follows:</p>
		<dl id="outline">
			<dt>Chapter&nbsp;<a href="#chap-graph">2</a></dt>
			<dd>outlines graph data models and the languages used to query them.</dd>
			<dt>Chapter&nbsp;<a href="#chap-knowledge">3</a></dt>
			<dd>describes representations of schema, identity, and context for graphs.</dd>
			<dt>Chapter&nbsp;<a href="#chap-deductive">4</a></dt>
			<dd>presents deductive formalisms for representing and entailing knowledge.</dd>
			<dt>Chapter&nbsp;<a href="#chap-inductive">5</a></dt>
			<dd>describes inductive techniques for learning from graphs.</dd>
			<dt>Chapter&nbsp;<a href="#chap-create">6</a></dt>
			<dd>discusses the creation and enrichment of knowledge graphs.</dd>
			<dt>Chapter&nbsp;<a href="#chap-quality">7</a></dt>
			<dd>enumerates dimensions for assessing knowledge graph quality.</dd>
			<dt>Chapter&nbsp;<a href="#chap-refine">8</a></dt>
			<dd>discusses various techniques for knowledge graph refinement.</dd>
			<dt>Chapter&nbsp;<a href="#chap-publish">9</a></dt>
			<dd>introduces principles and protocols for publishing knowledge graphs.</dd>
			<dt>Chapter&nbsp;<a href="#chap-kgs">10</a></dt>
			<dd>surveys some prominent knowledge graphs and their applications.</dd>
			<dt>Chapter&nbsp;<a href="#chap-conclude">11</a></dt>
			<dd>concludes with future directions for knowledge graphs.</dd>
			<dt>Appendix&nbsp;<a href="#chap-defs">A</a></dt>
			<dd>outlines the historical background for knowledge graphs.</dd>
		</dl>
	</section>
	<section id="chap-graph" class="chapter">
		<h2>Data Graphs</h2>
		<p>At the foundation of any knowledge graph is the principle of first applying a graph abstraction to data, resulting in an initial data graph. We now discuss a selection of graph-structured data models that are commonly used in practice to represent data graphs. We then discuss the primitives that form the basis of graph query languages used to interrogate such data graphs.</p>

		<section id="ssec-graphModels" class="section">
		<h3>Models</h3>
		<p>Leaving aside graphs, let us assume that the tourism board from our running example has not yet decided how to model relevant data about attractions, events, services, etc. The board first considers using a tabular structure – in particular, relational databases – to represent the required data, and though they do not know precisely what data they will need to capture, they begin to design an initial relational schema. They begin with an <span class="sf">Event</span> table with five columns:</p>

		<p class="mathblock"><span class="sf">Event</span>(<span class="sf underline">name</span>, <span class="sf">venue</span>, <span class="sf">type</span>, <span class="sf underline">start</span>, <span class="sf">end</span>)</p>

		<p>where <span class="sf underline">name</span> and <span class="sf underline">start</span> together form the primary key of the table in order to uniquely identify recurring events. But as they start to populate the data, they encounter various issues: events may have multiple names (e.g., in different languages), events may have multiple venues, they may not yet know the start and end date-times for future events, events may have multiple types, and so forth. Incrementally addressing these modelling issues as the data become more diverse, they generate internal identifiers for events and adapt their relational schema until they have:</p>

		<p class="mathblock" id="al-schema"><span class="sf">EventName</span>(<span class="sf underline">id</span>,<span class="sf underline">name</span>), <span class="sf">EventStart</span>(<span class="sf underline">id</span>,<span class="sf">start</span>), <span class="sf">EventEnd</span>(<span class="sf underline">id</span>,<span class="sf">end</span>), <span class="sf">EventVenue</span>(<span class="sf underline">id</span>,<span class="sf underline">venue</span>), <span class="sf">EventType</span>(<span class="sf underline">id</span>,<span class="sf underline">type</span>)<span style="float: right;">(1)</span></p>

		<p>With the above schema, the organisation can now model events with \(0{-}n\) names, venues, and types, and \(0{-}1\) start dates and end dates (without needing relational nulls).</p>
		<p>Along the way, the board has to incrementally change the schema several times in order to support new sources of data. Each such change requires a costly remodelling, reloading, and reindexing of data; here we only considered one table. The tourism board struggles with the relational model because they do not know, <em>a priori</em>, what data will need to be modelled or what sources they will use. But once they reach the latter relational schema, the board finds that they can integrate further sources without more changes: with minimal assumptions on <em>multiplicities</em> (\(1{-}1\), \(1{-}n\), etc.) this schema offers a lot of flexibility for integrating incomplete and diverse data.</p>
		<p>In fact, the refined, flexible schema that the board ends up with – as shown in (<a href="#al-schema">2.1</a>) – is modelling a set of binary relations between entities, which indeed can be viewed as modelling a graph. By instead adopting a graph data model from the outset, the board could forgo the need for an upfront schema, and could define any (binary) relation between any pair of entities at any time.</p>
		<p>We now introduce graph data models popular in practice&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>].</p>

		<h4 id="sssec-directedelg" class="subsection">Directed edge-labelled graphs</h4>
		<p>A directed edge-labelled graph (sometimes known as a <em>multi-relational graph</em>&nbsp;[<a href="#ref-nickel2013tensor">Nickel and Tresp, 2013</a>, <a href="#ref-bordes2013translating">Bordes et al., 2013</a>, <a href="#ref-BalazevicAH19">Balazevic et al., 2019a</a>]) is defined as a set of nodes – like <span class="gnode">Santiago</span>, <span class="gnode">Arica</span>, <span class="gnode">EID16</span>, <span class="gnode">2018-03-22&nbsp;12:00</span> – and a set of directed labelled edges between those nodes, like <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>. In the case of knowledge graphs, nodes are used to represent entities and edges are used to represent (binary) relations between those entities. Figure&nbsp;<a href="#fig-delg">2.1</a> provides an example of how the tourism board could model some relevant event data as a directed edge-labelled graph. The graph includes data about the names, types, start and end date-times, and venues for events.<sup class="fnmark" id="fnm1"><a href="#fn1">1</a></sup><span class="footnote" id="fn1"><sup><a href="#fnm1">note 1</a></sup> We represent bidirectional edges as <span class="gnode">Viña&nbsp;del&nbsp;Mar</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span>, which more concisely depicts two directed edges: <span class="gnode">Viña&nbsp;del&nbsp;Mar</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span> and <span class="gnode">Viña&nbsp;del&nbsp;Mar</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">bus</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gnode">Arica</span>. Also while some naming conventions recommend more complete edge labels that include a verb, such as <span class="gelab">has venue</span> or <span class="gelab">is valid from</span>, in this book, for presentation purposes, we will omit the “<code>has</code>” and “<code>is</code>” verbs from such labels, using simply <span class="gelab">venue</span> or <span class="gelab">valid&nbsp;from</span>.</span> Adding information to such a graph typically involves adding new nodes and edges (with some exceptions discussed later). Representing incomplete information requires simply omitting a particular edge; for example, the graph does not yet define a start/end date-time for the Food Truck festival.</p>
		
		<figure id="fig-delg">
			<img src="images/fig-delg.svg" alt="Directed edge-labelled graph describing events and their venues" />
			<figcaption>Directed edge-labelled graph describing events and their venues</figcaption>
		</figure>

		<p>Modelling data as a graph in this way offers more flexibility for integrating new sources of data, compared to the standard relational model, where a schema must be defined upfront and followed at each step. While other structured data models such as trees (XML, JSON, etc.) would offer similar flexibility, graphs do not require organising the data hierarchically (should <code>venue</code> be a parent, child, or sibling of <code>type</code> for example?). They also allow cycles to be represented and queried (e.g., note the directed cycle in the routes between Santiago, Arica, and Viña del Mar).</p>
		<p>A standardised data model based on directed edge-labelled graphs is the Resource Description Framework (RDF)&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>], which has been recommended by the W3C. The RDF model defines different types of nodes, including <em>Internationalized Resource Identifiers</em> (IRIs)&nbsp;[<a href="#ref-rfc3987">Dürst and Suignard, 2005</a>] which allow for global identification of entities on the Web; <em>literals</em>, which allow for representing strings (with or without language tags) and other datatype values (integers, dates, etc.); and <em>blank nodes</em>, which are anonymous nodes that are not assigned an identifier (for example, rather than create internal identifiers like <code>EID15</code>, <code>EID16</code>, in RDF, we have the option to use blank nodes). We will discuss these different types of nodes further in Section&nbsp;<a href="#sec-identity">3.2</a> when we speak about issues relating to identity.</p>

		<div class="formal">
			<p>We now formally define a directed edge-labelled graph, where we denote by \(\con\) a countably infinite set of constants.</p>

			<dl class="definition" id="def-delg">
				<dt>Directed edge-labelled graph</dt>
				<dd>A <em>directed edge-labelled graph</em> is a tuple \(G = (V,E,L)\), where \(V \subseteq \con\) is a set of nodes, \(L \subseteq \con\) is a set of edge labels, and \(E \subseteq V \times L \times V\) is a set of edges.</dd>
			</dl>

			<div class="example">
				<p>In reference to Figure&nbsp;<a href="#fig-delg">2.1</a>, the set of nodes \(V\) has 15 elements, including <code>Arica</code>, <code>EID16</code>, etc. The set of edges \(E\) has 23 triples, including (<code>Arica</code>, <code>flight</code>, <code>Santiago</code>). Bidirectional edges are represented with two edges. The set of edge labels \(L\) has 8 elements, including <code>start</code>, <code>flight</code>, etc.</p>
			</div>
			
			<p>Definition&nbsp;<a href="#def-delg">2.1</a> does not state that \(V\) and \(L\) are disjoint: though not present in the example, a node can also serve as an edge-label. The definition also permits that nodes and edge labels can be present without any associated edge. Either restriction could be explicitly stated – if necessary – in a particular application while still conforming to a directed edge-labelled graph.</p>
			<p>For ease of presentation presentation, we may treat a set of (directed labelled) edges \(E \subseteq V \times L \times V\) as a directed edge-labelled graph \((V,E,L)\), in which case we refer to the graph induced by \(E\) assuming that \(V\) and \(L\) contain all and only those nodes and edge labels, respectively, used in \(E\). We may similarly apply set operators on directed edge-labelled graphs, which should be interpreted as applying to their sets of edges; for example, given \(G_1 = (V_1,E_1,L_1)\) and \(G_2 = (V_2,E_2,L_2)\), by \(G_1 \cup G_2\) we refer to the directed edge-labelled graph induced by \(E_1 \cup E_2\).</p>
		</div>

		<h4 id="subsub-heterograph" class="subsection">Heterogeneous graphs</h4>
		<p>A heterogeneous graph&nbsp;[<a href="#ref-HusseinYC18">Hussein et al., 2018</a>, <a href="#ref-WangJSWYCY19">Wang et al., 2019</a>, <a href="#ref-YangXJWHW20">Yang et al., 2020</a>] (or <em>heterogeneous information network</em>&nbsp;[<a href="#ref-sun2011pathsim">Sun et al., 2011</a>, <a href="#ref-2012Sun">Sun and Han, 2012</a>]) is a directed graph where each node and edge is assigned one type. Heterogeneous graphs are thus akin to directed edge-labelled graphs – with edge labels corresponding to edge types – but where the type of node forms part of the graph model itself, rather than being expressed with a relation (as seen in Figure&nbsp;<a href="#fig-capital">2.2</a>). An edge is called <em>homogeneous</em> if it is between two nodes of the same type (e.g., <span class="gelab">borders</span> in Figure&nbsp;<a href="#fig-capital">2.2</a>); otherwise it is called <em>heterogeneous</em> (e.g., <span class="gelab">capital</span> in Figure&nbsp;<a href="#fig-capital">2.2</a>). Heterogeneous graphs allow for partitioning nodes according to their type, for example, for the purposes of machine learning tasks&nbsp;[<a href="#ref-HusseinYC18">Hussein et al., 2018</a>, <a href="#ref-WangJSWYCY19">Wang et al., 2019</a>, <a href="#ref-YangXJWHW20">Yang et al., 2020</a>]. Conversely, such graphs typically only support a one-to-one relation between nodes and types, which is not the case for directed edge-labelled graphs (see, for example, the node <span class="gnode">Santiago</span> with zero types and <span class="gnode">EID15</span> with multiple types in Figure&nbsp;<a href="#fig-delg">2.1</a>.</p>
		
		<figure id="fig-capital">
			<figure id="fig-cap">
				<img src="images/fig-cap.svg" alt="Del graph"/>
				<figcaption>Directed edge-labelled graph</figcaption>
			</figure>
			<figure id="fig-hg">
				<img src="images/fig-hg.svg" alt="Heterogenous graph"/>
				<figcaption>Heterogenous graph</figcaption>
			</figure>
			<figcaption>Comparing directed edge-labelled graphs and heterogeneous graphs</figcaption>
		</figure>
		
		<div class="formal">
			<p>We next define the notion of a heterogeneous graph.</p>

			<dl class="definition" id="def-hg">
				<dt>Heterogeneous graph</dt>
				<dd>A <em>heterogeneous graph</em> is a tuple \(G = (V,E,L,l)\), where \(V \subseteq \con\) is a set of nodes, \(L \subseteq \con\) is a set of edge/node labels, \(E \subseteq V \times L \times V\) is a set of edges, and \(l : V \rightarrow L\) maps each node to a label.</dd>
			</dl>

			<div class="example">
				<p>In reference to Figure&nbsp;<a href="#fig-hg">2.2b</a>, the set of nodes \(V\) has three elements: <code>Santiago</code>, <code>Chile</code>, and <code>Perú</code>. The set of edges \(E\) has 3 triples, including (<code>Santiago</code>, <code>capital</code>, <code>Chile</code>). The set of edge labels \(L\) has 4 elements: <code>capital</code>, <code>borders</code>, <code>City</code>, <code>Country</code>. Finally, with respect to the node labels, \(l(\)<code>Santiago</code>\() =\) <code>City</code>, \(l(\)<code>Chile</code>\() =\) <code>Country</code>, and \(l(\)<code>Perú</code>\() =\) <code>Country</code>.</p>
			</div>

			<p>In heterogeneous graphs, edge and node labels are often called <em>types</em>. By rather defining edges with labels as per directed edge-labelled graphs – rather than separately labelling edges with \(l\) – two nodes can be related by \(n\) edges with \(n\) different labels; for example, we can represent both \((\)<code>Santiago</code>, <code>capital</code>, <code>Chile</code>\()\) and \((\)<code>Santiago</code>, <code>country</code>, <code>Chile</code>\()\) as edges in the heterogeneous graph.</p>
		</div>

		<h4 id="sssec-propgraph" class="subsection">Property graphs</h4>
		<p>Property graphs constitute an alternative graph model that offers additional flexibility when modelling more complex relations. Consider integrating incoming data that provide further details on which companies offer fares on which flights, allowing the board to better understand available routes between cities (for example, on national airlines). In the case of directed edge-labelled graphs, we cannot directly annotate an edge like <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span> with the company (or companies) offering that route. But we could add a new node denoting a flight, connect it with the source, destination, companies, and mode, as shown in Figure&nbsp;<a href="#fig-fsa">2.3a</a>. Applying this modelling to all routes in Figure&nbsp;<a href="#fig-delg">2.1</a> would, however, involve significant changes.</p>

		<p>The property graph model was thus proposed to offer additional flexibility when modelling data as a graph&nbsp;[<a href="#ref-Miller13">Miller, 2013</a>, <a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. A property graph allows a set of <em>property–value</em> pairs and a <em>label</em> to be associated with both nodes and edges. Figure&nbsp;<a href="#fig-pg">2.3b</a> depicts an example of a property graph with data analogous to Figure&nbsp;<a href="#fig-fsa">2.3a</a>. We use property–value pairs on edges to model the companies. The type of relation is captured by the label <code>flight</code>}. We further use node labels to indicate the types of the two nodes, and property–value pairs for their latitude and longitude.</p>

		<figure id="fig-flghts">
			<figure id="fig-fsa">
				<img src="images/fig-fsa.svg" alt="Directed edge-labelled graph"/>
				<figcaption>Directed edge-labelled graph</figcaption>
			</figure>
			<figure id="fig-pg">
				<img src="images/fig-pg.svg" alt="Property graph"/>
				<figcaption>Property graph</figcaption>
			</figure>
			<figcaption>Comparing directed edge-labelled graphs and property graphs</figcaption>
		</figure>

		<p>Property graphs are prominently used in graph databases, such as Neo4j&nbsp;[<a href="#ref-Miller13">Miller, 2013</a>, <a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. Property graphs can be converted to/from directed edge-labelled graphs&nbsp;[<a href="#ref-HernandezHK15">Hernández et al., 2015</a>, <a href="#ref-AnglesTT19">Angles et al., 2019</a>] (per, e.g., Figure&nbsp;<a href="#fig-pg">2.3b</a>). In summary, directed edge-labelled graphs offer a more minimal model, while property graphs offer a more flexible one. Often the choice of model will be secondary to other practical factors, such as the implementations available for different models, etc.</p>

		<div class="formal">
			<p>We formally define a property graph.</p>

			<dl class="definition" id="def-pg">
				<dt>Property graph</dt>
				<dd>A <em>property graph</em> is a tuple \(G = (V,E,L,P,U,e,l,p)\), where \(V \subseteq \con\) is a set of node ids, \(E \subseteq \con\) is a set of edge ids, \(L \subseteq \con\) is a set of labels, \(P \subseteq \con\) is a set of properties, \(U \subseteq \con\) is a set of values, \(e : E \rightarrow V \times V\) maps an edge id to a pair of node ids, \(l : V \cup E \rightarrow 2^L\) maps a node or edge id to a set of labels, and \(p : V \cup E \rightarrow 2^{P \times U}\) maps a node or edge id to a set of property–value pairs.</dd>
			</dl>

			<div class="example">
				<p>Returning to Figure&nbsp;<a href="#fig-pg">2.3b</a>:</p>
				<ul>
					<li>the set \(V\) contains <code>Santiago</code> and <code>Arica</code>;</li>
					<li>the set \(E\) contains <code>LA380</code> and <code>LA381</code>;</li>
					<li>the set \(L\) contains <code>Capital City</code>, <code>Port City</code>, and <code>flight</code>;</li>
					<li>the set \(P\) contains <code>lat</code>, <code>long</code>, and <code>company</code>;</li>
					<li>the set \(U\) contains <code>–33.45</code>, <code>–70.66</code>, <code>LATAM</code>, <code>–18.48</code>, and <code>–70.33</code>;</li>
					<li>the mapping \(e\) gives, for example, \(e(\)<code>LA380</code>\() = (\)<code>Santiago</code>, <code>Arica</code>\()\);</li>
					<li>the mapping \(l\) gives, for example, \(l(\)<code>Santiago</code>\() =\{ \)<code>Capital City</code>\(\}\) and \(l(\)<code>LA380</code>\() =\{ \)<code>flight</code>\(\}\);</li>
					<li>the mapping \(p\) gives, for example, \(p(\)<code>LA380</code>\() =\{ (\)<code>company</code>, <code>LATAM</code>\() \}\) and \(p(\)<code>Santiago</code>\() =\{ (\)<code>lat</code>, <code>–33.45</code>\(), (\)<code>long</code>, <code>–70.66</code>\() \}\).</li>
				</ul>
			</div>

			<p>Unlike previous definitions&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>], we allow a node or edge to have several values for a given property. In practice, systems like Neo4j&nbsp;[<a href="#ref-Miller13">Miller, 2013</a>] may rather support this by allowing a single array (i.e., list) of values.</p>
		</div>

		<h4 id="subsub-graphdataset" class="subsection">Graph dataset</h4>
		<p>Although multiple directed edge-labelled graphs can be merged by taking their union, it is often desirable to manage several graphs rather than one monolithic graph; for example, it may be beneficial to manage multiple graphs from different sources, making it possible to update or refine data from one source, to distinguish untrustworthy sources from more trustworthy ones, and so forth. A graph dataset then consists of a set of <em>named graphs</em> and a <em>default graph</em>. Each named graph is a pair of a graph ID and a graph. The default graph is a graph without an ID, and is referenced “by default” if a graph ID is not specified. Figure&nbsp;<a href="#fig-gd">2.4</a> provides an example where events and routes are stored in two named graphs, and the default graph manages metadata about the named graphs. Graph names can also be used as nodes in a graph. Furthermore, nodes and edges can be repeated across graphs, where the same node in different graphs will typically refer to the same entity, allowing data on that entity to be integrated when merging graphs. Though the example depicts a dataset of directed edge-labelled graphs, the concept generalises straightforwardly to datasets of other types of graphs.</p>

		<figure id="fig-gd">
			<img src="images/fig-gd.svg" alt="Graph dataset with two named graphs and a default graph describing events and routes"/>
			<figcaption>Graph dataset based on directed edge-labelled graphs with two named graphs and a default graph describing events and routes</figcaption>
		</figure>

		<p>An RDF dataset is a graph dataset model standardised by the W3C&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>] where each graph is an RDF graph, and graph names can be blank nodes or IRIs. A prominent use-case for RDF datasets is to manage and query <em>Linked Data</em> composed of interlinked documents of RDF graphs spanning the Web. When dealing with Web data, tracking the source of data becomes of key importance&nbsp;[<a href="#ref-Dividino09">Dividino et al., 2009</a>, <a href="#ref-BonattiHPS11">Bonatti et al., 2011</a>, <a href="#ref-zimm-etal-2012-JWS">Zimmermann et al., 2012</a>]. We will discuss Linked Data later in Section&nbsp;<a href="#sec-identity">3.2</a> and further discuss provenance in Section&nbsp;<a href="#ssec-knowledgeContext">3.3</a>.</p>

		<div class="formal">
			<p>We more formally define a graph dataset. We assume that all data graphs featured in a given graph dataset follow the same model (directed edge-labelled graph, heterogeneous graph, property graph, etc).</p>

			<dl class="definition" id="def-gd">
				<dt>Graph dataset</dt>
				<dd>A <em>named graph</em> is a pair \((n,G)\) where \(G\) is a data graph, and \(n \in \con\) is a graph name. A <em>graph dataset</em> is a pair \(D = (G_D,N)\) where \(G_D\) is a data graph called the <em>default graph</em> and \(N\) is either the empty set, or a set of named graphs \(\{ (n_1,G_1), \ldots (n_k,G_k) \}\) (\(k &gt; 0\)) such that if \(i \neq j\) then  \(n_i \neq n_j\) (for all \(1 \leq i \leq k\), \(1 \leq j \leq k\)).</dd>
			</dl>

			<div class="example">
				<p>Figure&nbsp;<a href="#fig-gd">2.4</a> provides an example of a directed edge-labelled graph dataset \(D\) consisting of two named graphs and a default graph. The default graph does not have a name associated with it. The two graph names are <code>Events</code> and <code>Routes</code>; these are also used as nodes in the default graph.</p>
			</div>
		</div>

		<h4 id="sssec-othergraphs" class="subsection">Other graph data models</h4>
		<p>The previous models are popular examples of graph representations. Other graph data models exist with <em>complex nodes</em> that may contain individual edges&nbsp;[<a href="#ref-AnglesG08">Angles and Gutierrez, 2008</a>, <a href="#ref-Hartig14">Hartig and Thompson, 2014</a>] or nested graphs&nbsp;[<a href="#ref-AnglesG08">Angles and Gutierrez, 2008</a>, <a href="#ref-n3">Berners-Lee and Connolly, 2011</a>] (sometimes called <em>hypernodes</em>&nbsp;[<a href="#ref-LeveneP89">Levene and Poulovassilis, 1989</a>]. Likewise the mathematical notion of a <em>hypergraph</em> defines <em>complex edges</em> that connect sets rather than pairs of nodes. In our view, a knowledge graph can adopt any such graph data model based on nodes and edges: often data can be converted from one model to another (see Figure&nbsp;<a href="#fig-fsa">2.3a</a> vs.&nbsp;Figure&nbsp;<a href="#fig-pg">2.3b</a>). In the rest of the paper, we prefer discussing directed edge-labelled graphs given their relative succinctness, but most discussion extends naturally to other models.</p>

		<h4 id="sssec-graphstore" class="subsection">Graph stores</h4>
		<p>A variety of techniques have been proposed for storing and indexing graphs, facilitating the efficient evaluation of queries (as discussed next). Directed edge-labelled graphs can be stored in relational databases either as a single relation of arity three (<em>triple table</em>), as a binary relation for each property (<em>vertical partitioning</em>), or as \(n\)-ary relations for entities of a given type (<em>property tables</em>)&nbsp;[<a href="#ref-WylotHCS18">Wylot et al., 2018</a>]. Custom (so-called <em>native</em>) storage techniques have also been developed for a variety of graph models, providing efficient access for finding nodes, edges and their adjacent elements&nbsp;[<a href="#ref-AnglesG08">Angles and Gutierrez, 2008</a>, <a href="#ref-Miller13">Miller, 2013</a>, <a href="#ref-WylotHCS18">Wylot et al., 2018</a>]. A number of systems further allow for distributing graphs over multiple machines based on popular NoSQL stores or custom partitioning schemes&nbsp;[<a href="#ref-WylotHCS18">Wylot et al., 2018</a>, <a href="#ref-JankeS18">Janke and Staab, 2018</a>]. For further details we refer to the book chapter by <a href="#ref-JankeS18">Janke and Staab [2018]</a> and the survey by <a href="#ref-WylotHCS18">Wylot et al. [2018]</a> dedicated to this topic.</p>
		</section>

		<section id="ssec-querying" class="section">
		<h3>Querying</h3>
		<p>A number of languages have been proposed for querying graphs&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>], including the SPARQL query language for RDF graphs&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>]; and Cypher&nbsp;[<a href="#ref-FrancisGGLLMPRS18">Francis et al., 2018</a>], Gremlin&nbsp;[<a href="#ref-Rodriguez15">Rodriguez, 2015</a>], and G-CORE&nbsp;[<a href="#ref-AnglesABBFGLPPS18">Angles et al., 2018</a>] for querying property graphs. We refer to <a href="#ref-seifer19">Seifer et al. [2019]</a> for an investigation of the popularity of these languages. Underlying these query languages are some common primitives, including (basic) graph patterns, relational operators, path expressions, and more besides&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. We now describe these core features for querying graphs in turn, starting with basic graph patterns.</p>

		<h4 id="sssec-graphpatterns" class="subsection">Basic graph patterns</h4>
		<p>At the core of every structured query language for graphs lie <em>basic graph patterns</em>&nbsp;[<a href="#ref-ConsensM90">Consens and Mendelzon, 1990</a>, <a href="#ref-AnglesABHRV17">Angles et al., 2017</a>], which follow the same model as the data graph being queried (see Section&nbsp;<a href="#ssec-graphModels">2.1</a>), additionally allowing variables as terms.<sup class="fnmark" id="fnm2"><a href="#fn2">2</a></sup><span class="footnote" id="fn2"><sup><a href="#fnm2">note 2</a></sup> The terms of a directed edge-labelled graph are its nodes and edge-labels. The terms of a property graph are its ids, labels, properties, and values (as used on either edges or nodes).</span> Terms in basic graph patterns are thus divided into constants, such as <span class="gnode">Arica</span> or <span class="gelab">venue</span>, and variables, which we prefix with question marks, such as <span class="gvar">?event</span> or <span class="gelab" style="color: black">?rel</span>. A basic graph pattern is then evaluated against the data graph by generating mappings from the variables of the graph pattern to constants in the data graph such that the image of the graph pattern under the mapping (replacing variables with the assigned constants) is contained within the data graph.</p>
		<p>Figure&nbsp;<a href="#fig-gp">2.5</a> provide an example of a basic graph pattern looking for the venues of Food Festivals, along with the possible mappings generated by the graph pattern against the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a>. In some of the presented mappings (the last two listed), multiple variables are mapped to the same term, which may or may not be desirable depending on the application. Hence a number of semantics have been proposed for evaluating basic graph patterns&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>], amongst which the most important are: <em>homomorphism-based semantics</em>, which allows multiple variables to be mapped to the same term such that all mappings shown in Figure&nbsp;<a href="#fig-gp">2.5</a> would be considered results; and <em>isomorphism-based semantics</em>, which requires variables on nodes and/or edges to be mapped to unique terms, thus excluding the latter three mappings of Figure&nbsp;<a href="#fig-gp">2.5</a> from the results. Different languages may adopt different semantics for evaluating basic graph patterns; for example, SPARQL adopts a homomorphism-based semantics, while Cypher adopts an isomorphism-based semantics specifically on edges (while allowing multiple variables to map to one node).</p>

		<figure id="fig-gp">
			<img src="images/fig-gp.svg" alt="Graph pattern" class="multi" />
			<div style="height:.5em;">&nbsp;</div>
			<table class="condensedTable">
				<thead>
					<tr>
						<th><span class="sf">?ev</span></th>
						<th><span class="sf">?vn1</span></th>
						<th><span class="sf">?vn2</span></th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Piscina Olímpica</code></td>
						<td><code>Sotomayor</code></td>
					</tr>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Sotomayor</code></td>
						<td><code>Piscina Olímpica</code></td>
					</tr>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Piscina Olímpica</code></td>
						<td><code>Piscina Olímpica</code></td>
					</tr>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Sotomayor</code></td>
						<td><code>Sotomayor</code></td>
					</tr>
					<tr>
						<td><code>EID15</code></td>
						<td><code>Santa Lucía</code></td>
						<td><code>Santa Lucía</code></td>
					</tr>
				</tbody>
			</table>
			<div style="height:.5em;">&nbsp;</div>
			<figcaption>basic directed edge-labelled graph pattern (left) with mappings generated over the directed edge-labelled graph of Figure&nbsp;<a href="#fig-delg">2.1</a> (right)</figcaption>
		</figure>

		<p>As we will see in later examples (particularly Figure&nbsp;<a href="#fig-cgp">2.7</a>), basic graph patterns may also form cycles (be they directed or undirected), and may replace edge labels with variables. Basic graph patterns in the context of other models – such as property graphs – can be defined analogously by allowing variables to replace constants in any position of the model.</p>

		<div class="formal">
			<p>We formalise basic graph patterns first for directed edge-labelled graphs, and subsequently for property graphs&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. For these definitions, we introduce a countably infinite set of <em>variables</em> \(\var\) ranging over (but disjoint from: \(\con \cap \var = \emptyset\)) the set of constants. We refer generically to constants and variables as <em>terms</em>, denoted and defined as \(\term = \con \cup \var\). We define a basic graph pattern for a model by simply replacing constants with terms (that may be variables). Though we focus on directed edge-labelled graphs and property graphs, basic graph patterns for other graph models can be defined analogously.</p> 

			<dl class="definition" id="def-delgp">
				<dt>Basic directed edge-labelled graph pattern</dt>
				<dd>We define a <em>basic directed edge-labelled graph pattern</em> as a tuple \(Q = (V,E,L)\), where \(V \subseteq \term\) is a set of node terms, \(L \subseteq \term\) is a set of edge terms, and \(E \subseteq V \times L \times V\) is a set of edges (triple patterns).</dd>
			</dl>

			<div class="example">
				<p>Returning to the example of Figure&nbsp;<a href="#fig-gp">2.5</a>:</p>
				<ul>
					<li>the set \(V\) contains the constant <code>Food Festival</code> and variables <code>?event</code>, <code>?ven1</code> and <code>?ven2</code>;</li>
					<li>the set \(E\) contains four edges, including \((\)<code>?event</code>, <code>type</code>, <code>Food Festival</code>\()\);</li>
					<li>the set \(L\) contains the constants <code>type</code> and <code>venue</code>.</li>
				</ul>
			</div>

			<p>A basic property graph pattern is also defined by introducing variables.</p>

			<dl class="definition" id="def-pgp">
				<dt>Basic property graph pattern</dt>
				<dd>We define a <em>basic property graph pattern</em> as a tuple \(Q = (V,E,L,P,U,e,l,p)\), where \(V \subseteq \term\) is a set of node id terms, \(E \subseteq \term\) is a set of edge id terms, \(L \subseteq \term\) is a set of label terms, \(P \subseteq \term\) is a set of property terms, \(U \subseteq \term\) is a set of value terms, \(e : E \rightarrow V \times V\) maps an edge id term to a pair of node id terms, \(l : V \cup E \rightarrow 2^{L}\) maps a node or edge id term to a set of label terms, and \(p : V \cup E \rightarrow 2^{P \times U}\) maps a node or edge id term to a set of pairs of property–value terms.</dd>
			</dl>

			<p>Towards defining the results of evaluating a basic graph pattern over a data graph (following the same model), we first define a partial mapping \(\mu : \var \rightarrow \con\) from variables to constants, whose <em>domain</em> (the set of variables for which it is defined) is denoted by \(\dom(\mu)\). Given a basic graph pattern \(Q\), let \(\var(Q)\) denote the set of all variables appearing in (some recursively nested element of) \(Q\). We further denote by \(\mu(Q)\) the image of \(Q\) under \(\mu\), meaning that any variable \(v \in \var(Q) \cap \dom(\mu)\) is replaced in \(Q\) by \(\mu(v)\). Observe that when \(\var(Q) \subseteq \dom(\mu)\), then \(\mu(Q)\) is a data graph (in the corresponding model of \(Q\)).</p>
			<p>Next, we define the notion of containment between data graphs. For two directed edge-labelled graphs \(G_1 = (V_1,E_1,L_1)\) and \(G_2 = (V_2,E_2,L_2)\), we say that \(G_1\) is a <em>sub-graph</em> of \(G_2\), denoted \(G_1 \subseteq G_2\), if and only if \(V_1 \subseteq V_2\), \(E_1 \subseteq E_2\), and \(L_1 \subseteq L_2\).<sup class="fnmark" id="fnm3"><a href="#fn3">3</a></sup><span class="footnote" id="fn3"><sup><a href="#fnm3">note 3</a></sup> Given, for example, \(G_1 = (\{a\},\{(a,b,a)\},\{b,c\})\) and \(G_2 = (\{a,c\},\{(a,b,a)\},\{b\})\), we remark that \(G_1 \not\subseteq G_2\) and \(G_2 \not\subseteq G_1\): the former has a label not used on an edge while the latter has a node without an incident edge. In concrete data models like RDF where such cases of nodes or labels without edges cannot occur, the sub-graph relation \(G_1 \subseteq G_2\) holds if and only if \(E_1 \subseteq E_2\) holds.</span> Conversely, in property graphs, nodes can often be defined without edges. For two property graphs \(G_1 = (V_1,E_1,L_1,P_1,U_1,e_1,l_1,p_1)\) and \(G_2 = (V_2,E_2,L_2,P_2,U_2,e_2,l_2,p_2)\), we say that \(G_1\) is a <em>sub-graph</em> of \(G_2\), denoted \(G_1 \subseteq G_2\), if and only if \(V_1 \subseteq V_2\), \(E_1 \subseteq E_2\), \(L_1 \subseteq L_2\), \(P_1 \subseteq P_2\), \(U_1 \subseteq U_2\), for all \(x \in E_1\) it holds that \(e_1(x) = e_2(x)\), and for all \(y \in E_1 \cup V_1\) it holds that \(l_1(y) \subseteq l_2(y)\) and \(p_1(y) \subseteq p_2(y)\).</p>
			<p>We are now ready to define the evaluation of a basic graph pattern.</p>

			<dl class="definition" id="def-evgp">
				<dt>Evaluation of a basic graph pattern</dt>
				<dd>Let \(Q\) be a basic graph pattern and let \(G\) be a data graph (in the same model). We then define the <em>evaluation of the basic graph pattern \(Q\) over the data graph \(G\)</em>, denoted \(Q(G)\), to be the set of mappings \(Q(G) = \{ \mu \mid \mu(Q) \subseteq G \text{ and } \dom(\mu) = \var(Q) \}\).</dd>
			</dl>

			<div class="example">
				<p>Figure&nbsp;<a href="#fig-gp">2.5</a> enumerates all of the mappings given by the evaluation of the depicted basic graph pattern over the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a>. Each non-header row indicates a mapping \(\mu\).</p>
			</div>

			<p>The final results of evaluating a basic graph pattern may vary depending on the choice of semantics: the results under <em>homomorphism-based semantics</em> are defined as \(Q(G)\). Conversely, under <em>isomorphism-based</em> semantics, mappings that send two edge variables to the same constant and/or mappings that send two node variables to the same constant may be excluded from the results. Henceforth we assume the more general <em>homomorphism-based semantics</em>.</p>
		</div>

		<h4 id="sssec-complexpatterns" class="subsection">Complex graph patterns</h4>
		<p>A (basic) graph pattern transforms an input graph into a table of results (as shown in Figure&nbsp;<a href="#fig-gp">2.5</a>). We may then consider using the relational algebra to combine and/or transform such tables, thus forming more complex queries from one or more graph patterns. Recall that the relational algebra consists of unary operators that accept one input table, and binary operators that accept two input tables. Unary operators include projection (\(\pi\)) to output a subset of columns, selection (\(\sigma\)) to output a subset of rows matching a given condition, and renaming of columns (\(\rho\)). Binary operators include union (\(\cup\)) to merge the rows of two tables into one table, difference (\(-\)) to remove the rows from the first table present in the second table, and joins (\(\Join\)) to extend the rows of one table with rows from the other table that satisfy a join condition. Selection and join conditions typically include equalities (\(=\)), inequalities (\(\leq\)), negation (\(\neg\)), disjunction (\(\vee\)), etc. From these operators, we can further define other (syntactic) operators, such as intersection (\(\cap\)) to output rows in both tables, anti-join (\(\rhd\), aka <em>minus</em>) to output rows from the first table for which there are no join-compatible rows in the second table, left-join (\(\mathbin{\rule[0ex]{0.3em}{.5pt}\llap{\rule[1ex]{0.3em}{.5pt}}\mkern-6mu\Join}\), aka <em>optional</em>) to perform a join but keeping rows from the first table without a compatible row in the second table, etc.</p>
		<p>Basic graph patterns can then be expressed in a subset of relational algebra (namely \(\pi\), \(\sigma\), \(\rho\), \(\Join\)). Assuming, for example, a single ternary relation \(G(s,p,o)\) representing a graph – i.e., a table \(G\) with three columns \(s\), \(p\), \(o\) – the query of Figure&nbsp;<a href="#fig-gp">2.5</a> can be expressed in relational algebra as:</p>
		
		<p class="mathblock">\(\pi_{ev,vn_1,vn_2}(\sigma_{p=\texttt{type} \wedge o=\texttt{Food Festival} \wedge p_1=p_2=\texttt{venue}}(\rho_{s/ev}(G \bowtie \rho_{p/p_1,o/vn_1}(G) \bowtie \rho_{p/p_2,o/vn_2}(G))))\)</p>
		
		<p>where \(\Join\) denotes a <em>natural join</em>, meaning that equality is checked across pairs of columns with the same name in both tables (here, the join is thus performed on the subject column \(s\)). The result of this query is a table with a column for each variable: \(ev,vn1,vn2\). However, not all queries using \(\pi, \sigma, \rho\) and \(\Join\) on \(G\) can be expressed as basic graph patterns; for example, we cannot choose which variables to project in a basic graph pattern, but rather must project all variables not fixed to a constant.</p>
		<p>Graph query languages such as SPARQL&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>] and Cypher&nbsp;[<a href="#ref-FrancisGGLLMPRS18">Francis et al., 2018</a>] allow the full use of relational operators over the results of graph patterns, giving rise to <em>complex graph patterns</em>&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. Figure&nbsp;<a href="#fig-cq">2.6</a> presents an example of a complex graph pattern with projected variables in bold, choosing particular variables to appear in the final results. In Figure&nbsp;<a href="#fig-cgp">2.7</a>, we give another example of a complex graph pattern looking for food festivals or drinks festivals not held in Santiago, optionally returning their start date and name (where available).</p>

		<figure id="fig-cq">
			<img src="images/fig-cq.svg" alt="Conjunctive query" class="multi" />
			<table class="condensedTable">
				<thead>
					<tr>
						<th><span class="sf">?name1</span></th>
						<th><span class="sf">?con</span></th>
						<th><span class="sf">?name2</span></th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<td><code>Food Truck</code></td>
						<td><code>bus</code></td>
						<td><code>Food Truck</code></td>
					</tr>
					<tr>
						<td><code>Food Truck</code></td>
						<td><code>bus</code></td>
						<td><code>Food Truck</code></td>
					</tr>
					<tr>
						<td><code>Food Truck</code></td>
						<td><code>bus</code></td>
						<td><code>Ñam</code></td>
					</tr>
					<tr>
						<td><code>Food Truck</code></td>
						<td><code>flight</code></td>
						<td><code>Ñam</code></td>
					</tr>
					<tr>
						<td><code>Food Truck</code></td>
						<td><code>flight</code></td>
						<td><code>Ñam</code></td>
					</tr>
					<tr>
						<td><code>Ñam</code></td>
						<td><code>bus</code></td>
						<td><code>Food Truck</code></td>
					</tr>
					<tr>
						<td><code>Ñam</code></td>
						<td><code>flight</code></td>
						<td><code>Food Truck</code></td>
					</tr>
					<tr>
						<td><code>Ñam</code></td>
						<td><code>flight</code></td>
						<td><code>Food Truck</code></td>
					</tr>
				</tbody>
			</table>
			<figcaption>Complex graph pattern (left) with mappings generated over the graph of Figure&nbsp;<a href="#fig-delg">2.1</a> (right)</figcaption>
		</figure>

		<p>Complex graph patterns can give rise to duplicate results; for example, the first result in Figure&nbsp;<a href="#fig-cq">2.6</a> appears twice since <code>?city1</code> matches <code>Arica</code> and <code>?city2</code> matches <code>Viña del Mar</code> in one result, and vice-versa in the other. Query languages then offer two semantics: <em>bag semantics</em> preserves duplicates according to the multiplicity of the underlying mappings, while <em>set semantics</em> (typically invoked with a <code>DISTINCT</code> keyword) removes duplicates from the results.</p>

		<figure id="fig-cgp">
			<img class="inlined" src="images/fig-cgp1.svg" alt="Complex graph pattern 1"/>
			<img class="inlined" src="images/fig-cgp2.svg" alt="Complex graph pattern 2"/>
			<img class="inlined" src="images/fig-cgp3.svg" alt="Complex graph pattern 3"/>
			<img class="inlined" src="images/fig-cgp4.svg" alt="Complex graph pattern 4"/>
			<img class="inlined" src="images/fig-cgp5.svg" alt="Complex graph pattern 5"/>
			<div><div style="display:inline;">\(Q := ((((Q_1 \cup Q_2) \rhd Q_3)\) \(\mathbin{\rule[0ex]{0.3em}{.5pt}\llap{\rule[1ex]{0.3em}{.5pt}}\mkern-6mu\Join}\) \(Q_4 )\) \(\mathbin{\rule[0ex]{0.3em}{.5pt}\llap{\rule[1ex]{0.3em}{.5pt}}\mkern-6mu\Join}\) \(Q_5),\qquad Q(G) =\) <table class="condensedTable" style="position:relative;top:.6em;display:inline-block;vertical-align:middle;"><thead><tr><th>?event</th><th>?start</th><th>?name</th></tr></thead><tbody><tr><td><code>EID16</code></td><td></td><td><code>Food Truck</code></td></tr></tbody></table></div></div>
			<figcaption>Complex graph pattern (\(Q\)) with mappings generated (\(Q(G)\)) over the graph of Figure&nbsp;<a href="#fig-delg">2.1</a> (\(G\))</figcaption>
		</figure>

		<div class="formal">
			<p>We now formally define complex graph patterns.</p>

			<dl class="definition" id="def-cgp">
				<dt>Complex graph pattern</dt>
				<dd><em>Complex graph patterns</em> are defined recursively, as follows:
					<ul>
						<li>If \(Q\) is a basic graph pattern, then \(Q\) is a <em>complex graph pattern</em>.</li>
						<li>If \(Q\) is a complex graph pattern, and \(\mathcal{V} \subseteq \var(Q)\), then \(\pi_\mathcal{V}(Q)\) is a <em>complex graph pattern</em>.</li>
						<li>If \(Q\) is a complex graph pattern, and \(R\) is a selection condition with Boolean and equality connectives (\(\wedge\), \(\vee\), \(\neg\), \(=\)) , then \(\sigma_R(Q)\) is a <em>complex graph pattern</em>.</li>
						<li>If both \(Q_1\) and \(Q_2\) are complex graph patterns, then \(Q_1 \Join Q_2\), \(Q_1 \cup Q_2\), \(Q_1 - Q_2\) and \(Q_1 \rhd Q_2\) are also <em>complex graph patterns</em>.</li>
					</ul>
				</dd>
			</dl>

			<p>We now define the evaluation of complex graph patterns. Given a mapping \(\mu\), for a set of variables \(\mathcal{V} \subseteq \var\) let \(\mu[\mathcal{V}]\) denote the mapping \(\mu'\) such that \(\dom(\mu') = \dom(\mu) \cap \mathcal{V}\) and \(\mu'(v) = \mu(v)\) for all \(v \in \dom(\mu')\) (in other words, \(\mu[\mathcal{V}]\) projects the variables \(\mathcal{V}\) from \(\mu\)). Letting \(R\) denote a Boolean selection condition and \(\mu\) a mapping, we denote by \(\mu \models R\) that \(\mu\) satisfies the Boolean condition. Finally, we define two mappings \(\mu_1\) and \(\mu_2\) to be <em>compatible</em>, denoted \(\mu_1 \sim \mu_2\), if and only if \(\mu_1(v) = \mu_2(v)\) for all \(v \in \dom(\mu_1) \cap \dom(\mu_2)\) (i.e., they map common variables to the same constant). We are now ready to provide the definition.</p>

			<dl class="definition" id="def-evalcgp">
				<dt>Complex graph pattern evaluation</dt>
				<dd>Given a complex graph pattern \(Q\), if \(Q\) is a basic graph pattern, then \(Q(G)\) is defined per Definition&nbsp;<a href="#def-evgp">2.7</a>. Otherwise, \(Q(G)\) is defined as follows:
				\begin{align*}
				 \pi_\mathcal{V}(Q)(G) = & \,\{ \mu[\mathcal{V}] \mid \mu \in Q(G) \} \\
				 \sigma_R(Q)(G) = & \, \{ \mu \mid \mu \in Q(G)\text{ and }\mu \models R\}\\
				 Q_1 \Join Q_2(G) = & \,\{ \mu_1 \cup \mu_2 \mid \mu_1 \in Q_2(G), \mu_2 \in Q_1(G)\text{ and }\mu_1 \sim \mu_2 \} \\
				 Q_1 \cup Q_2(G) = & \,\{ \mu \mid \mu \in Q_1(G)\text{ or } \mu \in Q_2(G) \} \\
				 Q_1 - Q_2(G) = & \,\{ \mu \mid \mu \in Q_1(G)\text{ and } \mu \notin Q_2(G) \} \\
				 Q_1 \rhd Q_2(G) = & \,\{ \mu \mid \mu \in Q_1(G)\text{ and }\nexists \mu_2 \in Q_2(G)\text{ such that }\mu \sim \mu_2 \}
				\end{align*}</dd>
			</dl>

			<p>Based on these operators, we can define some additional syntactic operators, such as the <em>left-join</em> \(\mathbin{\rule[0ex]{0.3em}{.5pt}\llap{\rule[1ex]{0.3em}{.5pt}}\mkern-6mu\Join}\), aka <em>optional</em>):</p>
			<p>
			\begin{align*}
			 Q_1 \mathbin{\rule[0ex]{0.3em}{.5pt}\llap{\rule[1ex]{0.3em}{.5pt}}\mkern-6mu\Join} Q_2(G) = & \,(Q_1(G) \Join Q_2(G)) \cup (Q_1(G) \rhd Q_2(G))
			\end{align*}
			</p>
			<p>We call such operators <em>syntactic</em> as they do not add expressivity.</p>

			<div class="example">
				<p>Figure&nbsp;<a href="#fig-cgp">2.7</a> illustrates a complex graph pattern and its evaluation.</p>
			</div>
		</div>

		<h4 id="sssec-navpatterns" class="subsection">Navigational graph patterns</h4>
		<p>A key feature that distinguishes graph query languages is the ability to include <em>path expressions</em> in queries. A path expression \(r\) is a regular expression that allows for matching arbitrary-length paths between two nodes using a <em>regular path query</em> \((x,r,y)\), where \(x\) and \(y\) can be variables or constants (or even the same term). The base path expression is where \(r\) is a constant (an edge label). Furthermore if \(r\) is a path expression, then \(r^*\) (<em>Kleene star</em>: zero-or-more) is also a path expression. Finally, if \(r_1\) and \(r_2\) are path expressions, then \(r_1 \mid r_2\) (<em>disjunction</em>) and \(r_1 \cdot r_2\) (<em>concatenation</em>) are also path expressions. A related notion is that of <em>2-way regular path queries</em>, which also allow for querying inverse paths; specifically, if \(r\) is path expression, then it is a <em>2-way path expression</em>, and if \(r\) is a <em>2-way path expression</em>, then \(r^-\) (<em>inverse</em>) is a <em>2-way path expression</em>. Henceforth we will refer generically to both the 1-way and 2-way variants as path expressions and regular path queries.</p>
		<p>Regular path queries can be evaluated under a number of different semantics. For example, \((\)<code>Arica</code>, <code>bus*</code>, <code>?city</code>\()\) evaluated against the graph of Figure&nbsp;<a href="#fig-delg">2.1</a> may match the paths shown in Figure&nbsp;<a href="#fig-path">2.8</a>. In fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason, restricted semantics are often applied, returning only the shortest paths, or paths without repeated nodes or edges (as in the case of Cypher).<sup class="fnmark" id="fnm4"><a href="#fn4">4</a></sup><span class="footnote" id="fn4"><sup><a href="#fnm4">note 4</a></sup> Mapping variables to paths requires special treatment&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. Cypher&nbsp;[<a href="#ref-FrancisGGLLMPRS18">Francis et al., 2018</a>] returns a string that encodes a path, upon which certain functions such as <code>length(·)</code> can be applied. G-CORE&nbsp;[<a href="#ref-AnglesABBFGLPPS18">Angles et al., 2018</a>], on the other hand, allows for returning paths, and supports additional operators on them, including projecting them as graphs, applying cost functions, and more besides.</span> Rather than returning paths, another option is to instead return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL&nbsp;1.1).</p>

		<figure id="fig-path">
			<img class="inlined" src="images/fig-path1.svg" alt="Path matching 1"/>
			<img class="inlined" src="images/fig-path2.svg" alt="Path matching 2"/>
			<img class="inlined" src="images/fig-path3.svg" alt="Path matching 3"/>
			<img class="inlined" src="images/fig-path4.svg" alt="Path matching 4"/>
			<span style="margin-left:2em;">⋯</span>
			<figcaption>Example paths matching \((\)<code>Arica</code>, <code>bus*</code>, <code>?city</code>\()\) over the graph of Figure&nbsp;<a href="#fig-delg">2.1</a></figcaption>
		</figure>

		<p>Regular path queries can then be used in basic graph patterns to express <em>navigational graph patterns</em>&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>], as shown in Figure&nbsp;<a href="#fig-ngp">2.9</a>, which illustrates a query searching for food festivals in cities reachable (recursively) from Arica by bus or flight. Furthermore, when regular path queries and graph patterns are combined with operators such as projection, selection, union, difference, and optional, the result is known as <em>complex navigational graph patterns</em>&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>].</p>

		<div class="formal">
			<p>We first define path expressions and regular path queries.</p>

			<dl class="definition" id="def-path-expression">
				<dt>Path expression</dt>
				<dd>A constant (edge label) \(c\) is a <em>path expression</em>. Furthermore, if \(r\), \(r_1\) and \(r_2\) are path expressions, then:
					<ul>
						<li>\(r^-\) (<em>inverse</em>) and \(r^*\) (<em>Kleene star</em>) are <em>path expressions</em>.</li>
						<li>\(r_1 \cdot r_2\) (<em>concatenation</em>) and \(r_1 \mid r_2\) (<em>disjunction</em>) are <em>path expressions</em>.</li>
					</ul>
				</dd>
			</dl>
		
			<p>We now define the evaluation of a path expression on a directed-edge labelled graph under the SPARQL 1.1-style semantics whereby the endpoints (pairs of start and end nodes) of the path are returned&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>].</p>

			<dl class="definition" id="def-path-expression-evaluation">
				<dt>Path evaluation (directed edge-labelled graph)</dt>
				<dd>Given a directed edge-labelled graph \(G = (V,E,L)\) and a path expression \(r\), we define the <em>evaluation of \(r\) over \(G\)</em>, denoted \(r[G]\), as follows:
				\begin{align*}
				r[G] = &\, \{ (u,v) \mid (u,r,v) \in E \} \,(\text{for }r \in \con) \\
				r^-[G] = &\, \{ (u,v) \mid (v,u) \in r[G] \} \\
				r_1 \mid r_2[G] = &\, r_1[G] \cup r_2[G] \\
				r_1 \cdot r_2[G] = &\, \{ (u,v) \mid \exists w \in V : (u,w) \in r_1[G]\text{ and }(w,v) \in r_2[G]\}\\
				r^*[G] = &\, V \cup \bigcup_{n \in \mathbb{N^+}} r^n[G]
				\end{align*}
				where by \(r^n\) we denote the \(n\)<sup>th</sup>-concatenation of \(r\) (e.g., \(r^3 = r \cdot r \cdot r\)).</dd>
			</dl>

			<p>The evaluation of a path expression on a property graph \(G = (V,E,L,P,U,e,l,p)\) can be defined analogously by adapting the first definition (in the case that \(r \in \con\)) as follows:</p>
			<p>\[ r[G] = \{(u,v) \mid \exists x \in E : e(x) = (u,v)\text{ and }l(e) = r \} \,.\]</p>
			<p>The rest of the definitions then remain unchanged.</p>
			<p>Query languages may support additional operators, some of which are syntactic (e.g., \(r^+\) is sometimes used for one-or-more, but can be rewritten as \(r \cdot r^*\)), while others may add expressivity such as the case of SPARQL&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>], which allows a limited form of negation in expressions (e.g., \(!r\), with \(r\) being a constant or the inverse of a constant, matching any path not labelled \(r\)).</p>
			<p>Next we define a regular path query and its evaluation.</p>

			<dl class="definition" id="def-regular-path-query">
				<dt>Regular path query</dt>
				<dd>A <em>regular path query</em> is a triple \((x,r,y)\) where \(x,y \in \con \cup \var\) and \(r\) is a path expression.</dd>
			</dl>

			<dl class="definition" id="def-regular-path-query-evaluation">
				<dt>Regular path query evaluation</dt>
				<dd>Let \(G\) denote a directed edge-labelled graph, \(c\), \(c_1\), \(c_2 \in \con\) denote constants and \(z\), \(z_1\), \(z_2 \in \var\) denote variables. Then the <em>evaluation of a regular path query</em> is defined as follows:
				\begin{align*}
				(c_1,r,c_2)(G) = & \{ \mu_\emptyset \mid (c_1,c_2) \in r[G] \} \\
				(c,r,z)(G) = & \{ \mu \mid \dom(\mu) = \{ z \}\text{ and }(c,\mu(z)) \in r[G] \} \\
				(z,r,c)(G) = & \{ \mu \mid \dom(\mu) = \{ z \}\text{ and }(\mu(z),c) \in r[G] \} \\
				(z_1,r,z_2)(G) = & \{ \mu \mid \dom(\mu) = \{ z_1, z_2 \}\text{ and }(\mu(z_1),\mu(z_2)) \in r[G] \}
				\end{align*}
				where \(\mu_\emptyset\) denotes the empty mapping such that \(\dom(\mu) = \emptyset\) (the join identity).</dd>
			</dl>

			<dl class="definition" id="def-navigational-graph-pattern">
				<dt>Navigational graph pattern</dt>
				<dd>If \(Q\) is a basic graph pattern, then \(Q\) is a <em>navigational graph pattern</em>. If \(Q\) is a navigational graph pattern and \((x,r,y)\) is a regular path query, then \(Q \Join (x,r,y)\) is a <em>navigational graph pattern</em>.</dd>
			</dl>

			<p>The definition of the evaluation of a navigational graph pattern then follows from the previous definition of a join and the definition of the evaluation of a regular path query (for a directed edge-labelled graph or a property graph, respectively). Likewise, <em>complex navigational graph patterns</em> – and their evaluation – are defined by extending this definition in the natural way with the same operators from Definition&nbsp;<a href="#def-cgp">2.8</a> following the same semantics seen in Definition&nbsp;<a href="#def-evalcgp">2.9</a>.</p>
		</div>

		<figure id="fig-ngp">
			<img src="images/fig-ngp.svg" alt="Navigational graph pattern" class="multi" />
			<div style="height:2em;">&nbsp;</div>
			<table class="condensedTable">
				<thead>
					<tr>
						<th><span class="sf">?event</span></th>
						<th><span class="sf">?name</span></th>
						<th><span class="sf">?city</span></th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<td><code>EID15</code></td>
						<td><code>Ñam</code></td>
						<td><code>Santiago</code></td>
					</tr>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Food Truck</code></td>
						<td><code>Arica</code></td>
					</tr>
					<tr>
						<td><code>EID16</code></td>
						<td><code>Food Truck</code></td>
						<td><code>Viña del Mar</code></td>
					</tr>
				</tbody>
			</table>
			<div style="height:1em;">&nbsp;</div>
			<figcaption>Navigational graph pattern (left) with mappings generated over the graph of Figure&nbsp;<a href="#fig-delg">2.1</a> (right)</figcaption>
		</figure>

		<h4 id="app-qother" class="subsection">Other features</h4>
		<p>Thus far, we have discussed features that form the practical and theoretical foundation of any query language for graphs&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. However, specific query languages for graphs may support other features, such as aggregation (<code>GROUP BY</code>, <code>COUNT</code>, etc.), more complex filters and datatype operators (e.g., range queries on years extracted from a date), federation for querying remotely hosted graphs over the Web, languages for updating graphs, support for entailment, etc. For more information, we refer to the documentation of the respective query languages (e.g.,&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>, <a href="#ref-AnglesABBFGLPPS18">Angles et al., 2018</a>]) and to the survey by&nbsp;<a href="#ref-AnglesABHRV17">Angles et al. [2017]</a>.</p>

		<h4 id="app-quis" class="subsection">Query Interfaces</h4>
		<p>Knowledge graphs are often queried by non-expert users who may not be able to express their information needs in terms of a particular graph query language. Different types of interfaces have thus been proposed in order to assist users in querying data graphs. Such interfaces may support, for example:</p>

		<dl>
			<dt>Faceted browsing:</dt>
			<dd>Users start by specifying a simple search, such as a keyword search, a type of node like <code>Food Festival</code>, or possibly other kinds of search. They are then presented with a set of matching results, and a set of facets, which are typically attributes (e.g., <code>venue</code>) and values (e.g., <code>Santa Lucía</code>) present in the current results set. Selecting a value for a facet restricts the current results set to include only results with the indicated value; this selection process can be applied iteratively to restrict results per multiple facets. Often the faceted criteria are translated into and evaluated as graph queries. Though relatively intuitive for users, such systems typically support acyclic queries that generate lists of results (analogous to graph queries that project a single variable), and rarely support more expressive queries. Examples of faceted browsing systems for graphs include VisiNav&nbsp;[<a href="#ref-Harth10">Harth, 2010</a>], Broccoli&nbsp;[<a href="#ref-BastB13">Bast and Buchhold, 2013</a>], SemFacet&nbsp;[<a href="#ref-ArenasGKMZ16">Arenas et al., 2016</a>], GraFa&nbsp;[<a href="#ref-Moreno-VegaH18">Moreno-Vega and Hogan, 2018</a>], etc.</dd>
			<dt>Query building:</dt>
			<dd>Users are provided with a form or graphical interface that can be used to specify a graph query without needing to understand the syntax of a specific query language. Such query builders allow for incrementally adding nodes or edges to the query, assisted by features such as auto-completion, previewing intermediate results, and graph navigation. Query builders typically allow for expressing queries equivalent to (cyclic) basic graph patterns, but may not support more expressive features of query languages as described herein. Graph query builder systems include Smeagol&nbsp;[<a href="#ref-ClemmerD11">Clemmer and Davies, 2011</a>], QueryVOWL&nbsp;[<a href="#ref-HaagLSE15a">Haag et al., 2015</a>], VIIQ&nbsp;[<a href="#ref-JayaramGL15">Jayaram et al., 2015a</a>], Sparklis&nbsp;[<a href="#ref-Ferre17">Ferré, 2017</a>], RDF Explorer&nbsp;[<a href="#ref-VargasAHL19">Vargas et al., 2019</a>], and more besides.</dd>
			<dt>Query-by-example:</dt>
			<dd>Users provide examples of positive and sometimes negative answers to their queries. For example, they may provide as positive examples the nodes <span class="gnode">Arica</span>, <span class="gnode">Santiago</span>, <span class="gnode">Viña del Mar</span>, and as negative examples the nodes <span class="gnode">Chile</span>, <span class="gnode">Lima</span>, where the system will then “reverse engineer” a query that returns positive examples but not negative examples (in this case, the query proposed may return nodes of type <code>City</code> whose <code>country</code> is <code>Chile</code>). Query-by-example systems typically support basic graph patterns, and may not support more expressive querying features. They are useful in cases where users have examples of what they are looking for, but are not necessarily sure of the query they need to retrieve similar examples. Query-by-example systems for graphs include GQBE&nbsp;[<a href="#ref-JayaramKLYE15">Jayaram et al., 2015b</a>] and SPARQLByE&nbsp;[<a href="#ref-DiazAB16">Diaz et al., 2016</a>].</dd>
			<dt>Question answering:</dt>
			<dd>Users express their queries as questions in natural language; for example, they might ask “<em>What food festivals will be held in Arica?</em>”. The question answering system will then generate answers from the graph based on its best interpretation of the question. We identify three types of question answering system. <em>Navigation-based systems</em> identify entities/nodes from the graph that are mentioned in the query, and then attempt to navigate edges from those nodes whose labels best match the question; for example, they may match the nodes <span class="gnode">Food Festival</span> and <span class="gnode">Arica</span> in the graph based on the question, and from there, try to navigate edges in the graph whose labels match the question in order to find answers. <em>Template-based systems</em> rather pre-suppose a fixed list of question templates expressed in the query language, with placeholder variables that will be replaced with entities/nodes detected in the question; a template matched for the previous example may be of the form “<em>What <code>X</code> will be held in <code>Y</code>?</em>”. <em>Translation-based systems</em> attempt to translate the question into a query in the structured query language, using (typically neural) machine translation techniques. The latter two types of question answering systems can additionally return a graph query that explains the answers generated. Question answering systems are often very intuitive to use, but may not always return correct results, particularly when considering complex questions/queries. Examples of question answering systems for knowledge graphs include Treo&nbsp;[<a href="#ref-FreitasOOCS11a">Freitas et al., 2011</a>], NFF&nbsp;[<a href="#ref-Hu0YWZ18">Hu et al., 2018</a>], TemplateQA&nbsp;[<a href="#ref-ZhengYZC18">Zheng et al., 2018</a>], WDAqua-core1&nbsp;[<a href="#ref-DiefenbachBSM20">Diefenbach et al., 2020</a>], and more besides.</dd>
		</dl>

		<p>Such query interfaces enable non-expert users to formulate queries over graphs, which in turn broadens the potential impact of knowledge graphs.</p>
		</section>
	</section>
	<section id="chap-knowledge" class="chapter">
		<h2>Schema, Identity, Context</h2>
		<p>In this chapter we describe extensions of the data graph – relating to schema, identity and context – that provide additional structures for accumulating knowledge. Henceforth, we refer to a <em>data graph</em> as a collection of data represented as nodes and edges using one of the models discussed in Chapter&nbsp;<a href="#chap-graph">2</a>. We refer to a <em>knowledge graph</em> as a data graph potentially enhanced with representations of schema, identity, context, ontologies and/or rules. These additional representations may be embedded in the data graph, or layered above. Representations for schema, identity and context are discussed now, while ontologies and rules will be discussed in Chapter&nbsp;<a href="#chap-deductive">4</a>.</p>

		<section id="sec-schema" class="section">
		<h3>Schema</h3>
		<p>One of the benefits of modelling data as graphs – versus, for example, the relational model – is the option to forgo or postpone the definition of a schema. However, when modelling data as graphs, schemata <em>can</em> be used to prescribe a high-level structure and/or semantics that the graph follows or should follow. We discuss three types of graph schemata: <em>semantic</em>, <em>validating</em>, and <em>emergent</em>.</p>

		<h4 id="sec-semSchema" class="subsection">Semantic schema</h4>
		<p>A semantic schema allows for defining the meaning of high-level terms (aka <em>vocabulary</em> or <em>terminology</em>) used in the graph, which facilitates reasoning over graphs using those terms. Looking at Figure&nbsp;<a href="#fig-delg">2.1</a>, for example, we may notice some natural groupings of nodes based on the types of entities to which they refer. We may thus decide to define <em>classes</em>, such as <code>Event</code>, <code>City</code>, etc., to denote these groupings. In fact, Figure&nbsp;<a href="#fig-delg">2.1</a> already illustrates three low-level classes – <code>Open Market</code>, <code>Food Market</code>, <code>Drinks Festival</code> – grouping similar entities with an edge labelled <span class="gelab">type</span>. We may subsequently wish to capture some relations between some of these classes. In Figure&nbsp;<a href="#fig-classhier">3.1</a>, we present a class hierarchy for events where children are defined to be <em>sub-classes</em> of their parents such that if we find an edge <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Food&nbsp;Festival</span> in our graph, we may also <em>infer</em> that <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Festival</span> and <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span> hold in the graph.</p>

		<figure id="fig-classhier">
			<img src="images/fig-classhier.svg" alt="Example class hierarchy for Event"/>
			<figcaption>Example class hierarchy for <code>Event</code></figcaption>
		</figure>

		<p>Aside from classes, we may also wish to define the semantics of edge labels, aka <em>properties</em>. Returning to Figure&nbsp;<a href="#fig-delg">2.1</a>, we may consider that the properties <span class="gelab">city</span> and <span class="gelab">venue</span> are <em>sub-properties</em> of a more general property <span class="gelab">location</span>, such that given an edge <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, for example, we may also infer that <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> must hold as an edge in the graph. We may also consider, for example, that <span class="gelab">bus</span> and <span class="gelab">flight</span> are both sub-properties of a more general property <span class="gelab">connects&nbsp;to</span>. Along these lines, properties may also form a hierarchy similar to what we saw for classes. We may further define the <em>domain</em> of properties, indicating the class(es) of entities for nodes from which edges with that property extend; for example, we may define that the domain of <span class="gelab">connects&nbsp;to</span> is a class <code>Place</code>, such that given the previous sub-property relations, we infer <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Place</span>. Conversely, we may define the <em>range</em> of properties, indicating the class(es) of entities for nodes to which edges with that property extend; for example, we may define that the range of <span class="gelab">city</span> is a class <code>City</code>, inferring that <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">City</span>.</p>
		<p>A prominent standard for defining a semantic schema for (RDF) graphs is the <em>RDF Schema</em> (<em>RDFS</em>) standard&nbsp;[<a href="#ref-RDFS">Brickley and Guha, 2014</a>], which allows for defining sub-classes, sub-properties, domains, and ranges amongst the classes and properties used in an RDF graph, where such definitions can be serialised as a graph. We illustrate the semantics of these features in Table&nbsp;<a href="#tab-semSchema">3.1</a> and provide a concrete example of definitions in Figure&nbsp;<a href="#fig-sg">3.2</a> for a sample of terms used in the running example. These definitions can then be embedded into a data graph. More generally, the semantics of terms used in a graph can be defined in much more depth than seen here, as is supported by the <em>Web Ontology Language</em> (<em>OWL</em>) standard&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>] for RDF graphs. We will return to such semantics later in Chapter&nbsp;<a href="#chap-deductive">4</a>.</p>

		<table class="normalTable" id="tab-semSchema">
			<caption>Definitions for sub-class, sub-property, domain and range</caption>
			<thead>
				<tr>
					<th>Feature</th>
					<th>Definition</th>
					<th>Condition</th>
					<th>Example</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>Subclass</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td><span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span> implies <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Place</span></td>
				</tr>
				<tr>
					<td>Subproperty</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(q\)</span></td>
					<td><span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span> implies <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(q\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">location</span></td>
				</tr>
				<tr>
					<td>Domain</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span> implies <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span></td>
				</tr>
				<tr>
					<td>Range</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">range</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span> implies <span class="gnode">\(y\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">range</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Venue</span></td>
				</tr>
			</tbody>
		</table>
		<figure id="fig-sg">
			<img src="images/fig-sg.svg" alt="Example schema graph describing sub-classes, sub-properties, domains, and ranges"/>
			<figcaption>Example schema with sub-classes, sub-properties, domains, and ranges</figcaption>
		</figure>

		<p>Semantic schemata are typically defined for incomplete graph data, where the absence of an edge between two nodes, such as <span class="gnode">Viña&nbsp;del&nbsp;Mar</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span>, does not mean that the relation does not hold in the real world. Therefore, from the graph of Figure&nbsp;<a href="#fig-delg">2.1</a>, we cannot assume that there is no flight between Viña del Mar and Arica. In contrast, if the <em>Closed World Assumption</em> (<em>CWA</em>) were adopted – as is the case in many classical database systems – it would be assumed that the data graph is a complete description of the world, thus allowing to assert with certainty that no flight exists between the two cities. Systems that do not adopt the CWA are said to adopt the <em>Open World Assumption</em> (<em>OWA</em>). Considering our running example, it would be unreasonable to assume that the tourism organisation has complete knowledge of everything describable in its knowledge graph, and hence adopting the OWA appears more appropriate. However, it can be inconvenient if a system is unable to definitely answer “<em>yes</em>” or “<em>no</em>” to questions such as “<em>is there a flight between Arica and Viña del Mar?</em>”, especially when the organisation is certain that it has complete knowledge of the flights. A compromise between OWA and CWA is the <em>Local Closed World Assumption</em> (<em>LCWA</em>), where portions of the data graph are assumed to be complete.</p>

		<h4 id="sssec-validating-schema" class="subsection">Validating schema</h4>
		<p>When graphs are used to represent diverse, incomplete data at large scale, the OWA is the most appropriate choice for a <em>default</em> semantics. But in some scenarios, we may wish to guarantee that our data graph – or specific parts thereof – are in some sense “complete”. Returning to Figure&nbsp;<a href="#fig-delg">2.1</a>, for example, we may wish to ensure that all events have at least a name, a venue, a start date, and an end date, such that applications using the data – e.g., one that sends event notifications to users – can ensure that they have the minimal information required. Furthermore, we may wish to ensure that the city of an event is <em>stated to be</em> a city (rather than <em>inferring</em> that it is a city). We can define such constraints in a validating schema and validate the data graph with respect to the resulting schema, listing constraint violations (if any). Thus while semantic schemata allow for inferring new graph data, validating schemata allow for validating a given data graph with respect to some constraints.</p>
		<p>A standard way to define a validating schema for graphs is using <em>shapes</em>&nbsp;[<a href="#ref-SHACLSpec">Knublauch and Kontokostas, 2017</a>, <a href="#ref-Prudhommeaux2014">Prud'hommeaux et al., 2014</a>, <a href="#ref-Labra2017">Labra Gayo et al., 2018</a>]. A shape <em>targets</em> a set of nodes in a data graph and specifies <em>constraints</em> on those nodes. The shape’s target can be defined in many ways, such as targeting all instances of a class, the domain or range of a property, the result of a query, nodes connected to the target of another shape by a given property, etc. Constraints can then be defined on the targeted nodes, such as to restrict the number or types of values taken on a given property, the shapes that such values must satisfy, etc</p>
		<p>A <em>shapes graph</em> is formed from a set of interrelated shapes. Shapes graphs can be depicted as UML-like class diagrams, where Figure&nbsp;<a href="#fig-shapeExample">3.3</a> illustrates an example of a shapes graph based on Figure&nbsp;<a href="#fig-delg">2.1</a>, defining constraints on four interrelated shapes. Each shape – denoted with a box like <span class="shap">Place</span>, <span class="shap">Event</span>, etc. – is associated with a set of constraints. Nodes conform to a shape if and only if they satisfy all constraints defined on the shape. Inside each shape box are placed constraints on the number (e.g., <code>[1..*]</code> denotes one-to-many, <code>[1..1]</code> denotes precisely one, etc.) and types (e.g., <code>string</code>, <code>dateTime</code>, etc.) of nodes that conforming nodes can relate to with a property (e.g., <span class="gelab">name</span>, <span class="gelab">start</span>, etc.). Another option is to place constraints on the number of nodes conforming to a particular shape that the conforming node can relate to with a property (thus generating edges between shapes); for example, <span class="shap">Event</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="stack"><span class="edge">venue</span><br/><span class="edge">1..*</span></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="shap">Venue</span> denotes that conforming nodes for <span class="shap">Event</span> must relate to at least one node with the property <span class="gelab">venue</span> that conforms to the <span class="shap">Venue</span> shape. Shapes can inherit the constraints of parent shapes – with inheritance denoted with an \(\triangle\) connector – as in the case of <span class="shap">City</span> and <span class="shap">Venue</span>, whose conforming nodes must also conform to the <span class="shap">Place</span> shape.</p>

		<figure id="fig-shapeExample">
			<img src="images/fig-shapeExample.svg" alt="Example shapes graph depicted as a UML-like diagram"/>
			<figcaption>Example shapes graph depicted as a UML-like diagram</figcaption>
		</figure>

		<p>Given a shape and a targeted node, it is possible to check if the node conforms to that shape or not, which may require checking conformance of other nodes; for example, the node <span class="gnode">EID15</span> conforms to the <span class="shap">Event</span> shape not only based on its local properties, but also based on conformance of <span class="gnode">Santa&nbsp;Lucía</span> to <span class="shap">Venue</span> and <span class="gnode">Santiago</span> to <span class="shap">City</span>. Conformance dependencies may also be recursive, where the conformance of <span class="gnode">Santiago</span> to <span class="shap">City</span> requires that it conforms to <span class="shap">Place</span>, which requires that <span class="gnode">Viña&nbsp;del&nbsp;Mar</span> and <span class="gnode">Arica</span> conform to <span class="shap">Place</span>, and so on. Conversely, <span class="gnode">EID16</span> does not conform to <span class="gnode">Event</span>, as it does not have the <span class="gelab">start</span> and <span class="gelab">end</span> properties required by the example shapes graph.</p>
		<p>When declaring shapes, the data modeller may not know in advance the entire set of properties that some nodes can have (now or in the future). An <em>open shape</em> allows the node to have additional properties not specified by the shape, while a <em>closed shape</em> does not. For example, if we add the edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">founder</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Pedro&nbsp;de&nbsp;Valdivia</span> to the graph represented in Figure&nbsp;<a href="#fig-delg">2.1</a>, then <span class="gnode">Santiago</span> only conforms to the <span class="shap">City</span> shape if the shape is defined as open (since the shape does not mention <span class="gelab">founder</span>).</p>
		<p>Practical languages for shapes often support additional Boolean features, such as conjunction (<em class="sc">and</em>), disjunction (<em class="sc">or</em>), and negation (<em class="sc">not</em>) of shapes; for example, we may say that all the values of <span class="gelab">venue</span> should conform to the shape <span class="shap"><span class="sc">Venue</span> <em>and</em> (<em>not</em> <span class="sc">City</span>)</span>, making explicit that venues in the data graph should not be directly given as cities. However, shapes languages that freely combine recursion and negation may lead to semantic problems, depending on how their semantics are defined. To illustrate, consider the following case inspired by the barber paradox&nbsp;[<a href="#ref-Labra2017">Labra Gayo et al., 2018</a>], involving a shape <span class="shap">Barber</span> whose conforming nodes <span class="gelab">shave</span> at least one node conforming to <span class="shap"><span class="sc">Person</span> <em>and</em> (<em>not</em> <span class="sc">Barber</span>)</span>. Now, given (only) <span class="gnode">Bob</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">shave</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Bob</span> with <span class="gnode">Bob</span> conforming to <span class="shap">Person</span>, does <span class="gnode">Bob</span> conform to <span class="shap">Barber</span>? If <em>yes</em> – if <span class="gnode">Bob</span> conforms to <span class="shap">Barber</span> – then <span class="gnode">Bob</span> violates the constraint by not shaving at least one node conforming to <span class="shap"><span class="sc">Person</span> <em>and</em> (<em>not</em> <span class="sc">Barber</span>)</span>. If <em>no</em> – if <span class="gnode">Bob</span> does not conform to <span class="shap">Barber</span> – then <span class="gnode">Bob</span> satisfies the <span class="shap">Barber</span> constraint by shaving such a node. Semantics to avoid such paradoxical situations have been proposed based on stratification&nbsp;[<a href="#ref-Boneva2017">Boneva et al., 2017</a>], partial assignments&nbsp;[<a href="#ref-Corman2018b">Corman et al., 2018</a>], and stable models&nbsp;[<a href="#ref-Gelfond88">Gelfond and Lifschitz, 1988</a>].</p>
		<p>Although validating schemata and semantic schemata serve different purposes, they can complement each other. In particular, a validating schema can take into consideration a semantic schema, such that, for example, validation is applied on the data graph including inferences. Taking the class hierarchy of Figure&nbsp;<a href="#fig-classhier">3.1</a> and the shapes graph of Figure&nbsp;<a href="#fig-shapeExample">3.3</a>, for example, we may define the target of the <span class="shap">Event</span> shape as the nodes that are of type <code>Event</code> (the class). If we first apply inferencing with respect to the class hierarchy of the semantic schema, the <span class="shap">Event</span> shape would now target <span class="gnode">EID15</span> and <span class="gnode">EID16</span>. The presence of a semantic schema may, however, require adapting the validating schema. Taking into account, for example, the aforementioned class hierarchy would require defining a relaxed cardinality on the <span class="gelab">type</span> property. Open shapes may also be preferred in such cases rather than enumerating constraints on all possible properties that may be inferred on a node.</p>
		<p>Two shapes languages have recently emerged for RDF graphs: <em>Shape Expressions</em> (<em>ShEx</em>), published as a W3C Community Group Report&nbsp;[<a href="#ref-Prudhommeaux2014">Prud'hommeaux et al., 2014</a>]; and <em>SHACL</em> (<em>Shapes Constraint Language</em>), published as a W3C Recommendation&nbsp;[<a href="#ref-SHACLSpec">Knublauch and Kontokostas, 2017</a>]. These languages support the discussed features (and more) and have been adopted for validating graphs in a number of domains relating to healthcare&nbsp;[<a href="#ref-ThorntonSSGMPW19">Thornton et al., 2019</a>], scientific literature&nbsp;[<a href="#ref-HammondPT17">Hammond et al., 2017</a>], spatial data&nbsp;[<a href="#ref-Car2019">Car et al., 2019</a>], amongst others. More details about ShEx and SHACL can be found in the book by <a href="#ref-Labra2017">Labra Gayo et al. [2018]</a>. A recently proposed language that can be used as a common basis for both ShEx and SHACL reveals their similarities and differences&nbsp;[<a href="#ref-Labra-Gayo2019">Labra Gayo et al., 2019</a>]. A similar notion of schema has been proposed by <a href="#ref-Angles18">Angles [2018]</a> for property graphs.</p>

		<div class="formal">
			<p>We formally define shapes following the conventions of&nbsp;<a href="#ref-Labra-Gayo2019">Labra Gayo et al. [2019]</a>.</p>

			<dl class="definition" id="def-shape">
				<dt>Shape</dt>
				<dd>A <em>shape</em> \(\phi\) is defined as:
				<table>
					<tr>
						<td>\(\phi\)</td>
						<td>::=</td>
						<td>\(\top\)</td>
						<td>true</td>
					</tr>
					<tr>
						<td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
						<td> \( | \) </td>
						<td>\(\datatype{N}\)</td>
						<td>node belongs to the set of nodes \(N\)</td>
					</tr>
					<tr>
						<td></td>
						<td> \( | \) </td>
						<td>\(\Psi_{\mathrm{cond}}\)</td>
						<td>node satisfies the Boolean condition \(\mathrm{cond}\)</td>
					</tr>
					<tr>
						<td></td>
						<td> \( | \) </td>
						<td>\(\phi_1 \wedge \phi_2\)</td>
						<td>conjunction of shape \(\phi_1\) and shape \(\phi_2\)</td>
					</tr>
					<tr>
						<td></td>
						<td> \( | \) </td>
						<td>\(\lnot \phi \)</td>
						<td>negation of shape \(\phi\)</td>
					</tr>
					<tr>
						<td></td>
						<td> \( | \) </td>
						<td>\(@s\)</td>
						<td>reference to shape with label \(s\)</td>
					</tr>
					<tr>
						<td></td>
						<td> \( | \) </td>
						<td>\(\qualified{p}{\phi}{min}{max}\)</td>
						<td>between \(min\) and \(max\) outward edges (inclusive) with label \(p\) to nodes satisfying shape \(\phi\)</td>
					</tr>
				</table>
				where \(min \in \mathbb{N}_{(0)}\), \(max \in \mathbb{N}_{(0)} \cup \{ * \}\), with “\(*\)” indicating unbounded.</dd>
			</dl>

			<dl class="definition" id="def-shapes-schema">
				<dt>Shapes schema</dt>
				<dd>A <em>shapes schema</em> is defined as a tuple \(\Sigma = (\Phi,S,\lambda)\) where \(\Phi\) is a set of shapes, \(S\) is a set of shape labels, and \(\lambda : S \rightarrow \Phi\) is a total function from labels to shapes.</dd>
			</dl>

			<div class="example">
				<p>The shapes schema from Figure&nbsp;<a href="#fig-shapeExample">3.3</a> can be expressed as:</p>
				<table>
					<tr>
						<td><span class="shap">Event</span></td>
						<td>\(\mapsto\)</td>
						<td>\(\qualifiedL{name}{\datatypeL{string}}{1}{*}\wedge\qualifiedL{start}{\datatypeL{dateTime}}{1}{1}\wedge\qualifiedL{end}{\datatypeL{dateTime}}{1}{1}\)</td>
					</tr>
					<tr>
						<td></td>
						<td></td>
						<td>\(\qquad\wedge\qualifiedL{type}{\top}{1}{*}\wedge\xrightarrow{venue}\)<span class="shap">Venue</span>\(\{1,*\}\)</td>
					</tr>
					<tr>
						<td><span class="shap">Venue</span></td>
						<td>\(\mapsto\)</td>
						<td><span class="shap">Place</span>\(\:\wedge\qualifiedL{indoor}{\datatypeL{boolean}}{0}{1}\wedge\xrightarrow{city}\)<span class="shap">City</span>\(\{0,1\}\)</td>
					</tr>
					<tr>
						<td><span class="shap">City</span></td>
						<td>\(\mapsto\)</td>
						<td><span class="shap">Place</span>\(\:\wedge\qualifiedL{population}{(\datatypeL{int}\wedge \Psi_{>5000})}{0}{1}\)</td>
					</tr>
					<tr>
						<td><span class="shap">Place</span></td>
						<td>\(\mapsto\)</td>
						<td>\(\qualifiedL{lat}{\datatypeL{float}}{0}{1}\wedge\qualifiedL{long}{\datatypeL{float}}{0}{1}\)</td>
					</tr>
					<tr>
						<td></td>
						<td></td>
						<td>\(\qquad\wedge\xrightarrow{flight}\)<span class="shap">Place</span>\(\{0,*\}\wedge\xrightarrow{bus}\)<span class="shap">Place</span>\(\{0,*\}\)</td>
					</tr>
				</table>
				<p>For example, <span class="shap">Event</span> is a shape label (an element of \(S\)) that maps to a shape (an element of \(\phi\)). This mapping is defined by \(\lambda\).</p>
			</div>

			<p>In a shapes schema, shapes may refer to other shapes, giving rise to a graph that is sometimes known as the <em>shapes graph</em>&nbsp;[<a href="#ref-SHACLSpec">Knublauch and Kontokostas, 2017</a>]. Figure&nbsp;<a href="#fig-shapeExample">3.3</a> illustrates a shapes graph of this form.</p>
			<p>The semantics of a shape is defined in terms of the evaluation of that shape over each node of a given data graph. The semantics of a shapes schema, in turn, is the result of evaluating each shape of the schema over each node of a given data graph; the result of this evaluation is a <em>shapes map</em>.</p>

			<dl class="definition" id="def-shape-map">
				<dt>Shapes map</dt>
				<dd>Given a directed edge-labelled graph \(G = (V,E,L)\) and a shapes schema \(\Sigma = (\Phi,S,\lambda)\), a <em>shapes map</em> is a (partial) mapping \(\sigma: V \times S \rightarrow \{ 0, 1 \}\).</dd>
			</dl>

			<p>The shapes map \(\sigma\) is a way of labelling the nodes of \(G\) with the labels of shapes from \(S\). If \(\sigma(v,s) = 1\), then node \(v\) is labelled \(s\) (possibly amongst other labels); otherwise if \(\sigma(v,s) = 0\), then node \(v\) is not labelled \(s\). The precise semantics depends on  whether or not \(\sigma\) is a total or partial mapping: whether or not it is defined for every pair in \(V \times S\). Herein we present the semantics for the more straightforward case wherein \(\sigma\) is assumed to be a total shapes map.</p>

			<dl class="definition" id="def-shape-evaluation">
				<dt>Shape evaluation</dt>
				<dd>Given a shapes schema \(\Sigma \coloneqq (\Phi,S,\lambda)\), a directed edge-labelled graph \(G = (V,E,L)\), a node \(v \in V\) and a total shapes map \(\sigma\), the <em>shape evaluation function</em> \(\semantics{\phi}{G}{v}{\sigma} \in \{ 0 , 1 \}\) is defined as follows:
				<table>
					<tr>
						<td>\(\semantics{\top}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1\)</td>
					</tr>
					<tr>
						<td>\(\semantics{\datatype{N}}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1\) iff \(v \in N\)</td>
					</tr>
					<tr>
						<td>\(\semantics{\Psi_{\mathrm{cond}}}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1\) iff \(\mathrm{cond}(v)\) is true</td>
					</tr>
					<tr>
						<td>\(\semantics{\phi_1 \wedge \phi_2}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(\min\{\semantics{\phi_1}{G}{v}{\sigma}, \semantics{\phi_2}{G}{v}{\sigma}\}\)</td>
					</tr>
					<tr>
						<td>\(\semantics{\lnot \phi}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1 - \semantics{\phi}{G}{v}{\sigma}\)</td>
					</tr>
					<tr>
						<td>\(\semantics{@s}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1\) iff \(\sigma(v,s) = 1\)</td>
					</tr>
					<tr>
						<td>\(\semantics{\qualified{p}{\phi}{min}{max}}{G}{v}{\sigma}\)</td>
						<td>\(=\)</td>
						<td>\(1\) iff \(min \leq \lvert \{ (v,p,u)\in E \mid \semantics{\phi}{G}{u}{\sigma}=1 \} \rvert \leq max\)</td>
					</tr>
				</table>
				If \(\semantics{\phi}{G}{v}{\sigma} = 1\), then \(v\) is said to <em>satisfy</em> \(\phi\) in \(G\) under \(\sigma\).</dd>
			</dl>

			<p>Typically for the purposes of validating a graph with respect to a shapes schema, a <em>target</em> is defined that requires certain nodes to satisfy certain shapes.</p>

			<dl class="definition" id="def-shape-target">
				<dt>Shapes target</dt>
				<dd>Given a directed edge-labelled graph \(G = (V,E,L)\) and a shapes schema \(\Sigma = (\Phi,S,\lambda)\), a <em>shapes target</em> \(T \subseteq V \times S\) is a set of pairs of nodes and shape labelsfrom \(G\) and \(\Sigma\), respectively.</dd>
			</dl>

			<p>The nodes that a shape targets can be selected a manual selection, based on the type(s) of the nodes, based on the results of a graph query, etc.&nbsp;[<a href="#ref-Corman2018b">Corman et al., 2018</a>, <a href="#ref-Labra-Gayo2019">Labra Gayo et al., 2019</a>].</p>
			<p>Lastly, we define the notion of a valid graph under a given shapes schema and target based on the existence of a shapes map satisfying certain conditions.</p>

			<dl class="definition" id="def-valid-graph">
				<dt>Valid graph</dt>
				<dd>Given a shapes schema \(\Sigma = (\Phi,S,\lambda)\), a directed edge-labelled graph \(G = (V,E,L)\), and a shapes target \(T\), we say that <em>\(G\) is valid under \(\Sigma\) and \(T\)</em> if and only if there exists a shapes map \(\sigma\) such that, for all \(s \in S\) and \(v \in V\) it holds that \(\sigma(v,s) = \semantics{\lambda(s)}{G}{v}{\sigma}\), and \((v,s) \in T\) implies \(\sigma(v,s) = 1\).</dd>
			</dl>

			<div class="example">
				<p>Taking the graph \(G\) from Figure&nbsp;<a href="#fig-delg">2.1</a> and the shapes schema \(\Sigma\) from Figure&nbsp;<a href="#fig-shapeExample">3.3</a>, first assume an empty shapes target \(T = \{\}\). If we consider a shapes map where (e.g.) \(\sigma(\)<span class="gnode">EID15</span>, <span class="shap">Event</span>\() = 1\), \(\sigma(\)<span class="gnode">Santa Lucía</span>, <span class="shap">Venue</span>\() = 1\), \(\sigma(\)<span class="gnode">Santa Lucía</span>, <span class="shap">Place</span>\() = 1\), etc., but where \(\sigma(\)<span class="gnode">EID16</span>, <span class="shap">Event</span>\() = 0\) (as it does not have the required values for <span class="gelab">start</span> and <span class="gelab">end</span>), etc., then we see that \(G\) is valid under \(\Sigma\) and \(T\). However, if we were to define a shapes target \(T\) to ensure that the <span class="shap">Event</span> shape targets <span class="gnode">EID15</span> and <span class="gnode">EID16</span> – i.e., to define \(T\) such that \(\{ (\)<span class="gnode">EID15</span>, <span class="shap">Event</span>\(), (\)<span class="gnode">EID16</span>, <span class="shap">Event</span>\() \} \subseteq T\) – then the graph would no longer be valid under \(\Sigma\) and \(T\) since <span class="gnode">EID16</span> does not satisfy <span class="shap">Event</span>.</p>
			</div>

			<p>The semantics we present here assumes that each node in the graph either satisfies or does not satisfy each shape labelled by the schema. More complex semantics – for example, based on Kleene’s three-valued logic&nbsp;[<a href="#ref-Corman2018b">Corman et al., 2018</a>, <a href="#ref-Labra-Gayo2019">Labra Gayo et al., 2019</a>] – have been proposed that support partial shapes maps, where the satisfaction of some nodes for some shapes can be left as undefined. Shapes languages in practice may support other more advanced forms of constraints, such as counting on paths&nbsp;[<a href="#ref-SHACLSpec">Knublauch and Kontokostas, 2017</a>]. In terms of implementing validation with respect to shapes, work has been done on translating constraints into sets of graph queries, whose results are input to a SAT solver for recursive cases&nbsp;[<a href="#ref-CormanFRS19a">Corman et al., 2019</a>].</p>
		</div>
		
		<h4 id="ssec-emergentSchema" class="subsection">Emergent schema</h4>
		<p>Both semantic and validating schemata require a domain expert to explicitly specify definitions and constraints. However, a data graph will often exhibit latent structures that can be automatically extracted as an <em>emergent schema</em>&nbsp;[<a href="#ref-PhamPEB15">Pham et al., 2015</a>] (aka <em>graph summary</em>&nbsp;[<a href="#ref-LiuSDK18">Liu et al., 2018</a>, <a href="#ref-CebiricGKKMTZ19">Čebirić et al., 2019</a>, <a href="#ref-SpahiuPPRM16a">Spahiu et al., 2016</a>]).</p>
		<p>A framework often used for defining emergent schema is that of <em>quotient graphs</em>, which partition groups of nodes in the data graph according to some equivalence relation while preserving some structural properties of the graph. Taking Figure&nbsp;<a href="#fig-delg">2.1</a>, we can intuitively distinguish different <em>types</em> of nodes based on their context, such as event nodes, which link to venue nodes, which in turn link to city nodes, and so forth. In order to describe the structure of the graph, we could consider six partitions of nodes: <em>event</em>, <em>name</em>, <em>venue</em>, <em>class</em>, <em>date-time</em>, <em>city</em>. In practice, these partitions may be computed based on the class or shape of the node. Merging the nodes of each partition into one node while preserving edges leads to the quotient graph shown in Figure&nbsp;<a href="#fig-emergentSchema">3.4</a>: the nodes of this quotient graph are the partitions of nodes from the data graph and an edge <span class="gnode">\(X\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(Z\)</span> is included the quotient graph if and only if there exists \(x \in X\) and \(z \in Z\) such that <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(z\)</span> is in the original data graph.</p>

		<figure id="fig-emergentSchema">
			<img src="images/fig-emergentSchema.svg" alt="Example quotient graph simulating the data graph in Figure&nbsp;1"/>
			<figcaption>Example quotient graph simulating the data graph in Figure&nbsp;<a href="#fig-delg">2.1</a></figcaption>
		</figure>

		<p>There are many ways in which quotient graphs may be defined, depending not only on how nodes are partitioned, but also how the edges are defined. Different quotient graphs may provide different guarantees with respect to the structure they preserve. Formally, we can say that every quotient graph <em>simulates</em> its input graph (based on the <em>simulation relation</em> of set membership between data nodes and quotient nodes), meaning that for all \(x \in X\) with \(x\) an input node and \(X\) a quotient node, if <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(z\)</span> is an edge in the data graph, then there must exist an edge <span class="gnode">\(X\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(Z\)</span> in the quotient graph such that \(z \in Z\); for example, the quotient graph of Figure&nbsp;<a href="#fig-emergentSchema">3.4</a> simulates the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a>. However, this quotient graph seems to suggest (for instance) that <span class="gnode">EID16</span> would have a start and end date in the data graph when this is not the case. A stronger notion of structural preservation is given by <em>bisimilarity</em>, which in this case would further require that if <span class="gnode">\(X\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(Z\)</span> is an edge in the quotient graph, then for all \(x \in X\), there must exist a \(z \in Z\) such that <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(z\)</span> is in the data graph; this is not satisfied by <span class="gnode">EID16</span> in the quotient graph of Figure&nbsp;<a href="#fig-emergentSchema">3.4</a>, which does not have an outgoing edge labelled <span class="gelab">start</span> or <span class="gelab">end</span> in the original data graph. Figure&nbsp;<a href="#fig-emergentSchema2">3.5</a> illustrates a bisimilar version of the quotient graph, splitting the <em>event</em> partition into two nodes reflecting their different outgoing edges. An interesting property of bisimilarity is that it preserves forward-directed paths: given a path expression \(r\) without inverses and two bisimilar graphs, \(r\) will match a path in one graph if and only if it matches a corresponding path in the other bisimilar graph. One can verify, for example, that a path matches <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city\(\cdot\)(flight|bus)*</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(z\)</span> in Figure&nbsp;<a href="#fig-delg">2.1</a> if and only if there is a path matching <span class="gnode">\(X\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city\(\cdot\)(flight|bus)*</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(Z\)</span> in Figure&nbsp;<a href="#fig-emergentSchema2">3.5</a> such that \(x \in X\) and \(z \in Z\).</p>

		<figure id="fig-emergentSchema2">
			<img src="images/fig-emergentSchema2.svg" alt="Example quotient graph bisimilar with the data graph in Figure&nbsp;1"/>
			<figcaption>Example quotient graph bisimilar with the data graph in Figure&nbsp;<a href="#fig-delg">2.1</a></figcaption>
		</figure>

		<p>There are many ways in which quotient graphs may be defined, depending on the equivalence relation that partitions nodes. Furthermore, there are many ways in which other similar or bisimilar graphs can be defined, depending on the (bi)simulation relation that preserves the data graph’s structure&nbsp;[<a href="#ref-CebiricGKKMTZ19">Čebirić et al., 2019</a>]. Such techniques aim to <em>summarise</em> the data graph into a higher-level topology. In order to reduce the memory overhead of the quotient graph, in practice, nodes may rather be labelled with the cardinality of the partition and/or a high-level label (e.g., <em>event</em>, <em>city</em>) for the partition rather than storing the labels of all nodes in the partition.</p>
		<p>Various other forms of emergent schema not directly based on a quotient graph framework have also been proposed; examples include emergent schemata based on relational tables&nbsp;[<a href="#ref-PhamPEB15">Pham et al., 2015</a>], and baseed on formal concept analysis&nbsp;[<a href="#ref-GonzalezH18">González and Hogan, 2018</a>]. Emergent schemata may be used to provide a human-understandable overview of the data graph, to aid with the definition of a semantic or validating schema, to optimise the indexing and querying of the graph, to guide the integration of data graphs, and so forth. We refer to the survey by <a href="#ref-CebiricGKKMTZ19">Čebirić et al. [2019]</a> dedicated to the topic for further details.</p>
		
		<div class="formal">
			<p>Emergent schemata are often based on the notion of a quotient graph.</p>

			<dl class="definition" id="def-qg">
				<dt>Quotient graph</dt>
				<dd>Given a directed edge-labelled graph \(G = (V,E,L)\), a graph \(\mathcal{G} = (\mathcal{V},\mathcal{E},L)\) is a <em>quotient graph</em> of \(G\) if and only if:
					<ul>
						<li>\(\mathcal{V}\) is a partition of \(V\) without the empty set, i.e., \(\mathcal{V} \subseteq (2^V - \emptyset)\), \(V = \bigcup_{U\in \mathcal{V}} U\), and for all \(U\in \mathcal{V}\), \(W\in \mathcal{V}\), it holds that \(U = W\) or \(U \cap W = \emptyset\); <em>and</em></li>
						<li>\(\mathcal{E} = \{ (U,l,W) \mid U \in \mathcal{V}, W \in \mathcal{V} \text{ and } \exists u \in U, \exists w \in W : (u,l,w) \in E \} \).</li>
					</ul>
				</dd>
			</dl>

			<p>A quotient graph can “merge” multiple nodes into one node, keeping the edges of its constituent nodes. For an input graph \(G = (V,E,L)\), there is an exponential number of possible quotient graphs based on partitions of the input nodes. On one extreme, the input graph is a quotient graph of itself (turning nodes like <span class="gnode">u</span> into singleton nodes like <span class="gnode">{u}</span>). On the other extreme, a single node <span class="gnode">\(V\)</span>, with all input nodes, and loops \((V,l,V)\) for each edge-label \(l\) used in the set of input edges \(E\), is also a quotient graph. Quotient graphs typically fall somewhere in between, where the partition \(\mathcal{V}\) of \(V\) is often defined in terms of an <em>equivalence relation</em> \(\sim\) on the set \(V\) such that \(\mathcal{V} \coloneqq {\sim}/V\); i.e., \(\mathcal{V}\) is defined as the <em>quotient set</em> of \(V\) with respect to \(\sim\); for example, we might define an equivalence relation on nodes such that \(u \sim v\) if and only if they have the same set of defined types, where \({\sim}/V\) is then a partition whose parts contain all nodes with the same types. Another way to induce a quotient graph is to define the partition in a way that preserves some of the topology (i.e., connectivity) of the input graph. One way to formally define this idea is through <em>simulation</em> and <em>bisimulation</em>.</p>

		<dl class="definition" id="def-sim">
			<dt>Simulation</dt>
			<dd>Given two directed edge-labelled graph \(G = (V,E,L)\) and \(G' = (V',E',L')\), let \(R \subseteq V \times V'\) be a relation between the nodes of \(G\) and \(G'\), respectively. We call \(R\) a <em>simulation</em> on \(G\) and \(G'\) if, for all \((v,v') \in R\), the following holds:
			<ul>
				<li>if \((v,p,w) \in E\) then there exists \(w'\) such that \((v',p,w') \in E'\) and \((w,w') \in R\).</li>
			</ul>
			If a simulation exists on \(G\) and \(G'\), we say that \(G'\) <em>simulates</em> \(G\), denoted \(G \rightsquigarrow G'\).</dd>
		</dl>

		<dl class="definition" id="def-bisim">
			<dt>Bisimulation</dt>
			<dd>If \(R\) is a simulation on \(G\) and \(G'\), we call it a <em>bisimulation</em> if, for all \((v,v') \in R\), the following condition holds:
			<ul>
				<li>if \((v'p,w') \in E'\) then there exists \(w\) such that \((v,p,w) \in E\) and \((w,w') \in R\).</li>
			</ul>
			If a bisimulation exists on \(G\) and \(G'\), we call them <em>bisimilar</em>, denoted \(G \approx G'\).</dd>
		</dl>

		<p>Bisimulation (\(\approx\)) is then an equivalence relation on graphs. By defining the (bi)simulation relation \(R\) in terms of set membership \(\in\), every quotient graph simulates its input graph, but does not necessarily bisimulate its input graph. This gives rise to the notion of <em>bisimilar quotient graphs</em>.</p>

		<div class="example">
			<p>Figures&nbsp;<a href="#fig-emergentSchema">3.4</a> and&nbsp;<a href="#fig-emergentSchema2">3.5</a> exemplify quotient graphs for the graph of Figure&nbsp;<a href="#fig-delg">2.1</a>. Figure&nbsp;<a href="#fig-emergentSchema">3.4</a> simulates but is not bisimilar to the data graph. Figure&nbsp;<a href="#fig-emergentSchema2">3.5</a> is bisimilar to the data graph. Often the goal will be to compute the most concise quotient graph that satisfies a given condition; for example, the nodes without outgoing edges in Figure&nbsp;<a href="#fig-emergentSchema2">3.5</a> could be merged while preserving bisimilarity.</p>
		</div>
		</section>

		<section id="sec-identity" class="section">
		<h3>Identity</h3>
		<p>Figure&nbsp;<a href="#fig-delg">2.1</a> uses nodes like <span class="gnode">Santiago</span>, but to which Santiago does this node refer? Do we refer to Santiago de Chile, Santiago de Cuba, Santiago de Compostela, or do we perhaps refer to the indie rock band Santiago? Based on edges such as <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, we may deduce that it is one of the three cities mentioned (not the rock band), and based on the fact that the graph describes tourist attractions in Chile, we may further deduce that it refers to Santiago de Chile. Without further details, however, <em>disambiguating</em> nodes of this form may rely on heuristics prone to error in more difficult cases. To help avoid such ambiguity, first we may use globally-unique identifiers to avoid naming clashes when the knowledge graph is extended with external data, and second we may add external identity links to disambiguate a node with respect to an external source.</p>

		<h4 id="subsec-globalIdentifiers" class="subsection">Persistent identifiers</h4>
		<p>Assume we wished to compare tourism in Chile and Cuba, and we have acquired an appropriate knowledge graph for Cuba similar to the one we have for Chile. We can merge two graphs by taking their union. However, as shown in Figure&nbsp;<a href="#fig-globalIds">3.6</a>, using an ambiguous node like <span class="gnode">Santiago</span> may yield a <em>naming clash</em>: the node is referring to two different real-world cities in both graphs, where the merged graph indicates that Santiago is a city in both Chile and Cuba (rather than two distinct cities).<sup class="fnmark" id="fnm5"><a href="#fn5">5</a></sup><span class="footnote" id="fn5"><sup><a href="#fnm5">note 5</a></sup> Such a naming clash is not unique to graphs, but could also occur if merging tables, trees, etc.</span> To avoid such clashes, long-lasting <em>persistent identifiers</em> (<em>PIDs</em>)&nbsp;[<a href="#ref-pids">Hakala, 2010</a>] can be created in order to uniquely identify an entity; examples of PID schemes include <em>Digital Object Identifiers</em> (<em>DOIs</em>) for papers, <em>ORCID iDs</em> for authors, <em>International Standard Book Numbers</em> (<em>ISBNs</em>) for books, <em>Alpha-2 codes</em> for counties, and more besides.</p>

		<figure id="fig-globalIds">
			<img src="images/fig-globalIds.svg" alt="Result of merging two graphs with ambiguous local identifiers"/>
			<figcaption>Result of merging two graphs with ambiguous local identifiers</figcaption>
		</figure>

		<p>In the context of the Semantic Web, the RDF data model goes one step further and recommends that global Web identifiers be used for nodes and edge labels. However, rather than adopt the <em>Uniform Resource Locators (URLs)</em> used to identify the location of <em>information resources</em> such as webpages, RDF&nbsp;1.1 proposes to use <em>Internationalised Resource Identifiers (IRIs)</em> to identify <em>non-information resources</em> such as cities or events.<sup class="fnmark" id="fnm6"><a href="#fn6">6</a></sup><span class="footnote" id="fn6"><sup><a href="#fnm6">note 6</a></sup> Uniform Resource Identifiers (URIs) can be Uniform Resource Locators (URLs), used to locate information resources, and Uniform Resource Names (URNs), used to name resources. Internationalised Resource Identifiers (IRIs) are URIs that allow Unicode (e.g., <code>http://example.com/Ñam</code>).</span> Hence, for example, in the RDF representation of the Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>] – a knowledge graph proposed to complement Wikipedia, discussed in more detail in Chapter&nbsp;<a href="#chap-kgs">10</a> – while the URL <span class="gnode"><a class="uri" href="https://www.wikidata.org/wiki/Q2887">https://www.wikidata.org/wiki/Q2887</a></span> refers to a webpage that can be loaded in a browser providing human-readable metadata about Santiago, the IRI <span class="gnode"><a class="uri" href="http://www.wikidata.org/entity/Q2887">http://www.wikidata.org/entity/Q2887</a></span> refers to the city itself. Distinguishing the identifiers for the webpage and the city itself avoids naming clashes; for example, if we use the URL to identify both the webpage and the city, we may end up with an edge in our graph, such as (with readable labels below the edge):</p>

		<p class="mathblock uris"><span class="gnode"><a class="uri" href="https://www.wikidata.org/wiki/Q2887">https://www.wikidata.org/wiki/Q2887</a></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge"><a class="uri" href="https://www.wikidata.org/wiki/Property:P112">https://www.wikidata.org/wiki/Property:P112</a></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode"><a class="uri" href="https://www.wikidata.org/wiki/Q203534">https://www.wikidata.org/wiki/Q203534</a></span><br/>
			<code>[Santiago (URL)]</code><code style="margin-left:8em;margin-right:7em;">[founded by (URL)]</code> <code>[Pedro de Valdivia (URL)]</code></p>

		<p>Such an edge leaves ambiguity: was Pedro de Valdivia the founder of the webpage, or the city? Using IRIs for entities distinct from the URLs for the webpages that describe them avoids such ambiguous cases, where Wikidata thus rather defines the previous edge using less ambiguous identifiers, as follows:</p>

		<p class="mathblock uris"><span class="gnode"><a class="uri" href="https://www.wikidata.org/entity/Q2887">https://www.wikidata.org/entity/Q2887</a></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge"><a class="uri" href="https://www.wikidata.org/prop/direct/P112">https://www.wikidata.org/prop/direct/P112</a></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode"><a class="uri" href="https://www.wikidata.org/entity/Q203534">https://www.wikidata.org/entity/Q203534</a></span><br/>
			<code>[Santiago (IRI)]</code><code style="margin-left:8em;margin-right:7em;">[founded by (IRI)]</code> <code>[Pedro de Valdivia (IRI)]</code></p>

		<p>using IRIs for the city, person, and founder of, distinct from the webpages describing them. These Wikidata identifiers use the prefix <a class="uri" href="http://www.wikidata.org/entity/">http://www.wikidata.org/entity/</a> for entities and the prefix <a class="uri" href="http://www.wikidata.org/prop/direct/">http://www.wikidata.org/prop/direct/</a> for relations. Such prefixes are known as <em>namespaces</em>, and are often abbreviated with prefix strings, such as <code>wd:</code> or <code>wdt:</code>, where the latter edge can then be written more concisely using such abbreviations as th edge <span class="gnode">wd:Q2887</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">wdt:P112</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">wd:Q203534</span>.</p>
		<p>If HTTP IRIs are used to identify the graph’s entities, when the IRI is looked up (via HTTP), the web-server can return (or redirect to) a description of that entity in formats such as RDF. This further enables RDF graphs to link to related entities described in external RDF graphs over the Web, giving rise to <em>Linked Data</em>&nbsp;[<a href="#ref-ldprinciples">Berners-Lee, 2006</a>, <a href="#ref-ldbook">Heath and Bizer, 2011</a>] (discussed in Chapter&nbsp;<a href="#chap-publish">9</a>). Though HTTP IRIs offer a flexible and powerful mechanism for issuing global identifiers on the Web, they are not necessarily persistent: websites may go offline, the resources described at a given location may change, etc. In order to enhance the persistence of such identifiers, <em>Persistent URL</em> (<em>PURL</em>) services offer redirects from a central server to a particular location, where the PURL can be redirected to a new location if necessary, changing the address of a document without changing its identifier. The persistence of HTTP IRIs can then be improved by using namespaces defined through PURL services.</p>

		<h4 id="sssec-external_identy" class="subsection">External identity links</h4>
		<p>Assume that the tourist board opts to define the <code>chile:</code> namespace with an IRI such as <code>http://turismo.cl/entity/</code> on a web-server that they control, allowing nodes such as <span class="gnode">chile:Santiago</span> – a shortcut for the IRI <span class="gnode"><a class="uri" href="http://turismo.cl/entity/Santiago">http://turismo.cl/entity/Santiago</a></span> – to be looked up over the Web. While using such a naming scheme helps to avoid naming clashes, the use of IRIs does not necessarily help ground the identity of a resource. For example, an external geographic knowledge graph may assign the same city the IRI <span class="gnode">geo:SantiagoDeChile</span> in their own namespace, where we have no direct way of knowing that the two identifiers refer to the same city. If we merge the two knowledge graphs, we will end up with two distinct nodes for the same city, and thus not integrate their data.</p>
		<p>There are a number of ways to ground the identity of an entity. The first is to associate the entity with uniquely-identifying information in the graph, such as its geo-coordinates, its postal code, the year it was founded, etc. Each additional piece of information removes ambiguity regarding which city is being referred to, providing (for example) more options for matching the city with its analogue in external sources. A second option is to use <em>identity links</em> to state that a local entity has the same identity as another <em>coreferent</em> entity found in an external source; an instantiation of this concept can be found in the OWL standard, which defines the <code>owl:sameAs</code> property relating coreferent entities. Using this property, we could state the edge <span class="gnode">chile:Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">owl:sameAs</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">geo:SantiagoDeChile</span> in our RDF graph, thus establishing an identity link between the corresponding nodes in both graphs. Rather than specifying pairwise identity links between all knowledge graphs, it suffices if two knowledge graphs provide corresponding identity links to the same external knowledge graph, such as DBpedia or Wikidata; for example, if the local knowledge graph provides an identity link to Wikidata indicating <span class="gnode">chile:Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">owl:sameAs</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">wd:Q2887</span>, while the remote knowledge graph has the identity link <span class="gnode">geo:SantiagoDeChile</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">owl:sameAs</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">wd:Q2887</span>, then we can infer <span class="gnode">chile:Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">owl:sameAs</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">geo:SantiagoDeChile</span>. The semantics of <code>owl:sameAs</code> defined by the OWL standard then allows us to combine the data for both nodes. Such semantics will be discussed later in Chapter&nbsp;<a href="#chap-deductive">4</a>. Ways in which identity links can be computed will also be discussed later in Chapter&nbsp;<a href="#chap-refine">8</a>.</p>

		<h4 id="sssec-datatypes" class="subsection">Datatypes</h4>
		<p>Consider the two date-times on the left of Figure&nbsp;<a href="#fig-delg">2.1</a>: how should we assign these nodes persistent/global identifiers? Intuitively it would not make much sense, for example, to assign IRIs to these nodes since their syntactic form tells us what they refer to: specific dates and times in March 2020. This syntactic form is further recognisable by machine, meaning that with appropriate software, we could order such values in ascending or descending order, extract the year, etc.</p>
		<p>Most practical data models for graphs allow for defining nodes that are datatype values. RDF utilises <em>XML Schema Datatypes</em> (<em>XSD</em>)&nbsp;[<a href="#ref-XSD">Peterson et al., 2012</a>], amongst others, where a datatype node is given as a pair \((l,d)\) where \(l\) is a lexical string, such as "<code>2020-03-29T20:00:00</code>", and \(d\) is an IRI denoting the datatype, such as <code>xsd:dateTime</code>. The node is then denoted <span class="gnode">"<code>2020-03-29T20:00:00</code>"^^xsd:dateTime</span>. Datatype nodes in RDF are called <em>literals</em> and are not allowed to have outgoing edges. Other datatypes commonly used in RDF data include <code>xsd:string</code>, <code>xsd:integer</code>, <code>xsd:decimal</code>, <code>xsd:boolean</code>, etc. If the datatype is omitted, the value is assumed to be of type <code>xsd:string</code>. Applications built on top of RDF can then recognise these datatypes, parse them into datatype objects, and apply equality checks, normalisation, ordering, transformations, etc., according to their standard definition. In the context of property graphs, Neo4j&nbsp;[<a href="#ref-Miller13">Miller, 2013</a>] also defines a set of internal datatypes on property values that includes numbers, strings, Booleans, spatial points, and temporal values.</p>

		<h4 id="sssec-lexicalisation" class="subsection">Lexicalisation</h4>
		<p>Global identifiers for entities will sometimes have a human-interpretable form, such as <span class="gnode">chile:Santiago</span>, but the identifier strings themselves do not carry any formal semantic significance. In other cases, the identifiers used may not be human-interpretable by design. In Wikidata, for instance, Santiago de Chile is identified as <span class="gnode">wd:Q2887</span>, where such a scheme has the advantage of providing better persistence and of not being biased to a particular human language. As a real-world example, the Wikidata identifier for Eswatini (<span class="gnode">wd:Q1050</span>) was not affected when the country changed its name from Swaziland, and does not necessitate choosing between languages for creating (more readable) IRIs such as <span class="gnode">wd:Eswatini</span> (English), <span class="gnode">wd:eSwatini</span> (Swazi), <span class="gnode">wd:Esuatini</span> (Spanish), etc.</p>
		<p>Since identifiers can be arbitrary, it is common to add edges that provide a human-interpretable label for nodes, such as <span class="gnode">wd:Q2887</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">rdfs:label</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">“Santiago”</span>, indicating how people may refer to the subject node linguistically. Linguistic information of this form plays an important role in grounding knowledge such that users can more clearly identify which real-world entity a particular node in a knowledge graph actually references&nbsp;[<a href="#ref-Lexvo">de Melo, 2015</a>]; it further permits cross-referencing entity labels with text corpora to find, for example, documents that potentially speak of a given entity&nbsp;[<a href="#ref-IESW">Martínez-Rodríguez et al., 2020</a>]. Labels can be complemented with aliases (e.g., <span class="gnode">wd:Q2887</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">skos:altLabel</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">“Santiago&nbsp;de&nbsp;Chile”</span>) or comments (e.g. <span class="gnode">wd:Q2887</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">rdfs:comment</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">“Santiago&nbsp;is&nbsp;the&nbsp;capital&nbsp;of&nbsp;Chile”</span>) to further help ground the node’s identity.</p>
		<p>Nodes such as <span class="gnode">“Santiago”</span> denote string literals, rather than an identifier. Depending on the specific graph model, such literal nodes may also be defined as a pair \((s,l)\), where \(s\) denotes the string and \(l\) a language code; in RDF, for example we may state <span class="gnode">chile:City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">rdfs:label</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">"City"@en</span>, <span class="gnode">chile:City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">rdfs:label</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">"Ciudad"@es</span>, etc., indicating labels for the node in different languages. In other models, the pertinent language can rather be specified, e.g., via metadata on the edge. Knowledge graphs with human-interpretable labels, aliases, comments, etc., (in various languages) are sometimes called (<em>multilingual</em>) <em>lexicalised knowledge graphs</em>&nbsp;[<a href="#ref-BonattiDPP18">Bonatti et al., 2018</a>]".</p>

		<h4 id="sssec-existential" class="subsection">Existential nodes</h4>
		<p>When modelling incomplete information, we may in some cases know that there must exist a particular node in the graph with particular relationships to other nodes, but without being able to identify the node in question. For example, we may have two co-located events <span class="gnode">chile:EID42</span> and <span class="gnode">chile:EID43</span> whose venue has yet to be announced. One option is to simply omit the venue edges, in which case we lose the information that these events have a venue and that both events have the same venue. Another option might be to create a fresh IRI representing the venue, but semantically this becomes indistinguishable from there being a known venue. Hence some graph models permit the use of existential nodes, represented here as a blank circle:</p>

		<p class="mathblock"><span class="gnode">chile:EID42</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">chile:venue</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">chile:venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gnode">chile:EID43</span></p>

		<p>These edges denote that there exists a common venue for <span class="gnode">chile:EID42</span> and <span class="gnode">chile:EID42</span> without identifying it. Existential nodes are supported in RDF as blank nodes&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>], which are also commonly used to support modelling complex elements in graphs, such as <em>RDF lists</em>&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>, <a href="#ref-HoganAMP14">Hogan et al., 2014</a>]. Figure&nbsp;<a href="#fig-list">3.7</a> exemplifies an RDF list, which uses blank nodes in a linked-list structure to encode order. Though existential nodes can be convenient, their presence can complicate operations on graphs, such as deciding if two data graphs have the same structure modulo existential nodes&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>, <a href="#ref-Hogan17">Hogan, 2017</a>]. Hence methods for <em>skolemising</em> existential nodes in graphs – replacing them with canonical labels – have been proposed&nbsp;[<a href="#ref-canon">Longley and Sporny, 2019</a>, <a href="#ref-Hogan17">Hogan, 2017</a>]. Other authors rather call to minimise the use of such nodes in graph data&nbsp;[<a href="#ref-ldbook">Heath and Bizer, 2011</a>].</p>

		<figure id="fig-list">
			<img src="images/fig-list.svg" alt="RDF list representing the three largest peaks of Chile, in order"/>
			<figcaption>RDF list representing the three largest peaks of Chile, in order</figcaption>
		</figure>
		</section>

		<section id="ssec-knowledgeContext" class="section">
		<h3>Context</h3>
		<p>Many (arguably <em>all</em>) facts presented in the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a> can be considered true with respect to a certain <em>context</em>. With respect to <em>temporal context</em>, <span class="gnode">Santiago</span> has existed as a city since 1541, flights from <span class="gnode">Arica</span> to <span class="gnode">Santiago</span> began in 1956, etc. With respect to <em>geographic context</em>, the graph describes events in Chile. With respect to <em>provenance</em>, data relating to <span class="gnode">EID15</span> were taken from – and are thus said to be true with respect to – the Ñam webpage on January 4<sup>th</sup>, 2020. Other forms of context may also be used. We may further combine contexts, such as to indicate that <span class="gnode">Arica</span> is a Chilean city (<em>geographic</em>) since 1883 (<em>temporal</em>) per the Treaty of Ancón (<em>provenance</em>).</p>
		<p>By context we herein refer to the <em>scope of truth</em>, i.e., the context in which some data are held to be true&nbsp;[<a href="#ref-McCarthy93">McCarthy, 1993</a>, <a href="#ref-GuhaMF04">Guha et al., 2004</a>]. The graph of Figure&nbsp;<a href="#fig-delg">2.1</a> leaves much of its context implicit. However, making context explicit can allow for interpreting the data from different perspectives, such as to understand what held true in 2016, what holds true excluding webpages later found to have spurious data, etc. As seen previously, context for graph data may be considered at different levels: on individual nodes, individual edges, or sets of edges (sub-graphs). We now discuss various representations by which context can be made explicit at different levels.</p>

		<h4 id="sssec-direct-representation" class="subsection">Direct representation</h4>
		<p>The first way to represent context is to consider it as data no different from other data. For example, the dates for the event <span class="gnode">EID15</span> in Figure&nbsp;<a href="#fig-delg">2.1</a> can be seen as representing a form of temporal context, indicating the temporal scope within which edges such as <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">venue</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santa&nbsp;Lucía</span> are held true. Another option is to change a relation represented as an edge, such as <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span>, into a node, such as seen in Figure&nbsp;<a href="#fig-fsa">2.3a</a>, allowing us to assign additional context to the relation. While in these examples context is represented in an ad hoc manner, a number of specifications have been proposed to represent context as data in a more standard way. One example is the <em>Time Ontology</em>&nbsp;[<a href="#ref-timeOnt">Cox et al., 2017</a>], which specifies how temporal entities, intervals, time instants, etc. – and relations between them such as <em>before</em>, <em>overlaps</em>, etc. – can be described in RDF graphs in an interoperable manner. Another example is the <em>PROV Data Model</em>&nbsp;[<a href="#ref-prov13">Gil et al., 2013</a>], which specifies how provenance can be described in RDF graphs, where entities (e.g., graphs, nodes, physical document) are derived from other entities, are generated and/or used by activities (e.g., extraction, authorship), and are attributed to agents (e.g., people, software, organisations).</p>

		<h4 id="sec-reify" class="subsection">Reification</h4>
		<p>Often we may wish to directly define the context of edges themselves; for example, we may wish to state that the edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span> is valid from 1956. While we could use the pattern of turning the edge into a node – as illustrated in Figure&nbsp;<a href="#fig-fsa">2.3a</a> – to directly represent such context, another option is to use <em>reification</em>, which allows for making statements about statements in a generic manner (or in the case of a graph, for defining edges about edges). In Figure&nbsp;<a href="#fig-temporal">3.8</a> we present three forms of reification that can be used for modelling temporal context on the aforementioned edge within a directed edge-labelled graph&nbsp;[<a href="#ref-HernandezHK15">Hernández et al., 2015</a>]. We use \(e\) to denote an arbitrary identifier representing the edge itself to which the context can be associated. Unlike in a direct representation, \(e\) represents an edge, not a flight. RDF reification&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>] (Figure&nbsp;<a href="#fig-reif">3.8a</a>) defines a new node <span class="gnode">\(e\)</span> to represent the edge and connects it to the source node (via <span class="gelab">subject</span>), target node (via <span class="gelab">object</span>), and edge label (via <span class="gelab">predicate</span>) of the edge. In contrast, \(n\)-ary relations&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>] (Figure&nbsp;<a href="#fig-nary">3.8b</a>) connect the source node of the edge directly to the edge node <span class="gnode">\(e\)</span> with the label of the edge; the target node of the edge is then connected to <span class="gnode">\(e\)</span> (via <span class="gelab">value</span>). Finally, singleton properties&nbsp;[<a href="#ref-Nguyen14">Nguyen et al., 2014</a>] (Figure&nbsp;<a href="#fig-singprop">3.8c</a>) rather use <span class="gelab">\(e\)</span> as an edge label, connecting it to a node indicating the original edge label (via <span class="gelab">singleton</span>). Other forms of reification have been proposed in the literature, including, for example, NdFluents&nbsp;[<a href="#ref-Gimenez-GarciaZ17">Giménez-García et al., 2017</a>]. In general, a reified edge does not assert the edge it reifies; for example, we may reify an edge to state that it is no longer valid. We refer to <a href="#ref-HernandezHK15">Hernández et al. [2015]</a> for further comparison of reification alternatives.</p>

		<figure id="fig-temporal">
			<figure id="fig-reif" style="display:inline-block;margin-right:2.5em;margin-left:0;">
				<img src="images/fig-reif.svg" alt="RDF Reification"/>
				<figcaption>RDF Reification</figcaption>
			</figure>
			<figure id="fig-nary" style="display:inline-block;">
				<img src="images/fig-nary.svg" alt="n-ary Relations"/>
				<figcaption>\(n\)-ary Relations</figcaption>
			</figure>
			<figure id="fig-singprop" style="display:inline-block;margin-right:0;margin-left:2em;">
				<img src="images/fig-singprop.svg" alt="Singleton properties"/>
				<figcaption>Singleton properties</figcaption>
			</figure>
			<figcaption>Three representations of temporal context on a directed labelled edge</figcaption>
		</figure>

		<h4 id="sssec-higher-arity" class="subsection">Higher-arity representation</h4>
		<p>As an alternative to reification, we can rather use higher-arity representations for modelling context. Taking again the edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span>, Figure&nbsp;<a href="#fig-temporal2">3.9</a> illustrates three higher-arity representations of temporal context. First, we can use a named graph (Figure&nbsp;<a href="#fig-ngraph">3.9a</a>) to contain the edge and then define the temporal context on the graph name. Second, we can use a property graph (Figure&nbsp;<a href="#fig-pgc">3.9b</a>) where the temporal context is defined as a property on the edge. Third, we can use <em>RDF*</em>&nbsp;[<a href="#ref-Hartig17">Hartig, 2017</a>] (Figure&nbsp;<a href="#fig-rdfstar">3.9c</a>): an extension of RDF that allows edges to be defined as nodes. Amongst these options, the most flexible is the named graph representation, where we can assign context to multiple edges at once by placing them in one named graph; for example, we can add more edges to the named graph of Figure&nbsp;<a href="#fig-ngraph">3.9a</a> that are also valid from 1956. The least flexible option is RDF*, which, in the absence of an edge id, does not permit different groups of contextual values to be assigned to an edge; for example, if we add four contextual values to the edge <span class="gnode">Chile</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">president</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">M.&nbsp;Bachelet</span>, to state that it was valid from 2006 until 2010 and valid from 2014 until 2018, we cannot pair the values, but may rather have to create a node to represent different presidencies (in the other models, we could have used two named graphs or edge ids).</p>

		<figure id="fig-temporal2">
			<figure id="fig-ngraph" style="display:inline-block;margin-right:2.5em;margin-left:0;">
				<img src="images/fig-ngraph.svg" alt="Named graph"/>
				<figcaption>Named graph</figcaption>
			</figure>
			<figure id="fig-pgc" style="display:inline-block;">
				<img src="images/fig-pgc.svg" alt="Property graph"/>
				<figcaption>Property graph</figcaption>
			</figure>
			<figure id="fig-rdfstar" style="display:inline-block;margin-right:0;margin-left:2em;">
				<img src="images/fig-rdfstar.svg" alt="RDF*"/>
				<figcaption>RDF*</figcaption>
			</figure>
			<figcaption>Three higher-arity representations of temporal context on an edge</figcaption>
		</figure>

		<h4 id="sssec-annotations" class="subsection">Annotations</h4>
		<p>Thus far, we have discussed representing context in a graph, but we have not spoken about automated mechanisms for reasoning about context; for example, if there are only seasonal summer flights from <span class="gnode">Santiago</span> to <span class="gnode">Arica</span>, we may wish to find other routes from Santiago for winter events taking place in <span class="gnode">Arica</span>. While the dates for buses, flights, etc., can be represented directly in the graph, or using reification, writing a query to manually intersect the corresponding temporal contexts will be difficult. An alternative is to consider <em>annotations</em> that provide mathematical definitions of a contextual domain and key operations over that domain that can be applied automatically.</p>
		<p>Some annotations model a particular contextual domain; for example, <em>Temporal RDF</em>&nbsp;[<a href="#ref-GutierrezHV07">Gutiérrez et al., 2007</a>] allows for annotating edges with time intervals, such as <span class="gnode">Chile</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="stack"><span class="edge">president</span><br/><span class="edge">[2006,2010]</span></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">M.&nbsp;Bachelet</span>, while <em>Fuzzy RDF</em>&nbsp;[<a href="#ref-Straccia09">Straccia, 2009</a>] allows for annotating edges with a degree of truth such as <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="stack"><span class="edge">climate</span><br/><span class="edge">0.8</span></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Semi-Arid</span>, indicating that it is more-or-less true – with a degree of \(0.8\) – that Santiago has a semi-arid climate.</p>
		<p>Other forms of annotation are domain-independent; for example, <em>Annotated RDF</em>&nbsp;[<a href="#ref-Dividino09">Dividino et al., 2009</a>, <a href="#ref-UdreaRS10">Udrea et al., 2010</a>, <a href="#ref-zimm-etal-2012-JWS">Zimmermann et al., 2012</a>] allows for representing context modelled as <em>semi-rings</em>: algebraic structures consisting of domain values (e.g., temporal intervals, fuzzy values, etc.) and two operators to combine domain values: <em>meet</em> and <em>join</em>.<sup class="fnmark" id="fnm7"><a href="#fn7">7</a></sup><span class="footnote" id="fn7"><sup><a href="#fnm7">note 7</a></sup> The <em>join</em> operator for annotations is different from the join operator for relational algebra.</span> We provide an example in Figure&nbsp;<a href="#fig-time">3.10</a>, where \(G\) is annotated with values from a temporal domain using sets of integers (\(1{-}365\) to represent days of the year. For brevity we use intervals, where, e.g., \(\{[150,152]\}\) denotes the set \(\{150,151,152\}\). Query \(Q\) then asks for flights from Santiago to cities with events; this query will check and return an annotation reflecting the temporal validity of each answer. To derive these answers, we require a conjunction of annotations on compatible <span class="gelab">flight</span> and <span class="gelab">city</span> edges, using the <em>meet operator</em> to compute the annotation for which both edges hold. The natural way to define meet here is as the intersection of sets of days, where, for example, applying meet on the event annotation \(\color{blue}\{[150,152]\}\) and the flight annotation \(\color{blue}\{[1,120],[220,365]\}\) for <span class="gnode">Punta Arenas</span> leads to the empty time interval \(\color{blue}\{\}\), which may thus lead to the city being filtered from the results (depending on the query evaluation semantics). However, for <span class="gnode">Arica</span>, we find two different non-empty intersections: \(\color{blue}\{[123,125]\}\) for <span class="gnode">EID16</span> and \(\color{blue}\{[276,279]\}\) for <span class="gnode">EID17</span>. Given that we are interested in just the city (a projected variable), we can combine the two annotations for <span class="gnode">Arica</span> using the <em>join operator</em>, returning the annotation in which either result holds true. The natural way to define join is as the union of the sets of days, giving \(\color{blue}\{[123,125],[276,279]\}\).</p>

		<figure id="fig-time">
			<img src="images/fig-time1.svg" alt="Temporally annotated graph" class="multi"/>
			<img src="images/fig-time2.svg" alt="Example query"/>
			<div><div style="display:inline;">\(Q(G) :\) <table class="condensedTable" style="position:relative;top:.6em;display:inline-block;vertical-align:middle;"><thead><tr><th>?city</th><th>context</th></tr></thead><tbody><tr><td><code>Arica</code></td><td>\(\color{blue}\{[123,125],[276,279]\}\)</td></tr></tbody></table></div></div>
			<figcaption>Example query on a temporally annotated graph</figcaption>
		</figure>

		<div class="formal">
			<p>We define an annotation domain per <a href="#ref-zimm-etal-2012-JWS">Zimmermann et al. [2012]</a>.</p>

			<dl class="definition" id="def-anndom">
				<dt>Annotation domain</dt>
				<dd>Let \(A\) be a set of <em>annotation values</em>. An <em>annotation domain</em> is an idempotent, commutative semi-ring \(D = \langle A,\oplus,\otimes,\bot,\top \rangle\).</dd>
			</dl>

			<p>This definition can then instantiate specific domains of context.</p>
			<p>Letting \(D\) be a semi-ring imposes that, for any values \(a, a_1, a_2, a_3\) in \(A\), the following hold:</p>
			<ul>
				<li>\((a_1 \oplus a_2) \oplus a_3 = a_1 \oplus (a_2 \oplus a_3)\)</li>
				<li>\((\bot \oplus a) = (a \oplus \bot) = a\)</li>
				<li>\((a_1 \oplus a_2) = (a_2 \oplus a_1)\)</li>
				<li>\((a_1 \oplus a_2) = (a_2 \oplus a_1)\)</li>
				<li>\((a_1 \otimes a_2) \otimes a_3 = a_1 \otimes (a_2 \otimes a_3)\)</li>
				<li>\((\top \otimes a) = (a \otimes \top) = a\)</li>
				<li>\(a_1 \otimes (a_2 \oplus a_3) = (a_1 \otimes a_2) \oplus (a_1 \otimes a_3)\)</li>
				<li>\((a_1 \oplus a_2) \otimes a_3 = (a_1 \otimes a_3) \oplus (a_2 \otimes a_3)\)</li>
				<li>\((\bot \otimes a) = (a \otimes \bot) = \bot\)</li>
			</ul>
			<p>The requirement that it be idempotent further imposes the following:</p>
			<ul>
				<li>\((a \oplus a) = a\)</li>
			</ul>
			<p>Finally, the requirement that it be commutative imposes the following:</p>
			<ul>
				<li>\((a_1 \otimes a_2) = (a_2 \otimes a_1)\)</li>
			</ul>
			<p>Idempotence induces a partial order: \(a_1 \leq a_2\) if and only if \(a_1 \oplus a_2 = a_2\). Imposing these conditions on the annotation domain allow for reasoning and querying to be conducted over the annotation domain in a well-defined manner. Annotated graphs can then be defined in the natural way:</p>

			<dl class="definition" id="def-annotated-directed-edge-labelled-graph">
				<dt>Annotated directed edge-labelled graph</dt>
				<dd>Letting \(D = \langle A,\oplus,\otimes,\bot,\top \rangle\) denote an idempotent, commutative semi-ring, we define an <em>annotated directed edge-labelled graph</em> (or <em>annotated directed edge-labelled graph</em>) as \(G = (V,E_A,L)\) where \(V \subseteq \con\) is a set of nodes, \(L \subseteq \con\) is a set of edge labels, and \(E_A \subseteq V \times L \times V \times A\) is a set of edges annotated with values from \(A\).</dd>
			</dl>

			<div class="example">
				<p>Figure&nbsp;<a href="#fig-time">3.10</a> exemplifies query answering on a graph annotated with days of the year. Formally this domain can be defined as follows: \(A \coloneqq 2^{\mathbb{N}_{[1,365]}}\), \(\oplus \coloneqq \cup\), \(\otimes \coloneqq \cap\), \(\top \coloneqq \mathbb{N}_{[1,365]}\), \(\bot \coloneqq \emptyset\), where one may verify that \(D = \langle 2^{\mathbb{N}_{[1,365]}}, \cup, \cap, \mathbb{N}_{[1,365]}, \emptyset \rangle\) is indeed an idempotent, commutative semi-ring.</p>
			</div>
		</div>

		<h4 id="sssec-other-context" class="subsection">Other contextual frameworks</h4>
		<p>Other frameworks have been proposed for modelling and reasoning about context in graphs. A notable example is that of <em>contextual knowledge repositories</em>&nbsp;[<a href="#ref-SerafiniH12">Serafini and Homola, 2012</a>], which allow for assigning individual (sub-)graphs to their own context. Unlike in the case of named graphs, context is explicitly modelled along one or more dimensions, where each (sub-)graph takes a value for each dimension. Each dimension is associated with a partial order over its values – e.g., <span class="gnode">2020-03-22</span> \(\preceq\) <span class="gnode">2020-03</span> \(\preceq\) <span class="gnode">2020</span> – enabling the selection and combination of sub-graphs that are valid within contexts at different granularities. <a href="#ref-SchuetzBNSS20">Schuetz et al. [2021]</a> similarly propose a form of contextual OnLine Analytic Processing (OLAP), based on a data cube formed by dimensions where each cell contains a knowledge graph. Operations such as “<em>slice-and-dice</em>” (selecting knowledge according to given dimensions), as well as “<em>roll-up</em>” (aggregating knowledge at a higher level) are supported. We refer the reader to the respective papers for more details&nbsp;[<a href="#ref-SerafiniH12">Serafini and Homola, 2012</a>, <a href="#ref-SchuetzBNSS20">Schuetz et al., 2021</a>].</p>
		</section>
	</section>
	<section id="chap-deductive" class="chapter">
		<h2>Deductive Knowledge</h2>
		<p>As humans, we can <em>deduce</em> more from the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a> than what the edges explicitly indicate. We may deduce, for example, that the Ñam festival (<span class="gnode">EID15</span>) will be located in Santiago, even though the graph does not contain an edge <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>. We may further deduce that the cities connected by flights must have some airport nearby, even though the graph does not contain nodes referring to these airports. In these cases, given the data as premises, and some general rules about the world that we may know <em>a priori</em>, we can use a deductive process to derive new data, allowing us to know more than what is explicitly given by the data. These types of general premises and rules, when shared by many people, form part of “<em>commonsense knowledge</em>”&nbsp;[<a href="#ref-Commonsense">McCarthy, 1990</a>]; conversely, when rather shared by a few experts in an area, they form part of “<em>domain knowledge</em>”, where, for example, an expert in biology may know that <em>hemocyanin</em> is a protein containing copper that carries oxygen in the blood of some species of <em>Mollusca</em> and <em>Arthropoda</em>.</p>
		<p>Machines, in contrast, do not have <em>a priori</em> access to such deductive faculties; rather they need to be given formal instructions, in terms of premises and <em>entailment regimes</em>, facilitating similar deductions to what a human can make. In this way, we will be making more of the meaning (i.e., <em>semantics</em>) of the graph explicit in a machine-readable format. These entailment regimes formalise the conclusions that logically follow as a consequence of a given set of premises. Once instructed in this manner, machines can (often) apply deductions with a precision, efficiency, and scale beyond human performance. These deductions may serve a range of applications, such as improving query answering, (deductive) classification, finding inconsistencies, etc. As a concrete example involving query answering, assume we are interested in knowing <em>the festivals located in Santiago</em>; we may straightforwardly express such a query as per the graph pattern shown in Figure&nbsp;<a href="#fig-bgpFS">4.1</a>. This query returns no results for the graph in Figure&nbsp;<a href="#fig-delg">2.1</a>: there is no node named <span class="gnode">Festival</span>, and nothing has (directly) the <span class="gelab">location</span> <span class="gnode">Santiago</span>. However, an answer (<span class="gnode">Ñam</span>) could be automatically entailed were we to state that \(x\) being a Food Festival <em>entails</em> that \(x\) is a Festival, or that \(x\) having venue \(y\) in city \(z\) <em>entails</em> that \(x\) has location \(z\). How, then, should such entailments be captured? In Section&nbsp;<a href="#sec-semSchema">3.1.1</a> we already discussed how the former entailment can be captured with sub-class relations in a semantic schema; the second entailment, however, requires a more expressive entailment regime than seen thus far.</p>

		<figure id="fig-bgpFS">
			<img src="images/fig-bgpFS.svg" alt="Graph pattern querying for names of festivals in Santiago"/>
			<figcaption>Graph pattern querying for names of festivals in Santiago</figcaption>
		</figure>

		<p>In this chapter, we discuss ways in which more complex entailments can be expressed and automated. Though we could leverage a number of logical frameworks for these purposes – such as First-Order Logic, Datalog, Prolog, Answer Set Programming, etc. – we focus on <em>ontologies</em>, which constitute a formal representation of knowledge that, importantly for us, can be represented as a graph. We then discuss how these ontologies can be formally defined, how they relate to existing logical frameworks, and how reasoning can be conducted with respect to such ontologies.</p>

		<section id="ssec-ontologies" class="section">
		<h3>Ontologies</h3>
		<p>To enable entailment, we must be precise about what the terms we use mean. Returning to Figure&nbsp;<a href="#fig-delg">2.1</a>, for example, and examining the node <span class="gnode">EID16</span> more closely, we may begin to question how it is modelled, particularly in comparison with <span class="gnode">EID15</span>. Both nodes – according to the class hierarchy of Figure&nbsp;<a href="#fig-classhier">3.1</a> – are considered to be events. But what if, for example, we wish to define two pairs of start and end dates for <span class="gnode">EID16</span> corresponding to the different venues? Should we rather consider what takes place in each venue as a different event? What then if an event has various start and end dates in a single venue: would these also be considered as one (recurring) event, or many events? These questions are facets of a more general question: <em>what precisely do we mean by an “event”</em>? Does it happen in one contiguous time interval or can it happen many times? Does it happen in one place or can it happen in multiple? There are no “correct” answers to such questions – we may understand the term “event” in a variety of ways, and thus the answers are a matter of <em>convention</em>.</p>
		<p>In the context of computing, an <em>ontology</em><sup class="fnmark" id="fnm8"><a href="#fn8">8</a></sup><span class="footnote" id="fn8"><sup><a href="#fnm8">note 8</a></sup> The term stems from the philosophical study of <em>ontology</em>, concerning the kinds of entities that exist, the nature of their existence, what kinds of properties they have, and how they may be identified and categorised.</span> is then a concrete, formal representation of what terms mean within the scope in which they are used (e.g., a given domain). For example, one event ontology may formally define that if an entity is an “event”, then it has precisely one venue and precisely one time instant in which it begins. Conversely, a different event ontology may define that an “event” can have multiple venues and multiple start times, etc. Each such ontology formally captures a particular perspective – a particular <em>convention</em>. Under the first ontology, for example, we could not call the Olympics an “event”, while under the second ontology we could. Likewise ontologies can guide how graph data are modelled. Under the first ontology we may split <span class="gnode">EID16</span> into two events. Under the second, we may elect to keep <span class="gnode">EID16</span> as one event with two venues. Ultimately, given that ontologies are formal representations, they can be used to automate entailment.</p>
		<p>Like all conventions, the usefulness of an ontology depends on the level of agreement on what that ontology defines, how detailed it is, and how broadly and consistently it is adopted. Adoption of an ontology by the parties involved in one knowledge graph may lead to a consistent use of terms and consistent modelling in that knowledge graph. Agreement over multiple knowledge graphs will, in turn, enhance the interoperability of those knowledge graphs.</p>
		<p>Amongst the most popular ontology languages used in practice are the <em>Web Ontology Language</em> (<em>OWL</em>)&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>]<sup class="fnmark" id="fnm9"><a href="#fn9">9</a></sup><span class="footnote" id="fn9"><sup><a href="#fnm9">note 9</a></sup> We could include RDF Schema (RDFS) in this list, but it is largely subsumed by OWL, which extends its core.</span>, recommended by the W3C and compatible with RDF graphs; and the <em>Open Biomedical Ontologies Format</em> (<em>OBOF</em>)&nbsp;[<a href="#ref-obof">Mungall et al., 2012</a>], used mostly in the biomedical domain. Since OWL is the more widely adopted, we focus on its features, though many similar features are found in both&nbsp;[<a href="#ref-obof">Mungall et al., 2012</a>]. Before introducing such features, however, we must discuss how graphs are to be <em>interpreted</em>.</p>

		<h4 id="sssec-interpretations" class="subsection">Interpretations and models</h4>
		<p>We as humans may <em>interpret</em> the node <span class="gnode">Santiago</span> in the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a> as referring to the real-world city that is the capital of Chile. We may further <em>interpret</em> an edge <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> as stating that there are flights from the city of Arica to this city. We thus interpret the data graph as another graph – what we here call the <em>domain graph</em> – composed of real-world entities connected by real-world relations. The process of interpretation, here, involves <em>mapping</em> the nodes and edges in the data graph to nodes and edges of the domain graph.</p>
		<p>Along these lines, we can abstractly define an <em>interpretation</em> of a data graph as being composed of two elements: a domain graph, and a mapping from the <em>terms</em> (nodes and edge-labels) of the data graph to those of the domain graph. The domain graph follows the same model as the data graph; for example, if the data graph is a directed edge-labelled graph, then so too will be the domain graph. For simplicity, we will speak of directed edge-labelled graphs and refer to the nodes of the domain graph as <em>entities</em>, and to its edges as <em>relations</em>. Given a data graph and an interpretation, while we denote nodes in the data graph by <span class="gnode">Santiago</span>, we will denote the entity it refers to in the domain graph by <span class="ginode">Santiago</span> (per the mapping of the given interpretation). Likewise, while we denote an edge by <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, we will denote the relation by <span class="ginode">Arica</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">flight</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Santiago</span> (again, per the mapping of the given interpretation). In this abstract notion of an interpretation, we do not require that <span class="ginode">Santiago</span> or <span class="ginode">Arica</span> be the real-world cities, nor even that the domain graph contain real-world entities and relations: an interpretation can have any domain graph and mapping.</p>
		<p>Why is such an abstract notion of interpretation useful? The distinction between nodes/edges and entities/relations becomes important when we define the meaning of ontology features and entailment. To illustrate this distinction, if we ask whether there is an edge labelled <span class="gelab">flight</span> between <span class="gnode">Arica</span> and <span class="gnode">Viña&nbsp;del&nbsp;Mar</span> for the data graph in Figure&nbsp;<a href="#fig-delg">2.1</a>, the answer is <em>no</em>. However, if we ask if the entities <span class="ginode">Arica</span> and <span class="ginode">Viña&nbsp;del&nbsp;Mar</span> are connected by the relation <span class="gielab">flight</span>, then the answer depends on what assumptions we make when interpreting the graph. Under the Closed World Assumption (CWA), if we do not have additional knowledge, then the answer is a definite <em>no</em> – since what is not known is assumed to be false. Conversely, under the Open World Assumption (OWA), we cannot be certain that this relation does not exist as this could be part of some knowledge not (yet) described by the graph. Likewise under the Unique Name Assumption (UNA), the data graph describes <em>at least two</em> flights to <span class="ginode">Santiago</span> (since <span class="ginode">Viña&nbsp;del&nbsp;Mar</span> and <span class="ginode">Arica</span> are assumed to be different entities and, therefore, <span class="ginode">Arica</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">flight</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Santiago</span> and <span class="ginode">Viña&nbsp;del&nbsp;Mar</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">flight</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Santiago</span> must be different edges). Conversely, under No Unique Name Assumption (NUNA), we can only say that there is <em>at least one</em> such flight since <span class="ginode">Viña&nbsp;del&nbsp;Mar</span> and <span class="ginode">Arica</span> may be the same entity with two “names”.</p>
		<p>These assumptions (or lack thereof) define which interpretations are valid, and which interpretations <em>satisfy</em> which data graphs. We call an interpretation that satisfies a data graph a <em>model</em> of that data graph. The UNA forbids interpretations that map two data terms to the same domain term. The NUNA allows such interpretations. Under the CWA, an interpretation that contains an edge <span class="ginode">x</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">p</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">y</span> in its domain graph can only satisfy a data graph from which we can entail <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">y</span>. Under the OWA, an interpretation containing the edge <span class="ginode">x</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">p</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">y</span> can satisfy a data graph not entailing <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">y</span> so long it does not explicitly contradict that edge. OWL adopts the NUNA and OWA, which is the most general case: multiple nodes/edge-labels in the graph may refer to the same entity/relation-type (per the NUNA), and anything not entailed by the data graph is <em>not</em> assumed to be false as a consequence (per the OWA).</p>

		<div class="formal">
			<p>A graph interpretation – or simply interpretation – captures the assumptions under which the semantics of a graph can be defined. We define interpretations for directed edge-labelled graphs, though the notion extends naturally to other graph models (assuming the data and domain graphs follow the same model).</p>

			<dl class="definition" id="def-graph-interpretation">
				<dt>Graph interpretation</dt>
				<dd>A <em>(graph) interpretation</em> \(I\) is defined as a pair \(I \coloneqq (\Gamma,\inp{\cdot})\) where \(\Gamma = (V_\Gamma,E_\Gamma,L_\Gamma)\) is a (directed edge-labelled) graph called the <em>domain graph</em> and \(\inp{\cdot} : \con \rightarrow V_\Gamma \cup L_\Gamma\) is a partial mapping from constants to terms in the domain graph. </dd>
			</dl>

			<p>We denote the domain of the mapping \(\inp{\cdot}\) by \(\textrm{dom}(\inp{\cdot})\). For interpretations under the UNA, the mapping \(\inp{\cdot}\) is required to be injective, while with no UNA (NUNA), no such requirement is necessary.</p>
			<p>Interpretations that <em>satisfy</em> a graph are then said to be <em>models</em> of that graph.</p>

			<dl class="definition" id="def-gmodel">
				<dt>Graph models</dt>
				<dd>Let \(G \coloneqq (V,E,L)\) be a directed edge-labelled graph. An interpretation \(I \coloneqq (\Gamma,\inp{\cdot})\) <em>satisfies</em> \(G\) if and only if the following hold:
				<ul>
					<li>\(V \cup L \subseteq \textrm{dom}(\inp{\cdot})\);</li>
					<li>for all \(v \in V\), it holds that \(\inp{v} \in V_\Gamma\);</li>
					<li>for all \(l \in L\), it holds that \(\inp{l} \in L_\Gamma\); and</li>
					<li>for all \((u,l,v) \in E\), it holds that \((\inp{u},\inp{l},\inp{v}) \in E_\Gamma\).</li>
				</ul>
				If \(I\) <em>satisfies</em> \(G\) we call \(I\) a <em>(graph) model</em> of \(G\).</dd>
			</dl>
		</div>

		<h4 id="ssec-ontology-features" class="subsection">Ontology features</h4>
		<p>Beyond our base assumptions, we can associate certain patterns in the data graph with <em>semantic conditions</em> that define which interpretations satisfy it; for example, we can add a semantic condition to enforce that if our data graph contains the edge <span class="gnode">p</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">q</span>, then any edge <span class="ginode">x</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">p</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">y</span> in the domain graph of the interpretation must also have a corresponding edge <span class="ginode">x</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">q</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">y</span> to satisfy the data graph. These semantic conditions then form the features of an ontology language. In what follows, to aid readability, we will introduce the features of OWL using an abstract graphical notation with abbreviated terms. For details of concrete syntaxes, we rather refer to the OWL and OBOF standards&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>, <a href="#ref-obof">Mungall et al., 2012</a>]. Likewise we present semantic conditions over interpretations for each feature in the same graphical format;<sup class="fnmark" id="fnm10"><a href="#fn10">10</a></sup><span class="footnote" id="fn10"><sup><a href="#fnm10">note 10</a></sup> We abbreviate “if and only if” as “iff” whereby “\(\phi\) iff \(\psi\)” can be read as “if \(\phi\) then \(\psi\)” and “if \(\psi\) then \(\phi\)”.</span> further details of these conditions will be described later in Section&nbsp;<a href="#sec-ontSemantics">4.1.3</a>.</p>

		<h5 id="sssec-individuals" class="subsubsection">Individuals</h5>
		<p>In Table&nbsp;<a href="#tab-ontEqIneq">4.1</a>, we list the main features supported by OWL for describing <em>individuals</em> (e.g., <span class="sf">Santiago</span>, <span class="sf">EID16</span>), sometimes distinguished from classes and properties. First, we can <em>assert</em> (binary) relations between individuals using edges such as <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>. In the condition column, when we write <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(y\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span>, for example, we refer to the condition that the relation is given in the domain graph of the interpretation; if so, the interpretation satisfies the axiom. OWL further allows for defining relations to explicitly state that two terms refer to the <em>same</em> entity, where, e.g., <span class="gnode">Región&nbsp;V</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">same&nbsp;as</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Región de Valparaíso</span> states that both refer to the same region (per Section&nbsp;<a href="#sec-identity">3.2</a>); or that two terms refer to <em>different</em> entities, where, e.g., <span class="gnode">Valparaíso</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">diff.&nbsp;from</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Región&nbsp;de&nbsp;Valparaíso</span> distinguishes the city from the region of the same name. We may also state that a relation does not hold using <em>negation</em>, which can be serialised as a graph using a form of reification (see Figure&nbsp;<a href="#fig-reif">3.8a</a>).</p>

		<table class="normalTable" id="tab-ontEqIneq">
			<caption>Ontology features for individuals</caption>
			<thead>
				<tr>
					<th>Feature</th>
					<th>Axiom</th>
					<th>Condition</th>
					<th>Example</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>Assertion</td>
					<td><span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(y\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(z\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(y\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span></td>
					<td><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">capital</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span></td>
				</tr>
				<tr>
					<td>Negation</td>
					<td><img class="inside" src="images/tab-ontEqIneq-neg-axiom.svg" alt="negation axiom" /></td>
					<td>not <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(y\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span></td>
					<td><img class="inside" src="images/tab-ontEqIneq-neg-example.svg" alt="negation example" /></td>
				</tr>
				<tr>
					<td>Same As</td>
					<td><span class="gnode">\(x_1\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">same&nbsp;as</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(x_2\)</span></td>
					<td><span class="ginode">\(x_1\)</span> = <span class="ginode">\(x_2\)</span></td>
					<td><span class="gnode">Región&nbsp;V</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">same&nbsp;as</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Región&nbsp;de&nbsp;Valparaíso</span></td>
				</tr>
				<tr>
					<td>Different From</td>
					<td><span class="gnode">\(x_1\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">diff.&nbsp;from</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(x_2\)</span></td>
					<td><span class="ginode">\(x_1\)</span> ≠ <span class="ginode">\(x_2\)</span></td>
					<td><span class="gnode">Valparaíso</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">diff.&nbsp;from</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Región&nbsp;de&nbsp;Valparaíso</span></td>
				</tr>
			</tbody>
		</table>

		<h5 id="sssec-properties" class="subsubsection">Properties</h5>
		<p>In Section&nbsp;<a href="#sec-semSchema">3.1.1</a>, we already discussed how <em>sub-properties</em>, <em>domains</em> and <em>ranges</em> may be defined for properties. OWL allows such definitions, and further includes other features, as listed in Table&nbsp;<a href="#tab-ontProp">4.2</a>. We may define a pair of properties to be <em>equivalent</em>, <em>inverses</em>, or <em>disjoint</em>. We can further define a particular property to denote a <em>transitive</em>, <em>symmetric</em>, <em>asymmetric</em>, <em>reflexive</em>, or <em>irreflexive</em> relation. We can also define the multiplicity of the relation denoted by properties, based on being <em>functional</em> (many-to-one) or <em>inverse-functional</em> (one-to-many). We may further define a <em>key</em> for a class, denoting the set of properties whose values uniquely identify the entities of that class. Without adopting a Unique Name Assumption (UNA), from these latter three features we may conclude that two or more terms refer to the same entity. Finally, we can relate a property to a <em>chain</em> (a path expression only allowing concatenation of properties) such that pairs of entities related by the chain are also related by the given property. Note that for the latter two features in Table&nbsp;<a href="#tab-ontProp">4.2</a> we require representing a list, denoted with a vertical notation <span class="gnode">⋮</span>; while such a list may be serialised as a graph in a number of concrete ways, OWL uses RDF lists (see Figure&nbsp;<a href="#fig-list">3.7</a>).</p>

		<table class="normalTable" id="tab-ontProp">
			<caption>Ontology features for property axioms</caption>
			<thead>
				<tr>
					<th>Feature</th>
					<th>Axiom</th>
					<th>Condition <span style="font-weight: normal">(for all \(x_*\), \(y_*\), \(z_*\))</span></th>
					<th>Example</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>Sub-property</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(q\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> implies <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(q\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">location</span></td>
				</tr>
				<tr>
					<td>Domain</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> implies <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span></td>
				</tr>
				<tr>
					<td>Range</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">range</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(c\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> implies <span class="ginode">\(y\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">range</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Venue</span></td>
				</tr>
				<tr>
					<td>Equivalence</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">equiv.&nbsp;p.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(q\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> iff <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(q\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span></td>
					<td><span class="gnode">start</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">equiv.&nbsp;p.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">begins</span></td>
				</tr>
				<tr>
					<td>Inverse</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(q\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> iff <span class="ginode">\(y\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(q\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(x\)</span></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">hosts</span></td>
				</tr>
				<tr>
					<td>Disjoint</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">disj.&nbsp;p.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(q\)</span></td>
					<td>not <img class="inside" src="images/tab-ontProp-disj-cond.svg" alt="disjoint condition" /></td>
					<td><span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">disj.&nbsp;p.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">hosts</span></td>
				</tr>
				<tr>
					<td>Transitive</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Transitive</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; implies <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span></td>
					<td><span class="gnode">part&nbsp;of</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Transitive</span></td>
				</tr>
				<tr>
					<td>Symmetric</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span> iff <span class="ginode">\(y\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(x\)</span></td>
					<td><span class="gnode">nearby</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span></td>
				</tr>
				<tr>
					<td>Asymmetric</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Asymmetric</span></td>
					<td>not <img class="inside" src="images/tab-ontProp-asym-cond.svg" alt="asymmetric condition" /></td>
					<td><span class="gnode">capital</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Asymmetric</span></td>
				</tr>
				<tr>
					<td>Reflexive</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Reflexive</span></td>
					<td><img class="inside" src="images/tab-ontProp-refl-cond.svg" alt="reflexive condition" /></td>
					<td><span class="gnode">part&nbsp;of</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Reflexive</span></td>
				</tr>
				<tr>
					<td>Irreflexive</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Irreflexive</span></td>
					<td>not <img class="inside" src="images/tab-ontProp-refl-cond.svg" alt="irreflexive condition" /></td>
					<td><span class="gnode">flight</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Irreflexive</span></td>
				</tr>
				<tr>
					<td>Functional</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Functional</span></td>
					<td><span class="ginode">\(y_1\)</span><img class="tip" src="images/edge-revtip2.png" width="15" alt="arrow tip lefttward"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y_2\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; implies <span class="ginode">\(y_1\)</span> = <span class="ginode">\(y_2\)</span></td>
					<td><span class="gnode">population</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Functional</span></td>
				</tr>
				<tr>
					<td>Inv. Functional</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Inv.&nbsp;Functional</span></td>
					<td><span class="ginode">\(x_1\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span><img class="tip" src="images/edge-revtip2.png" width="15" alt="arrow tip lefttward"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="ginode">\(x_2\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; implies <span class="ginode">\(x_1\)</span> = <span class="ginode">\(x_2\)</span></td>
					<td><span class="gnode">capital</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Inv.&nbsp;Functional</span></td>
				</tr>
				<tr>
					<td>Key</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">key</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">\(p_1\)<br/>⋮<br/>\(p_n\)</span></td>
					<td><img style="margin-left:0;" class="inside" src="images/tab-ontProp-key-cond.svg" alt="key condition premise" />&thinsp;implies&thinsp;<span class="ginode">\(x_1\)</span>=<span class="ginode">\(x_2\)</span></td>
					<td><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">key</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">lat<br/>long</span></td>
				</tr>
				<tr>
					<td>Chain</td>
					<td><span class="gnode">\(p\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">chain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">\(q_1\)<br/>⋮<br/>\(q_n\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(q_1\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y_1\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/>…<img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y_{n-1}\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(q_n\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; implies <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(z\)</span></td>
					<td><span class="gnode">location</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">chain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">location<br/>part&nbsp;of</span></td>
				</tr>
			</tbody>
		</table>

		<h5 id="sssec-classes" class="subsubsection">Classes</h5>
		<p>In Section&nbsp;<a href="#sec-semSchema">3.1.1</a>, we discussed how class hierarchies can be modelled using a <em>sub-class</em> relation. OWL supports sub-classes, and many additional features, for defining and making claims about classes; these additional features are summarised in Table&nbsp;<a href="#tab-ontClass">4.3</a>. Given a pair of classes, OWL allows for defining that they are <em>equivalent</em>, or <em>disjoint</em>. Thereafter, OWL provides a variety of features for defining novel classes by applying set operators on other classes, or based on conditions that the properties of its instances satisfy. First, using set operators, one can define a novel class as the <em>complement</em> of another class, the <em>union</em> or <em>intersection</em> of a list (of arbitrary length) of other classes, or as an <em>enumeration</em> of all of its instances. Second, by placing restrictions on a particular property \(p\), one can define classes whose instances are all of the entities that have: <em>some value</em> from a given class on \(p\); <em>all values</em> from a given class on \(p\);<sup class="fnmark" id="fnm11"><a href="#fn11">11</a></sup><span class="footnote" id="fn11"><sup><a href="#fnm11">note 11</a></sup> While something like <span class="gnode">flight</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">prop</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gnode">DomesticAirport</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">all</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">NationalFlight</span> might appear to be a more natural example for <span class="sc">All Values</span>, this would be problematic as the corresponding <em>for all</em> condition is satisfied when no such node exists, so we would infer anything known not to have any flights to be a domestic airport. (We could, however, define the intersection of such a definition and airport as being a domestic airport.)</span> have a specific individual as a value on \(p\) (<em>has value</em>); have themselves as a reflexive value on \(p\) (<em>has self</em>); have at least, at most or exactly some number of values on \(p\) (<em>cardinality</em>); and have at least, at most or exactly some number of values on \(p\) from a given class (<em>qualified cardinality</em>). For the latter two cases, in Table&nbsp;<a href="#tab-ontClass">4.3</a>, we use the notation “\(\#\{\)<span class="ginode">a</span>\(\mid \phi \}\)” to count distinct entities satisfying \(\phi\) in the interpretation. These features can then be combined to create more complex classes, where combining the examples for <span class="sc">Intersection</span> and <span class="sc">Has Self</span> in Table&nbsp;<a href="#tab-ontClass">4.3</a> gives the definition: <em>self-driving taxis are taxis having themselves as a driver</em>.</p>

		<table class="normalTable" id="tab-ontClass">
			<caption>Ontology features for class axioms and definitions</caption>
			<thead>
				<tr>
					<th>Feature</th>
					<th>Axiom</th>
					<th>Condition <span style="font-weight: normal">(for all \(x_*\), \(y_*\), \(z_*\))</span></th>
					<th>Example</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>Sub-class</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> implies <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></td>
					<td><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Place</span></td>
				</tr>
				<tr>
					<td>Equivalence</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">equiv.&nbsp;c.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></td>
					<td><span class="gnode">Human</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">suc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Person</span></td>
				</tr>
				<tr>
					<td>Disjoint</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">disj.&nbsp;c.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td>not <span class="ginode">\(c\)</span><img class="tip" src="images/edge-revtip2.png" width="15" alt="arrow tip lefttward"/><span class="iedge">type</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></td>
					<td><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">disj.&nbsp;c.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Region</span></td>
				</tr>
				<tr>
					<td>Complement</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">comp.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(d\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff not <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></td>
					<td><span class="gnode">Dead</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">com.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Alive</span></td>
				</tr>
				<tr>
					<td>Union</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">union</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">\(d_1\)<br/>⋮<br/>\(d_n\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <div class="stack-tab"><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d_1\)</span> or<br/><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">…</span> or<br/><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d_n\)</span></div></td>
					<td><span class="gnode">Flight</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">union</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">DomesticFlight<br/>InternationalFlight</span></td>
				</tr>
				<tr>
					<td>Intersection</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inter.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">\(d_1\)<br/>⋮<br/>\(d_n\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <img class="inside" src="images/tab-ontClass-inter-cond.svg" alt="intersection condition equiv" /></td>
					<td><span class="gnode">SelfDrivingTaxi</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inter.</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">Taxi<br/>SelfDriving</span></td>
				</tr>
				<tr>
					<td>Enumeration</td>
					<td><span class="gnode">\(c\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">one&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">\(x_1\)<br/>⋮<br/>\(x_n\)</span></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <span class="ginode">\(x\)</span> \(\in \{\)<span class="ginode">\(x_1\)</span>\(,\dots,\)<span class="ginode">\(x_n\)</span>\(\}\)</td>
					<td><span class="gnode">EUState</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">one&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode stack-tab">Austria<br/>⋮<br/>Sweden</span></td>
				</tr>
				<tr>
					<td>Some Values</td>
					<td><img class="inside" src="images/tab-ontClass-someval-axiom.svg" alt="some values axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <div class="stack-tab">there exists <span class="ginode">\(a\)</span> such that<br/><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(a\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></div></td>
					<td><img class="inside" src="images/tab-ontClass-someval-example.svg" alt="some values example" /></td>
				</tr>
				<tr>
					<td>All Values</td>
					<td><img class="inside" src="images/tab-ontClass-allval-axiom.svg" alt="all values axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <div class="stack-tab">for all <span class="ginode">\(a\)</span> with <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(a\)</span><br/>it holds that <span class="ginode">\(a\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span></div></td>
					<td><img class="inside" src="images/tab-ontClass-allval-example.svg" alt="all values example" /></td>
				</tr>
				<tr>
					<td>Has Value</td>
					<td><img class="inside" src="images/tab-ontClass-hasval-axiom.svg" alt="has value axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(y\)</span></td>
					<td><img class="inside" src="images/tab-ontClass-hasval-example.svg" alt="has value example" /></td>
				</tr>
				<tr>
					<td>Has Self</td>
					<td><img class="inside" src="images/tab-ontClass-hasself-axiom.svg" alt="has self axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span> iff <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(x\)</span></td>
					<td><img class="inside" src="images/tab-ontClass-hasself-example.svg" alt="has self example" /></td>
				</tr>
				<tr>
					<td>Cardinality<br/>\(\star \in \{ =, \leq, \geq \}\)</td>
					<td><img class="inside" src="images/tab-ontClass-card-axiom.svg" alt="cardinality axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; iff \(\#\{\)<span class="ginode">a</span> \(\mid\) <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(a\)</span>\(\} \star n\)</td>
					<td><img class="inside" src="images/tab-ontClass-card-example.svg" alt="cardinality example" /></td>
				</tr>
				<tr>
					<td>Qualified<br/>Cardinality<br/>\(\star \in \{ =, \leq, \geq \}\)</td>
					<td><img class="inside" src="images/tab-ontClass-qualcard-axiom.svg" alt="qualified cardinality axiom" /></td>
					<td><span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(c\)</span><br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; iff \(\#\{\)<span class="ginode">a</span> \(\mid\) <span class="ginode">\(x\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">\(p\)</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(a\)</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">\(d\)</span>\(\} \star n\)</td>
					<td><img class="inside" src="images/tab-ontClass-qualcard-example.svg" alt="qualified cardinality example" /></td>
				</tr>
			</tbody>
		</table>

		<h5 id="sssec-other-features" class="subsubsection">Other features</h5>
		<p>OWL supports other language features not previously discussed, including: <em>annotation properties</em>, which provide metadata about ontologies, such as versioning info; <em>datatype vs.&nbsp;object properties</em>, which distinguish properties that take datatype values from those that do not; and <em>datatype facets</em>, which allow for defining new datatypes by applying restrictions to existing datatypes, such as to define that places in Chile must have a <em>float between \(-66.0\) and \(-110.0\)</em> as their value for the (datatype) property <span class="gelab">latitude</span>. For more details we refer to the OWL 2 standard&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>]. We will further discuss methodologies for the creation of ontologies in Section&nbsp;<a href="#ssec-knowledgeConceptual">6.5</a>.</p>

		<h5 id="sssec-modesl-under-semantic-conditions" class="subsubsection">Models under semantic conditions</h5>
		<p>Each axiom described by the previous tables, when added to a graph, enforces some condition(s) on the models the graph. If we were to consider only the base condition of the <span class="sc">Assertion</span> feature in Table&nbsp;<a href="#tab-ontEqIneq">4.1</a>, for example, then the models of a graph would be any interpretation such that for every edge <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">y</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">z</span> in the graph, there exists a relation <span class="ginode">x</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">y</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">z</span> in the model. Given that there may be other relations in the model (under the OWA), the number of models of any such graph is infinite. Furthermore, given that we can map multiple nodes in the graph to one entity in the model (under the NUNA), any interpretation with (for example) the relation <span class="ginode">a</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">a</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">a</span> is a model of any graph so long as for every edge <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">y</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">z</span> in the graph, it holds that <span class="ginode">x</span> = <span class="ginode">y</span> = <span class="ginode">z</span> = <span class="ginode">a</span> in the interpretation (in other words, the interpretation maps everything to <span class="ginode">a</span>). As we add axioms with their associated conditions to the graph, we restrict models for the graph; for example, considering a graph with two edges – <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">y</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">z</span> and <span class="gnode">y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Irreflexive</span> – the interpretation with <span class="ginode">a</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">a</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">a</span>, <span class="ginode">x</span> = <span class="ginode">y</span> = … = <span class="ginode">a</span> is no longer a model as it breaks the condition for the irreflexive axiom. In this way, we can define a precise model-theoretic semantics for graphs based on how the aforementioned ontological features used in the graph restrict the models of that graph. %This model-theoretical semantics then allows us to define <em>entailment</em> between graphs using such features.</p>

		<div class="formal">
			<p>We define models under semantics conditions.</p>

			<dl class="definition" id="def-semantic-condition">
				<dt>Semantic condition</dt>
				<dd>Let \(2^G\) denote the set of all (directed edge-labelled) graphs. A <em>semantic condition</em> is a mapping \(\phi : 2^{G} \rightarrow \{ \text{true}, \text{false} \}\). An interpretation \(I \coloneqq (\Gamma,\inp{\cdot})\) is a model of \(G\) under \(\phi\) if and only if \(I\) is a model of \(G\) and \(\phi(\Gamma)\). Given a set of semantic conditions \(\Phi\), we say that \(I\) is a model of \(G\) if and only if \(I\) is a model of \(G\) and for all \(\phi \in \Phi\), \(\phi(\Gamma)\) is true.</dd>
			</dl>

			<p>We do not restrict the language used to define semantic conditions, but, for example, we can define the <span class="sc">Has Value</span> semantic condition of Table&nbsp;<a href="#tab-ontClass">4.3</a> in FOL as:</p>
			<p class="mathblock">\(\forall c, p, y \Big( \big( \Gamma(c,\)<span class="gelab">prop</span>\(,p) \wedge \Gamma(c,\)<span class="gelab">value</span>\(,y) \big) \leftrightarrow \forall x \big( \Gamma(x,\)<span class="gelab">type</span>\(,c) \leftrightarrow \Gamma(x,p,y) \big) \Big)\)</p>
			<p>Here we overload \(\Gamma\) as a ternary predicate to capture the edges of \(\Gamma\). The other semantic conditions enumerated in Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a> can be defined in a similar way&nbsp;[<a href="#ref-SchneiderS11">Schneider and Sutcliffe, 2011</a>].<sup class="fnmark" id="fnm12"><a href="#fn12">12</a></sup><span class="footnote" id="fn12"><sup><a href="#fnm12">note 12</a></sup> Although these tables consider axioms originating in the data graph, it suffices to check their image in the domain graph since \(I\) only satisfies \(G\) if the edges of \(G\) defining the axioms are reflected in the domain graph of \(I\) per Definition&nbsp;<a href="#def-gmodel">4.2</a>. This then simplifies the definitions considerably.</span> This FOL formula defines an if-and-only-if version of the semantic condition for <span class="sc">Has Value</span> (described in Section&nbsp;<a href="#sec-iff">4.1.4</a>).</p>
		</div>
		
		<h4 id="sec-ontSemantics" class="subsection">Entailment</h4>
		<p>The conditions listed in the previous tables give rise to <em>entailments</em>, where, for example, in reference to the <span class="sc">Symmetric</span> feature of Table&nbsp;<a href="#tab-ontProp">4.2</a>, the definition <span class="gnode">nearby</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span> and edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">nearby</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago&nbsp;Airport</span> entail the edge <span class="gnode">Santiago&nbsp;Airport</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">nearby</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> according to the condition given for that feature. We now describe how these conditions lead to entailments.</p>
		<p>We say that one graph <em>entails</em> another if and only if any model of the former graph is also a model of the latter graph. Intuitively this means that the latter graph says nothing new over the former graph and thus holds as a logical consequence of the former graph. For example, consider the graph <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">City</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Place</span> and the graph <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Place</span>. All models of the latter must have that <span class="ginode">Santiago</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Place</span>, but so must all models of the former, which must have <span class="ginode">Santiago</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">City</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Place</span> and further must satisfy the condition for <span class="sc">Sub-class</span>, which requires that <span class="ginode">Santiago</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Place</span> also hold. Hence we conclude that any model of the former graph must be a model of the latter graph, or, in other words, the former graph entails the latter graph.</p>

		<div class="formal">
			<p>We now formally define entailment under semantic conditions.</p>

			<dl class="definition" id="def-ent">
				<dt>Graph entailment</dt>
				<dd>Letting \(G_1\) and \(G_2\) denote two (directed edge-labelled) graphs, and \(\Phi\) a set of semantic conditions, we say that <em>\(G_1\) entails \(G_2\) under \(\Phi\)</em> – denoted \(G_1 \models_\Phi G_2\) – if and only if any model of \(G_1\) under \(\Phi\) is also a model of \(G_2\) under \(\Phi\).</dd>
			</dl>

			<p>An example of entailment is discussed in Section&nbsp;<a href="#sec-ontSemantics">4.1.3</a>.<sup class="fnmark" id="fnm13"><a href="#fn13">13</a></sup><span class="footnote" id="fn13"><sup><a href="#fnm13">note 13</a></sup> Here we have defined entailment under OWA. To define entailment under CWA, let \(G \models_\Phi (s,p,o)\) denote that \(G\) entails the edge \((s,p,o)\) under \(\Phi\) (a slight abuse of notation). Under CWA, we make the additional assumption that if \(G \not\models_\Phi e\), where \(e\) is an edge (strictly speaking, a <em>positive</em> edge), then \(G \models_\Phi \neg e\); in other words, under CWA we assume that any (positive) edges that \(G\) does not entail under \(\Phi\) can be assumed false according to \(G\) and \(\Phi\). However, note that in FOL, the CWA only applies to positive <em>facts</em>, whereas edges in a graph can be used to represent other FOL formulae. If one wished to maintain FOL-compatibility under CWA, additional restrictions on the types of edge \(e\) may be needed.</span></p>
		</div>

		<h4 id="sec-iff" class="subsection">If–then vs. if-and-only-if semantics</h4>
		<p>Consider the graph <span class="gnode">nearby</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span> and the graph <span class="gnode">nearby</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">nearby</span>. Both of these graphs result in the same semantic conditions being applied in the domain graph, but does one entail the other? The answer depends on the semantics applied. Considering the axioms and conditions of Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>, we can consider two semantics. Under <em>if</em>–<em>then</em> semantics – <em>if</em> <strong>Axiom</strong> matches the data graph <em>then</em> <strong>Condition</strong> holds in domain graph – the graphs do not entail each other: though both graphs give rise to the same condition, this condition is not translated back into the axioms that describe it.<sup class="fnmark" id="fnm14"><a href="#fn14">14</a></sup><span class="footnote" id="fn14"><sup><a href="#fnm14">note 14</a></sup> Here, <span class="ginode">nearby</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">type</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">Symmetric</span> is a model of the first graph but not the second, while <span class="ginode">nearby</span><img class="tip" src="images/edge-source2.png" width="8" alt="arrow source"/><span class="iedge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip2.png" width="15" alt="arrow tip rightward"/><span class="ginode">nearby</span> is a model of the second graph but not the first. Hence neither graph entails the other.</span> Conversely, under <em>if-and-only-if</em> semantics – <strong>Axiom</strong> matches data graph <em>if-and-only-if</em> <strong>Condition</strong> holds in domain graph – the graphs entail each other: both graphs give rise to the same condition, which is translated back into all possible axioms that describe it. Hence if-and-only-if semantics allows for entailing more axioms in the ontology language than if–then semantics. OWL generally applies an if-and-only-if semantics in order to enable richer entailments&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>].</p>
		</section>

		<section id="ssec-reasoning" class="section">
		<h3>Reasoning</h3>
		<p>Unfortunately, given two graphs, deciding if the first entails the second – per the notion of entailment we have defined and for all of the ontological features listed in Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a> – is <em>undecidable</em>: no (finite) algorithm for such entailment can exist that halts on all inputs with the correct <code>true</code>/<code>false</code> answer&nbsp;[<a href="#ref-Hitzler2010">Hitzler et al., 2010</a>]. However, we can provide practical reasoning algorithms for ontologies that (1) halt on any pair of input ontologies but may miss entailments, returning <code>false</code> instead of <code>true</code> in some cases, (2) always halt with the correct answer but only accept input ontologies with restricted features, or (3) only return correct answers for any pair of input ontologies but may never halt on certain inputs. Though option (3) has been explored using, e.g., theorem provers for First Order Logic (FOL)&nbsp;[<a href="#ref-SchneiderS11">Schneider and Sutcliffe, 2011</a>], options (1) and (2) are more commonly pursued using rules and/or Description Logics. Option (1) generally allows for more efficient and scalable reasoning algorithms and is useful where data are incomplete and having some entailments is valuable. Option (2) may be a better choice in domains – such as medical ontologies – where missing entailments may have undesirable outcomes.</p>

		<h4 id="sec-rules" class="subsection">Rules</h4>
		<p>A straightforward way to provide automated access to the knowledge that can be deduced through (ontological or other forms of) entailments is through <em>inference rules</em> (or simply <em>rules</em>) encoding <span class="sc">if</span>–<span class="sc">then</span>-style consequences. A rule is composed of a <em>body</em> (<span class="sc">if</span>) and a <em>head</em> (<span class="sc">then</span>). Both the body and head are given as graph patterns. A rule indicates that if we can replace the variables of the body with terms from the data graph and form a sub-graph of a given data graph, then using the same replacement of variables in the head will yield a valid entailment. The head must typically use a subset of the variables appearing in the body to ensure that the conclusion leaves no variables unreplaced. Rules of this form correspond to (positive) Datalog&nbsp;[<a href="#ref-CeriGT89">Ceri et al., 1989</a>] in Databases, Horn clauses&nbsp;[<a href="#ref-lloyd2012foundations">Lloyd, 1984</a>] in Logic Programming, etc.</p>
		<p>Rules can capture entailments under ontological conditions. In Table&nbsp;<a href="#tab-rulesRdfs">4.4</a>, we list some example rules for sub-class, sub-property, domain and range features&nbsp;[<a href="#ref-MunozPG09">Muñoz et al., 2009</a>]; these rules may be considered incomplete, not capturing, for example, that every class is a sub-class of itself, that every property is a sub-property of itself, etc. A more comprehensive set of rules for the OWL features of Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a> have been defined as OWL 2 RL/RDF&nbsp;[<a href="#ref-key:owl2profiles">Motik et al., 2012</a>]; these rules are likewise incomplete as such rules cannot fully capture negation (e.g., <span class="sc">Complement</span>), existentials (e.g., <span class="sc">Some Values</span>), universals (e.g., <span class="sc">All Values</span>), or counting (e.g., <span class="sc">Cardinality</span> and <span class="sc">Qualified Cardinality</span>). Other rule languages have, however, been proposed to support additional such features, including existentials (see, e.g., Datalog\(^\pm\)&nbsp;[<a href="#ref-BellomariniSG18">Bellomarini et al., 2018</a>]), disjunction (see, e.g., Disjunctive Datalog&nbsp;[<a href="#ref-RudolphKH08">Rudolph et al., 2008</a>]), etc.</p>

		<table class="normalTable" id="tab-rulesRdfs">
			<caption>Example rules for sub-class, sub-property, domain, and range features</caption>
			<thead>
				<tr>
					<th>Feature</th>
					<th>Body</th>
					<th>\(\Rightarrow\)</th>
					<th>Head</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>Sub-class (I)</td>
					<td><span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?c</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?d</span></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?d</span></td>
				</tr>
				<tr>
					<td>Sub-class (II)</td>
					<td><span class="gvar">?d</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?d</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?e</span></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?d</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?e</span></td>
				</tr>
				<tr>
					<td>Sub-property (I)</td>
					<td><img class="inside" src="images/tab-rulesRdfs-subprop.svg" alt="sub-proprety (I) body" /></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">?q</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span></td>
				</tr>
				<tr>
					<td>Sub-property (II)</td>
					<td><span class="gvar">?p</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?q</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?r</span></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?p</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?r</span></td>
				</tr>
				<tr>
					<td>Domain</td>
					<td><img class="inside" src="images/tab-rulesRdfs-domain.svg" alt="domain body" /></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?c</span></td>
				</tr>
				<tr>
					<td>Range</td>
					<td><img class="inside" src="images/tab-rulesRdfs-range.svg" alt="range body" /></td>
					<td>\(\Rightarrow\)</td>
					<td><span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?c</span></td>
				</tr>
			</tbody>
		</table>

		<p>Rules can be leveraged for reasoning in a number of ways. <em>Materialisation</em> refers to the idea of applying rules recursively to a graph, adding the conclusions generated back to the graph until a fixpoint is reached and nothing more can be added. The materialised graph can then be treated as any other graph. Although the efficiency and scalability of materialisation can be enhanced through optimisations like Rete networks&nbsp;[<a href="#ref-Forgy82">Forgy, 1982</a>], or using distributed frameworks like MapReduce&nbsp;[<a href="#ref-UrbaniKMHB12">Urbani et al., 2012</a>], depending on the rules and the data, the materialised graph may become unfeasibly large to manage. Another strategy is to use rules for <em>query rewriting</em>, which given a query, will automatically extend the query in order to find solutions entailed by a set of rules; for example, taking the schema graph in Figure&nbsp;<a href="#fig-sg">3.2</a> and the rules in Table&nbsp;<a href="#tab-rulesRdfs">4.4</a>, the (sub-)pattern <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span> in a given input query would be rewritten to the following disjunctive pattern evaluated on the original graph:</p>

		<p class="mathblock"><span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span> \(\cup\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Type</span> \(\cup\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Periodic&nbsp;Market</span> \(\cup\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">venue</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span></p>

		<p>Figure&nbsp;<a href="#fig-qrew">4.2</a> provides a more complete example of an ontology that is used to rewrite the query of Figure&nbsp;<a href="#fig-bgpFS">4.1</a>; if evaluated over the graph of Figure&nbsp;<a href="#fig-delg">2.1</a>, <span class="gnode">Ñam</span> will be returned as a solution. However, not all of the aforementioned features of OWL can be supported in this manner. The OWL 2 QL profile&nbsp;[<a href="#ref-key:owl2profiles">Motik et al., 2012</a>] is a subset of OWL designed specifically for query rewriting of this form&nbsp;[<a href="#ref-ArtaleCKZ09">Artale et al., 2009</a>].</p>

		<figure id="fig-qrew">
			<dl>
				<dt>\(O:\)</dt>
				<dd>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img class="inlined" src="images/fig-qrew1.svg" alt="ontology"/></dd>
				<dt>\(Q(O):\)</dt>
				<dd>&nbsp;&nbsp;&nbsp;&nbsp;\((\)<img class="inlined" src="images/fig-qrew2.svg" alt="type Festival"/> \(\cup\) <img class="inlined" src="images/fig-qrew3.svg" alt="type Food Festival"/> \(\cup\) <img class="inlined" src="images/fig-qrew4.svg" alt="type Drinks Festival"/>\()\)</dd>
				<dd>\(\Join (\)<img class="inlined" src="images/fig-qrew5.svg" alt="location Santiago"/> \(\cup\) <img class="inlined" src="images/fig-qrew6.svg" alt="venue city Santiago"/>\()\)</dd>
				<dd>\(\Join \) &nbsp;<img class="inlined" src="images/fig-qrew7.svg" alt="name"/></dd>
			</dl>
			<figcaption>Query rewriting example for the query \(Q\) of Figure&nbsp;<a href="#fig-bgpFS">4.1</a></figcaption>
		</figure>

		<p>While rules can be used to (partially) capture ontological entailments, they can also be defined independently of an ontology language, capturing entailments for a given domain. In fact, some rules – such as the following – cannot be captured by the ontology features previously seen, as they do not support ways to infer relations from cyclical graph patterns (for computability reasons):</p>
		
		<p class="mathblock"><img class="inside" src="images/fig-inline-rule.svg" alt="dom flight rule premise" style="margin-right:2em;vertical-align:middle;position:relative;"/> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span></p>

		<p>Various languages allow for expressing rules over graphs – independently or alongside of an ontology language – including: Notation3 (N3)&nbsp;[<a href="#ref-n3">Berners-Lee and Connolly, 2011</a>], Rule Interchange Format (RIF)&nbsp;[<a href="#ref-rif">Kifer and Boley, 2013</a>], Semantic Web Rule Language (SWRL)&nbsp;[<a href="#ref-swrl">Horrocks et al., 2004</a>], and SPARQL Inferencing Notation (SPIN)&nbsp;[<a href="#ref-spin">Knublauch et al., 2011</a>], amongst others.</p>

		<div class="formal">
			<p>Given a graph pattern \(Q\) – be it a directed edge-labelled graph pattern per Definition&nbsp;<a href="#def-delgp">2.5</a> or a property graph pattern per Definition&nbsp;<a href="#def-pgp">2.6</a> – recall that \(\var(Q)\) denotes the variables appearing in \(Q\). We now define rules for graphs.</p>

			<dl class="definition" id="def-rule">
				<dt>Rule</dt>
				<dd>A <em>rule</em> is a pair \(R \coloneqq (B,H)\) such that \(B\) and \(H\) are graph patterns and \(\var(H) \subseteq B\). The graph pattern \(B\) is called the <em>body</em> of the rule while \(H\) is called the <em>head</em> of the rule. </dd>
			</dl>

			<p>This definition of a rule applies for directed edge-labelled graphs and property graphs by considering the corresponding type of graph pattern. The head is considered to be a conjunction of edges. Given a graph \(G\), a rule is <em>applied</em> by computing the mappings from the body to the graph and then using those mappings to substitute the variables in \(H\). The restriction \(\var(H) \subseteq B\) ensures that the results of this substitution is a graph, with no variables in \(H\) left unsubstituted.</p>

			<dl class="definition" id="def-rule-application">
				<dt>Rule application</dt>
				<dd>Given a rule \(R = (B,H)\) and a graph \(G\), we define the <em>application of \(R\) over \(G\)</em> as the graph \(R(G) \coloneqq \bigcup_{\mu \in B(G)} \mu(H)\).</dd>
			</dl>

			<p>Given a set of rules \(\mathcal{R} \coloneqq \{ R_1, \ldots, R_n \}\) and a knowledge graph \(G\), towards defining the set of inferences given by the rules over the graph, we denote by \(\mathcal{R}(G) \coloneqq \bigcup_{R \in \mathcal{R}} R(G)\) the union of the application of all rules of \(\mathcal{R}\) over \(G\), and we denote by \(\mathcal{R}^+(G) \coloneqq \mathcal{R}(G) \cup G\) the extension of \(G\) with respect to the application of \(\mathcal{R}\). Finally, we denote by \(\mathcal{R}^k(G)\) (for \(k \in \mathbb{N^+}\)) the recursive application of \(\mathcal{R}^+(G)\), where \(\mathcal{R}^1(G) \coloneqq \mathcal{R}^+(G)\), and \(\mathcal{R}^{i+1}(G) \coloneqq \mathcal{R}^+(\mathcal{R}^{i}(G))\). We are now ready to define the <em>least model</em>, which captures the inferences possible for \(\mathcal{R}\) over \(G\).</p>

			<dl class="definition" id="def-least-model">
				<dt>Least model</dt>
				<dd>The <em>least model of \(\mathcal{R}\)</em> over \(G\)} is defined as \(\mathcal{R}^*(G) \coloneqq \bigcup_{k\in \mathbb{N}}(R^k(G))\).</dd>
			</dl>

			<p>At some point \(R^{k'}(G) = R^{k'+1}(G)\): the rule applications reach a fixpoint and we have the least model. Once the least model \(\mathcal{R}^*(G)\) is computed, the entailed data can be treated as any other data.</p>
			<p>Rules can support graph entailments of the form \(G_1 \models_\Phi G_2\). We say that a set of rules \(\mathcal{R}\) is <em>correct</em> for \(\Phi\) if, for any graph \(G\), \(G \models_\Phi \mathcal{R}^*(G)\). We say that \(\mathcal{R}\) is <em>complete</em> for \(\Phi\) if, for any graph \(G\), there does not exist a graph \(G' \not\subseteq \mathcal{R}^*(G)\) such that \(G \models_\Phi G'\). Table&nbsp;<a href="#tab-rulesRdfs">4.4</a> exemplifies a correct but incomplete set of rules for the semantic conditions of the RDFS standard&nbsp;[<a href="#ref-RDFS">Brickley and Guha, 2014</a>].</p>
			<p>Alternatively, rather than supporting ontology-based graph entailments, rules can be directly specified in a rule language such as Notation3 (N3)&nbsp;[<a href="#ref-n3">Berners-Lee and Connolly, 2011</a>], Rule Interchange Format (RIF)&nbsp;[<a href="#ref-rif">Kifer and Boley, 2013</a>], Semantic Web Rule Language (SWRL)&nbsp;[<a href="#ref-swrl">Horrocks et al., 2004</a>], or SPARQL Inferencing Notation (SPIN)&nbsp;[<a href="#ref-spin">Knublauch et al., 2011</a>]. Languages such as SPIN represent rules as graphs, allowing the rules of a knowledge graph to be embedded in the data graph. Taking advantage of this fact, we can then consider a form of graph entailment \(G_1 \cup \gamma(\mathcal{R}) \models_\Phi G_2\), where by \(\gamma(\mathcal{R})\) we denote the graph representation of rules \(\mathcal{R}\). If the set of rules \(\mathcal{R}\) is correct and complete for \(\Phi\), we may simply write \(G_1 \cup \gamma(\mathcal{R}) \models G_2\), indicating that \(\Phi\) captures the same semantics for \(\gamma(\mathcal{R})\) as applying the rules in \(\mathcal{R}\). Rules thus offer another form of graph entailment.</p>
		</div>

		<h4 id="sssec-dls" class="subsection">Description Logics</h4>
		<p>Description Logics (DLs) were initially introduced as a way to formalise the meaning of <em>frames</em>&nbsp; and <em>semantic networks</em>&nbsp;. Since semantic networks are an early version of knowledge graphs, and DLs have heavily influenced the Web Ontology Language, DLs thus hold an important place in the logical formalisation of knowledge graphs. DLs form a family of logics rather than a particular logic. Initially, DLs were restricted fragments of FOL that permit decidable reasoning tasks, such as entailment checking&nbsp;[<a href="#ref-BaaderHLS17">Baader et al., 2017</a>]. Different DLs strike different balances between expressive power and computational complexity of reasoning. DLs were later extended with features beyond FOL that are useful in the context of modelling graph data, such as transitive closure, datatypes, etc.</p>
		<p>DLs are based on three types of elements: <em>individuals</em>, such as <code>Santiago</code>; <em>classes</em> (aka <em>concepts</em>) such as <code>City</code>; and <em>properties</em> (aka <em>roles</em>) such as <code>flight</code>. DLs then allow for making claims, known as <em>axioms</em>, about these elements. <em>Assertional axioms</em> can be either unary class relations on individuals, such as <code>City(Santiago)</code>, or binary property relations on individuals, such as <code>flight(Santiago,Arica)</code>. Such axioms form the <em>Assertional Box</em> (<em>A-Box</em>). DLs further introduce logical symbols to allow for defining <em>class axioms</em> (forming the <em>Terminology Box</em>, or <em>T-Box</em> for short), and <em>property axioms</em> (forming the <em>Role Box</em>, <em>R-Box</em>); for example, the class axiom <span class="nobreak"><code>City</code>&nbsp;\(\sqsubseteq\)&nbsp;<code>Place</code></span> states that the former class is a sub-class of the latter one, while the property axiom <span class="nobreak"><code>flight</code>&nbsp;\(\sqsubseteq\)&nbsp;<code>connectsTo</code></span> states that the former property is a sub-property of the latter one. DLs may then introduce a rich set of logical symbols, not only for defining class and property axioms, but also defining new classes based on existing terms; as an example of the latter, we can define a class <span class="nobreak">\(\exists\)<code>nearby</code>.<code>Airport</code></span> as the class of individuals that have some airport nearby. Noting that the symbol \(\top\) is used in DLs to denote the class of all individuals, we can then add a class axiom <span class="nobreak">\(\exists\)<code>flight</code>.\(\top \sqsubseteq \exists\)<code>nearby</code>.<code>Airport</code></span> to state that individuals with an outgoing flight must have some airport nearby. Noting that the symbol \(\sqcup\) can be used in DL to define that a class is the union of other classes, we can further define, for example, that <code>Airport</code>&nbsp;\(\sqsubseteq\)&nbsp;<code>DomesticAirport</code> \(\sqcup\) <code>InternationalAirport</code>, i.e., that an airport is either a domestic airport or an international airport (or both).</p>
		<p>The similarities between DL features and the OWL features seen previously are not coincidental: the OWL standard was heavily influenced by DLs, where, for example, the OWL 2 DL language is a fragment of OWL restricted so that entailment becomes decidable, where the restrictions are inspired by those defined for DLs. To exemplify a restriction, <span class="nobreak"><code>DomesticAirport</code>&nbsp;\(\sqsubseteq ~=1\)&nbsp;<code>destination</code> \(\circ\) <code>country</code>.\(\top\)</span> defines in DL syntax that domestic airports have flights destined to precisely one country (where <span class="nobreak"><code>p</code>&nbsp;\(\circ\)&nbsp;<code>q</code></span> denotes a chain of properties). However, counting chains (in this case with \(=1~\texttt{destination} \circ \texttt{country}\)) is often disallowed in DLs to ensure decidability.</p>
		<p>Expressive DLs support complex entailments involving existentials, universals, counting, etc. A common strategy for deciding such entailments is to reduce entailment to <em>satisfiability</em>, which decides if an ontology is consistent or not&nbsp;[<a href="#ref-HorrocksP04">Horrocks and Patel-Schneider, 2004</a>].<sup class="fnmark" id="fnm15"><a href="#fn15">15</a></sup><span class="footnote" id="fn15"><sup><a href="#fnm15">note 15</a></sup> \(G\) entails \(G'\) if and only if \(G \cup \text{not}(G')\) is not satisfiable, i.e., it has no model.</span> Thereafter methods such as <em>tableau</em> can be used to check satisfiability, cautiously constructing models by completing them along similar lines to the materialisation strategy previously described, but additionally branching models in the case of disjunction, introducing new elements to represent existentials, etc. If any model is successfully “completed”, the process concludes that the original definitions are satisfiable (see, e.g.,&nbsp;[<a href="#ref-MotikSH09">Motik et al., 2009</a>]). Due to their prohibitive computational complexity&nbsp;[<a href="#ref-key:owl2profiles">Motik et al., 2012</a>] – where for example, disjunction may lead to an exponential number of branching possibilities – such reasoning strategies are not typically applied in the case of large-scale data, though they may be useful when modelling complex domains for knowledge graphs.</p>

		<div class="formal">
			<p>A DL knowledge base consists of an A-Box, a T-Box, and an R-Box.</p>

			<dl class="definition" id="def-dl-knowledg-base">
				<dt>DL knowledge base</dt>
				<dd><em>DL knowledge base</em> \(\mathsf{K}\) is defined as a tuple \((\mathsf{A},\mathsf{T},\mathsf{R})\), where \(\mathsf{A}\) is the <em>A-Box</em>: a set of assertional axioms; \(\mathsf{T}\) is the <em>T-Box</em>: a set of class (aka concept/terminological) axioms; and \(\mathsf{R}\) is the <em>R-Box</em>: a set of relation (aka property/role) axioms.</dd>
			</dl>

			<p>Table&nbsp;<a href="#tab-dlsem">4.5</a> provides definitions for all of the constructs typically found in Description Logics. The syntax column denotes how the construct is expressed in DL. The semantics column defines the meaning of axioms using <em>interpretations</em>, which are defined in a slightly different way to those seen previously for graphs.</p>

			<dl class="definition" id="def-dl-interpretation">
				<dt>DL interpretation</dt>
				<dd>A <em>DL interpretation</em> \(I\) is defined as a pair \((\inpdom,\inp{\cdot})\), where \(\inpdom\) is the <em>interpretation domain</em>, and \(\inp{\cdot}\) is the <em>interpretation function</em>. The interpretation domain is a set of individuals. The interpretation function accepts a definition of either an individual \(a\), a class \(C\), or a relation \(R\), mapping them, respectively, to an element of the domain (\(\inp{a} \in \inpdom\)), a subset of the domain (\(\inp{C} \subseteq \inpdom\)), or a set of pairs from the domain (\(\inp{R} \subseteq \inpdom \times \inpdom\)).</dd>
			</dl>

			<p>An interpretation \(I\) <em>satisfies</em> a knowledge-base \(\mathsf{K}\) if and only if, for all of the syntactic axioms in \(\mathsf{K}\), the corresponding semantic conditions in Table&nbsp;<a href="#tab-dlsem">4.5</a> hold for \(I\). In this case, we call \(I\) a <em>model</em> of \(\mathsf{K}\).</p>
			
			<div class="example" id="ex-entail">
				<p>For \(\mathsf{K} \coloneqq (\mathsf{A},\mathsf{T},\mathsf{R})\), let:</p>
				<ul>
					<li>\(\mathsf{A} \coloneqq \{ \)<code>City(Arica)</code>, <code>City(Santiago)</code>, <code>flight(Arica,Santiago)</code>\(\}\);</li>
					<li>\(\mathsf{T} \coloneqq \{\)<code>City</code> \(\sqsubseteq\) <code>Place</code>, \(\exists\)<code>flight</code>\(.\top \sqsubseteq \exists\)<code>nearby</code>.<code>Airport</code>\(\} \);</li>
					<li>\(\mathsf{R} \coloneqq \{\)<code>flight</code> \(\sqsubseteq\) <code>connectsTo</code>\(\} \).</li>
				</ul>
				<p>For \(I = (\inpdom,\inp{\cdot})\), let:</p>
				<ul>
					<li>\(\inpdom \coloneqq \{ ⚓,\,🏔,\,🛪 \}\);</li>
					<li><code>Arica</code><sup>\(I\)</sup> \(\coloneqq\,⚓\), <code>Santiago</code><sup>\(I\)</sup> \(\coloneqq\,🏔\), <code>AricaAirport</code><sup>\(I\)</sup> \(\coloneqq\,🛪\);</li>
					<li><code>City</code><sup>\(I\)</sup> \(\coloneqq \{ ⚓,\,🏔 \}\), <code>Airport</code><sup>\(I\)</sup> \(\coloneqq \{ 🛪 \}\);</li>
					<li><code>flight</code><sup>\(I\)</sup> \(\coloneqq \{ (⚓,\,🏔) \}\), <code>connectsTo</code><sup>\(I\)</sup> \(\coloneqq \{ (⚓,\,🏔) \}\), <code>sells</code><sup>\(I\)</sup> \(\coloneqq \{ (🛪,\,☕) \}\).</li>
				</ul>
				<p>The interpretation \(I\) is not a model of \(\mathsf{K}\) since it does not have that \(⚓\) is <code>nearby</code> some <code>Airport</code>, nor that \(⚓\) and \(🏔\) are in the class <code>Place</code>. However, if we <em>extend</em> the interpretation \(I\) with the following:</p>
				<ul>
					<li><code>Place</code><sup>\(I\)</sup> \(\coloneqq \{ ⚓,\,🏔 \}\);</li>
					<li><code>nearby</code> \(\coloneqq \{ (⚓,\,🛪) \}\).</li>
				</ul>
				<p>Now \(I\) is a model of \(\mathsf{K}\). Note that although \(\mathsf{K}\) does not imply that <code>sells(Arica,coffee)</code> while \(I\) indicates that \(🛪\) does indeed sell \(☕\), \(I\) is still a model of \(\mathsf{K}\) since \(\mathsf{K}\) is not assumed to be a complete description, per the OWA.</p>
			</div>
			
			<p>Finally, the notion of a model gives rise to the notion of entailment, which tells us which knowledge bases hold as a logical consequence of which others.</p>

			<dl class="definition" id="def-entailment">
				<dt>Entailment</dt>
				<dd>Given two DL knowledge bases \(\mathsf{K}_1\) and \(\mathsf{K}_2\), we define that \(\mathsf{K}_1\) entails \(\mathsf{K}_2\), denoted \(\mathsf{K}_1 \models \mathsf{K}_2\), if and only if any model of \(\mathsf{K}_1\) is a model of \(\mathsf{K}_2\).</dd>
			</dl>

			<div class="example">
				<p>Let \(\mathsf{K}_1\) denote the knowledge base \(\mathsf{K}\) from the Example&nbsp;<a href="#ex-entail">4.1</a>, and define a second knowledge base with one assertion: \(\mathsf{K}_2 \coloneqq ( \{ \)<code>connectsTo</code>\((\)<code>Arica</code>, <code>Santiago</code>\() \}, \{\}, \{\} )\) with one assertion. Though \(\mathsf{K}_1\) does not assert this axiom, it does entail \(\mathsf{K}_2\): to be a model of \(\mathsf{K}_2\), an interpretation must have that \((\)<code>Arica</code><sup>\(I\)</sup>, <code>Santiago</code>\() \in\) <code>connectsTo</code><sup>\(I\)</sup>, but this must also be the case for any interpretation that satisfies \(\mathsf{K}_1\) since it must have that \((\)<code>Arica</code><sup>\(I\)</sup>, <code>Santiago</code><sup>\(I\)</sup>\() \in \)<code>flight</code> and <code>flight</code> \(\subseteq\) <code>connectsTo</code><sup>\(I\)</sup>. Hence any model of \(\mathsf{K}_1\) must also be a model of \(\mathsf{K}_2\), and \(\mathsf{K}_1 \models \mathsf{K}_2\) holds.</p>
			</div>

			<p>Unfortunately, the problem of deciding entailment for knowledge bases expressed in the DL composed of the unrestricted use of all of the axioms of Table&nbsp;<a href="#tab-dlsem">4.5</a> is undecidable since we could reduce instances of the Halting Problem to such entailment. Hence DLs in practice restrict use of the features listed in Table&nbsp;<a href="#tab-dlsem">4.5</a>. Different DLs apply different restrictions, implying different trade-offs for expressivity and the complexity of entailment. Most DLs are founded on one of the following base DLs (we use indentation to denote derivation):</p>
			<ul>
				<li>[\(\mathcal{ALC}\)] (\(\mathcal{A}\)ttributive \(\mathcal{L}\)anguage with \(\mathcal{C}\)omplement}&nbsp;[<a href="#ref-Schmidt-SchaussS91">Schmidt-Schauß and Smolka, 1991</a>]), supports atomic classes, the top and bottom classes, class intersection, class union, class negation, universal restrictions and existential restrictions. Relation and class assertions are also supported.<ul>
					<li>[\(\mathcal{S}\)] extends \(\mathcal{ALC}\) with transitive closure.</li>
				</ul></li>
			</ul>
			<p>These base languages can be extended as follows:</p>
			<ul>
				<li>[\(\mathcal{H}\)] adds relation inclusion.<ul>
					<li>[\(\mathcal{R}\)] adds (limited) complex relation inclusion, relation reflexivity, relation irreflexivity, relation disjointness and the universal relation.</li>
				</ul></li>
				<li>[\(\mathcal{O}\)] adds (limited) nomimals.</li>
				<li>[\(\mathcal{I}\)] adds inverse relations.</li>
				<li>[\(\mathcal{F}\)] adds (limited) functional properties.<ul>
					<li>[\(\mathcal{N}\)] adds (limited) number restrictions (covering \(\mathcal{F}\) with \(\top\)).<ul>
						<li>[\(\mathcal{Q}\)] adds (limited) qualified number restrictions (covering \(\mathcal{N}\) with \(\top\)).</li>
					</ul></li>
				</ul></li>
			</ul>
			<p>We use “(limited)” to indicate that such features are often only allowed under certain restrictions to ensure decidability; for example, complex relations (chains) typically cannot be combined with cardinality restrictions. DLs are then typically named per the following scheme, where \([a|b]\) denotes an alternative between \(a\) and \(b\) and \([c][d]\) denotes a concatenation \(cd\):</p>
			<p>\[ [\mathcal{ALC}|\mathcal{S}][\mathcal{H}|\mathcal{R}][\mathcal{O}][\mathcal{I}][\mathcal{F}|\mathcal{N}|\mathcal{Q}] \]</p>
			<p>Examples include \(\mathcal{ALCO}\), \(\mathcal{ALCHI}\), \(\mathcal{SHIF}\), \(\mathcal{SROIQ}\), etc. These languages often apply additional restrictions on class and property axioms to ensure decidability, which we do not discuss here. For further details on DLs, we refer to the recent book by <a href="#ref-BaaderHLS17">Baader et al. [2017]</a>.</p>
			<p>As mentioned in the body of the survey, DLs have been very influential in the definition of OWL, where the OWL 2 DL fragment (roughly) corresponds to the DL \(\mathcal{SROIQ}\). For example, the axiom <span class="gnode">venue</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domain</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Event</span> in OWL can be translated to \(\exists\)<code>venue</code>\(.\top \sqsubseteq\) <code>Event</code>, meaning that the class of individuals with some value for <code>venue</code> (in any class) is a sub-class of the class <code>Event</code>. We leave other translations from the OWL axioms of Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a> to DL as an exercise.<sup class="fnmark" id="fnm16"><a href="#fn16">16</a></sup><span class="footnote" id="fn16"><sup><a href="#fnm16">note 16</a></sup> Though not previously mentioned, OWL additionally defines the classes <code>Thing</code> and <code>Nothing</code> that correspond to \(\top\) and \(\bot\), respectively.</span> Note, however, that axioms like <span class="gnode">sub-taxon of</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subp. of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">subc. of</span> – which given a graph such as <span class="gnode">Fred</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Homo sapiens</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">sub-taxon of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Hominini</span> entails the edge <span class="gnode">Fred</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Hominini</span> – cannot be expressed in DL: “<code>subTaxonOf</code> \(\sqsubseteq\ \sqsubseteq\)” is not syntactically valid. Hence only a subset of graphs can be translated into well-formed DL ontologies; we refer to the OWL standard for details&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>].</p>
		</div>

		<div class="formal">
			<table id="tab-dlsem">
				<caption>Description Logic semantics (such that \(x, y, z, \inp{a}, \inp{a_1}, \ldots \inp{a_n}, \inp{b}\) are in \(\inpdom\))</caption>
				<thead>
					<tr>
						<th>Name</th>
						<th>Syntax</th>
						<th>Semantics (\(\inp{\cdot}\))</th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Class Definitions</span></td>
					</tr>
					<tr>
						<td>Atomic Class</td>
						<td>\(A\)</td>
						<td>\(\inp{A}\) (a subset of \(\inpdom)\)</td>
					</tr>
					<tr>
						<td>Top Class</td>
						<td>\(\top\)</td>
						<td>\(\inpdom\)</td>
					</tr>
					<tr>
						<td>Bottom Class</td>
						<td>\(\bot\)</td>
						<td>\(\emptyset\)</td>
					</tr>
					<tr>
						<td>Class Negation</td>
						<td>\(\neg C\)</td>
						<td>\(\inpdom \setminus \inp{C}\)</td>
					</tr>
					<tr>
						<td>Class Intersection</td>
						<td>\(C \sqcap D\)</td>
						<td>\(\inp{C} \cap \inp{D}\)</td>
					</tr>
					<tr>
						<td>Class Union</td>
						<td>\(C \sqcup D\)</td>
						<td>\(\inp{C} \cup \inp{D}\)</td>
					</tr>
					<tr>
						<td>Nominal</td>
						<td>\(\{ a_1, ..., a_n \}\)</td>
						<td>\(\{ \inp{a_1}, ..., \inp{a_n} \}\)</td>
					</tr>
					<tr>
						<td>Existential Restriction</td>
						<td>\(\exists R.C\)</td>
						<td>\(\{ x \mid \exists y : (x,y) \in \inp{R}\text{ and }y \in \inp{C} \}\)</td>
					</tr>
					<tr>
						<td>Universal Restriction</td>
						<td>\(\forall R.C\)</td>
						<td>\(\{ x \mid \forall y : (x,y) \in \inp{R}\text{ implies }y \in \inp{C} \}\)</td>
					</tr>
					<tr>
						<td>Self Restriction</td>
						<td>\(\exists R.\textsf{Self}\)</td>
						<td>\(\{ x \mid (x,x) \in \inp{R} \}\)</td>
					</tr>
					<tr>
						<td>Number Restriction</td>
						<td>\(\star\,n\,R\) (where \(\star \in \{\geq, \leq, = \}\))</td>
						<td>\(\{ x \mid \#\{ y : (x,y) \in \inp{R} \} \star n \}\)</td>
					</tr>
					<tr>
						<td>Qualified&#x202F;Number&#x202F;Restriction</td>
						<td>\(\star\,n\,R.C\)&#x202F;(where&#x202F;\(\star \in \{\geq, \leq, = \}\))</td>
						<td>\(\{ x \mid \#\{ y : (x,y) \in \inp{R}\text{ and }y \in \inp{C} \} \star n \}\)</td>
					</tr>
				</tbody>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Class Axioms</span> (T-Box)</td>
					</tr>
					<tr>
						<td>Class Inclusion</td>
						<td>\(C \sqsubseteq D\)</td>
						<td>\(\inp{C} \subseteq \inp{D}\)</td>
					</tr>
				</tbody>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Relation Definitions</span></td>
					</tr>
					<tr>
						<td>Relation</td>
						<td>\(R\)</td>
						<td>\(\inp{R}\) (a subset of \(\inpdom \times \inpdom\))</td>
					</tr>
					<tr>
						<td>Inverse Relation</td>
						<td>\(R^{-}\)</td>
						<td>\(\{ (y,x) \mid (x,y) \in \inp{R} \}\)</td>
					</tr>
					<tr>
						<td>Universal Relation</td>
						<td>\(\textsf{U}\)</td>
						<td>\(\inpdom \times \inpdom\)</td>
					</tr>
				</tbody>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Relation Axioms</span> (R-Box)</td>
					</tr>
					<tr>
						<td>Relation Inclusion</td>
						<td>\(R \sqsubseteq S\)</td>
						<td>\(\inp{R} \subseteq \inp{S}\)</td>
					</tr>
					<tr>
						<td>Complex Relation Inclusion</td>
						<td>\(R_1 \circ ... \circ R_n \sqsubseteq S\)</td>
						<td>\(\inp{R_1} \circ ... \circ \inp{R_n} \subseteq \inp{S}\)</td>
					</tr>
					<tr>
						<td>Transitive Relations</td>
						<td>\(\textsf{Trans}(R)\)</td>
						<td>\(\inp{R} \circ \inp{R} \subseteq \inp{R}\)</td>
					</tr>
					<tr>
						<td>Functional Relations</td>
						<td>\(\textsf{Func}(R)\)</td>
						<td>\(\{ (x,y), (x,z) \} \subseteq \inp{R} \)implies \(y = z\)</td>
					</tr>
					<tr>
						<td>Reflexive Relations</td>
						<td>\(\textsf{Ref}(R)\)</td>
						<td>for all \(x : (x,x) \in \inp{R}\)</td>
					</tr>
					<tr>
						<td>Irreflexive Relations</td>
						<td>\(\textsf{Irref}(R)\)</td>
						<td>for all \(x : (x,x) \not\in \inp{R}\)</td>
					</tr>
					<tr>
						<td>Symmetric Relations</td>
						<td>\(\textsf{Sym}(R)\)</td>
						<td>\(\inp{R} = \inp{(R^{-})}\)</td>
					</tr>
					<tr>
						<td>Asymmetric Relations</td>
						<td>\(\textsf{Asym}(R)\)</td>
						<td>\(\inp{R} \cap \inp{(R^{-})} = \emptyset\)</td>
					</tr>
					<tr>
						<td>Disjoint Relations</td>
						<td>\(\textsf{Disj}(R,S)\)</td>
						<td>\(\inp{R} \cap \inp{S} = \emptyset\)</td>
					</tr>
				</tbody>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Assertional Definitions</span></td>
					</tr>
					<tr>
						<td>Individual</td>
						<td>\(a\)</td>
						<td>\(\inp{a}\)</td>
					</tr>
				</tbody>
				<tbody>
					<tr>
						<td class="subtabhead" colspan="3"><span class="sc">Assertional Axioms</span> (A-Box)</td>
					</tr>
					<tr>
						<td>Relation Assertion</td>
						<td>\(R(a,b)\)</td>
						<td>\((\inp{a},\inp{b}) \in \inp{R}\)</td>
					</tr>
					<tr>
						<td>Negative Relation Assertion</td>
						<td>\(\neg R(a,b)\)</td>
						<td>\((\inp{a},\inp{b}) \not\in \inp{R}\)</td>
					</tr>
					<tr>
						<td>Class Assertion</td>
						<td>\(C(a)\)</td>
						<td>\(\inp{a} \in \inp{C}\)</td>
					</tr>
					<tr>
						<td>Equality</td>
						<td>\( a = b \)</td>
						<td>\(\inp{a} = \inp{b}\)</td>
					</tr>
					<tr>
						<td>Inequality</td>
						<td>\( a \neq b \)</td>
						<td>\(\inp{a} \neq \inp{b}\)</td>
					</tr>
				</tbody>
			</table>
		</div>

		</section>
	</section>
	<section id="chap-inductive" class="chapter">
		<h2>Inductive Knowledge</h2>
		<p>While deductive knowledge is characterised by precise logical consequences, inductively acquiring knowledge involves generalising patterns from a given set of input observations, which can then be used to generate novel but potentially imprecise predictions. For example, from a large data graph with geographical and flight information, we may observe the pattern that almost all capital cities of countries have international airports serving them, and hence predict that if Santiago is a capital city, it <em>likely</em> has an international airport serving it; however, the predictions drawn from this pattern do not hold for certain, where (e.g.) Vaduz, the capital city of Liechtenstein, has no (international) airport serving it. Hence predictions will often be associated with a level of confidence; for example, we may say that a capital has an international airport in \(\frac{187}{195}\) of cases, offering a confidence of \(0.959\) for predictions made with that pattern. We then refer to knowledge acquired inductively as <em>inductive knowledge</em>, which includes both the models used to encode patterns, as well as the predictions made by those models. Though fallible, inductive knowledge can be highly valuable.</p>

		<figure id="fig-ind">
			<img src="images/fig-ind.svg" alt="Conceptual overview of popular inductive techniques for knowledge graphs in terms of type of representation generated (Numeric/Symbolic) and type of paradigm used (Unsupervised/Self-supervised/Supervised)"/>
			<figcaption>Conceptual overview of popular inductive techniques for knowledge graphs in terms of type of representation generated (Numeric/Symbolic) and type of paradigm used (Unsupervised/Self-supervised/Supervised)</figcaption>
		</figure>

		<p>In Figure&nbsp;<a href="#fig-ind">5.1</a> we provide an overview of the inductive techniques typically applied to knowledge graphs. In the case of unsupervised methods, there is a rich body of work on <em>graph analytics</em>, which uses well-known functions/algorithms to detect communities or clusters, find central nodes and edges, etc., in a graph. Alternatively, <em>knowledge graph embeddings</em> can use self-supervision to learn a low-dimensional numeric model of a knowledge graph that (typically) maps input edges to an output <em>plausibility score</em> indicating the likelihood of the edge being true. The structure of graphs can also be directly leveraged for supervised learning, as explored in the context of <em>graph neural networks</em>. Finally, while the aforementioned techniques learn numerical models, <em>symbolic learning</em> can learn symbolic models – i.e., logical formulae in the form of rules or axioms – from a graph in a self-supervised manner. We now discuss each of the aforementioned techniques in turn.</p>

		<section id="sec-gAnalytics" class="section">
		<h3>Graph Analytics</h3>
		<p>Analytics is the process of discovering, interpreting, and communicating meaningful patterns inherent to (typically large) data collections. Graph analytics is then the application of analytical processes to (typically large) graph data. The nature of graphs naturally lends itself to certain types of analytics that derive conclusions about nodes and edges based on the <em>topology</em> of the graph, i.e., how the nodes of the graph are connected. Graph analytics draws upon techniques from related areas, such as graph theory and network analysis, which have been used to study graphs representing social networks, the Web, internet routing, transport networks, ecosystems, protein–protein interactions, linguistic cooccurrences, and more besides&nbsp;[<a href="#ref-Estrada2011">Estrada, 2011</a>].</p>
		<p>Returning to the domain of our running example, the tourism board could use graph analytics to extract knowledge about, for instance: key transport hubs that serve many tourist attractions (centrality); groupings of attractions visited by the same tourists (community detection); attractions that may become unreachable in the event of strikes or other route failures (connectivity), or pairs of attractions that are similar to each other (node similarity). Given that such analytics will require a complex, large-scale graph, for the purposes of illustration, in Figure&nbsp;<a href="#fig-chileTransport">5.2</a> we present a more concise example of some transportation connections in Chile directed towards popular tourist destinations. We first introduce a selection of key techniques that can be applied for graph analytics. We then discuss frameworks and languages that can be used to compute such analytics in practice. Given that many traditional graph algorithms are defined for unlabelled graphs, we then describe ways in which analytics can be applied over directed edge-labelled graphs. Finally we discuss the potential connections between graph analytics and querying and reasoning.</p>

		<figure id="fig-chileTransport">
			<img src="images/fig-chileTransport.svg" alt="Data graph representing transport routes in Chile"/>
			<figcaption>Data graph representing transport routes in Chile</figcaption>
		</figure>

		<h4 id="sssec-graph-analytics-tasks" class="subsection">Techniques</h4>
		<p>A wide variety of techniques can be applied for graph analytics. In the following we will enumerate some of the main techniques – as recognised, for example, by the survey of <a href="#ref-IosupHNHPMCCSAT16">Iosup et al. [2016]</a> – that can be invoked in this setting.</p>
		<ul>
			<li><em>Centrality:</em> aims to identify the most important (aka <em>central</em>) nodes or edges of a graph. Specific node centrality measures include <em>degree</em>, <em>betweenness</em>, <em>closeness</em>, <em>Eigenvector</em>, <em>PageRank</em>, <em>HITS</em>, <em>Katz</em>, amongst others. Betweenness centrality can also be applied to edges. For example, a node centrality measure might predict the transport hubs in Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, while edge centrality might predict traffic by finding connections on which many shortest routes depend.</li>
			<li><em>Community detection:</em> aims to identify <em>communities</em> in a graph, i.e., sub-graphs that are more densely connected internally than to the rest of the graph. Community detection algorithms, such as <em>minimum-cut algorithms</em>, <em>label propagation</em>, <em>Louvain modularity</em>, amongst others, can discover such communities. Community detection applied to Figure&nbsp;<a href="#fig-chileTransport">5.2</a> may, for example, detect a community to the left (the north of Chile), to the right (the south of Chile), and perhaps also the centre (Chilean cities with airports).</li>
			<li><em>Connectivity:</em> aims to estimate how well-connected the graph is, revealing, for instance, the resilience and (un)reachability of elements of the graph. Specific techniques include measuring <em>graph density</em> or <em>\(k\)-connectivity</em>, detecting <em>strongly connected components</em> and <em>weakly connected components</em>, computing <em>spanning trees</em> or <em>minimum cuts</em>, etc. In the context of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, such analysis may tell us that routes to <span class="gnode">Grey Glacier</span>, <span class="gnode">Osorno Volcano</span> and <span class="gnode">Piedras Rojas</span> are the most “brittle”, becoming disconnected if one of two <span class="gelab">bus</span> routes fails.</li>
			<li><em>Node (or vertex) similarity:</em> aims to find nodes that are similar to other nodes by virtue of how they are connected within their neighbourhood. Node similarity metrics may be computed using <em>structural equivalence</em>, <em>random walks</em>, <em>diffusion kernels</em>, etc. These methods provide an understanding of what connects nodes, and, thereafter, in what ways they are similar. In the context of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, such analysis may tell us that <span class="gnode">Calama</span> and <span class="gnode">Arica</span> are similar nodes based on both having return flights to <span class="gnode">Santiago</span> and return buses to <span class="gnode">San Pedro</span>.</li>
		</ul>
		<p>While the previous techniques accept a graph alone as input,<sup class="fnmark" id="fnm17"><a href="#fn17">17</a></sup><span class="footnote" id="fn17"><sup><a href="#fnm17">note 17</a></sup> Node similarity can be run over an entire graph to find the \(k\) most similar nodes for each node, or can also be run for a specific node to find its most similar nodes. There are also measures for graph similarity (based on, e.g., frequent itemsets&nbsp;[<a href="#ref-MaillotB18">Maillot and Bobed, 2018</a>]) that accept multiple graphs as input.</span> other forms of graph analytics may further accept a node, a pair of nodes, etc., along with the graph.</p>
		<ul>
			<li><em>Path finding:</em> aims to find paths in a graph, typically between pairs of nodes given as input. Various technical definitions exist that restrict the set of valid paths between such nodes, including simple paths that do not visit the same node twice, shortest paths that visit the fewest number of edges, or – as previously discussed in Section&nbsp;<a href="#sssec-dls">4.3.2</a> – regular path queries that restrict the labels of edges that can be traversed by the path according to a regular expression&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. We could use such algorithms to find, for example, the shortest path(s) in Figure&nbsp;<a href="#fig-chileTransport">5.2</a> from <span class="gnode">Torres del Paine</span> to <span class="gnode">Moon Valley</span>.</li>
		</ul>
		<p>Most of the aforementioned techniques for graph analytics were originally proposed and studied for simple graphs or directed graphs without edge labels. We will discuss their application to more complex graph models – and how they can be combined with other techniques such as reasoning and querying – later in Section&nbsp;<a href="#sssec-query-languages">5.1.3</a>.</p>

		<h4 id="sssec-technologies-graph-analytics" class="subsection">Frameworks</h4>
		<p>Various frameworks have been proposed for large-scale graph analytics, often in a distributed (cluster) setting. Amongst these we can mention Apache Spark (GraphX)&nbsp;[<a href="#ref-XinGFS13">Xin et al., 2013a</a>, <a href="#ref-DaveJLXGZ16">Dave et al., 2016</a>], GraphLab&nbsp;[<a href="#ref-LowGKBGH12">Low et al., 2012</a>], Pregel&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>], Signal–Collect&nbsp;[<a href="#ref-signalcollect">Stutz et al., 2016</a>], Shark&nbsp;[<a href="#ref-XinRZFSS13">Xin et al., 2013b</a>], etc. These <em>graph parallel frameworks</em> apply a <em>systolic abstraction</em>&nbsp;[<a href="#ref-Kung82">Kung, 1982</a>] based on a directed graph, where nodes are seen as processors that can send messages to other nodes along edges. Computation is then iterative, where in each iteration, each node reads messages received through inward edges (and possibly its own previous state), performs a computation, and then sends messages through outward edges based on the result. These frameworks then define the systolic computational abstraction on top of the data graph being processed: nodes and edges in the data graph become nodes and edges in the systolic graph.</p>
		<p>To take an example, assume we wish to compute the places that are most (or least) easily reached by the routes shown in the graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>. A good way to measure this is using centrality, where we choose PageRank&nbsp;[<a href="#ref-page1999pagerank">Page et al., 1999</a>], which computes the probability of a tourist randomly following the routes shown in the graph being at a particular place after a given number of “hops”. We can implement PageRank on large graphs using a graph parallel framework. In Figure&nbsp;<a href="#fig-pagerank">5.3</a>, we provide an example of an iteration of PageRank for an illustrative sub-graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>. The nodes are initialised with a score of \(\frac{1}{|V|} = \frac{1}{6}\), where we assume the tourist to have an equal chance of starting at any point. In the <em>message phase</em> (<span class="sc">Msg</span>), each node \(v\) passes a score of \(\frac{d \textrm{R}_i(v)}{|E(v)|}\) on each of its outgoing edges, where we denote by \(d\) a constant damping factor used to ensure convergence (typically \(d = 0.85\), indicating the probability that a tourist randomly “jumps” to any place), by \(\textrm{R}_i(v)\) the score of node \(v\) in iteration \(i\) (the probability of the tourist being at node \(v\) after \(i\) hops), and by \(|E(v)|\) the number of outgoing edges of \(v\). The aggregation phase (<span class="sc">Agg</span>) for \(v\) then sums all incoming messages received along with its constant share of the damping factor (\(\frac{1-d}{|V|}\)) to compute \(\textrm{R}_{i+1}(v)\). We then proceed to the message phase of the next iteration, continuing until some termination criterion is reached (e.g., iteration count or residual threshold, etc.) and final scores are output.</p>

		<figure id="fig-pagerank">
			<img src="images/fig-pagerank.svg" alt="Example of a systolic iteration of PageRank for a sample sub-graph of Figure&nbsp;24"/>
			<figcaption>Example of a systolic iteration of PageRank on a sub-graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a></figcaption>
		</figure>

		<p>While the given example is for PageRank, the systolic abstraction is general enough to support a wide variety of graph analytics, including those previously mentioned. An algorithm in this framework consists of the functions to compute message values in the <em>message phase</em> (<span class="sc">Msg</span>), and to accumulate the messages in the aggregation phase (<span class="sc">Agg</span>). The framework will take care of distribution, message passing, fault tolerance, etc. However, such frameworks – based on message passing between neighbours – have limitations: not all types of analytics can be expressed in such frameworks&nbsp;[<a href="#ref-XuHLJ19">Xu et al., 2019</a>].<sup class="fnmark" id="fnm18"><a href="#fn18">18</a></sup><span class="footnote" id="fn18"><sup><a href="#fnm18">note 18</a></sup> Formally, <a href="#ref-XuHLJ19">Xu et al. [2019]</a> have shown that such frameworks are as powerful as the (incomplete) Weisfeiler–Lehman (WL) graph isomorphism test for distinguishing graphs. This test involves nodes recursively hashing together hashes of local information received from neighbours, and passing these hashes to neighbours.</span> Hence frameworks may allow additional features, such as a <em>global step</em> that performs a global computation on all nodes, making the result available to each node&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>]; or a <em>mutation step</em> that allows for adding or removing nodes and edges during processing&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>].</p>

		<div class="formal">
			<p>Before defining a graph parallel framework, in the interest of generality, we first define a directed graph labelled with feature vectors, which captures the type of input that such a framework can accept, with vectors on both nodes and edges.<p>

			<dl class="definition" id="def-dvlg">
				<dt>Directed vector-labelled graph</dt>
				<dd>We define a <em>directed vector-labelled graph</em> \(G = (V,E,F,\lambda)\), where \(V\) is a set of nodes, \(E \subseteq V \times V\) is a set of edges, \(F\) is a set of feature vectors, and \(\lambda : V \cup E \rightarrow F\) labels each node and edge with a feature vector.</dd>
			</dl>

			<p>A directed-edge labelled graph or property graph may be encoded as a directed vector-labelled graph in a number of ways. The type of node and/or a selection of its attributes may be encoded in the node feature vectors, while the label of an edge and/or a selection of its attributes may be encoded in the edge feature vector (including, for example, weights applied to edges). Typically node feature vectors will all have the same dimensionality, as will edge feature vectors.</p>

			<div class="example" id="ex-dvlg">
				<p>We define a directed vector-labelled graph in preparation for later computing PageRank using a graph parallel framework. Let \(G = (V,E,L)\) denote a directed edge-labelled graph. Let \(|E(u)|\) denote the outdegree of node \(u \in V\). We then initialise a directed vector-labelled graph \(G' = (V,E',F,\lambda)\) such that \(E' = \{ (x,z) \mid \exists y : (x,y,z)\in E \}\), and for all \(u \in V\), we define \(\lambda(u) \coloneqq \begin{bmatrix} \frac{1}{|V|} \\ |E'(u)| \\ |V| \end{bmatrix}\), and \(\lambda(u,v) \coloneqq \begin{bmatrix} \, \end{bmatrix}\), with \(F \coloneqq \{ \lambda(u) \mid u \in V \} \cup \{\lambda(u,v) \mid (u,v) \in E' \}\), assigning each node a vector containing its initial PageRank score, the outdegree of the node, and the number of nodes in the graph. Conversely, edge-vectors are not used in this case.</p>
			</div>

			<p>We now define a graph parallel framework, where we use \(\{\!\!\{ \cdot \}\!\!\}\) to denote a multiset, \(2^{S \rightarrow \mathbb{N}}\) to denote the set of all multisets containing (only) elements from the set \(S\), and \(\mathbb{R}^a\) to denote the set of all vectors of dimension \(a\) (i.e., the set of all vectors containing \(a\) real-valued elements).</p>

			<dl class="definition" id="def-gpf">
				<dt>Graph parallel framework</dt>
				<dd>A <em>graph parallel framework</em> (<em>GPF</em>) is a triple of functions \(\mathfrak{G} \coloneqq (\)<span class="sc">Msg</span>, <span class="sc">Agg</span>, <span class="sc">End</span>\()\) such that (with \(a, b, c \in \mathbb{N}\)):
					<ul>
						<li><span class="sc">Msg</span>\(: \mathbb{R}^a \times \mathbb{R}^b \rightarrow \mathbb{R}^c\)</li>
						<li><span class="sc">Agg</span>\(: \mathbb{R}^a \times 2^{\mathbb{R}^c \rightarrow \mathbb{N}} \rightarrow \mathbb{R}^a\)</li>
						<li><span class="sc">End</span>\(: 2^{\mathbb{R}^a \rightarrow \mathbb{N}} \rightarrow \{ \mathrm{true}, \mathrm{false} \}\)</li>
					</ul>
				</dd>
			</dl>

			<p>The function <span class="sc">Msg</span> defines what message (i.e., vector) must be passed from a node to a neighbouring node along a particular edge, given the current feature vectors of the node and the edge; the function <span class="sc">Agg</span> is used to compute a new feature vector for a node, given its previous feature vector and incoming messages; the function <span class="sc">End</span> defines a condition for termination of vector computation. The integers \(a\), \(b\) and \(c\) denote the dimensions of node feature vectors, edge feature vectors, and message vectors, respectively; we assume that \(a\) and \(b\) correspond with the dimensions of input feature vectors for nodes and edges. Given a GPF \(\mathfrak{G} = (\)<span class="sc">Msg</span>, <span class="sc">Agg</span>, <span class="sc">End</span>\()\), a directed vector-labelled graph \(G = (V, E, F, \lambda)\), and a node \(u \in V\), we define the output vector assigned to node \(u\) in \(G\) by \(\mathfrak{G}\) (written \(\mathfrak{G}(G, u)\)) as follows. First let \(\mathbf{n}_u^{(0)} \coloneqq \lambda(u)\). For all \(i\geq 1\), let:</p>
			<p>\begin{align*}
			 M_u^{(i)} & \coloneqq \left\{\!\!\!\left\{ {\rm\small M{\scriptsize SG}}\left(\mathbf{n}_v^{(i-1)},\lambda(v,u)\right) \bigl\lvert\, (v,u) \in E \right\}\!\!\!\right\} \\
			 \mathbf{n}_{u}^{(i)} & \coloneqq {\rm\small A{\scriptsize GG}}\left(\mathbf{n}_u^{(i-1)},M_u^{(i)}\right)
			\end{align*}</p>
			<p>where \(M_u^{(i)}\) is the multiset of messages received by node \(u\) during iteration \(i\), and \(\mathbf{n}_{u}^{(i)}\) is the state (vector) of node \(u\) at the end of iteration \(i\). If \(j\) is the smallest integer for which <span class="sc">End</span>\((\{\!\!\{ \mathbf{n}_u^{(j)} \mid u \in V \}\!\!\})\) is true, then \(\mathfrak{G}(G, u) \coloneqq \mathbf{n}_u^{(j)}\).</p>
			<p>This particular definition assumes that vectors are dynamically computed for nodes, and that messages are passed only to outgoing neighbours, but the definitions can be readily adapted to consider dynamic vectors for edges, or messages being passed to incoming neighbours, etc. We now provide an example instantiating a GPF to compute PageRank over a directed graph.</p>

			<div class="example">
				<p>We take as input the directed vector labelled graph \(G' = (V,E,F,\lambda)\) from Example&nbsp;<a href="#ex-dvlg">5.1</a> for a PageRank GPF. First we define the messages passed from \(u\) to \(v\):</p>
				<p class="mathblock"><span class="sc">Msg</span>\(\left(\mathbf{n}_v,\lambda(v,u)\right) \coloneqq \begin{bmatrix}
				\frac{d(\mathbf{n}_{v})_1}{(\mathbf{n}_{v})_2}\\
				\end{bmatrix}\)</p>
				<p>where \(d\) denotes PageRank’s constant dampening factor (typically \(d \coloneqq 0.85\)) and \((\mathbf{n}_{v})_k\) denotes the \(k\)<sup>th</sup> element of the \(\mathbf{n}_{v}\) vector. In other words, \(v\) will pass to \(u\) its PageRank score multiplied by the dampening factor and divided by its out-degree (we do not require \(\lambda(v,u)\) in this particular example). Next we define the function for \(u\) to aggregate the messages it receives from other nodes:</p>
				<p class="mathblock"><span class="sc">Agg</span>\(\left(\mathbf{n}_u,M_u\right) \coloneqq \begin{bmatrix} \frac{1 - d}{(\mathbf{n}_{u})_3} + \sum_{\mathbf{m} \in M_u}(\mathbf{m})_1 \\ (\mathbf{n}_{u})_2 \\ (\mathbf{n}_{u})_3 \\
				\end{bmatrix}\)</p>
				<p>Here, we sum the scores received from other nodes along with its share of rank from the dampening factor, copying over the node’s degree and the total number of nodes for future use. Finally, there are a number of ways that we could define the termination condition; here we simply define:</p>
				<p class="mathblock"><span class="sc">End</span>\((\{\!\!\{ \mathbf{n}_u^{(i)} \mid u \in V \}\!\!\}) \coloneqq (i \geq \textsf{z}) \)</p>
				<p>where \(\textsf{z}\) is a fixed number of iterations, at which point the process stops.</p>
			</div>

			<p>We may note in this example that the total number of nodes is duplicated in the vector for each node of the graph. Part of the benefit of GPFs is that only local information in the neighbourhood of the node is required for each computation step. In practice, such frameworks may allow additional features, such as global computation steps whose results are made available to all nodes&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>], operations that dynamically modify the graph&nbsp;[<a href="#ref-MalewiczABDHLC10">Malewicz et al., 2010</a>], etc.</p>
		</div>

		<h4 id="sssec-query-languages" class="subsection">Analytics on data graphs</h4>
		<p>As aforementioned, most analytics presented thus far are, in their “native” form, applicable for undirected or directed graphs without the <em>edge metadata</em> – i.e., edge labels or property–value pairs – typical of graph data models.<sup class="fnmark" id="fnm19"><a href="#fn19">19</a></sup><span class="footnote" id="fn19"><sup><a href="#fnm19">note 19</a></sup> We remark that in the case of property graphs, property–value pairs on nodes can be converted by mapping values to nodes and properties to edges with the corresponding label.</span> A number of strategies can be applied to make data graphs subject to analytics of this form:</p>
		<ul>
			<li><em>Projection</em> involves simply “projecting” an undirected or directed graph by optionally selecting a sub-graph from the data graph from which all edge meta-data are dropped; for example, the graph of Figure&nbsp;<a href="#fig-pagerank">5.3</a> may be the result of extracting the sub-graph induced by the edge labels <span class="gelab">bus</span> and <span class="gelab">flight</span> from a larger data graph, where the labels are then dropped to create a directed graph.</li>
			<li><em>Weighting</em> involves converting edge meta-data into numerical values according to some function. Many of the aforementioned techniques are easily adapted to the case of weighted (directed) graphs; for example, we could consider weights on the graph of Figure&nbsp;<a href="#fig-pagerank">5.3</a> denoting trip duration (or price, traffic, etc.), and then compute the shortest (weighted) paths considering time by adding the duration of each leg of the respective journey.<sup class="fnmark" id="fnm20"><a href="#fn20">20</a></sup><span class="footnote" id="fn20"><sup><a href="#fnm20">note 20</a></sup> Other forms of analytics are possible if we assume the graph is weighted; for example, if we annotated the graph of Figure&nbsp;<a href="#fig-pagerank">5.3</a> with probabilities of tourists moving from one place to the next, we could leverage <em>Markov processes</em> to understand features such as reducibility, periodicity, transience, recurrence, ergodicity, steady states, etc., of the routes&nbsp;[<a href="#ref-markov">Dynkin, 1965</a>].</span> In the absence of external weights, we may rather map edge labels to weights, assigning the same weight to all <span class="gelab">flight</span> edges, to all <span class="gelab">bus</span> edges, etc., based on some criteria.</li>
			<li><em>Transformation</em> involves transforming the graph to a lower arity model. A transformation may be <em>lossy</em>, meaning that the original graph cannot be recovered; or <em>lossless</em>, meaning that the original graph can be recovered. Figure&nbsp;<a href="#fig-transform">5.4</a> provides an example of a lossy and lossless transformation from a directed edge-labelled graph to directed graphs. In the lossy transformation, we cannot tell, for example, if the original graph contained the edge <span class="gnode">Iquique</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, or rather the edge <span class="gnode">Iquique</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Arica</span>, or both. The lossless transformation must introduce new nodes (similar to reification) to maintain information about directed labelled edges. Both transformed graphs further attempt to preserve the directionality of the original graph.</li>
			<li><em>Customisation</em> involves changing the analytical procedure to incorporate edge meta-data, such as was the case for path finding based on path expressions. Other examples might include structural measures for node similarity that not only consider common neighbours, but also common neighbours connected by edges with the same label, or aggregate centrality measures that capture the importance of edges grouped by label, etc.</li>
		</ul>

		<figure id="fig-transform">
			<figure id="fig-transform1" style="display:inline-block;margin-right:2.5em;margin-left:0;">
				<img src="images/fig-transform1.svg" alt="Original graph"/>
				<figcaption>Original graph</figcaption>
			</figure>
			<figure id="fig-transform2" style="display:inline-block;">
				<img src="images/fig-transform2.svg" alt="Lossy transformation"/>
				<figcaption>Lossy transformation</figcaption>
			</figure>
			<figure id="fig-transform3" style="display:inline-block;margin-right:0;margin-left:2em;">
				<img src="images/fig-transform3.svg" alt="Lossless transformation"/>
				<figcaption>Lossless transformation</figcaption>
			</figure>
			<figcaption>Transformations from a directed edge-labelled graph to a directed graph</figcaption>
		</figure>

		<p>The results of an analytical process may change drastically depending on which of the previous strategies are chosen to prepare the graph for analysis. The choice of strategy may be a non-trivial one to make <em>a priori</em> and may require empirical validation. More study is required to more generally understand the effects of such strategies on the results of different analytical techniques over different graph models.</p>

		<h4 id="sssec-analyticsQ" class="subsection">Analytics with queries</h4>
		<p>As discussed in Section&nbsp;<a href="#ssec-querying">2.2</a>, various languages for querying graphs have been proposed down through the years&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. One may consider a variety of ways in which query languages and analytics can complement each other. First, we may consider using query languages to project or transform a graph suitable for a particular analytical task, such as to extract the graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a> from a larger data graph. Query languages such as SPARQL&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>], Cypher&nbsp;[<a href="#ref-FrancisGGLLMPRS18">Francis et al., 2018</a>], and G-CORE&nbsp;[<a href="#ref-AnglesABBFGLPPS18">Angles et al., 2018</a>] allow for outputting graphs, where such queries can be used to select sub-graphs for analysis. These languages can also express some limited (non-recursive) analytics, where aggregations can be used to compute degree centrality, for example; they may also have some built-in analytical support, where, for example, Cypher&nbsp;[<a href="#ref-FrancisGGLLMPRS18">Francis et al., 2018</a>] allows for finding shortest paths. In the other direction, analytics can contribute to the querying process in terms of <em>optimisations</em>, where, for example, analysis of connectivity may suggest how to better distribute a large data graph over multiple machines for querying using, e.g., <em>minimum cuts</em>&nbsp;[<a href="#ref-AkhterNS18">Akhter et al., 2018</a>, <a href="#ref-JankeST18">Janke et al., 2018</a>]. Analytics have also been used to <em>rank</em> query results over large graphs&nbsp;[<a href="#ref-WagnerTLHS12">Wagner et al., 2012</a>, <a href="#ref-FanWW13">Fan et al., 2013</a>], selecting the most important results for presentation to the user.</p>
		<p>In some use-cases we may further wish to interleave querying and analytical processes. For example, from the full data graph collected by the tourist board, consider an upcoming airline strike where the board wishes to find <em>the events during the strike with venues in cities unreachable from Santiago by public transport due to the strike</em>. Hypothetically, we could use a query to extract the transport network excluding the airline’s routes (assuming, per Figure&nbsp;<a href="#fig-fsa">2.3a</a> that the airline information is available), use analytics to extract the strongly connected component containing Santiago, and finally use a query to find events in cities not in the Santiago component on the given dates.<sup class="fnmark" id="fnm21"><a href="#fn21">21</a></sup><span class="footnote" id="fn21"><sup><a href="#fnm21">note 21</a></sup> Such a task could not be solved in a single query using regular path queries as such expressions would not be capable of filtering edges representing flights of a particular airline.</span> While one could solve this task using an imperative language such as Gremlin&nbsp;[<a href="#ref-Rodriguez15">Rodriguez, 2015</a>], GraphX&nbsp;[<a href="#ref-XinGFS13">Xin et al., 2013a</a>], or R&nbsp;[<a href="#ref-R">Foundation, 1992</a>], more declarative languages are also being explored to express such tasks, with proposals including the extension of graph query languages with recursive capabilities&nbsp;[<a href="#ref-BishofDKLP12">Bischof et al., 2012</a>, <a href="#ref-ReutterSV15">Reutter et al., 2015</a>, <a href="#ref-HoganRS20">Hogan et al., 2020</a>],<sup class="fnmark" id="fnm22"><a href="#fn22">22</a></sup><span class="footnote" id="fn22"><sup><a href="#fnm22">note 22</a></sup> Recursive query languages become Turing complete assuming one can also express operations on binary arrays.</span> combining linear algebra with relational (query) algebra&nbsp;[<a href="#ref-HutchisonHS17">Hutchison et al., 2017</a>], and so forth.</p>

		<h4 id="sssec-analyticsE" class="subsection">Analytics with entailment</h4>
		<p>Knowledge graphs are often associated with a semantic schema or ontology that defines the semantics of domain terms, giving rise to entailments (per Chapter&nbsp;<a href="#chap-deductive">4</a>). Applying analytics with or without such entailments – e.g., before or after materialisation – may yield radically different results. For example, observe that an edge <span class="gnode">Santa&nbsp;Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">hosts</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">EID15</span> is semantically equivalent to an edge <span class="gnode">EID15</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">venue</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santa&nbsp;Lucía</span> once the inverse axiom <span class="gnode">hosts</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">venue</span> is invoked; however, these edges are far from equivalent from the perspective of analytical techniques that consider edge direction, for which including one type of edge, or the other, or both, may have a major bearing on the final results. To the best of our knowledge, the combination of analytics and entailment has not been well-explored, leaving open interesting research questions. Along these lines, it may be of interest to explore <em>semantically-invariant analytics</em> that yield the same results over semantically-equivalent graphs (i.e., graphs that entail one another), thus analysing the semantic content of the knowledge graph rather than simply the topological features of the data graph; for example, semantically-invariant analytics would yield the same results over a graph containing the inverse axiom <span class="gnode">hosts</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">inv.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">venue</span> and a number of <span class="gelab">hosts</span> edges, the same graph but where every <span class="gelab">hosts</span> edge is replaced by an inverse <span class="gelab">venue</span> edge, and the union of both graphs.</p>
		</section>

		<section id="ssec-embeddings" class="section">
		<h3>Knowledge Graph Embeddings</h3>
		<p>Methods for machine learning have gained significant attention in recent years. In the context of knowledge graphs, machine learning can either be used for directly <em>refining</em> a knowledge graph&nbsp;[<a href="#ref-Paulheim17">Paulheim, 2017</a>] (discussed further in Chapter&nbsp;<a href="#chap-refine">8</a>); or for <em>downstream tasks</em> using the knowledge graph, such as recommendation&nbsp;[<a href="#ref-zhang2016collaborative">Zhang et al., 2016</a>], information extraction&nbsp;[<a href="#ref-VashishthJT18">Vashishth et al., 2018</a>], question answering&nbsp;[<a href="#ref-HuangZLL19">Huang et al., 2019</a>], query relaxation&nbsp;[<a href="#ref-WangWLCZQ18">Wang et al., 2018</a>], query approximation&nbsp;[<a href="#ref-HamiltonBZJL18">Hamilton et al., 2018</a>], etc. (discussed further in Chapter&nbsp;<a href="#chap-kgs">10</a>). However, many traditional machine learning techniques assume dense numeric input representations in the form of vectors, which is quite distinct from how graphs are usually expressed. So how can graphs – or nodes, edges, etc., thereof – be encoded as numeric vectors?</p>
		<p>A first attempt to represent a graph using vectors would be to use a <em>one-hot encoding</em>, generating a vector for each node of length \(|L| \cdot |V|\) – with \(|V|\) the number of nodes in the input graph and \(|L|\) the number of edge labels – placing a one at the corresponding index to indicate the existence of the respective edge in the graph, or zero otherwise. Such a representation will, however, typically result in large and sparse vectors, which will be detrimental for most machine learning models.</p>
		<p>The main goal of knowledge graph embedding techniques is to create a dense representation of the graph (i.e., <em>embed</em> the graph) in a continuous, low-dimensional vector space that can then be used for machine learning tasks. The dimensionality \(d\) of the embedding is fixed and usually low (often, e.g., \(50 \geq d \geq 1000\)). Typically the graph embedding is composed of an <em>entity embedding</em> for each node: a vector with \(d\) dimensions that we denote by \(\mathbf{e}\); and a <em>relation embedding</em> for each edge label: (typically) a vector with \(d\) dimensions that we denote by \(\mathbf{r}\). The overall goal of these vectors is to abstract and preserve latent structures in the graph. There are many ways in which this notion of an embedding can be instantiated. Most commonly, given an edge <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span>, a specific embedding approach defines a <em>scoring function</em> that accepts \(\mathbf{e}\)<sub><code>s</code></sub> (the entity embedding of node <span class="gnode">s</span>), \(\mathbf{r}\)<sub><span class="gelab">p</span></sub> (the entity embedding of edge label <span class="gelab">p</span>) and \(\mathbf{e}\)<sub><code>o</code></sub> (the entity embedding of node <span class="gnode">o</span>) and computes the <em>plausibility</em> of the edge, which estimates how likely it is to be true. Given a data graph, the goal is then to compute the embeddings of dimension \(d\) that maximise the plausibility of positive edges (typically edges in the graph) and minimise the plausibility of negative examples (typically edges in the graph with a node or edge label changed such that they are no longer in the graph) according to the given scoring function. The resulting embeddings can then be seen as models learnt through self-supervision that encode (latent) features of the graph, mapping input edges to output plausibility scores.</p>
		<p>Embeddings can then be used for a number of low-level tasks involving the nodes and edge-labels of the graph from which they were computed. First, we can use the plausibility scoring function to assign a confidence to edges that may, for example, have been extracted from an external source (discussed later in Chapter&nbsp;<a href="#chap-create">6</a>). Second, the plausibility scoring function can be used to complete edges with missing nodes/edge labels for the purposes of link prediction (discussed later in Chapter&nbsp;<a href="#chap-refine">8</a>); for example, in Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, we might ask which nodes in the graph are likely to complete the edge <span class="gnode">Grey&nbsp;Glacier</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">?</span>, where – aside from <span class="gnode">Punta Arenas</span>, which is already given – we might intuitively expect <span class="gnode">Torres del Paine</span> to be a plausible candidate. Third, embedding models will typically assign similar vectors to similar nodes and similar edge-labels, and thus they can be used as the basis of similarity measures, which may be useful for finding duplicate nodes that refer to the same entity, or for the purposes of providing recommendations (discussed later in Chapter&nbsp;<a href="#chap-kgs">10</a>).</p>
		<p>A wide range of knowledge graph embedding techniques have been proposed&nbsp;[<a href="#ref-Wang2017KGEmbedding">Wang et al., 2017</a>]. Our goal here is to provide a high-level introduction to some of the most popular techniques proposed thus far. We first discuss <em>tensor-based approaches</em> that include three different sub-approaches using linear/tensor algebra to compute embeddings. We then discuss <em>language models</em> that leverage existing word embedding techniques, proposing ways of generating graph-like analogues for their expected (textual) inputs. Finally we discuss <em>entailment-aware models</em> that can take into account the semantics of the graph, when available.</p>

		<h4 id="ssec-tensor-based-models" class="subsection">Tensor-based models</h4>
		<p>We first discuss tensor-based models, which we sub-divide into three categories: <em>translational models</em> that adopt a geometric perspective whereby relation embeddings translate subject entities to object entities, <em>tensor decomposition models</em> that extract latent factors approximating the graph’s structure, and <em>neural models</em> that use neural networks to train embeddings that provide accurate plausibility scores.</p>

		<h5 id="sssec-translational-models" class="subsubsection">Translational models</h5>
		<p><em>Translational models</em> interpret edge labels as transformations from subject nodes (aka the <em>source</em> or <em>head</em>) to object nodes (aka the <em>target</em> or <em>tail</em>); for example, in the edge <span class="gnode">San&nbsp;Pedro</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Moon&nbsp;Valley</span>, the edge label <span class="gelab">bus</span> is seen as transforming <span class="gnode">San Pedro</span> to <span class="gnode">Moon Valley</span>, and likewise for other <span class="gelab">bus</span> edges. The most elementary approach in this family is TransE&nbsp;[<a href="#ref-bordes2013translating">Bordes et al., 2013</a>]. Over all positive edges <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span>, TransE learns vectors \(\mathbf{e}\)<sub><code>s</code></sub>, \(\mathbf{r}\)<sub><span class="gelab">p</span></sub>, and \(\mathbf{e}\)<sub><code>os</code></sub> aiming to make \(\mathbf{e}\)<sub><code>s</code></sub>&nbsp;+&nbsp;\(\mathbf{r}\)<sub><span class="gelab">p</span></sub> as close as possible to \(\mathbf{e}\)<sub><code>o</code></sub>. Conversely, if the edge is a negative example, TransE attempts to learn a representation that keeps \(\mathbf{e}\)<sub><code>s</code></sub>&nbsp;+&nbsp;\(\mathbf{r}\)<sub><span class="gelab">p</span></sub> away from \(\mathbf{e}\)<sub><code>o</code></sub>. To illustrate, Figure&nbsp;<a href="#fig-TransE">5.5</a> provides a toy example of two-dimensional (\(d = 2\)) entity and relation embeddings computed by TransE. We keep the orientation of the vectors similar to the original graph for clarity. For any edge <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span> in the original graph, adding the vectors \(\mathbf{e}\)<sub><code>s</code></sub>&nbsp;+&nbsp;\(\mathbf{r}\)<sub><span class="gelab">p</span></sub> should approximate \(\mathbf{e}\)<sub><code>o</code></sub>. In this toy example, the vectors correspond precisely where, for instance, adding the vectors for <span class="gnode">Licantén</span> (\(\mathbf{e}\)<sub><code>L.</code></sub>) and <span class="gelab">west of</span> (\(\mathbf{r}\)<sub><span class="gelab">wo.</span></sub>) gives a vector corresponding to <span class="gnode">Curico</span> (\(\mathbf{e}\)<sub><code>C.</code></sub>). We can use these embeddings to predict edges (amongst other tasks); for example, in order to predict which node in the graph is most likely to be <span class="gelab">west of</span> <span class="gnode">Antofagasta</span> (<code>A.</code>), by computing \(\mathbf{e}\)<sub><code>A.</code></sub>&nbsp;+&nbsp;\(\mathbf{r}\)<sub><span class="gelab">wo.</span></sub> we find that the resulting vector (dotted in Figure&nbsp;<a href="#fig-transEE">5.5c</a>) is closest to \(\mathbf{e}\)<sub><code>T.</code></sub>, thus predicting <span class="gnode">Toconao</span> (<code>T.</code>) to be the most <em>plausible</em> such node.</p>

		<figure id="fig-TransE">
			<figure id="fig-distEg" style="display:inline-block;margin-right:2.5em;margin-left:0;">
				<img src="images/fig-distEg.svg" alt="Original graph"/>
				<figcaption>Original graph</figcaption>
			</figure>
			<figure id="fig-transER" style="display:inline-block;">
				<img src="images/fig-transER.svg" alt="Relation embeddings"/>
				<figcaption>Relation embeddings</figcaption>
			</figure>
			<figure id="fig-transEE" style="display:inline-block;margin-right:0;margin-left:2em;">
				<img src="images/fig-transEE.svg" alt="Lossless transformation"/>
				<figcaption>Lossless transformation</figcaption>
			</figure>
			<figcaption>Toy example of two-dimensional relation and entity embeddings learnt by TransE; the entity embeddings use abbreviations and include an example of vector addition to predict what is west of Antofagasta</figcaption>
		</figure>

		<p>Aside from this toy example, TransE can be too simplistic; for example, in Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, <span class="gelab">bus</span> not only transforms <span class="gnode">San Pedro</span> to <span class="gnode">Moon Valley</span>, but also to <span class="gnode">Arica</span>, <span class="gnode">Calama</span>, and so forth. TransE will, in this case, aim to give similar vectors to all such target locations, which may not be feasible given other edges. TransE will also tend to assign cyclical relations a zero vector, as the directional components will tend to cancel each other out. To resolve such issues, many variants of TransE have been investigated. Amongst these, for example, TransH&nbsp;[<a href="#ref-wang2014knowledge">Wang et al., 2014</a>] represents different relations using distinct hyperplanes, where for the edge <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span>, <span class="gnode">s</span> is first projected onto the hyperplane of <span class="gelab">p</span> before the translation to <span class="gnode">o</span> is learnt (uninfluenced by edges with other labels for <span class="gnode">s</span> and for <span class="gnode">o</span>). TransR&nbsp;[<a href="#ref-lin2015learning">Lin et al., 2015</a>] generalises this approach by projecting <span class="gnode">s</span> and <span class="gnode">o</span> into a vector space specific to <span class="gelab">p</span>, which involves multiplying the entity embeddings for <span class="gnode">s</span> and <span class="gnode">o</span> by a projection matrix specific to <span class="gelab">p</span>. TransD&nbsp;[<a href="#ref-TransD">Ji et al., 2015</a>] simplifies TransR by associating entities and relations with a second vector, where these secondary vectors are used to project the entity into a relation-specific vector space. Recently, RotatE&nbsp;[<a href="#ref-SunDNT19">Sun et al., 2019</a>] proposes translational embeddings in complex space, which allows to capture more characteristics of relations, such as direction, symmetry, inversion, antisymmetry, and composition. Embeddings have also been proposed in non-Euclidean ; for example, MuRP&nbsp;[<a href="#ref-BalazevicAH19">Balazevic et al., 2019a</a>] uses relation embeddings that transform entity embeddings in the hyperbolic space of the Poincaré ball mode, whose curvature provides more “space” to separate entities with respect to the dimensionality. For discussion of other translational models, we refer to surveys by <a href="#ref-CaiZC18">Cai et al. [2018]</a>, <a href="#ref-Wang2017KGEmbedding">Wang et al. [2017]</a>.</p>

		<h5 id="sssec-tensor-decomposition-models" class="subsubsection">Tensor decomposition models</h5>
		<p>A second approach to derive graph embeddings is to apply methods based on <em>tensor decomposition</em>. A <em>tensor</em> is a multidimensional numeric field that generalises scalars (\(0\)-order tensors), vectors (\(1\)-order tensors) and matrices (\(2\)-order tensors) towards arbitrary dimension/order. Tensors have become a widely used abstraction for machine learning&nbsp;[<a href="#ref-RabanserSG17">Rabanser et al., 2017</a>]. Tensor decomposition involves decomposing a tensor into more “elemental” tensors (e.g., of lower order) from which the original tensor can be recomposed (or approximated) by a fixed sequence of basic operations over the output tensors. These elemental tensors can be viewed as capturing <em>latent factors</em> underlying the information contained in the original tensor. There are many approaches to tensor decomposition, where we will now briefly introduce the main ideas behind <em>rank decompositions</em>&nbsp;[<a href="#ref-RabanserSG17">Rabanser et al., 2017</a>].</p>
		<p>Leaving aside graphs momentarily, consider an \((a,b)\)-matrix (i.e., a \(2\)-order tensor) \(\mathbf{C}\), where \(a\) is the number of cities in Chile, \(b\) is the number of months in a year, and each element \((\mathbf{C})_{ij}\) denotes the average temperature of the \(i\)<sup>th</sup> city in the \(j\)<sup>th</sup> month. Noting that Chile is a long, thin country – ranging from subpolar climates in the south, to a desert climate in the north – we may find a decomposition of \(\mathbf{C}\) into two vectors representing latent factors – specifically \(\mathbf{x}\) (with \(a\) elements) giving lower values for cities with lower latitude, and \(\mathbf{y}\) (with \(b\) elements), giving lower values for months with lower temperatures – such that computing the outer product<sup class="fnmark" id="fnm23"><a href="#fn23">23</a></sup><span class="footnote" id="fn23"><sup><a href="#fnm23">note 23</a></sup> The outer product of two (column) vectors \(\mathbf{x}\) of length \(a\) and \(\mathbf{y}\) of length \(b\), denoted \(\mathbf{x} \otimes \mathbf{y}\), is defined as \(\mathbf{x}\mathbf{y}^{\mathrm{T}}\), yielding an \((a,b)\)-matrix \(\mathbf{M}\) such that \((\mathbf{M})_{ij} = (\mathbf{x})_i \cdot (\mathbf{y})_j\). Analogously, the outer product of \(k\) vectors is a \(k\)-order tensor.</span> of the two vectors approximates \(\mathbf{C}\) reasonably well: \(\mathbf{x} \otimes \mathbf{y} \approx \mathbf{C}\). In the (unlikely) case that there exist vectors \(\mathbf{x}\) and \(\mathbf{y}\) such that \(\mathbf{C}\) is precisely the outer product of two vectors (\(\mathbf{x} \otimes \mathbf{y} = \mathbf{C}\)) we call \(\mathbf{C}\) a rank-\(1\) matrix; we can then precisely encode \(\mathbf{C}\) using \(a + b\) values rather than \(a \times b\) values. Most times, however, to get precisely \(\mathbf{C}\), we need to sum multiple rank-\(1\) matrices, where the rank \(r\) of \(\mathbf{C}\) is the minimum number of rank-\(1\) matrices that need to be summed to derive precisely \(\mathbf{C}\), such that \(\mathbf{x}_1 \otimes \mathbf{y}_1 + \ldots \mathbf{x}_r \otimes \mathbf{y}_r = \mathbf{C}\). In the temperature example, \(\mathbf{x}_2 \otimes \mathbf{y}_2\) might correspond to a correction for altitude, \(\mathbf{x}_3 \otimes \mathbf{y}_3\) for higher temperature variance further south, etc. A (low) rank decomposition of a matrix then sets a limit \(d\) on the rank and computes the vectors \((\mathbf{x}_1,\mathbf{y}_1,\ldots,\mathbf{x}_{d},\mathbf{y}_{d})\) such that \(\mathbf{x}_1 \otimes \mathbf{y}_1 + \ldots + \mathbf{x}_{d} \otimes \mathbf{y}_{d}\) gives the best \(d\)-rank approximation of \(\mathbf{C}\). Noting that to generate \(n\)-order tensors we need to compute the outer product of \(n\) vectors, we can generalise this idea towards low-rank decomposition of tensors; this method is called Canonical Polyadic (CP) decomposition&nbsp;[<a href="#ref-Hitchcock27">Hitchcock, 1927</a>]. For example, a \(3\)-order tensor \(\mathcal{C}\) containing monthly temperatures for Chilean cities <em>at four different times of day</em> could be approximated with \(\mathbf{x}_1 \otimes \mathbf{y}_1 \otimes \mathbf{z}_1 + \ldots \mathbf{x}_{d} \otimes \mathbf{y}_{d} \otimes \mathbf{z}_{d}\) (e.g., \(\mathbf{x}_1\) might be a latitude factor, \(\mathbf{y}_1\) a monthly variation factor, and \(\mathbf{z}_1\) a daily variation factor, and so on). Various algorithms exist to compute (approximate) CP decompositions, including Alternating Least Squares, Jennrich’s Algorithm, and the Tensor Power method&nbsp;[<a href="#ref-RabanserSG17">Rabanser et al., 2017</a>].</p>
		<p>Returning to graphs, similar principles can be used to decompose a graph into vectors, thus yielding embeddings. In particular, a graph can be encoded as a one-hot \(3\)-order tensor \(\mathcal{G}\) with \(|V| \times |L| \times |V|\) elements, where the element \((\mathcal{G})_{ijk}\) is set to one if the \(i\)<sup>th</sup> node links to the \(k\)<sup>th</sup> node with an edge having the \(j\)<sup>th</sup> label, or zero otherwise. As previously mentioned, such a tensor will typically be very large and sparse, where rank decompositions are thus applicable. A CP decomposition&nbsp;[<a href="#ref-Hitchcock27">Hitchcock, 1927</a>] would compute a sequence of vectors \((\mathbf{x}_1,\mathbf{y}_1,\mathbf{z}_1,\ldots,\mathbf{x}_d,\mathbf{y}_d,\mathbf{z}_d)\) such that \(\mathbf{x}_1 \otimes \mathbf{y}_1 \otimes \mathbf{z}_1 + \ldots + \mathbf{x}_d \otimes \mathbf{y}_d \otimes \mathbf{z}_d \approx \mathcal{G}\). We illustrate this scheme in Figure&nbsp;<a href="#fig-cpRank">5.6</a>. Letting \(\mathbf{X}, \mathbf{Y}, \mathbf{Z}\) denote the matrices formed by \(\begin{bmatrix} \mathbf{x}_1\,\cdots\,\mathbf{x}_d \end{bmatrix}\), \(\begin{bmatrix} \mathbf{y}_1\,\cdots\,\mathbf{y}_d \end{bmatrix}\), \(\begin{bmatrix} \mathbf{z}_1\,\cdots\,\mathbf{z}_d \end{bmatrix}\), respectively, with each vector forming a column of the corresponding matrix, we could then extract the \(i\)<sup>th</sup> row of \(\mathbf{Y}\) as an embedding for the \(i\)<sup>th</sup> relation, and the \(j\)<sup>th</sup> rows of \(\mathbf{X}\) and \(\mathbf{Z}\) as <em>two</em> embeddings for the \(j\)<sup>th</sup> entity. However, knowledge graph embeddings typically aim to assign <em>one</em> vector to each entity.</p>

		<figure id="fig-cpRank">
			<img src="images/fig-cpRank.svg" alt="Abstract illustration of a CP \(d\)-rank decomposition of a tensor representing the graph of Figure&nbsp;27a"/>
			<figcaption>Abstract illustration of a CP \(d\)-rank decomposition of a tensor representing the graph of Figure&nbsp;<a href="#fig-distEg">5.5a</a></figcaption>
		</figure>

		<p>DistMult&nbsp;[<a href="#ref-distmult">Yang et al., 2015</a>] is a seminal method for computing knowledge graph embeddings based on rank decompositions, where each entity and relation is associated with a vector of dimension \(d\), such that for an edge <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span>, a plausibility scoring function \(\sum_{i=1}^d (\mathbf{e}\)<sub><code>s</code></sub>\()_i (\mathbf{r}\)<sub><span class="gelab">p</span></sub>\()_i (\mathbf{e}\)<sub><code>o</code></sub>\()_i\) is defined, where \((\mathbf{e}\)<sub><code>s</code></sub>\()_i\), <span class="nobreak">\((\mathbf{r}\)<sub><span class="gelab">p</span></sub>\()_i\)</span> and \((\mathbf{e}\)<sub><code>o</code></sub>\()_i\) denote the \(i\)<sup>th</sup> elements of vectors \(\mathbf{e}\)<sub><code>s</code></sub>, \(\mathbf{r}\)<sub><span class="gelab">p</span></sub>, \(\mathbf{e}\)<sub><code>o</code></sub>, respectively. The goal, then, is to learn vectors for each node and edge label that maximise the plausibility of positive edges and minimise the plausibility of negative edges. This approach equates to a CP decomposition of the graph tensor \(\mathcal{G}\), but where entities have one vector that is used twice: \(\mathbf{x}_1 \otimes \mathbf{y}_1 \otimes \mathbf{x}_1 + \ldots + \mathbf{x}_d \otimes \mathbf{y}_d \otimes \mathbf{x}_d \approx \mathcal{G}\). A weakness of this approach is that per the scoring function, the plausibility of <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span> will always be equal to that of <span class="gnode">o</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">s</span>; in other words, DistMult does not consider edge direction.</p>
		<p>Rather than use a vector as a relation embedding, RESCAL&nbsp;[<a href="#ref-nickel2013tensor">Nickel and Tresp, 2013</a>] uses a matrix, which allows for combining values from \(\mathbf{e}\)<sub><code>s</code></sub> and \(\mathbf{e}\)<sub><code>o</code></sub> across all dimensions, and thus can capture (e.g.) edge direction. However, RESCAL incurs a higher cost in terms of space and time than DistMult. HolE&nbsp;[<a href="#ref-NickelRP16">Nickel et al., 2016b</a>] uses vectors for relation and entity embeddings, but proposes to use the <em>circular correlation operator</em> – which takes sums along the diagonals of the outer product of two vectors – to combine them. This operator is not commutative, and can thus consider edge direction. ComplEx&nbsp;[<a href="#ref-TrouillonWRGB16">Trouillon et al., 2016</a>], on the other hand, uses a complex vector (i.e., a vector containing complex numbers) as a relational embedding, which similarly allows for breaking the aforementioned symmetry of DistMult’s scoring function while keeping the number of parameters low. SimplE&nbsp;[<a href="#ref-Kazemi018">Kazemi and Poole, 2018</a>] rather proposes to compute a standard CP decomposition computing two initial vectors for entities from \(\mathbf{X}\) and \(\mathbf{Z}\) and then averaging terms across \(\mathbf{X}\), \(\mathbf{Y}\), \(\mathbf{Z}\) to compute the final plausibility scores. TuckER&nbsp;[<a href="#ref-BalazevicAH19a">Balazevic et al., 2019b</a>] employs a different type of decomposition – called a Tucker Decomposition&nbsp;[<a href="#ref-tucker64extension">Tucker, 1964</a>], which computes a smaller “core” tensor \(\mathcal{T}\) and a sequence of three matrices \(\mathbf{A}\), \(\mathbf{B}\) and \(\mathbf{C}\), such that \(\mathcal{G} \approx \mathcal{T} \otimes \mathbf{A} \otimes \mathbf{B} \otimes \mathbf{C}\) – where entity embeddings are taken from \(\mathbf{A}\) and \(\mathbf{C}\), while relation embeddings are taken from \(\mathbf{B}\). Of these approaches, TuckER&nbsp;[<a href="#ref-BalazevicAH19a">Balazevic et al., 2019b</a>] currently provides state-of-the-art results on standard benchmarks.</p>

		<h5 id="sssec-neural-models" class="subsubsection">Neural models</h5>
		<p>A limitation of the aforementioned approaches is that they assume either linear (preserving addition and scalar multiplication) or bilinear (e.g., matrix multiplication) operations over embeddings to compute plausibility scores. Other approaches rather use neural networks to learn embeddings with non-linear scoring functions for plausibility.</p>
		<p>One of the earliest proposals of a neural model was Semantic Matching Energy (SME)&nbsp;[<a href="#ref-GlorotBWB13">Glorot et al., 2013</a>], which learns parameters (aka weights: \(\mathbf{w}\), \(\mathbf{w}'\)) for two functions – \(f_{\mathbf{w}}(\mathbf{e}\)<sub><code>s</code></sub>\(,\mathbf{r}\)<sub><span class="gelab">p</span></sub>\()\) and \(g_{\mathbf{w}'}(\mathbf{e}\)<sub><code>o</code></sub>\(,\mathbf{r}\)<sub><span class="gelab">p</span></sub>\()\) – such that the dot product of the result of both functions – \(f_{\mathbf{w}}(\mathbf{e}\)<sub><code>s</code></sub>\(,\mathbf{r}\)<sub><span class="gelab">p</span></sub>\() \cdot g_{\mathbf{w}'}(\mathbf{e}\)<sub><code>o</code></sub>\(,\mathbf{r}\)<sub><span class="gelab">p</span></sub>\()\) – gives the plausibility score. Both linear and bilinear variants of \(f_{\mathbf{w}}\) and \(g_{\mathbf{w}'}\) are proposed. Another early proposal was Neural Tensor Networks (NTN)&nbsp;[<a href="#ref-socher2013reasoning">Socher et al., 2013</a>], which proposes to maintain a tensor \(\mathcal{W}\) of internal weights, such that the plausibility score is computed by a complex function that combines the outer product <span class="nobreak">\(\mathbf{e}\)<sub><code>s</code></sub>\( \otimes \mathcal{W} \otimes \mathbf{e}\)<sub><code>o</code></sub></span> with a standard neural layer over \(\mathbf{e}\)<sub><code>s</code></sub> and \(\mathbf{e}\)<sub><code>o</code></sub>, which in turn is combined with \(\mathbf{r}\)<sub><span class="gelab">p</span></sub>, to produce a plausibility score. The tensor \(\mathcal{W}\) results in a high number of parameters, limiting scalability&nbsp;[<a href="#ref-Wang2017KGEmbedding">Wang et al., 2017</a>]. Multi-Layer Perceptron (MLP)&nbsp;[<a href="#ref-DongGHHLMSSZ14">Dong et al., 2014</a>] is a simpler model, where \(\mathbf{e}\)<sub><code>s</code></sub>, \(\mathbf{r}\)<sub><span class="gelab">p</span></sub> and \(\mathbf{e}\)<sub><code>o</code></sub> are concatenated and fed into a hidden layer to compute plausibility scores.</p>
		<p>A number of more recent approaches have proposed using convolutional kernels in their models. ConvE&nbsp;[<a href="#ref-DettmersMS018">Dettmers et al., 2018</a>] proposes to generate a matrix from \(\mathbf{e}\)<sub><code>s</code></sub> and \(\mathbf{r}\)<sub><span class="gelab">p</span></sub> by “wrapping” each vector over several rows and concatenating both matrices. The concatenated matrix serves as the input for a set of (2D) convolutional layers, which returns a feature map tensor. The feature map tensor is vectorised and projected into \(d\) dimensions using a parameterised linear transformation. The plausibility score is then computed based on the dot product of this vector and \(\mathbf{e}\)<sub><code>o</code></sub>. A disadvantage of ConvE is that by wrapping vectors into matrices, it imposes an artificial two-dimensional structure on the embeddings. HypER&nbsp;[<a href="#ref-BalazevicAH19b">Balazevic et al., 2019c</a>] is a similar model using convolutions, but avoids the need to wrap vectors into matrices. Instead, a fully connected layer (called the “hypernetwork”) is applied to \(\mathbf{r}\)<sub><span class="gelab">p</span></sub> and used to generate a matrix of relation-specific convolutional filters. These filters are applied directly to \(\mathbf{e}\)<sub><code>s</code></sub> to give a feature map, which is vectorised. The same process is then applied as in ConvE: the resulting vector is projected into \(d\) dimensions, and a dot product applied with \(\mathbf{e}\)<sub><code>o</code></sub> to produce the plausibility score. The resulting model is shown to outperform ConvE on standard benchmarks&nbsp;[<a href="#ref-BalazevicAH19b">Balazevic et al., 2019c</a>].</p>
		<p>The presented approaches strike different balances in terms of expressivity and the number of parameters than need to be trained. While more expressive models, such as NTN, may better fit more complex plausibility functions over lower dimensional embeddings by using more hidden parameters, simpler models, such as that proposed by Dong et al.&nbsp;[<a href="#ref-DongGHHLMSSZ14">Dong et al., 2014</a>], and convolutional networks&nbsp;[<a href="#ref-DettmersMS018">Dettmers et al., 2018</a>, <a href="#ref-BalazevicAH19b">Balazevic et al., 2019c</a>] that enable parameter sharing by applying the same (typically small) kernels over different regions of a matrix, require handling fewer parameters overall and are more scalable.</p>

		<h5 id="sssec-survey-and-def" class="subsubsection">Survey and definition of tensor-based approaches</h5>
		<p>We now formally define and survey the aforementioned tensor-based approaches. For simplicity, we will consider directed edge-labelled graphs.</p>

		<div class="formal">
			<p>Before defining embeddings, we first introduce tensors.</p>

			<dl class="definition" id="def-vector-matrix-tensor-order-mode">
				<dt>Vector, matrix, tensor, order, mode</dt>
				<dd>For any positive integer \(a\), a <em>vector</em> of dimension \(a\) is a family of real numbers indexed by integers in \(\{1, \ldots, a\}\). For \(a\) and \(b\) positive integers, an \((a,b)\)-matrix is a family of real numbers indexed by pairs of integers in \(\{1, \ldots, a\} \times \{1, \ldots, b\}\). A tensor is a family of real numbers indexed by a finite sequence of integers such that there exist positive numbers \(a_1, \ldots, a_n\) such that the indices are all the tuples of numbers in \(\{1, \ldots, a_1\} \times \ldots \times \{1, \ldots, a_n\}\). The number \(n\) is called the <em>order</em> of the tensor, the subindices \(i\in \{1, \ldots, n\}\) indicate the <em>mode</em> of a tensor, and each \(a_i\) defines the dimension of the \(i\)<sup>th</sup> mode. A 1-order tensor is a vector and a 2-order tensor is a matrix. We denote the set of all tensors as \(\mathbb{T}\).</dd>
			</dl>

			<p>For specific dimensions \(a_1,\ldots,a_n\) of modes, a tensor is an element of \((\cdots(\mathbb{R}^{a_1})^{\ldots})^{a_n}\) but we write \(\mathbb{R}^{a_1,\ldots,a_n}\) to simplify the notation. We use lower-case bold font to denote vectors (\(\mathbf{x} \in \mathbb{R}^a\)), upper-case bold font to denote matrices (\(\mathbf{X} \in \mathbb{R}^{a,b}\)) and calligraphic font to denote tensors (\(\mathcal{X} \in \mathbb{R}^{a_1,\ldots,a_n}\)).</p>
			<p>Now we are ready to abstractly define knowledge graph embeddings.</p>

			<dl class="definition" id="def-knowledge-graph-embedding">
				<dt>Knowledge graph embedding</dt>
				<dd>Given a directed edge-labelled graph \(G = (V,E,L)\), a <em>knowledge graph embedding of \(G\)</em> is a pair of mappings \((\varepsilon,\rho)\) such that \(\varepsilon : V \rightarrow \mathbb{T}\) and \(\rho : L \rightarrow \mathbb{T}\).</dd>
			</dl>

			<p>In the most typical case, \(\varepsilon\) and \(\rho\) map nodes and edge-labels, respectively, to vectors of fixed dimension. In some cases, however, they may map to matrices. Given this abstract notion of a knowledge graph embedding, we can then define a plausibility scoring function.</p>

			<dl class="definition" id="def-plausibility">
				<dt>Plausibility scores</dt>
				<dd>A <em>plausibility scoring function</em> is a partial function \(\phi : \mathbb{T} \times \mathbb{T} \times \mathbb{T} \rightarrow \mathbb{R}\). Given a directed edge-labelled graph \(G = (V,E,L)\), an edge \((s,p,o) \in V \times L \times V\), and a knowledge graph embedding \((\varepsilon,\rho)\) of \(G\), the plausibility of \((s,p,o)\) is given as \(\phi(\varepsilon(s),\rho(p),\varepsilon(o))\).</dd>
			</dl>

			<p>Edges with higher scores are considered more plausible. Given a graph \(G = (V,E,L)\), we assume a set of positive edges \(E^+\) and a set of negative edges \(E^{-}\). Positive edges are often simply the edges in the graph: \(E^+ \coloneqq E\). Negative edges use the vocabulary of \(G\) (i.e., \(E^- \subseteq V \times L \times V\)) and are typically defined by taking edges \((s,p,o)\) from \(E\) and changing one term of each edge – often one of the nodes – such that the edge is no longer in \(E\). Given sets of positive and negative edges, and a plausibility scoring function, the objective is then to find the embedding that maximises the plausibility of edges in \(E^+\) while minimising the plausibility of edges in \(E^{-}\). Specific knowledge graph embeddings then instantiate the type of embedding considered and the plausibility scoring function in various ways.</p>
			<p>In Table&nbsp;<a href="#tab-kges">5.1</a>, we define the plausibility scoring function and types of embeddings used by different knowledge graph embeddings. To simplify the definitions, we use \(\mathbf{e}_x\) to denote \(\varepsilon(x)\) when it is a vector, \(\mathbf{r}_y\) to denote \(\rho(y)\) when it is a vector, and \(\mathbf{R}_y\) to denote \(\rho(y)\) when it is a matrix. Some models involve learnt parameters (aka weights) for computing plausibility. We denote these as \(\mathbf{v}\), \(\mathbf{V}\), \(\mathcal{V}\), \(\mathbf{w}\), \(\mathbf{W}\) \(\mathcal{W}\) (for vectors, matrices or tensors). We use \(d_e\) and \(d_r\) to denote the dimensionality chosen for entity embeddings and relation embeddings, respectively. Often it is assumed that \(d_e = d_r\), in which case we will write \(d\). Weights may have their own dimensionality, which we denote \(w\). The embeddings in Table&nbsp;<a href="#tab-kges">5.1</a> use a variety of operators on vectors, matrices and tensors, which will be defined later.</p>

			<p>The embeddings defined in Table&nbsp;<a href="#tab-kges">5.1</a> vary in complexity, where a trade-off exists between the number of parameters used, and the expressiveness of the model in terms of its capability to capture latent features of the graph. To increase expressivity, many of the models in Table&nbsp;<a href="#tab-kges">5.1</a> use additional parameters beyond the embeddings themselves. A possible formal guarantee of such models is <em>full expressiveness</em>, which, given any disjoint sets of positive edges \(E^+\) and negative edges \(E^{-}\), asserts that the model can always correctly partition those edges. On the one hand, for example, DistMult&nbsp;[<a href="#ref-distmult">Yang et al., 2015</a>] cannot distinguish an edge <span class="gnode">s</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">o</span> from its inverse <span class="gnode">o</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">s</span>, so by adding an inverse of an edge in \(E^+\) to \(E^{-}\), we can show that it is <em>not</em> fully expressive. On the other hand, models such as ComplEx&nbsp;[<a href="#ref-TrouillonWRGB16">Trouillon et al., 2016</a>], SimplE&nbsp;[<a href="#ref-Kazemi018">Kazemi and Poole, 2018</a>], and TuckER&nbsp;[<a href="#ref-BalazevicAH19a">Balazevic et al., 2019b</a>] have been proven to be fully expressive given sufficient dimensionality; for example, TuckER&nbsp;[<a href="#ref-BalazevicAH19a">Balazevic et al., 2019b</a>] with dimensions \(d_r = |L|\) and \(d_e = |V|\) trivially satisfies full expressivity since its core tensor \(\mathcal{W}\) then has sufficient capacity to store the full one-hot encoding of any graph. This formal property is useful to show that the model does not have built-in limitations for numerically representing a graph, though of course in practice the dimensions needed to reach full expressivity are often impractical/undesirable.</p>
			<p>We continue by first defining the conventions used in Table&nbsp;<a href="#tab-kges">5.1</a>.</p>
			<ul>
				<li>We use \((\mathbf{x})_{i}\), \((\mathbf{X})_{ij}\), and \((\mathcal{X})_{{i_1}\ldots{i_n}}\) to denote elements of vectors, matrices, and tensors, respectively. If a vector \(\mathbf{x} \in \mathbb{R}^a\) is used in a context that requires a matrix, the vector is interpreted as an \((a, 1)\)-matrix (i.e., a column vector) and can be turned into a row vector (i.e., a \((1,a)\)-matrix) using the transpose operation \(\mathbf{x}^T\). We use \(\mathbf{x}^\mathrm{D} \in \mathbb{R}^{a,a}\) to denote the diagonal matrix with the values of the vector \(\mathbf{x} \in \mathbb{R}^{a}\) on its diagonal. We denote the identity matrix by \(\mathbf{I}\) such that if \(j=k\), then \((\mathbf{I})_{jk} = 1\); otherwise \((\mathbf{I})_{jk} = 0\).</li>
				<li>We denote by \(\begin{bmatrix}\mathbf{X}_1\\[-0.5ex]\vdots\\\mathbf{X_n}\end{bmatrix}\) the vertical stacking of matrices \(\mathbf{X}_1, \ldots, \mathbf{X}_n\) with the same number of columns. Given a vector \(\mathbf{x} \in \mathbb{R}^{ab}\), we denote by \(\mathbf{x}^{[a,b]} \in \mathbb{R}^{a,b}\) the “reshaping” of \(\mathbf{x}\) into an \((a,b)\)-matrix such that \((\mathbf{x}^{[a,b]})_{ij} = (\mathbf{x})_{(i + a(j-1))}\). Conversely, given a matrix \(\mathbf{X} \in \mathbb{R}^{a,b}\), we denote by \(\mathrm{vec}(\mathbf{X}) \in \mathbb{R}^{ab}\) the <em>vectorisation</em> of \(\mathbf{X}\) such that \(\mathrm{vec}(\mathbf{X})_k = (\mathbf{X})_{ij}\) where \(i = ((k-1)\,\mathrm{mod}\,m) + 1\) and \(j = \frac{k - i}{m} + 1\) (observe that \(\mathrm{vec}(\mathbf{x}^{[a,b]}) = \mathbf{x}\)).</li>
				<li>Given a tensor \(\mathcal{X} \in \mathbb{R}^{a,b,c}\), we denote by \(\mathcal{X}^{[i:\cdot:\cdot]} \in \mathbb{R}^{b,c}\), the \(i\)<sup>th</sup> <em>slice</em> of tensor \(\mathcal{X}\) along the first mode; for example, given \(\mathcal{X} \in \mathbb{R}^{5,2,3}\), then \(\mathcal{X}^{[4:\cdot:\cdot]}\) returns the \((2,3)\)-matrix consisting of the elements \(\begin{bmatrix} (\mathcal{X})_{411} & (\mathcal{X})_{412} & (\mathcal{X})_{413} \\ (\mathcal{X})_{421} & (\mathcal{X})_{422} & (\mathcal{X})_{423} \end{bmatrix}\). Analogously, we use \(\mathcal{X}^{[\cdot : i : \cdot]} \in \mathbb{R}^{a,c}\) and \(\mathcal{X}^{[\cdot:\cdot:i]} \in \mathbb{R}^{b,c}\) to indicate the \(i\)<sup>th</sup> slice along the second and third modes of \(\mathcal{X}\), respectively.</li>
				<li>We denote by \(\psi(\mathcal{X})\) the element-wise application of a function \(\psi\) to the tensor \(\mathcal{X}\), such that \((\psi(\mathcal{X}))_{in_1\ldots i_n} = \psi(\mathcal{X}_{i_1\ldots i_n})\). Common choices for \(\psi\) include a sigmoid function (e.g., the logistic function \(\psi(x) = \frac{1}{1 + e^{-x}}\) or the hyperbolic tangent function \(\psi(x) = \mathrm{tanh}\,x = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)), the rectifier (\(\psi(x) = \mathrm{max}(0,x)\)), softplus (\(\psi(x) = \mathrm{ln}(1 + e^x)\)), etc.</li>
			</ul>

			<p>We now define the operators used in Table&nbsp;<a href="#tab-kges">5.1</a>, where the first and most elemental operation we consider is that of matrix multiplication.</p>

			<dl class="definition" id="def-matrix-multiplication">
				<dt>Matrix multiplication</dt>
				<dd>The <em>multiplication of matrices</em> \(\mathbf{X} \in \mathbb{R}^{a,b}\) and \(\mathbf{Y} \in \mathbb{R}^{b,c}\) is a matrix \(\mathbf{XY} \in \mathbb{R}^{a,c}\) such that \((\mathbf{XY})_{ij} = \sum_{k=1}^b (\mathbf{X})_{ik}(\mathbf{Y})_{kj}\). The matrix multiplication of two tensors \(\mathcal{X} \in \mathbb{R}^{a_1,\ldots,a_m,c}\) and \(\mathcal{Y} \in \mathbb{R}^{c,b_1,\ldots,b_n}\) is a tensor \(\mathcal{XY} \in \mathbb{R}^{a_1,\ldots,a_{m},b_{1},\ldots,b_{n}}\) such that (\(\mathcal{XY})_{i_1\ldots i_m i_{m+1}\ldots i_{m+n}} = \sum_{k=1}^c (\mathcal{X})_{i_1\ldots i_m k}(\mathcal{Y})_{k i_{m+1}i_{m+n}}\).</dd>
			</dl>

			<p>For convenience, we may implicitly add or remove modes with dimension 1 for the purposes of matrix multiplication and other operators; for example, given two vectors \(\mathbf{x} \in \mathbb{R}^{a}\) and \(\mathbf{y} \in \mathbb{R}^{a}\), we denote by \(\T{\mathbf{x}}\mathbf{y}\) (aka the dot or inner product) the multiplication of matrix \(\T{\mathbf{x}} \in \mathbb{R}^{1,a}\) with \(\mathbf{y} \in \mathbb{R}^{a,1}\) such that \(\T{\mathbf{x}}\mathbf{y} \in \mathbb{R}^{1,1}\) (i.e., a scalar in \(\mathbb{R}\)); conversely, \(\mathbf{x}\T{\mathbf{y}} \in \mathbb{R}^{a,a}\) (the outer product).</p>
			<p>Constraints on embeddings are sometimes given as norms, defined next.</p>

			<dl class="definition" id="def-lpnorm-lpqnorm">
				<dt>\(L^p\)-norm, \(L^{p,q}\)-norm</dt>
				<dd>For \(p\in \mathbb{R}\), the <em>\(L^p\)-norm</em> of a vector \(\mathbf{x}\in \mathbb{R}^a\) is the scalar \(\|\mathbf{x}\|_p \coloneqq (|(\mathbf{x})_1|^p + \ldots + |(\mathbf{x})_a|^p)^{\frac{1}{p}}\), where \(|(\mathbf{x})_i|\) denotes the absolute value of the \(i\)<sup>th</sup> element of \(\mathbf{x}\). For \(p,q\in \mathbb{R}\), the <em>\(L^{p,q}\)-norm</em> of a matrix \(\mathbf{X}\in\mathbb{R}^{a,b}\) is the scalar \(\|\mathbf{X}\|_{p,q} \coloneqq \left( \sum_{j=1}^b \left( \sum_{i=1}^a |(\mathbf{X})_{ij}|^p \right)^{\frac{q}{p}} \right)^\frac{1}{q}\).</dd>
			</dl>

			<p>The \(L^1\) norm (i.e., \(\|\mathbf{x}\|_1\)) is thus simply the sum of the absolute values of \(\mathbf{x}\), while the \(L^2\) norm (i.e., \(\|\mathbf{x}\|_2\)) is the (Euclidean) length of the vector. The Frobenius norm of the matrix \(\mathbf{X}\) then equates to \(\|\mathbf{X}\|_{2,2} = \left( \sum_{j=1}^b \left( \sum_{i=1}^a |(\mathbf{X})_{ij}|^2 \right) \right)^\frac{1}{2}\); i.e., the square root of the sum of the squares of all elements.</p>
			<p>Another type of product used by embedding techniques is the Hadamard product, which multiplies tensors of the same dimension and computes their product in an element-wise manner.</p>

			<dl class="definition" id="def-hadamard-product">
				<dt>Hadamard product</dt>
				<dd>Given two tensors \(\mathcal{X} \in \mathbb{R}^{a_1,\ldots,a_n}\) and \(\mathcal{Y} \in \mathbb{R}^{a_1,\ldots,a_n}\), the <em>Hadamard product</em> \(\mathcal{X} \odot \mathcal{Y}\) is defined as a tensor in \(\mathbb{R}^{a_1,\ldots,a_n}\), with each element computed as \((\mathcal{X} \odot \mathcal{Y})_{i_1\ldots i_{n}} \coloneqq (\mathcal{X})_{i_1\ldots i_{n}} (\mathcal{Y})_{i_1\ldots i_{n}}\).</dd>
			</dl>

			<p>Other embedding techniques – namely RotatE&nbsp;[<a href="#ref-SunDNT19">Sun et al., 2019</a>] and ComplEx&nbsp;[<a href="#ref-TrouillonWRGB16">Trouillon et al., 2016</a>] – uses <em>complex space</em> based on complex numbers. With a slight abuse of notation, the definitions of vectors, matrices and tensors can be modified by replacing the set of real numbers \(\mathbb{R}\) by the set of complex numbers \(\mathbb{C}\), giving rise to complex vectors, complex matrices, and complex tensors. In this case, we denote by \(\mathrm{Re}(\cdot)\) the real part of a complex number. Given a complex vector \(\mathbf{x} \in \mathbb{C}^I\), we denote by \(\overline{\mathbf{x}}\) its complex conjugate (swapping the sign of the imaginary part of each element). Complex analogues of the aforementioned operators can then be defined by replacing the multiplication and addition of real numbers with the analogous operators for complex numbers, where RotateE&nbsp;[<a href="#ref-SunDNT19">Sun et al., 2019</a>] uses the complex Hadamard product, and ComplEx&nbsp;[<a href="#ref-TrouillonWRGB16">Trouillon et al., 2016</a>] uses complex matrix multiplication.</p>
			<p>One embedding technique – MuRP&nbsp;[<a href="#ref-BalazevicAH19">Balazevic et al., 2019a</a>] – uses hyperbolic space, specifically based on the Poincaré ball. As this is the only embedding we cover that uses this space, and the formalisms are lengthy (covering the Poincaré ball, Möbius addition, Möbius matrix–vector multiplication, logarithmic maps, exponential maps, etc.), we rather refer the reader to the paper for further details&nbsp;[<a href="#ref-BalazevicAH19">Balazevic et al., 2019a</a>].</p>
			<p>As discussed in Section&nbsp;<a href="#ssec-embeddings">5.2</a>, tensor decompositions are used for many embeddings, and at the heart of such decompositions is the tensor product, which is often used to reconstruct (an approximation of) the original tensor.</p>

			<dl class="definition" id="def-tensor-product">
				<dt>Tensor product</dt>
				<dd>Given two tensors \(\mathcal{X} \in \mathbb{R}^{a_1,\ldots,a_m}\) and \(\mathcal{Y} \in \mathbb{R}^{b_1,\ldots,b_n}\), the <em>tensor product</em> \(\mathcal{X} \otimes \mathcal{Y}\) is defined as a tensor in \(\mathbb{R}^{a_1,\ldots,a_m,b_1,\ldots,b_n}\), with each element computed as \((\mathcal{X} \otimes \mathcal{Y})_{i_1\ldots i_{m}j_1\ldots j_n} \coloneqq (\mathcal{X})_{i_1 \ldots i_m} (\mathcal{Y})_{j_1 \ldots j_n}\).<sup class="fnmark" id="fnm24"><a href="#fn24">24</a></sup><span class="footnote" id="fn24"><sup><a href="#fnm24">note 24</a></sup> Please note that “\(\otimes\)” is used here in an unrelated sense to its use in Definition&nbsp;<a href="#def-anndom">3.10</a>.</span></dd>
			</dl>

			<div class="example">
				<p>Assume that \(\mathcal{X} \in \mathbb{R}^{2,3}\) and \(\mathcal{Y} \in \mathbb{R}^{3,4,5}\). Then \(\mathcal{X} \otimes \mathcal{Y}\) will be a tensor in \(\mathbb{R}^{2,3,3,4,5}\). Element \((\mathcal{X} \otimes \mathcal{Y})_{12345}\) will be the product of \((\mathcal{X})_{12}\) and \((\mathcal{Y})_{345}\).</p>
			</div>

			<p>An \(n\)-mode product is used by other embeddings to transform elements along a given mode of a tensor by computing a product with a given matrix along that particular mode of the tensor.</p>

			<dl class="definition" id="def-nmode-product">
				<dt>\(n\)-mode product</dt>
				<dd>For a positive integer \(n\), a tensor \(\mathcal{X} \in \mathbb{R}^{a_1,\ldots,a_{n-1},a_n,a_{n+1},\ldots,a_m}\) and matrix \(\mathbf{Y} \in \mathbb{R}^{b,a_n}\), the <em>\(n\)-mode product</em> of \(\mathcal{X}\) and \(\mathbf{Y}\) is the tensor \(\mathcal{X} \otimes_n \mathbf{Y} \in \mathbb{R}^{a_1,\ldots,a_{n-1},b,a_{n+1},\ldots,a_m}\) such that \((\mathcal{X} \otimes_n \mathbf{Y})_{i_1\ldots i_{n-1}ji_{n+1}\ldots i_m} \coloneqq \sum_{k=1}^{a_n} (\mathcal{X})_{i_1 \ldots i_{n-1}ki_{n+1} \ldots i_m} (\mathbf{Y})_{jk}\).</dd>
			</dl>

			<div class="example">
				<p>Let us assume that \(\mathcal{X} \in \mathbb{R}^{2,3,4}\) and \(\mathbf{Y} \in \mathbb{R}^{5,3}\). The result of \(\mathcal{X} \otimes_2 \mathbf{Y}\) will be a tensor in \(\mathbb{R}^{2,5,4}\), where, for example, \((\mathcal{X} \otimes_2 \mathbf{Y})_{142}\) will be given as \((\mathcal{X})_{112}(\mathbf{Y})_{41} + (\mathcal{X})_{122}(\mathbf{Y})_{42} + (\mathcal{X})_{132}(\mathbf{Y})_{43}\). Observe that if \(\mathbf{y} \in \mathbb{R}^{a_n}\) – i.e., if \(\mathbf{y}\) is a (column) vector – then the \(n\)-mode tensor product \(\mathcal{X} \otimes_n \T{\mathbf{y}}\) “flattens” the \(n\)<sup>th</sup> mode of \(\mathcal{X}\) to one dimension, effectively reducing the order of \(\mathcal{X}\) by one.</p>
			</div>

			<p>One embedding technique – HolE&nbsp;[<a href="#ref-NickelRP16">Nickel et al., 2016b</a>] – uses the circular correlation operator \(\mathbf{x} \star \mathbf{y}\), where each element is the sum of elements along a diagonal of the outer product \(\mathbf{x} \otimes \mathbf{y}\) that “wraps” if not the primary diagonal.</p>

			<dl class="definition" id="def-circular-correlation">
				<dt>Circular correlation</dt>
				<dd>The <em>circular correlation</em> of vector \(\mathbf{x} \in \mathbb{R}^a\) with \(\mathbf{y} \in \mathbb{R}^a\) is the vector \(\mathbf{x} \star \mathbf{y} \in \mathbb{R}^{a}\) such that \((\mathbf{x} \star \mathbf{y})_k \coloneqq \sum_{i=1}^a (\mathbf{x})_i (\mathbf{y})_{(((k+i-2) \,\mathrm{mod}\,a)+1)}\). </dd>
			</dl>

			<div class="example">
				<p>Assuming \(a = 5\), then \((\mathbf{x} \star \mathbf{y})_1 = (\mathbf{x})_1(\mathbf{y})_1 + (\mathbf{x})_2(\mathbf{y})_2 + (\mathbf{x})_3(\mathbf{y})_3 + (\mathbf{x})_4(\mathbf{y})_4 + (\mathbf{x})_5(\mathbf{y})_5\), or a case that wraps: \((\mathbf{x} \star \mathbf{y})_4 = (\mathbf{x})_1(\mathbf{y})_4 + (\mathbf{x})_2(\mathbf{y})_5 + (\mathbf{x})_3(\mathbf{y})_1 + (\mathbf{x})_4(\mathbf{y})_2 + (\mathbf{x})_5(\mathbf{y})_3\).</p>
			</div>

			<p>Finally, a couple of neural models that we include – namely ConvE&nbsp;[<a href="#ref-DettmersMS018">Dettmers et al., 2018</a>] and HypER&nbsp;[<a href="#ref-BalazevicAH19b">Balazevic et al., 2019c</a>] – are based on convolutional architectures using the convolution operator.</p>

			<dl class="definition" id="def-convolution">
				<dt>Convolution</dt>
				<dd>Given two matrices \(\mathbf{X} \in \mathbb{R}^{a,b}\) and \(\mathbf{Y} \in \mathbb{R}^{e,f}\), the <em>convolution</em> of \(\mathbf{X}\) and \(\mathbf{Y}\) is the matrix \(\mathbf{X} * \mathbf{Y} \in \mathbb{R}^{(a + e - 1),(b + f - 1)}\) such that \((\mathbf{X} * \mathbf{Y})_{ij} = \sum_{k=1}^a \sum_{l=1}^b (\mathbf{X})_{kl} (\mathbf{Y})_{(i+k-a)(j+l-b)}\).<sup class="fnmark" id="fnm25"><a href="#fn25">25</a></sup><span class="footnote" id="fn25"><sup><a href="#fnm25">note 25</a></sup> We define the convolution operator per the widely-usedconvention for convolutional neural networks. Strictly speaking, the operator should be called <em>cross-correlation</em>, where traditional convolution requires the matrix \(\mathbf{X}\) to be initially “rotated” by 180°. Since in our settings the matrix \(\mathbf{X}\) is learnt, rather than given, the rotation is redundant, and hence the distinction is not important.</span> In cases where \((i+k-a) &lt; 1\), \((j+l-b) &lt; 1\), \((i+k-a) &gt; e\) or \((j+l-b) &gt; f\) (i.e., where \((\mathbf{Y})_{(i+k-a)(j+l-b)}\) lies outside the bounds of \(\mathbf{Y}\)), we say that \((\mathbf{Y})_{(i+k-a)(j+l-b)} = 0\).</dd>
			</dl>

			<p>Intuitively speaking, the convolution operator overlays \(\mathbf{X}\) in every possible way over \(\mathbf{Y}\) such that at least one pair of elements \((\mathbf{X})_{ij},(\mathbf{Y})_{lk}\) overlaps, summing the products of pairs of overlapping elements to generate an element of the result. Elements of \(\mathbf{X}\) extending beyond \(\mathbf{Y}\) are ignored (equivalently we can consider \(\mathbf{Y}\) to be “zero-padded” outside its borders).</p>
			<div lcass="example">
				<p>Given \(\mathbf{X} \in \mathbb{R}^{3,3}\) and \(\mathbf{Y} \in \mathbb{R}^{4,5}\), then \(\mathbf{X} * \mathbf{Y} \in \mathbb{R}^{6,7}\), where, for example, \((\mathbf{X} * \mathbf{Y})_{11} = (\mathbf{X})_{33}(\mathbf{Y})_{11}\) (with the bottom right corner of \(\mathbf{X}\) overlapping the top left corner of \(\mathbf{Y}\)), while \((\mathbf{X} * \mathbf{Y})_{34} = (\mathbf{X})_{11}(\mathbf{Y})_{12} + \)\( (\mathbf{X})_{12}(\mathbf{Y})_{13} + \)\( (\mathbf{X})_{13}(\mathbf{Y})_{14} + \)\( (\mathbf{X})_{21}(\mathbf{Y})_{22} + \)\( (\mathbf{X})_{22}(\mathbf{Y})_{23} + \)\( (\mathbf{X})_{23}(\mathbf{Y})_{24} + \)\( (\mathbf{X})_{31}(\mathbf{Y})_{32} + \)\( (\mathbf{X})_{32}(\mathbf{Y})_{33} + \)\( (\mathbf{X})_{33}(\mathbf{Y})_{34}\) (with \((\mathbf{X})_{22}\) – the centre of \(\mathbf{X}\) – overlapping \((\mathbf{Y})_{23}\)).<sup class="fnmark" id="fnm26"><a href="#fn26">26</a></sup><span class="footnote" id="fn26"><sup><a href="#fnm26">note 26</a></sup> Models applying convolutions may differ regarding how edge cases are handled, or on the “stride” of the convolution applied, where, for example, a stride of 3 for \((\mathbf{X} * \mathbf{Y})\) would see the kernel \(\mathbf{X}\) centred only on elements \((\mathbf{Y})_{ij}\) such that \(i\,\mathrm{mod}\,3 = 0\) and \(j\,\mathrm{mod}\,3 = 0\), reducing the number of output elements by a factor of 9. We do not consider such details here.</span></p>
			</div>

			<p>In a convolution \(\mathbf{X} * \mathbf{Y}\), the matrix \(\mathbf{X}\) is often called the “kernel” (or “filter”). Often several kernels are used in order to apply multiple convolutions. Given a tensor \(\mathcal{X} \in \mathbb{R}^{c,a,b}\) (representing \(c\) \((a,b)\)-kernels) and a matrix \(\mathbf{Y} \in \mathbb{R}^{e,f}\), we denote by \(\mathcal{X} * \mathbf{Y} \in \mathbb{R}^{c,(a + e - 1),(b + f - 1)}\) the result of the convolutions of the \(c\) first-mode slices of \(\mathcal{X}\) over \(\mathbf{Y}\) such that \((\mathcal{X} * \mathbf{Y})^{[i:\cdot:\cdot]} = \mathcal{X}^{[i:\cdot:\cdot]} * \mathbf{Y}\) for \(1 \leq i \leq c\), yielding a tensor of results for \(c\) convolutions.</p>
		</div>

		<div class="formal">
			<table id="tab-kges" class="condensedTable">
				<caption>Details for selected knowledge graph embeddings, including the plausibility scoring function \(\phi(\varepsilon(s),\rho(p),\varepsilon(o))\) for edge <span class="gnode">\(s\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(o\)</span>, and other conditions</caption>
				<thead>
					<tr>
						<th><strong>Model</strong></th>
						<th>\(\phi(\varepsilon(s),\rho(p),\varepsilon(o))\)}</th>
						<th><strong>Conditions</strong> (for all \(x \in V\), \(y \in L\))</th>
					</tr>
				</thead>
				<tbody>
					<tr>
						<td>TransE</td>
						<td>\(- \|\mathbf{e}_s + \mathbf{r}_p - \mathbf{e}_o\|_q\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(q \in \{1,2\}\), \(\|\mathbf{e}_x\|_2 = 1\)</td>
					</tr>
					<tr>
						<td>TransH</td>
						<td>\(-\|(\mathbf{e}_s - (\T{\mathbf{e}_s}\mathbf{w}_p)\mathbf{w}_p) + \mathbf{r}_p - (\mathbf{e}_o - (\T{\mathbf{e}_o} \mathbf{w}_p)\mathbf{w}_p)\|^{2}_{2}\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{w}_y \in \mathbb{R}^d\),<br />
							\(\|\mathbf{w}_y\|_2 = 1\) , \(\frac{\T{\mathbf{w}_y} \mathbf{r}_y}{\|\mathbf{r}_y\|_2} \approx 0\), \(\|\mathbf{e}_x\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>TransR</td>
						<td>\(-\|\mathbf{W}_p\mathbf{e}_s + \mathbf{r}_p - \mathbf{W}_p\mathbf{e}_o\|^{2}_{2}\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d_e}\), \(\mathbf{r}_y \in \mathbb{R}^{d_r}\), \(\mathbf{W}_y \in \mathbb{R}^{d_r , d_e}\),<br />
							\(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1\), \(\|\mathbf{W}_y\mathbf{e}_x\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>TransD</td>
						<td>\(-\|(\mathbf{w}_p\otimes\mathbf{w}_s + \mathbf{I})\mathbf{e}_s + \mathbf{r}_p - (\mathbf{w}_p\otimes\mathbf{w}_o + \mathbf{I})\mathbf{e}_o\|^{2}_{2}\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d_e}\), \(\mathbf{r}_y \in \mathbb{R}^{d_r}\), \(\mathbf{w}_x \in \mathbb{R}^{d_e}\), \(\mathbf{w}_y \in \mathbb{R}^{d_r}\),<br />
							\(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1\), \(\|(\mathbf{w}_y\otimes\mathbf{w}_x + \mathbf{I})\mathbf{e}_x\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>RotatE</td>
						<td>\(- \|\mathbf{e}_s \odot \mathbf{r}_p - \mathbf{e}_o\|_2\)}</td>
						<td>\(\mathbf{e}_x \in \mathbb{C}^{d}\), \(\mathbf{r}_y \in \mathbb{C}^{d}\), \(\|\mathbf{r}_y\|_2 = 1\)</td>
					</tr>
					<tr>
						<td>RESCAL</td>
						<td>\(\T{\mathbf{e}_s} \mathbf{R}_p \mathbf{e}_o\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{R}_y \in \mathbb{R}^{d,d}\), \(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{R}_y\|_{2,2} \leq 1\)</td>
					</tr>
					<tr>
						<td>DistMult</td>
						<td>\(\T{\mathbf{e}_s} \D{\mathbf{r}_p} \mathbf{e}_o\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\|\mathbf{e}_x\|_2 = 1\), \(\|\mathbf{r}_y\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>HolE</td>
						<td>\(\T{\mathbf{r}_p} (\mathbf{e}_s \star \mathbf{e}_o)\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>ComplEx</td>
						<td>\(\mathrm{Re}(\T{\mathbf{e}_s} \D{\mathbf{r}_p} \overline{\mathbf{e}}_o)\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{C}^{d}\), \(\mathbf{r}_y \in \mathbb{C}^{d}\), \(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>SimplE</td>
						<td>\(\frac{\T{\mathbf{e}_s} \D{\mathbf{r}_p} \mathbf{w}_o + \T{\mathbf{e}_o} \D{\mathbf{w}_p} \mathbf{w}_s}{2}\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{w}_x \in \mathbb{R}^{d}\), \(\mathbf{w}_y \in \mathbb{R}^{d}\),<br />
							\(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{w}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1, \|\mathbf{w}_y\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>TuckER</td>
						<td>\(\mathcal{W} \otimes_1 \T{\mathbf{e}_s} \otimes_2 \T{\mathbf{r}_p} \otimes_3 \T{\mathbf{e}_o}\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d_e}\), \(\mathbf{r}_y \in \mathbb{R}^{d_r}\), \(\mathcal{W} \in \mathbb{R}^{d_e , d_r , d_e }\)</td>
					</tr>
					<tr>
						<td>SME L.</td>
						<td>\(\T{(\mathbf{V}\mathbf{e}_s + \mathbf{V}'\mathbf{r}_p + \mathbf{v})} (\mathbf{W}\mathbf{e}_o + \mathbf{W}'\mathbf{r}_p + \mathbf{w})\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{v} \in \mathbb{R}^w\), \(\mathbf{w} \in \mathbb{R}^w\), \(\|\mathbf{e}_x\|_2 = 1\),<br />
							\(\mathbf{V} \in \mathbb{R}^{w,d},\mathbf{V}' \in \mathbb{R}^{w,d}, \mathbf{W} \in \mathbb{R}^{w,d}, \mathbf{W}' \in \mathbb{R}^{w,d}\)</td>
					</tr>
					<tr>
						<td>SME Bi.</td>
						<td>\(\T{((\mathcal{V} \otimes_3 \T{\mathbf{r}_p}) \mathbf{e}_s + \mathbf{v})}((\mathcal{W} \otimes_3 \T{\mathbf{r}_p}) \mathbf{e}_o + \mathbf{w})\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{v} \in \mathbb{R}^w\), \(\mathbf{w} \in \mathbb{R}^w\), \(\|\mathbf{e}_x\|_2 = 1\),<br />
							\(\mathcal{V} \in \mathbb{R}^{w,d,d}\), \(\mathcal{W} \in \mathbb{R}^{w,d,d}\)</td>
					</tr>
					<tr>
						<td>NTN</td>
						<td>\(\T{\mathbf{r}_p} \psi\left(\T{\mathbf{e}_s} \mathcal{W} \mathbf{e}_o + \mathbf{W} \begin{bmatrix}\mathbf{e}_s\\\mathbf{e}_o\end{bmatrix} + \mathbf{w}\right) \)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{w} \in \mathbb{R}^{w}\), \(\mathbf{W} \in \mathbb{R}^{w , 2d}\),<br />
							\(\mathcal{W} \in \mathbb{R}^{d,w,d}\), \(\|\mathbf{e}_x\|_2 \leq 1\), \(\|\mathbf{r}_y\|_2 \leq 1\),<br />
							\(\|\mathbf{w}\|_2 \leq 1\), \(\|\mathbf{W}\|_{2,2} \leq 1\), \(\|\mathcal{W}^{[\cdot:i:\cdot]}_{1\leq i \leq w}\|_{2,2} \leq 1\)</td>
					</tr>
					<tr>
						<td>MLP</td>
						<td>\(\T{\mathbf{v}} \psi\left(\mathbf{W} \begin{bmatrix}\mathbf{e}_s\\\mathbf{r}_p\\\mathbf{e}_o\end{bmatrix} + \mathbf{w}\right) \)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(\mathbf{v} \in \mathbb{R}^{w}\), \(\mathbf{w} \in \mathbb{R}^{w}\), \(\mathbf{W} \in \mathbb{R}^{w , 3d}\)<br />
							\(\|\mathbf{e}_x\|_2 \leq 1\) \(\|\mathbf{r}_y\|_2 \leq 1\)</td>
					</tr>
					<tr>
						<td>ConvE</td>
						<td>\(\psi\left(\T{\mathrm{vec}\left(\psi\left( \mathcal{W} * \begin{bmatrix}\mathbf{e}_s^{[a, b]}\\\mathbf{r}_p^{[a, b]}\end{bmatrix} \right)\right)} \mathbf{W}\right) \mathbf{e}_o \)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d}\), \(\mathbf{r}_y \in \mathbb{R}^{d}\), \(d = ab\),<br />
							\(\mathbf{W} \in \mathbb{R}^{w_1(w_2 + 2a - 1)(w_3 + b - 1) , d}\), \(\mathcal{W} \in \mathbb{R}^{w_1 , w_2 , w_3}\)</td>
					</tr>
					<tr>
						<td>HypER</td>
						<td>\(\psi\T{\left(\mathrm{vec}\left( \T{\mathbf{r}_p} \mathcal{W} * \mathbf{e}_s \right)} \mathbf{W} \right) \mathbf{e}_o\)</td>
						<td>\(\mathbf{e}_x \in \mathbb{R}^{d_e}\), \(\mathbf{r}_y \in \mathbb{R}^{d_r}\), \(\mathbf{W} \in \mathbb{R}^{w_2(w_1 + d_e - 1) , d_e}\),<br />
							\(\mathcal{W} \in \mathbb{R}^{d_r , w_1 , w_2}\)</td>
					</tr>
				</tbody>
			</table>
		</div>

		<h4 id="sssec-language-models" class="subsection">Language models</h4>
		<p>Embedding techniques were first explored as a way to represent natural language within machine learning frameworks, with word2vec&nbsp;[<a href="#ref-mikolov2013efficient">Mikolov et al., 2013</a>] and GloVe&nbsp;[<a href="#ref-pennington2014glove">Pennington et al., 2014</a>] being two seminal approaches. Both approaches compute embeddings for words based on large corpora of text such that words used in similar contexts (e.g., “<code>frog</code>”, “<code>toad</code>”) have similar vectors. Word2vec uses neural networks trained either to predict the current word from surrounding words (<em>continuous bag of words</em>), or to predict the surrounding words given the current word (<em>continuous skip-gram</em>). GloVe rather applies a regression model over a matrix of co-occurrence probabilities of word pairs. Embeddings generated by both approaches have become widely used in natural language processing tasks.</p>
		<p>Another approach for graph embeddings is thus to leverage proven approaches for language embeddings. However, while a graph consists of an unordered set of sequences of three terms (i.e., a set of edges), text in natural language consists of arbitrary-length sequences of terms (i.e., sentences of words). RDF2Vec&nbsp;[<a href="#ref-ristoski2016rdf2vec">Ristoski and Paulheim, 2016</a>] thus performs (biased&nbsp;[<a href="#ref-cochez2017biased">Cochez et al., 2017a</a>]) random walks on the graph and records the paths (the sequence of nodes and edge labels traversed) as “sentences”, which are then fed as input into the word2vec&nbsp;[<a href="#ref-mikolov2013efficient">Mikolov et al., 2013</a>] model. An example of such a path extracted from Figure&nbsp;<a href="#fig-chileTransport">5.2</a> might be, for example, <span class="gnode">San&nbsp;Pedro</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Calama</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Iquique</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, where the paper experiments with \(500\) paths of length \(8\) per entity. RDF2Vec also proposes a second mode where sequences are generated for nodes from canonically-labelled sub-trees of which they are a root node, where sub-trees of depth \(1\) and \(2\) are used for experiments. KGloVe&nbsp;[<a href="#ref-cochez2017global">Cochez et al., 2017b</a>] is rather based on GloVe. Given that the original GloVe model&nbsp;[<a href="#ref-pennington2014glove">Pennington et al., 2014</a>] considers words that co-occur frequently in windows of text to be more related, KGloVe uses personalised PageRank<sup class="fnmark" id="fnm27"><a href="#fn27">27</a></sup><span class="footnote" id="fn27"><sup><a href="#fnm27">note 27</a></sup> Intuitively speaking, personalised PageRank starts at a given node and then determines the probability of a random walk being at a particular node after a given number of steps. A higher number of steps converges towards standard PageRank emphasising global node centrality in the graph, while a lower number emphasises proximity/relatedness to the starting node.</span> to determine the most related nodes to a given node, which are fed into the GloVe model.</p>

		<h4 id="sssec-entailment-aware-models" class="subsection">Entailment-aware models</h4>
		<p>The embeddings thus far consider the data graph alone. But what if an ontology or set of rules is provided? Such deductive knowledge could be used to improve the embeddings. One approach is to use constraint rules to refine the predictions made by embeddings; for example, <a href="#ref-WangWG15">Wang et al. [2015]</a> use functional and inverse-functional definitions as constraints (under UNA) such that, for example, if we define that an event can have at most one value for <span class="gelab">venue</span>, this is used to lower the plausibility of edges that would assign multiple venues to an event.</p>
		<p>More recent approaches rather propose joint embeddings that consider both the data graph and rules when computing embeddings. KALE&nbsp;[<a href="#ref-GuoWWWG16">Guo et al., 2016</a>] computes entity and relation embeddings using a translational model (specifically TransE) that is adapted to further consider rules using <em>t-norm fuzzy logics</em>. With reference to Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, consider a simple rule <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">connects&nbsp;to</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>. We can use embeddings to assign plausibility scores to new edges, such as \(e_1\): <span class="gnode">Piedras&nbsp;Rojas</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Moon&nbsp;Valley</span>. We can further apply the previous rule to generate a new edge \(e_2\): <span class="gnode">Piedras&nbsp;Rojas</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">connects&nbsp;to</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Moon&nbsp;Valley</span> from the predicted edge \(e_1\). But what plausibility should we assign to this second edge? Letting \(p_1\) and \(p_2\) be the current plausibility scores of \(e_1\) and \(e_2\) (initialised using the standard embedding), then t-norm fuzzy logics suggests that the plausibility be updated as \(p_1p_2 - p_1 + 1\). Embeddings are then trained to jointly assign larger plausibility scores to positive examples versus negative examples of both edges and <em>ground rules</em>. An example of a positive ground rule based on Figure&nbsp;<a href="#fig-chileTransport">5.2</a> would be <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San&nbsp;Pedro</span> \(\Rightarrow\) <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">connects&nbsp;to</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San&nbsp;Pedro</span>. Negative ground rules randomly replace the relation in the head of the rule; for example, <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San&nbsp;Pedro</span> \(\not\Rightarrow\) <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San&nbsp;Pedro</span>. <a href="#ref-GuoWWWG18">Guo et al. [2018]</a> later propose RUGE, which uses a joint model over ground rules (possibly soft rules with confidence scores) and plausibility scores to align both forms of scoring for unseen edges.</p>
		<p>Generating ground rules can be costly. An alternative approach, called FSL&nbsp;[<a href="#ref-DemeesterRR16">Demeester et al., 2016</a>], observes that in the case of a simple rule, such as <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">connects&nbsp;to</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>, the relation embedding <span class="gelab">bus</span> should always return a lower plausibility than <span class="gelab">connects to</span>. Thus, for all such rules, FSL proposes to train relation embeddings while avoiding violations of such inequalities. While relatively straightforward, FSL only supports simple rules, while KALE also supports more complex rules.</p>
		<p>These works exemplify how deductive and inductive forms of knowledge – in this case rules and embeddings – can interplay and complement each other.</p>
		</section>

		<section id="ssec-gnns" class="section">
		<h3>Graph Neural Networks</h3>
		<p>While embeddings aim to provide a dense numerical representation of graphs suitable for use within existing machine learning models, another approach is to build custom machine learning models adapted for graph-structured data. Most custom learning models for graphs are based on (artificial) neural networks&nbsp;[<a href="#ref-abs-1901-00596">Wu et al., 2019</a>], exploiting a natural correspondence between both: a neural network already corresponds to a weighted, directed graph, where nodes serve as artificial neurons, and edges serve as weighted connections (axons). However, the typical topology of a traditional neural network – more specifically, a fully-connected feed-forward neural network – is quite homogeneous, being defined in terms of sequential layers of nodes where each node in one layer is connected to all nodes in the next layer. Conversely, the topology of a data graph is quite heterogeneous, being determined by the relations between entities that its edges represent.</p>
		<p>A <em>graph neural network</em> (GNN)&nbsp;[<a href="#ref-ScarselliGTHM09">Scarselli et al., 2009</a>] builds a neural network based on the topology of the data graph; i.e., nodes are connected to their neighbours per the data graph. Typically a model is then learnt to map input features for nodes to output features in a supervised manner; output features of the example nodes used for training may be manually labelled, or may be taken from the knowledge graph. Unlike knowledge graph embeddings, GNNs support end-to-end supervised learning for specific tasks: given a set of labelled examples, GNNs can be used to classify elements of the graph or the graph itself. GNNs have been used to perform classification over graphs encoding compounds, objects in images, documents, etc.; as well as to predict traffic, build recommender systems, verify software, etc.&nbsp;[<a href="#ref-abs-1901-00596">Wu et al., 2019</a>]. Given labelled examples, GNNs can even replace graph algorithms; for example, GNNs have been used to find central nodes in knowledge graphs in a supervised manner&nbsp;[<a href="#ref-ScarselliGTHM09">Scarselli et al., 2009</a>, <a href="#ref-ParkKDZF19">Park et al., 2019</a>, <a href="#ref-ParkKDZF20">Park et al., 2020</a>].</p>
		<p>We now discuss the ideas underlying two main flavours of GNN, specifically, <em>recursive GNNs</em> and <em>non-recursive GNNs</em>.</p>

		<h4 id="sssec-recursive-gnn" class="subsection">Recursive graph neural networks</h4>
		<p>Recursive graph neural networks (RecGNNs) are the seminal approach to graph neural networks&nbsp;[<a href="#ref-SperdutiS97">Sperduti and Starita, 1997</a>, <a href="#ref-ScarselliGTHM09">Scarselli et al., 2009</a>]. The approach is conceptually similar to the systolic abstraction illustrated in Figure&nbsp;<a href="#fig-pagerank">5.3</a>, where messages are passed between neighbours towards recursively computing some result. However, rather than define the functions used to decide the messages to pass, we rather label the output of a training set of nodes and let the framework learn the functions that generate the expected output, thereafter applying them to label other examples.</p>
		<p>In a seminal paper, <a href="#ref-ScarselliGTHM09">Scarselli et al. [2009]</a> proposed what they generically call a graph neural network (GNN), which takes as input a directed graph where nodes and edges are associated with <em>feature vectors</em> that can capture node and edge labels, weights, etc. These feature vectors remain fixed throughout the process. Each node in the graph is also associated with a <em>state vector</em>, which is recursively updated based on information from the node’s neighbours – i.e., the feature and state vectors of the neighbouring nodes and the feature vectors of the edges extending to/from them – using a parametric function, called the <em>transition function</em>. A second parametric function, called the <em>output function</em>, is used to compute the final output for a node based on its own feature and state vector. These functions are applied recursively up to a fixpoint. Both parametric functions can be implemented using neural networks where, given a partial set of <em>supervised nodes</em> in the graph – i.e., nodes labelled with their desired output – parameters for the transition and output functions can be learnt that best approximate the supervised outputs. The result can thus be seen as a recursive neural network architecture.<sup class="fnmark" id="fnm28"><a href="#fn28">28</a></sup><span class="footnote" id="fn28"><sup><a href="#fnm28">note 28</a></sup> Some authors refer to such architectures as <em>recurrent graph neural networks</em>, observing that the internal state maintained for nodes can be viewed as a form of recurrence over a sequence of transitions.</span> To ensure convergence up to a fixpoint, certain restrictions are applied, namely that the transition function be a <em>contractor</em>, meaning that upon each application of the function, points in the numeric space are brought closer together (intuitively, in this case, the numeric space “shrinks” upon each application, ensuring convergence to a unique fixpoint).</p>
		<p>To illustrate, consider, for example, that we wish to find priority locations for creating new tourist information offices. A good strategy would be to install them in hubs from which many tourists visit popular destinations. Along these lines, in Figure&nbsp;<a href="#fig-gnn">5.7</a> we illustrate the GNN architecture proposed by <a href="#ref-ScarselliGTHM09">Scarselli et al. [2009]</a> for a sub-graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, where we highlight the neighbourhood of <span class="gnode">Punta Arenas</span>. In this graph, nodes are annotated with feature vectors (\(\mathbf{n}_x\)) and hidden states at step \(t\) (\(\mathbf{h}_x^{(t)}\)), while edges are annotated with feature vectors (\(\mathbf{a}_{xy}\)). Feature vectors for nodes may, for example, one-hot encode the type of node (<em>City</em>, <em>Attraction</em>, etc.), directly encode statistics such as the number of tourists visiting per year, etc. Feature vectors for edges may, for example, one-hot encode the edge label (the type of transport), directly encode statistics such as the distance or number of tickets sold per year, etc. Hidden states can be randomly initialised. The right-hand side of Figure&nbsp;<a href="#fig-gnn">5.7</a> provides the GNN transition and output functions, where \(\mathrm{N}(x)\) denotes the neighbouring nodes of \(x\), \(f_{\mathbf{w}}(\cdot)\) denotes the transition function with parameters \(\mathbf{w}\), and \(g_{\mathbf{w}'}(\cdot)\) denotes the output function with parameters \(\mathbf{w'}\). An example is also provided for Punta Arenas (\(x = 1\)). These functions will be recursively applied until a fixpoint is reached. To train the network, we can label examples of places that already have (or should have) tourist offices and places that do (or should) not have tourist offices. These labels may be taken from the knowledge graph, or may be added manually. The GNN can then learn parameters \(\mathbf{w}\) and \(\mathbf{w'}\) that give the expected output for the labelled examples, which can subsequently be used to label other nodes.</p>

		<p>This GNN model is flexible and can be adapted in various ways&nbsp;[<a href="#ref-ScarselliGTHM09">Scarselli et al., 2009</a>]: we may define neighbouring nodes differently, for example to include nodes for outgoing edges, or nodes one or two hops away; we may allow pairs of nodes to be connected by multiple edges with different vectors; we may consider transition and output functions with distinct parameters for each node; we may add states and outputs for edges; we may change the sum to another aggregation function; etc.</p>

		<figure id="fig-gnn">
			<img src="images/fig-gnn.svg" alt="On the left, a sub-graph of Figure&nbsp;24 highlighting the neighbourhood of Punta Arenas, where nodes are annotated with feature vectors (n_x) and hidden states at step t (h_x^{(t)}), and edges are annotated with feature vectors (a_{xy}); on the right, the GNN transition and output functions proposed by Scarselli et al. and an example for Punta Arenas (x = 1), where N(x) denotes the neighbouring nodes of x, f_w(·)\) denotes the transition function with parameters w and g_{w'}(·) denotes the output function with parameters w'" class="multi" />
			<div style="height:1.9em;">&nbsp;</div>
			<table id="tab-gnn">
				<tr>
					<td>\(\mathbf{h}_x^{(t)} \coloneqq\)</td>
					<td>\(\sum_{y \in \textrm{N}(x)} f_\mathbf{w}(\mathbf{n}_{x},\mathbf{n}_{y},\mathbf{a}_{yx},\mathbf{h}_{y}^{(t-1)})\)</td>
				</tr>
				<tr>
					<td>\(\mathbf{o}_x^{(t)} \coloneqq\)</td>
					<td>\(g_{\mathbf{w}'}(\mathbf{h}_x^{(t)},\mathbf{n}_x)\)</td>
				</tr>
				<tr>
					<td>\(\mathbf{h}_1^{(t)} \coloneqq\)</td>
					<td>\(f_\mathbf{w}(\mathbf{n}_{1},\mathbf{n}_{3},\mathbf{a}_{31},\mathbf{h}_{3}^{(t-1)})\)</td>
				</tr>
				<tr>
					<td></td>
					<td>\(+ f_\mathbf{w}(\mathbf{n}_{1},\mathbf{n}_{4},\mathbf{a}_{41},\mathbf{h}_{4}^{(t-1)})\)</td>
				</tr>
				<tr>
					<td>\(\mathbf{o}_1^{(t)} \coloneqq\)</td>
					<td>\(g_{\mathbf{w}'}(\mathbf{h}_1^{(t)},\mathbf{n}_1)\)</td>
				</tr>
				<tr>
					<td>\(\ldots\)</td>
					<td></td>
				</tr>
			</table>
			<div style="height:2.5em;">&nbsp;</div>
			<figcaption>On the left a sub-graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a> highlighting the neighbourhood of Punta Arenas, where nodes are annotated with feature vectors (\(\mathbf{n}_x\)) and hidden states at step \(t\) (\(\mathbf{h}_x^{(t)}\)), and edges are annotated with feature vectors (\(\mathbf{a}_{xy}\)); on the right, the GNN transition and output functions proposed by <a href="#ref-ScarselliGTHM09">Scarselli et al. [2009]</a> and an example for Punta Arenas (\(x = 1\)), where \(\mathrm{N}(x)\) denotes the neighbouring nodes of \(x\), \(f_{\mathbf{w}}(\cdot)\) denotes the transition function with parameters \(\mathbf{w}\) and \(g_{\mathbf{w}'}(\cdot)\) denotes the output function with parameters \(\mathbf{w'}\)</figcaption>
		</figure>

		<div class="formal">
			<p>We now define a recursive graph neural network. We assume that the GNN accepts a directed vector-labelled graph as input (see Definition&nbsp;<a href="#def-dvlg">5.1</a>).</p>

			<dl class="definition" id="def-recursive-graph-neural-network">
				<dt>Recursive graph neural network</dt>
				<dd>A <em>recursive graph neural network</em> (<em>RecGNN</em>) is a pair of functions \(\mathfrak{R} \coloneqq (\)<span class="sc">Agg</span>, <span class="sc">Out</span>\()\), such that (with \(a, b, c \in \mathbb{N}\)):
					<ul>
						<li><span class="sc">Agg</span>\(: \mathbb{R}^a \times 2^{(\mathbb{R}^a \times \mathbb{R}^b) \rightarrow \mathbb{N}} \rightarrow \mathbb{R}^a\)</li>
						<li><span class="sc">Out</span>\(: \mathbb{R}^a \rightarrow \mathbb{R}^c\)</li>
					</ul>
				</dd>
			</dl>

			<p>The function <span class="sc">Agg</span> computes a new feature vector for a node, given its previous feature vector and the feature vectors of the nodes and edges forming its neighbourhood; the function <span class="sc">Out</span> transforms the final feature vector computed by <span class="sc">Agg</span> for a node to the output vector for that node. We assume that \(a\) and \(b\) correspond to the dimensions of the input node and edge vectors, respectively, while \(c\) denotes the dimension of the output vector for each node. Given a RecGNN \(\mathfrak{R} = (\)<span class="sc">Agg</span>, <span class="sc">Out</span>\()\), a directed vector-labelled graph \(G = (V,E,F,\lambda)\), and a node \(u \in V\), we define the output vector assigned to node \(u\) in \(G\) by \(\mathfrak{R}\) (written \(\mathfrak{R}(G,u)\)) as follows. First let \(\mathbf{n}_u^{(0)} \coloneqq \lambda(u)\). For all \(i \geq 1\), let:</p>
			<p class="mathblock">\(\mathbf{n}_u^{(i)} \coloneqq\) <span class="sc">Agg</span> \(\left( \mathbf{n}_u^{(i-1)}, \{\!\!\{ (\mathbf{n}_v^{(i-1)},\lambda(v,u)) \mid (v,u) \in E \}\!\!\} \right) \)</p>
			<p>If \(j \geq 1\) is an integer such that \(\mathbf{n}_u^{(j)} = \mathbf{n}_u^{(j-1)}\) for all \(u \in V\), then \(\mathfrak{R}(G,u) \coloneqq\) <span class="sc">Out</span>\((\mathbf{n}_u^{(j)})\).</p>
			<p>In a RecGNN, the same aggregation function (<span class="sc">Agg</span>) is applied recursively until a fixpoint is reached, at which point an output function (<span class="sc">Out</span>}) creates the final output vector for each node. While in practice RecGNNs will often consider a static feature vector and a dynamic state vector&nbsp;[<a href="#ref-ScarselliGTHM09">Scarselli et al., 2009</a>], we can more concisely encode this as one vector, where part may remain static throughout the aggregation process representing input features, and part may be dynamically computed representing the state. In practice, <span class="sc">Agg</span> and <span class="sc">Out</span> are often based on parametric combinations of vectors, with the parameters learnt based on a sample of output vectors for labelled nodes.</p>

			<div class="example">
				<p>The aggregation function for the GNN of <a href="#ref-ScarselliGTHM09">Scarselli et al. [2009]</a> is given as:</p>
				<p class="mathblock"><span class="sc">Agg</span>\((\mathbf{n}_u,N) \coloneqq \sum_{(\mathbf{n}_v,\mathbf{a}_{vu})\in N}f_{\mathbf{w}}(\mathbf{n}_u,\mathbf{n}_v,\mathbf{a}_{vu})\)</p>
				<p>where \(f_{\mathbf{w}}(\cdot)\) is a contraction function with parameters \(\mathbf{w}\). The output function is defined as:</p>
				<p class="mathblock"><span class="sc">Out</span>\(\left( \mathbf{n}_u \right) \coloneqq g_{\textbf{w}'}(\mathbf{n}_u)\)</p>
				<p>where again \(g_{\mathbf{w}'}(\cdot)\) is a function with parameters \(\mathbf{w'}\). Given a set of nodes labelled with their expected output vectors, the parameters \(\mathbf{w}\) and \(\mathbf{w}'\) are learnt.</p>
			</div>

			<p>There are notable similarities between graph parallel frameworks (GPFs; see Definition&nbsp;<a href="#def-gpf">5.2</a>) and RecGNNs. While we defined GPFs using separate <span class="sc">Msg</span> and <span class="sc">Agg</span> functions, this is not essential: conceptually they could be defined in a similar way to RecGNN, with a single <span class="sc">Agg</span> function that “pulls” information from its neighbours (we maintain <span class="sc">Msg</span> to more closely reflect how GPFs are defined/implemented in practice). The key difference between GPFs and GNNs is that in the former, the functions are defined by the user, while in the latter, the functions are generally learnt from labelled examples. Another difference arises from the termination condition present in GPFs, though often the GPF’s termination condition will – like in RecGNNs – reflect convergence to a fixpoint.</p>
		</div>

		<h4 id="sssec-convolutional-gnn" class="subsection">Non-recursive graph neural networks</h4>
		<p>GNNs can also be defined in a non-recursive manner, where a fixed number of layers are applied over the input in order to generate the output. A benefit of this approach is that we do not need to worry about convergence since the process is non-recursive. Also, each layer will often have independent parameters, representing different transformation steps. Naively, a downside is that adding many layers could give rise to a high number of parameters. Addressing this problem, a popular approach for non-recursive GNNs is to use convolutional neural networks.</p>
		<p>Convolutional neural networks (CNNs) have gained a lot of attention, in particular, for machine learning tasks involving images&nbsp;[<a href="#ref-KrizhevskySH17">Krizhevsky et al., 2017</a>]. The core idea in the image setting is to train and apply small kernels (aka filters) over localised regions of an image using a convolution operator to extract features from that local region. When applied to all local regions, the convolution outputs a feature map of the image. Since the kernels are small, and are applied multiple times to different regions of the input, the number of parameters to train is reduced. Typically multiple kernels can thus be applied, forming multiple convolutional layers.</p>
		<p>One may note that in GNNs and CNNs, operators are applied over local regions of the input data. In the case of GNNs, the transition function is applied over a node and its neighbours in the graph. In the case of CNNs, the convolution is applied on a pixel and its neighbours in the image. Following this intuition, a number of <em>convolutional graph neural networks</em> (<em>ConvGNNs</em>)&nbsp;[<a href="#ref-BrunaZSL13">Bruna et al., 2014</a>, <a href="#ref-KipfW17">Kipf and Welling, 2017</a>, <a href="#ref-abs-1901-00596">Wu et al., 2019</a>] have been proposed, where the transition function is implemented by means of convolutions. A key consideration for ConvGNNs is how regions of a graph are defined. Unlike the pixels of an image, nodes in a graph may have varying numbers of neighbours. This creates a challenge: a benefit of CNNs is that the same kernel can be applied over all the regions of an image, but this requires more careful consideration in the case of ConvGNNs since neighbourhoods of different nodes can be diverse. Approaches to address these challenges involve working with spectral (e.g.&nbsp;[<a href="#ref-BrunaZSL13">Bruna et al., 2014</a>, <a href="#ref-KipfW17">Kipf and Welling, 2017</a>]) or spatial (e.g.,&nbsp;[<a href="#ref-MontiBMRSB17">Monti et al., 2017</a>]) representations of graphs that induce a more regular structure from the graph. An alternative is to use an attention mechanism&nbsp;[<a href="#ref-VelickovicCCRLB18">Velickovic et al., 2018</a>] to <em>learn</em> the nodes whose features are most important to the current node.</p>
		
		<div class="formal">
			<p>Next we abstractly define a non-recursive graph neural network.</p>

			<dl class="definition" id="def-non-recursive-graph-neural-network">
				<dt>Non-recursive graph neural network</dt>
				<dd>A <em>non-recursive graph neural network</em> (NRecGNN) with \(l\) layers is an \(l\)-tuple of functions <span class="nobreak">\(\mathfrak{N} \coloneqq (\)<span class="sc">Agg</span>\(^{(1)},\ldots,\) <span class="sc">Agg</span>\(^{(l)} )\),</span> such that, for \(1 \leq k \leq l\) (with \(a_0, \ldots a_l, b \in \mathbb{N}\)), <span class="nobreak"><span class="sc">Agg</span>\(^{(k)}: \mathbb{R}^{a_{k-1}} \times 2^{(\mathbb{R}^{a_{k-1}} \times \mathbb{R}^b) \rightarrow \mathbb{N}} \rightarrow \mathbb{R}^{a_{k}}\)</span>.</dd>
			</dl>

			<p>Each function <span class="sc">Agg</span>\(^{(k)}\) (as before) computes a new feature vector for a node, given its previous feature vector and the feature vectors of the nodes and edges forming its neighbourhood. We assume that \(a_0\) and \(b\) correspond to the dimensions of the input node and edge vectors, respectively, where each function <span class="sc">Agg</span>\(^{(k)}\) for \(2 \leq k \leq l\) accepts as input node vectors of the same dimension as the output of the function <span class="sc">Agg</span>\(^{(k-1)}\). Given an NRecGNN \(\mathfrak{N} = (\) <span class="sc">Agg</span>\(^{(1)},\ldots,\) <span class="sc">Agg</span>\(^{(l)} )\), a directed vector-labelled graph \(G = (V,E,F,\lambda)\), and a node \(u \in V\), we define the output vector assigned to node \(u\) in \(G\) by \(\mathfrak{N}\) (written \(\mathfrak{N}(G,u)\)) as follows. First let \(\mathbf{n}_u^{(0)} \coloneqq \lambda(u)\). For all \(i \geq 1\), let:</p>
			<p class="mathblock">\(\mathbf{n}_u^{(i)} \coloneqq\) <span class="sc">Agg</span>\(^{(i)} \left( \mathbf{n}_u^{(i-1)}, \{\!\!\{ (\mathbf{n}_v^{(i-1)},\lambda(v,u)) \mid (v,u) \in E \}\!\!\} \right) \)</p>
			<p>Then \(\mathfrak{N}(G,u) \coloneqq \mathbf{n}_u^{(l)}\).</p>
			<p>In an \(l\)-layer NRecGNN, a different aggregation function can be applied at each step (i.e., in each layer), up to a fixed number of steps \(l\). We do not consider a separate <span class="sc">Out</span> function as it can be combined with the final aggregation function <span class="sc">Agg</span>\(^{(l)}\). When the aggregation functions use a convolutional operator based on kernels learned from labelled examples, we call the result a <em>convolutional graph neural network</em> (<em>ConvGNN</em>). We refer to the survey by <a href="#ref-abs-1901-00596">Wu et al. [2019]</a> for discussion of ConvGNNs proposed in the literature.</p>
			<p>We have considered GNNs that define the neighbourhood of a node based on its incoming edges. These definitions can be adapted to also consider outgoing neighbours by either adding inverse edges to the directed vector-labelled graph in pre-processing, or by adding outgoing neighbours as arguments to the <span class="sc">Agg</span>\((\cdot)\) function. More generally, GNNs (and indeed GPFs) relying solely on the neighbourhood of each node have limited expressivity in terms of their ability to distinguish nodes and graphs&nbsp;[<a href="#ref-XuHLJ19">Xu et al., 2019</a>]; for example, <a href="#ref-BarceloKMPRS20">Barceló et al. [2020]</a> show that such NRecGNNs have a similar expressiveness for classifying nodes as the \(\mathcal{ALCQ}\) Description Logic discussed in Section&nbsp;<a href="#sssec-dls">4.3.2</a>. More expressive GNN variants have been proposed that allow the aggregation functions to access and update a globally shared vector&nbsp;[<a href="#ref-BarceloKMPRS20">Barceló et al., 2020</a>]. We refer to the papers by <a href="#ref-XuHLJ19">Xu et al. [2019]</a> and <a href="#ref-BarceloKMPRS20">Barceló et al. [2020]</a> for further discussion.</p>
		</div>
		</section>

		<section id="ssec-symlearn" class="section">
		<h3>Symbolic Learning</h3>
		<p>The supervised techniques discussed thus far – namely knowledge graph embeddings and graph neural networks – learn numerical models over graphs. However, such models are often difficult to explain or understand. For example, taking the graph of Figure&nbsp;<a href="#fig-airports">5.8</a>, knowledge graph embeddings might predict the edge <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> as being highly plausible, but they will not provide an interpretable model to help understand why this is the case: the reason for the result may lie in a matrix of parameters learnt to fit a plausibility score on training data. Such approaches also suffer from the <em>out-of-vocabulary</em> problem, where they are unable to provide results for edges involving previously unseen nodes or edges; for example, if we add an edge <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">CDG</span>, where <span class="gnode">CDG</span> is new to the graph, a knowledge graph embedding will not have the entity embedding for <span class="gnode">CDG</span> and would need to be retrained in order to estimate the plausibility of an edge <span class="gnode">CDG</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">SCL</span>.</p>

		<figure id="fig-airports">
			<img src="images/fig-airports.svg" alt="An incomplete directed edge-labelled graph describing flights between airports"/>
			<figcaption>A directed edge-labelled graph describing flights between airports</figcaption>
		</figure>

		<p>An alternative (sometimes complementary) approach is to adopt <em>symbolic learning</em> in order to learn <em>hypotheses</em> in a symbolic (logical) language that “explain” a given set of positive and negative edges. These edges are typically generated from the knowledge graph in an automatic manner (similar to the case of knowledge graph embeddings). The hypotheses then serve as interpretable models that can be used for further deductive reasoning. Given the graph of Figure&nbsp;<a href="#fig-airports">5.8</a>, we may, for example, learn the rule <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> from observing that <span class="gelab">flight</span> routes tend to be return routes. Alternatively, rather than learn rules, we might learn a DL axiom from the graph stating that airports are either domestic, international, or both: <code>Airport</code> \(\sqsubseteq\) <code>DomesticAirport</code> \(\sqcup\) <code>InternationalAirport</code>. Such rules and axioms can then be used for deductive reasoning, and offer an interpretable model for new knowledge that is entailed/predicted; for example, from the aforementioned rule for return flights, one can interpret why a novel edge <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> is predicted. This further offers domain experts the opportunity to verify the models – e.g., the rules and axioms – derived by such processes. Finally, rules/axioms are quantified (<em>all</em> flights have a return flight, <em>all</em> airports are domestic or international, etc.), so they can be applied to unseen examples (e.g., with the aforementioned rule, we can derive <span class="gnode">CDG</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">SCL</span> from a new edge <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">CDG</span> with the unseen node <span class="gnode">CDG</span>).</p>
		<p>In this section, we discuss two forms of symbolic learning: <em>rule mining</em>, which learns rules, and <em>axiom mining</em>, which learns other forms of logical axioms.</p>

		<h4 id="sssec-rule-mining" class="subsection">Rule mining</h4>
		<p>Rule mining, in the general sense, refers to discovering meaningful patterns in the form of rules from large collections of background knowledge. In the context of knowledge graphs, we assume a set of positive and negative edges as given. Typically positive edges are observed edges (i.e., those given or entailed by a knowledge graph) while negative edges are defined according to a given assumption of completeness (discussed later). The goal of rule mining is to identify new rules that entail a high ratio of positive edges from other positive edges, but entail a low ratio of negative edges from positive edges. The types of rules considered may vary from more simple cases, such as <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> mentioned previously, to more complex rules, such as <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">capital</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">nearby</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Airport</span> \(\Rightarrow\) <span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">International&nbsp;Airport</span>, based on observing in the graph that airports near capitals tend to be international airports; or <img class="inside" src="images/fig-inline-rule.svg" alt="dom flight rule premise" style="margin-top:.5em;margin-bottom:.2em;margin-left:1.8em;margin-right:1.8em;vertical-align:middle;position:relative;"/> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>, indicating that flights within the same country denote domestic flights (as seen previously in Section&nbsp;<a href="#sec-rules">4.3.1</a>).</p>
		<p>Per the example inferring that airports near capital cities are international airports, rules are not assumed to hold in all cases, but rather are associated with measures of how well they conform to the positive and negative edges. In more detail, we call the edges entailed by a rule and the set of positive edges (not including the entailed edge itself), the <em>positive entailments</em> of that rule. The number of entailments that are positive is called the <em>support</em> for the rule, while the ratio of a rule’s entailments that are positive is called the <em>confidence</em> for the rule&nbsp;[<a href="#ref-SuchanekLBW19">Suchanek et al., 2019</a>]. Support and confidence indicate, respectively, the number and ratio of entailments “confirmed” to be true for the rule, where the goal is to identify rules that have both high support and high confidence. Techniques for rule mining in relational settings have long been explored in the context of <em>Inductive Logic Programming</em> (<em>ILP</em>)&nbsp;[<a href="#ref-DeRaedt08">De Raedt, 2008</a>]. However, knowledge graphs present novel challenges due to the scale of the data and the frequent assumption of incomplete data (OWA), where dedicated techniques have been proposed to address these issues&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>].</p>
		<p>When dealing with an incomplete knowledge graph, it is not immediately clear how to define negative edges. A common heuristic – also used for knowledge graph embeddings – is to adopt a Partial Completeness Assumption (PCA)&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>], which considers the set of positive edges to be those contained in the data graph, and the set of negative examples to be the set of all edges <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y'\)</span> not in the graph but where there exists a node <span class="gnode">\(y\)</span> such that <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">\(p\)</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y'\)</span> is in the graph. Taking Figure&nbsp;<a href="#fig-airports">5.8</a>, an example of a negative edge under PCA would be <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> (given the presence of <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">LIM</span>); conversely, <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> is neither positive nor negative. The PCA confidence measure is then the ratio of the support to the number of entailments in the positive or negative set&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>]. For example, the support for the rule <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> is \(2\) (since it entails <span class="gnode">IQQ</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> and <span class="gnode">ARI</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">IQQ</span> in the graph, which are thus positive edges), while the confidence is \(\frac{2}{2} = 1\) (noting that <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span>, though entailed, is neither positive nor negative, and is thus ignored by the measure). The support for the rule <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> is analogously 4, while the confidence is \(\frac{4}{5} = 0.8\) (noting that <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span> is a negative edge).</p>
		<p>The goal then, is to find rules satisfying given support and confidence thresholds. An influential rule-mining system for graphs is AMIE&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>, <a href="#ref-GalarragaTHS15">Galárraga et al., 2015</a>], which adopts the PCA measure of confidence, and builds rules in a top-down fashion&nbsp;[<a href="#ref-SuchanekLBW19">Suchanek et al., 2019</a>] starting with rule heads of the form \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>. For each such rule head (one for each edge label), three types of <em>refinements</em> are considered, each of which adds a new edge to the body of the rule. This new edge takes an edge label from the graph and may otherwise use <em>fresh variables</em> not appearing previously in the rule, <em>existing variables</em> that already appear in the rule, or nodes from the graph. The three refinements may then:</p>
		<ol>
			<li>add an edge with one existing variable and one fresh variable; for example, refining the aforementioned rule head might give: <span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>;</li>
			<li>add an edge with an existing variable and a graph node; for example, refining the above rule might give: <span class="gnode">Domestic&nbsp;Airport</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">type</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>;</li>
			<li>add an edge with two existing variables; for example, refining the above rule might give: <img class="inside" src="images/rule-mining-domairport.svg" alt="dom airport rule premise" style="margin-right:2.1em;"/> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span>.</li>
		</ol>
		<p>These refinements can be combined arbitrarily, which gives rise to a potentially exponential search space, where rules meeting given thresholds for support and confidence are maintained. To improve efficiency, the search space can be pruned; for example, these three refinements always decrease support, so if a rule does not meet the support threshold, there is no need to explore its refinements. Further restrictions are imposed on the types of rules generated. First, only rules up to a certain fixed size are considered. Second, a rule must be <em>closed</em>, meaning that each variable appears in at least two edges of the rule, which ensures that rules are <em>safe</em>, meaning that each variable in the head appears in the body; for example, the rules produced by the first and second refinements in the example are neither closed (variable <span class="gvar">y</span> appears once) nor safe (variable <span class="gvar">y</span> appears only in the head).<sup class="fnmark" id="fnm29"><a href="#fn29">29</a></sup><span class="footnote" id="fn29"><sup><a href="#fnm29">note 29</a></sup> Safe rules like <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">capital</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">nearby</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Airport</span> \(\Rightarrow\) <span class="gvar">?z</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">International&nbsp;Airport</span> are not closed as <span class="gvar">?x</span> appears only in one edge. The condition that rules are closed is strictly stronger than the safety condition.</span> The third refinement is thus applied until a rule is closed. For further discussion of possible optimisations based on pruning and indexing, we refer to the paper&nbsp;[<a href="#ref-GalarragaTHS15">Galárraga et al., 2015</a>].</p>
		<p>Later works have built on these techniques for mining rules from knowledge graphs. <a href="#ref-Gad-ElrabSUW16">Gad-Elrab et al. [2016]</a> propose a method to learn non-monotonic rules – rules with negated edges in the body – in order to capture exceptions to base rules; for example, the rule <img class="inside" src="images/rule-mining-not-international.svg" alt="not international rule premise" style="margin-right:2.1em;" /> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> may be learnt, indicating that flights are within the same country <em>except</em> when the (departure) airport is international, where the exception is shown dotted and we use \(\neg\) to negate an edge (representing an exception). The RuLES system&nbsp;[<a href="#ref-HoSGKW18">Ho et al., 2018</a>] – which is also capable of learning non-monotonic rules – proposes to mitigate the limitations of the PCA heuristic by extending the confidence measure to consider the plausibility scores of knowledge graph embeddings for entailed edges not appearing in the graph. Where available, explicit statements about the completeness of the knowledge graph (such as expressed in shapes; see Section&nbsp;<a href="#sssec-validating-schema">3.1.2</a>) can be used in lieu of PCA for identifying negative edges. Along these lines, CARL&nbsp;[<a href="#ref-TanonSRMW17">Pellissier Tanon et al., 2017</a>] exploits additional knowledge about the cardinalities of relations to refine the set of negative examples and the confidence measure for candidate rules. Alternatively, where available, ontologies can be used to derive logically-certain negative edges under OWA through, for example, disjointness axioms. The system proposed by d’Amato et al.&nbsp;[<a href="#ref-dAmatoTM16">d'Amato et al., 2016b</a>, <a href="#ref-dAmatoSTMG16">d'Amato et al., 2016a</a>] leverages ontologically-entailed negative edges for determining the confidence of rules generated through an evolutionary algorithm.</p>
		<p>While the previous works involve discrete expansions of candidate rules for which a fixed confidence scoring function is applied, another line of research is on a technique called <em>differentiable rule mining</em>&nbsp;[<a href="#ref-Rocktaschel017">Rocktäschel and Riedel, 2017</a>, <a href="#ref-YangYC17">Yang et al., 2017</a>, <a href="#ref-SadeghianADW19">Sadeghian et al., 2019</a>], which allows end-to-end learning of rules. The core idea is that the joins in rule bodies can be represented as matrix multiplication. More specifically, we can represent the relations of an edge label \(p\) by the adjacency matrix \(\mathbf{A}_p\) (of size \(|V| \times |V|\)) such that the value on the \(i\)<sup>th</sup> row of the \(j\)<sup>th</sup> column is \(1\) if there is an edge labelled \(p\) from the \(i\)<sup>th</sup> entity to the \(j\)<sup>th</sup> entity; otherwise the value is \(0\). Now we can represent a join in a rule body as matrix multiplication; for example, given <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic&nbsp;flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">country</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gvar">?z</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?z</span>, we can denote the body by the matrix multiplication \(\mathbf{A}\)<sub><span class="gelab">df.</span></sub>\(\mathbf{A}\)<sub><span class="gelab">c.</span></sub>, which gives an adjacency matrix representing entailed <span class="gelab">country</span> edges, where we should expect the \(1\)’s in \(\mathbf{A}\)<sub><span class="gelab">df.</span></sub>\(\mathbf{A}\)<sub><span class="gelab">c.</span></sub> to be covered by the head’s adjacency matrix \(\mathbf{A}\)<sub><span class="gelab">c.</span></sub>. Since we are given adjacency matrices for all edge labels, we are left to learn confidence scores for individual rules, and to learn rules (of varying length) with a threshold confidence. Along these lines, NeuralLP&nbsp;[<a href="#ref-YangYC17">Yang et al., 2017</a>] uses an <em>attention mechanism</em> to select a variable-length sequence of edge labels for path-like rules of the form <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p<sub>\(1\)</sub></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">y<sub>\(1\)</sub></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p<sub>\(2\)</sub></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/>…<img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p<sub>\(n\)</sub></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">y<sub>\(n\)</sub></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p<sub>\(n+1\)</sub></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?z</span> \(\Rightarrow\) <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">p</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?z</span>, for which confidences are likewise learnt. DRUM&nbsp;[<a href="#ref-SadeghianADW19">Sadeghian et al., 2019</a>] also learns path-like rules, where, observing that some edge labels are more/less likely to follow others in the rules – for example, <span class="gelab">flight</span> will not be followed by <span class="gelab">capital</span> in the graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a> as the join will be empty – the system uses bidirectional recurrent neural networks (a popular technique for learning over sequential data) to learn sequences of relations for rules, and their confidences. These differentiable rule mining techniques are, however, currently limited to learning path-like rules.</p>

		<h4 id="sssec-axiom-mining" class="subsection">Axiom mining</h4>
		<p>More general forms of axioms beyond rules – expressed in logical languages such as DLs (see Section&nbsp;<a href="#sssec-dls">4.3.2</a>) – can be mined from knowledge graphs. We can divide these approaches into two: those mining specific axioms and more general axioms.</p>
		<p>Among systems mining specific types of axioms, disjointness axioms are a popular target; for example, <code>DomesticAirport</code> \(\sqcap\) <code>InternationalAirport</code> \(\equiv \bot\) states that the two classes are disjoint by equivalently stating that the intersection of the two classes is equivalent to the empty class, or in simpler terms, no node can be simultaneously of type <span class="gnode">Domestic Airport</span> and <span class="gnode">International Airport</span>. The system proposed by <a href="#ref-Volker2015">Völker et al. [2015]</a> extracts disjointness axioms based on (negative) <em>association rule mining</em>&nbsp;[<a href="#ref-Agrawal93">Agrawal et al., 1993</a>], which finds pairs of classes where each has many instances in the knowledge graph but there are relatively few (or no) instances of both classes. <a href="#ref-TopperKS12">Töpper et al. [2012]</a> rather extract disjointness for pairs of classes that have a cosine similarity below a fixed threshold. For computing this cosine similarity, class vectors are computed using a TF–IDF analogy, where the “document” of each class is constructed from all of its instances, and the “terms” of this document are the properties used on the class instances (preserving multiplicities). While the previous two approaches find disjointness constraints between named classes (e.g., <em>city</em> is disjoint with <em>airport</em>), <a href="#ref-Rizzo2017">Rizzo et al. [2017]</a>, <a href="#ref-RizzodF21">Rizzo et al. [2021]</a> propose an approach that can capture disjointness constraints between class descriptions (e.g., <em>city without an airport nearby</em> is disjoint with <em>city that is the capital of a country</em>). The approach first clusters similar nodes of the knowledge base. Next, a <em>terminological cluster tree</em> is extracted, where each leaf node indicates a cluster extracted previously, and each internal (non-leaf) node is a class definition (e.g., <em>cities</em>) where the left child is either a cluster having all nodes in that class or a sub-class description (e.g., <em>cities without airports</em>) and the right child is either a cluster having no nodes in that class or a disjoint-class description (e.g., <em>non-cities with events</em>). Finally, candidate disjointness axioms are proposed for pairs of class descriptions in the tree that are not entailed to have a sub-class relation.</p>
		<p>Other systems propose methods to learn more general axioms. One of the first proposals in this direction is the DL-FOIL system&nbsp;[<a href="#ref-FanizzidE08">Fanizzi et al., 2008</a>, <a href="#ref-RizzoFd20">Rizzo et al., 2020</a>], which is based on algorithms for <em>class learning</em> (aka <em>concept learning</em>), whereby given a set of positive nodes and negative nodes, the goal is to find a logical class description that divides the positive and negative sets. For example, given \(\{\)<span class="gnode">Iquique</span>, <span class="gnode">Arica</span>\(\}\) as the positive set and \(\{\)<span class="gnode">Santiago</span>\(\}\) as the negative set, we may learn a (DL) class description \(\exists\)<code>nearby</code>.<code>Airport</code> \(\sqcap \neg(\exists\) <code>capital</code>\(^-.\top)\), denoting entities near to an airport that are not capitals, of which all positive nodes are instances and no negative nodes are instances. Such class descriptions are learnt in an analogous manner to how aforementioned systems like AMIE learn rules, with a refinement operator used to move from more general classes to more specific classes (and vice-versa), a confidence scoring function, and a search strategy. Another prominent such system is DL-Learner&nbsp;[<a href="#ref-BuhmannLW16">Bühmann et al., 2016</a>], which system further supports learning more general axioms through a scoring function that uses count queries to determine what ratio of expected edges – edges that would be entailed were the axiom true – are indeed found in the graph; for example, to score the axiom \(\exists\)<code>flight</code>\(^{-}\).<code>DomesticAirport</code> \(\sqsubseteq\) <code>InternationalAirport</code> over Figure&nbsp;<a href="#fig-airports">5.8</a>, we can use a graph query to count how many nodes have incoming flights from a domestic airport (there are \(3\), and how many nodes have incoming flights from a domestic airport <em>and</em> are international airports (there is \(1\), where the greater the difference between both counts, the weaker the evidence for the axiom.</p>

		<h4 id="sssec-hypothesis-mining" class="subsection">Hypothesis mining</h4>
		<p>We now provide some abstract formal definitions for the tasks of <em>rule mining</em> and <em>axiom mining</em> over graphs, which we generically refer to as <em>hypothesis mining</em>.</p>

		<div class="formal">
			<p>First we introduce <em>hypothesis induction</em>: a task that captures a more abstract (ideal) case for hypothesis mining. For simplicity, we focus on directed edge-labelled graphs. With a slight abuse of notation, we may interpret a set of edges \(E\) as the graph with precisely those edges and with no nodes or labels without edges. We may also interpret an edge \(e\) as the graph formed by \(\{ e \}\).</p>

			<dl class="definition" id="def-hypothesis-induction">
				<dt>Hypothesis induction</dt>
				<dd>The task of <em>hypothesis induction</em> assumes a particular graph entailment relation \(\models_\Phi\) (see Definition&nbsp;<a href="#def-ent">4.4</a>; hereafter simply \(\models\)). Given <em>background knowledge</em> in the form of a knowledge graph \(G\) (a directed edge-labelled graph, possibly extended with rules or ontologies), a set of <em>positive edges</em> \(E^{+}\) such that \(G\) does not entail any edge in \(E^{+}\) (i.e., for all \(e^{+} \in E^{+}\), \(G \not\models e^{+}\)) and \(E^{+}\) does not contradict \(G\) (i.e., there is a model of \(G \cup E^{+}\)), and a set of <em>negative edges</em> \(E^{-}\) such that \(G\) does not entail any edge in \(E^-\) (i.e., for all \(e^{-} \in E^{-}\), \(G \not\models e^{-}\)), the task is to find a set of <em>hypotheses</em> (i.e., a set of directed edge-labelled graphs) \(\Psi\) such that:
					<ul>
						<li>\(G \not\models \psi\) for all \(\psi \in \Psi\) (the background knowledge does not entail any hypothesis directly);</li>
						<li>\(G \cup \Psi^* \models E^{+}\) (the background knowledge and hypotheses together entail all positive edges);</li>
						<li>for all \(e^{-} \in E^{-}\), \(G \cup \Psi^* \not\models e^{-}\) (the background knowledge and hypotheses together do not entail any negative edge);</li>
						<li>\(G \cup \Psi^* \cup E^{+}\) has a model (the background knowledge, hypotheses and positive edges taken together do not contain a contradiction);</li>
						<li>for all \(e^{+} \in E^{+}\), \(\Psi^* \not\models e^{+}\) (the hypotheses alone do not entail a positive edge).</li>
					</ul>
				where by \(\Psi^* \coloneqq \cup_{\psi \in \Psi} \psi\) we denote the union of all graphs in \(\Psi\).</dd>
			</dl>

			<div class="example">
				<p>Let us assume ontological entailment \(\models\) with semantic conditions \(\Phi\) as defined in Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a>. Given the graph of Figure&nbsp;<a href="#fig-airports">5.8</a> as the background knowledge \(G\), along with:</p>
				<ul>
					<li>a set of positive edges \(E^{+} = \{ \)<span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span>, <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">ARI</span>\( \}\), and</li>
					<li>a set of negative edges \(E^{-} = \{ \)<span class="gnode">ARI</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">LIM</span>, <span class="gnode">SCL</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">domestic flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">LIM</span>\( \}\),</li>
				</ul>
				<p>then a set of hypotheses \(\Psi = \{ \)<span class="gnode">flight</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span>, <span class="gnode">domestic flight</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Symmetric</span>\( \}\) are not entailed by \(G\), entail all positive edges in \(E^{+}\) and no negative edges in \(E^{-}\) when combined with \(G\), do not contradict \(G \cup E^{+}\), and do not entail a positive edge without \(G\). Thus \(\Psi\) satisfies the conditions for hypothesis induction.</p>
			</div>

			<p>This task represents a somewhat idealised case. Often there is no set of positive edges distinct from the background knowledge itself. Furthermore, hypotheses not entailing a few positive edges, or entailing a few negative edges, may still be useful. The task of <em>hypothesis mining</em> rather accepts as input the background knowledge \(G\) and a set of negative edges \(E^{-}\) (such that for all \(e^{-} \in E^{-}\), \(G \not\models e^{-}\)), and attempts to <em>score</em> individual hypotheses \(\psi\) (such that \(G \not\models \psi\)) per their ability to “explain” \(G\) while minimising the number of elements of \(E^{-}\) entailed by \(G\) and \(\psi\). We can now abstractly define the task of hypothesis mining.</p>

			<dl class="definition" id="def-hypothesis-mining">
				<dt>Hypothesis mining</dt>
				<dd>Given a knowledge graph \(G\), a set of negative edges \(E^{-}\), a scoring function \(\sigma\), and a threshold \(\textsf{min}_{\sigma}\), the goal of <em>hypothesis mining</em> is to identify a set of hypotheses \(\{ \psi \mid G \not\models \psi\text{ and }\sigma(\psi,G,E^{-}) \geq \textsf{min}_{\sigma} \}\).</dd>
			</dl>

			<p>There are two scoring functions that are frequently used for \(\sigma\) in the literature: <em>support</em> and <em>confidence</em>.</p>

			<dl class="definition" id="def-hypothesis-support-and-confidence">
				<dt>Hypothesis support and confidence</dt>
				<dd>Given a knowledge graph \(G = (V,E,L)\) and a hypothesis \(\psi\), the <em>positive support</em> of \(\psi\) is defined as:
				\[ \sigma^{+}(\psi,G) \coloneqq |\{ e \in E \mid G' \not\models e \text{ and }G' \cup \psi \models e \}| \] 
				where \(G'\) denotes \(G\) with the edge \(e\) removed. Further given a set of negative edges \(E^{-}\), the <em>negative support</em> of \(\psi\) is defined as:
				\[ \sigma^{-}(\psi,G,E^{-}) \coloneqq |\{ e^{-} \in E^{-} \mid G \cup \psi \models e^{-} \}| \] 
				Finally, the <em>confidence</em> of \(\psi\) is defined as \(\sigma^\pm(\psi,G,E^{-}) \coloneqq \frac{\sigma^{+}(\psi,G)}{\sigma^{+}(\psi,G) + \sigma^{-}(\psi,G,E^{-})}\).</dd>
			</dl>

			<p>We have yet to define how the set of negative edges are defined, which, in the context of a knowledge graph \(G\), depends on which assumption is applied:</p>
			<ul>
				<li><em>Closed world assumption (CWA)</em>: For any (positive) edge \(e\), \(G \not\models e\) if and only if \(G \models \neg e\). Under CWA, any edge \(e\) not entailed by \(G\) can be considered a negative edge.</li>
				<li><em>Open world assumption</em>: For a (positive) edge \(e\), \(G \not\models e\) does not necessarily imply \(G \models \neg e\). Under OWA, the negation of an edge must be entailed by \(G\) for it to be considered negative.</li>
				<li><em>Partial completeness assumption (PCA)</em>: If there exists an edge \((s,p,o)\) such that \(G \models (s,p,o)\), then for all \(o'\) such that \(G \not\models (s,p,o')\), it is assumed that \(G \models \neg(s,p,o')\). Under PCA, if \(G\) entails some outgoing edge(s) labelled \(p\) from a node \(s\), then such edges are assumed to be complete, and any edge \((s,p,o')\) not entailed by \(G\) can be considered a negative edge.</li>
			</ul>
			<p>Knowledge graphs are generally incomplete – in fact, one of the main applications of hypothesis mining is to try to improve the completeness of the knowledge graph – and thus it would appear unwise to assume that any edge that is not currently entailed is false/negative. We can thus rule out CWA. Conversely, under OWA, potentially few (or no) negative edges might be entailed by the given ontologies/rules, and thus hypotheses may end up having low negative support despite entailing many edges that do not make sense in practice. Hence the PCA can be adopted as a heuristic to increase the number of negative edges and apply more sensible scoring of hypotheses. We remark that one can adapt PCA to define negative triples by changing the subject or predicate instead of the object.</p>
			<p>Different implementations of hypothesis mining may consider different logical languages. Rule mining, for example, mines hypotheses expressed either as monotonic rules (with positive edges) or non-monotonic edges (possibly with negated edges). On the other hand, axiom mining considers hypotheses expressed in a logical language such as Description Logics. Particular implementations may, for practical reasons, impose further syntactic restrictions on the hypotheses generated, such as to impose thresholds on their length, on the symbols they use, or on other structural properties (such as “closed rules” in the case of the AMIE rule mining system&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>]; see Section&nbsp;<a href="#ssec-symlearn">5.4</a>). Systems may further implement different search strategies for hypotheses. Systems such as DL-FOIL&nbsp;[<a href="#ref-FanizzidE08">Fanizzi et al., 2008</a>, <a href="#ref-RizzoFd20">Rizzo et al., 2020</a>], AMIE&nbsp;[<a href="#ref-GalarragaTHS13">Galárraga et al., 2013</a>], RuLES&nbsp;[<a href="#ref-HoSGKW18">Ho et al., 2018</a>], CARL&nbsp;[<a href="#ref-TanonSRMW17">Pellissier Tanon et al., 2017</a>], DL-Learner&nbsp;[<a href="#ref-BuhmannLW16">Bühmann et al., 2016</a>], etc., propose <em>discrete mining</em> that recursively generates candidate formulae through refinement/genetic operators that are then scored and checked for threshold criteria. On the other hand, systems such as NeuralLP&nbsp;[<a href="#ref-YangYC17">Yang et al., 2017</a>] and DRUM&nbsp;[<a href="#ref-SadeghianADW19">Sadeghian et al., 2019</a>] apply <em>differentiable mining</em> that allows for learning (path-like) rules and their scores in a more continuous fashion (e.g., using gradient descent). We refer to Section&nbsp;<a href="#ssec-symlearn">5.4</a> for further discussion and examples of such techniques for mining hypotheses.</p>
		</div>
		</section>
	</section>
	<section id="chap-create" class="chapter">
		<h2>Creation and Enrichment</h2>
		<p>In this chapter, we discuss the principal techniques by which knowledge graphs can be created and subsequently enriched from diverse sources of legacy data that may range from plain text to structured formats (and anything in between). The appropriate methodology to follow when creating a knowledge graph depends on the actors involved, the domain, the envisaged applications, the available data sources, etc. Generally speaking, however, the flexibility of knowledge graphs lends itself to starting with an initial core that can be incrementally enriched from other sources as required (typically following an Agile&nbsp;[<a href="#ref-HuntT03a">Hunt and Thomas, 2003</a>] or “pay-as-you-go”&nbsp;[<a href="#ref-SequedaBMH19">Sequeda et al., 2019</a>] methodology). For our running example, we assume that the tourism board decides to build a knowledge graph from scratch, aiming to initially describe the main tourist attractions – places, events, etc. – in Chile in order to help visiting tourists identify those that most interest them. The board decides to postpone adding further data, like transport routes, reports of crime, etc., for a later date.</p>

		<section id="sssec-graphCreationHuman" class="section">
		<h3>Human Collaboration</h3>
		<p>One approach for creating and enriching knowledge graphs is to solicit direct contributions from human editors. Such editors may be found in-house (e.g., employees of the tourist board), using crowd-sourcing platforms, through feedback mechanisms (e.g., tourists adding comments on attractions), through collaborative-editing platforms (e.g., an attractions wiki open to public edits), etc. Though human involvement incurs high costs&nbsp;[<a href="#ref-Paulheim18a">Paulheim, 2018</a>], some prominent knowledge graphs have been primarily based on direct contributions from human editors&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>, <a href="#ref-LinkedInKG">He et al., 2016</a>]. Depending on how the contributions are solicited, however, the approach has a number of key drawbacks, due primarily to human error&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>], disagreement&nbsp;[<a href="#ref-YasseriSRKK12">Yasseri et al., 2012</a>], bias&nbsp;[<a href="#ref-Janowicz0RZM18">Janowicz et al., 2018</a>], vandalism&nbsp;[<a href="#ref-HeindorfPSE16">Heindorf et al., 2016</a>], etc. Successful collaborative creation further raises challenges concerning licensing, tooling, and culture&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>]. Humans are sometimes rather employed to verify and curate additions to a knowledge graph extracted by other means&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>] (through, e.g., video games with a purpose&nbsp;[<a href="#ref-JurgensNavigli14">Jurgens and Navigli, 2014</a>]), to define high-quality mappings from other sources&nbsp;[<a href="#ref-r2rml">Das et al., 2012</a>], to define appropriate high-level schema&nbsp;[<a href="#ref-onteng">Keet, 2018</a>, <a href="#ref-Labra2017">Labra Gayo et al., 2018</a>], and so forth.</p>
		</section>

		<section id="sssec-graphCreationText" class="section">
		<h3>Text Sources</h3>
		<p>Text corpora – such as sourced from newspapers, books, scientific articles, social media, emails, web crawls, etc. – are an abundant source of rich information&nbsp;[<a href="#ref-HellmannLAB13">Hellmann et al., 2013</a>, <a href="#ref-RospocherEVFARS16">Rospocher et al., 2016</a>]. However, extracting such information with high precision and recall for the purposes of creating or enriching a knowledge graph is a non-trivial challenge. To address this, techniques from Natural Language Processing (NLP)&nbsp;[<a href="#ref-NLP-SW">Maynard et al., 2016</a>, <a href="#ref-JurafskyM18">Jurafsky and Martin, 2019</a>] and Information Extraction (IE)&nbsp;[<a href="#ref-WeikumT10">Weikum and Theobald, 2010</a>, <a href="#ref-Grishman12">Grishman, 2012</a>, <a href="#ref-IESW">Martínez-Rodríguez et al., 2020</a>] can be applied. Though processes vary considerably across text extraction frameworks, in Figure&nbsp;<a href="#fig-textExtract">6.1</a> we illustrate four core tasks for text extraction on a sample sentence. We will discuss these tasks in turn.</p>

		<figure id="fig-textExtract">
			<img src="images/fig-textExtract.svg" alt="Text extraction example; dashed nodes are new to the knowledge graph"/>
			<figcaption>Text extraction example; dashed nodes are new to the knowledge graph</figcaption>
		</figure>

		<h4 id="sssec-pre-processing" class="subsection">Pre-processing</h4>
		<p>The pre-processing task may involve applying various techniques to the input text, where Figure&nbsp;<a href="#fig-textExtract">6.1</a> illustrates <em>Tokenisation</em>, which parses the text into atomic terms and symbols. Other pre-processing tasks applied to a text corpus may include: <em>Part-of-Speech</em> (<em>POS</em>) <em>tagging</em>&nbsp;[<a href="#ref-NLP-SW">Maynard et al., 2016</a>, <a href="#ref-JurafskyM18">Jurafsky and Martin, 2019</a>] to identify terms representing verbs, nouns, adjectives, etc.; <em>Dependency Parsing</em>, which extracts a grammatical tree structure for a sentence where leaf nodes indicate individual words that together form phrases (e.g., noun phrases, verb phrases) and eventually clauses and sentences&nbsp;[<a href="#ref-NLP-SW">Maynard et al., 2016</a>, <a href="#ref-JurafskyM18">Jurafsky and Martin, 2019</a>]; and <em>Word Sense Disambiguation</em> (<em>WSD</em>)&nbsp;[<a href="#ref-Navigli:09">Navigli, 2009</a>] to identify the meaning (aka <em>sense</em>) in which a word is used, linking words with a lexicon of senses (e.g., WordNet&nbsp;[<a href="#ref-MillerF07">Miller and Fellbaum, 2007</a>] or BabelNet&nbsp;[<a href="#ref-NavigliPonzetto:12">Navigli and Ponzetto, 2012</a>]), where, for instance, the term <span class="tnode">flights</span> may be linked with the WordNet sense “<span class="sf">an instance of travelling by air</span>” rather than “<span class="sf">a stairway between one floor and the next</span>”. The appropriate type of pre-processing to apply often depends on the requirements of later tasks in the pipeline.</p>

		<h4 id="sssec-ner" class="subsection">Named Entity Recognition (NER)</h4>
		<p>The NER task identifies mentions of named entities in a text&nbsp;[<a href="#ref-NadeauS07">Nadeau and Sekine, 2007</a>, <a href="#ref-RatinovR09">Ratinov and Roth, 2009</a>], typically targetting mentions of people, organisations, locations, and potentially other types&nbsp;[<a href="#ref-LingW12">Ling and Weld, 2012</a>, <a href="#ref-NakasholeTW13">Nakashole et al., 2013</a>, <a href="#ref-YogatamaGL15">Yogatama et al., 2015</a>]. A variety of NER techniques exist, with many modern approaches based on learning frameworks that leverage lexical features (e.g., POS tags, dependency parse trees, etc.) and gazetteers (e.g., lists of common first names, last names, countries, prominent businesses, etc.). Supervised methods&nbsp;[<a href="#ref-BikelSW99">Bikel et al., 1999</a>, <a href="#ref-FinkelGM05">Finkel et al., 2005</a>, <a href="#ref-LampleBSKD16">Lample et al., 2016</a>] require manually labelling all entity mentions in a training corpus, whereas <em>bootstrapping</em>-based approaches&nbsp;[<a href="#ref-CollinsS99">Collins and Singer, 1999</a>, <a href="#ref-EtzioniCDKPSSWY04">Etzioni et al., 2004</a>, <a href="#ref-NakasholeTW13">Nakashole et al., 2013</a>, <a href="#ref-GuptaM14">Gupta and Manning, 2014</a>] rather require a small set of <em>seed examples</em> of entity mentions from which patterns can be learnt and applied to unlabelled text. <em>Distant supervision</em>&nbsp;[<a href="#ref-LingW12">Ling and Weld, 2012</a>, <a href="#ref-RenEWTVH15">Ren et al., 2015</a>, <a href="#ref-YogatamaGL15">Yogatama et al., 2015</a>] uses known entities in a knowledge graph as seed examples through which similar entities can be detected. Aside from learning-based frameworks, traditional approaches based on manually-crafted rules&nbsp;[<a href="#ref-KlueglAP09">Kluegl et al., 2009</a>, <a href="#ref-ChiticariuDLRZ18">Chiticariu et al., 2018</a>] are still sometimes used due to their more controllable and predictable behaviour&nbsp;[<a href="#ref-ChiticariuLR13">Chiticariu et al., 2013</a>]. The named entities identified by NER may be used to generate new candidate nodes for the knowledge graph (known as <em>emerging entities</em>, shown dashed in Figure&nbsp;<a href="#fig-textExtract">6.1</a>), or may be linked to existing nodes per the Entity Linking task described in the following.</p>

		<h4 id="sssec-el" class="subsection">Entity Linking (EL)</h4>
		<p>The EL task associates mentions of entities in a text with the existing nodes of a target knowledge graph, which may be the nucleus of a knowledge graph under creation, or an external knowledge graph&nbsp;[<a href="#ref-WuHH18">Wu et al., 2018</a>]. In Figure&nbsp;<a href="#fig-textExtract">6.1</a>, we assume that the nodes <span class="gnode">Santiago</span> and <span class="gnode">Easter&nbsp;Island</span> already exist in the knowledge graph (possibly extracted from other sources). EL may then link the given mentions to these nodes. The EL task presents two main challenges. First, there may be multiple ways to mention the same entity, as in the case of <span class="tnode">Rapa&nbsp;Nui</span> and <span class="tnode">Easter&nbsp;Island</span>; if we created a node <span class="gnode">Rapa&nbsp;Nui</span> to represent that mention, we would split the information available under both mentions across different nodes, where it is thus important for the target knowledge graph to capture the various aliases and multilingual labels by which one can refer to an entity&nbsp;[<a href="#ref-Moroetal:14">Moro et al., 2014</a>]. Second, the same mention in different contexts can refer to distinct entities; for instance, <span class="tnode">Santiago</span> can refer to cities in Chile, Cuba, Spain, amongst others. The EL task thus considers a <em>disambiguation phase</em> wherein mentions are associated to candidate nodes in the knowledge graph, the candidates are ranked, and the most likely node being mentioned is chosen&nbsp;[<a href="#ref-WuHH18">Wu et al., 2018</a>]. Context can be used in this phase; for example, if <span class="gnode">Easter&nbsp;Island</span> is a likely candidate for the corresponding mention alongside <span class="tnode">Santiago</span>, we may boost the probability that this mention refers to the Chilean capital as both candidates are located in Chile. Other heuristics for disambiguation consider a prior probability, where for example, <span class="tnode">Santiago</span> most often refers to the Chilean capital (being, e.g., the largest city with that name); centrality measures on the knowledge graph can be used for such purposes&nbsp;[<a href="#ref-WuHH18">Wu et al., 2018</a>].</p>

		<h4 id="sssec-er" class="subsection">Relation Extraction (RE)</h4>
		<p>The RE task extracts relations between entities in the text&nbsp;[<a href="#ref-ZhouSZZ05">Zhou et al., 2005</a>, <a href="#ref-BachB07">Bach and Badaskar, 2007</a>]. The simplest case is that of extracting binary relations in a <em>closed setting</em> wherein a fixed set of relation types are considered. While traditional approaches often relied on manually-crafted patterns&nbsp;[<a href="#ref-Hearst92">Hearst, 1992</a>], modern approaches rather tend to use learning-based frameworks&nbsp;[<a href="#ref-RollerKN18">Roller et al., 2018</a>], including supervised methods over manually-labelled examples&nbsp;[<a href="#ref-BunescuM05">Bunescu and Mooney, 2005</a>, <a href="#ref-ZhouSZZ05">Zhou et al., 2005</a>]. Other learning-based approaches again use bootstrapping&nbsp;[<a href="#ref-EtzioniCDKPSSWY04">Etzioni et al., 2004</a>, <a href="#ref-BunescuM07">Bunescu and Mooney, 2007</a>] and distant supervision&nbsp;[<a href="#ref-MintzBSJ09">Mintz et al., 2009</a>, <a href="#ref-RiedelYM10">Riedel et al., 2010</a>, <a href="#ref-HoffmannZLZW11">Hoffmann et al., 2011</a>, <a href="#ref-SurdeanuTNM12">Surdeanu et al., 2012</a>, <a href="#ref-XuHZG13">Xu et al., 2013</a>, <a href="#ref-SmirnovaC19">Smirnova and Cudré-Mauroux, 2019</a>] to forgo the need for manual labelling; the former requires a subset of manually-labelled seed examples, while the latter finds sentences in a large corpus of text mentioning pairs of entities with a known relation/edge, which are used to learn patterns for that relation. Binary RE can also be applied using unsupervised methods in an open setting – often referred to as <em>Open Information Extraction</em> (<em>OIE</em>)&nbsp;[<a href="#ref-BankoCSBE07">Banko et al., 2007</a>, <a href="#ref-EtzioniFCSM11">Etzioni et al., 2011</a>, <a href="#ref-FaderSE11">Fader et al., 2011</a>, <a href="#ref-MausamSSBE12">Mausam et al., 2012</a>, <a href="#ref-Mausam16">Mausam, 2016</a>, <a href="#ref-MitchellCHTYBCM18">Mitchell et al., 2018</a>] – whereby the set of target relations is not pre-defined but rather extracted from text based on, for example, dependency parse trees from which relations are taken.</p>
		<p>A variety of RE methods have been proposed to extract \(n\)-ary relations that capture further context for how entities are related. In Figure&nbsp;<a href="#fig-textExtract">6.1</a>, we see how an \(n\)-ary relation captures additional temporal context, denoting when Rapa Nui was named a World Heritage site; in this case, an anonymous node is created to represent the higher-arity relation in the directed-labelled graph. Various methods for \(n\)-ary RE are based on <em>frame semantics</em>&nbsp;[<a href="#ref-fillmore1976frame">Fillmore, 1976</a>], which, for a given verb (e.g., “<em>named</em>”), captures the entities involved and how they may be interrelated. Resources such as FrameNet&nbsp;[<a href="#ref-framenet">Baker et al., 1998</a>] then define frames for words, which, for example, may identify that the semantic frame for “<em>named</em>” includes a <em>speaker</em> (the person naming something), an <em>entity</em> (the thing named) and a <em>name</em>. Optional frame elements are an <em>explanation</em>, a <em>purpose</em>, a <em>place</em>, a <em>time</em>, etc., that may add context to the relation. Other RE methods are rather based on <em>Discourse Representation Theory</em> (<em>DRT</em>)&nbsp;[<a href="#ref-Kamp1981ATheoryOfTruth">Kamp, 1981</a>], which considers a logical representation of text based on existential events. Under this theory, for example, the naming of Easter Island as a World Heritage Site is considered to be an (existential) event where Easter Island is the <em>patient</em> (the entity affected), leading to the logical (neo-Davidsonian) formula:</p>
		
		<p class="mathblock">\( \exists e: \big(\)naming\((e),\) patient\((e,\) <span class="tnode">Easter&nbsp;Island</span>\(),\) name\((e,\) <span class="tnode">World&nbsp;Heritage&nbsp;Site</span>\()\big) \)</p>

		<p>Such a formula is analogous to reification, as discussed previously in Section&nbsp;<a href="#ssec-knowledgeContext">3.3</a>, where \(e\) is an existential term that refers to the \(n\)-ary relation being extracted.</p>
		<p>Finally, while relations extracted in a closed setting are typically mapped directly to a knowledge graph, relations that are extracted in an open setting may need to be aligned with the knowledge graph; for example, if an OIE process extracts a binary relation <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">has&nbsp;flights&nbsp;to</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Easter&nbsp;Island</span>, it may be the case that the knowledge graph does not have other edges labelled <span class="gelab">has&nbsp;flights&nbsp;to</span>, where alignment may rather map such a relation to the edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Easter&nbsp;Island</span> assuming <span class="gelab">flight</span> is used in the knowledge graph. A variety of methods have been applied for performing such alignments, including mappings&nbsp;[<a href="#ref-CorcoglionitiRA16">Corcoglioniti et al., 2016</a>, <a href="#ref-GangemiPRNDM17">Gangemi et al., 2017</a>] and rules&nbsp;[<a href="#ref-rouces2015framebase">Rouces et al., 2015</a>] for aligning \(n\)-ary relations; distributional and dependency-based similarities&nbsp;[<a href="#ref-MoroNavigli13">Moro and Navigli, 2013</a>], association rule mining&nbsp;[<a href="#ref-dutta2014semantifying">Dutta et al., 2014</a>], Markov clustering&nbsp;[<a href="#ref-Dutta2015ESKwithOI">Dutta et al., 2015</a>] and linguistic techniques&nbsp;[<a href="#ref-Martinez-Rodriguez18">Martínez-Rodríguez et al., 2018</a>] for aligning OIE relations; amongst others.</p>

		<h4 id="sssec-joint-tasks" class="subsection">Joint tasks</h4>
		<p>Having presented the four main tasks for building knowledge graphs from text, it is important to note that frameworks do not always follow this particular sequence of tasks. A common trend, for example, is to combine interdependent tasks, jointly performing WSD and EL&nbsp;[<a href="#ref-Moroetal:14">Moro et al., 2014</a>], or NER and EL&nbsp;[<a href="#ref-LuoHLN15">Luo et al., 2015</a>, <a href="#ref-NguyenTW16">Nguyen et al., 2016</a>], or NER and RE&nbsp;[<a href="#ref-RenWHQVJAH17">Ren et al., 2017</a>, <a href="#ref-ZhengWBHZX17">Zheng et al., 2017</a>], etc., in order to mutually improve the performance of multiple tasks. For further details on extracting knowledge graphs from text we refer to the book by <a href="#ref-NLP-SW">Maynard et al. [2016]</a> and the recent survey by <a href="#ref-IESW">Martínez-Rodríguez et al. [2020]</a>.</p>
		</section>

		<section id="sssec-graphCreationSemistructured" class="section">
		<h3>Markup Sources</h3>
		<p>The Web was founded on interlinking <em>markup documents</em> wherein markers (aka <em>tags</em>) are used to separate elements of the document (typically for formatting purposes). Most documents on the Web use the HyperText Markup Language (HTML). Figure&nbsp;<a href="#fig-html">6.2</a> presents an example HTML webpage about World Heritage Sites in Chile. Other formats of markup include Wikitext used by Wikipedia, TeX for typesetting, Markdown used by Content Management Systems, etc. One approach for extracting information from markup documents – in order to create and/or enrich a knowledge graph – is to strip the markers (e.g., HTML tags), leaving only plain text upon which the techniques from the previous section can be applied. However, markup can be useful for extraction purposes, where variations of the aforementioned tasks for text extraction have been adapted to exploit such markup&nbsp;[<a href="#ref-LuBLCG13">Lu et al., 2013</a>, <a href="#ref-LockardDSE18">Lockard et al., 2018</a>, <a href="#ref-IESW">Martínez-Rodríguez et al., 2020</a>]. We can divide extraction techniques for markup documents into three main categories: general approaches that work independently of the markup used in a particular format, often based on <em>wrappers</em> that map elements of the document to the output; focussed approaches that target specific forms of markup in a document, most typically <em>web tables</em> (but sometimes also lists, links, etc.); and form-based approaches that extract the data underlying a webpage, per the notion of the <em>Deep Web</em>. These approaches can often benefit from the regularities shared by webpages of a given website; for example, intuitively speaking, while the webpage of Figure&nbsp;<a href="#fig-html">6.2</a> is about Chile, we will likely find pages for other countries following the same structure on the same website.</p>

		<figure id="fig-html">
			<pre style="float:left;width:60%;font-size:75%;"><code class="language-html">&lt;html&gt;
  &lt;head&gt;&lt;title&gt;UNESCO World Heritage Sites&lt;/title&gt;&lt;/head&gt;
  &lt;body&gt;
    &lt;h1&gt;World Heritage Sites&lt;/h1&gt;
	&lt;h2&gt;Chile&lt;/h2&gt;
	&lt;p&gt;Chile has 6 UNESCO World Heritage Sites.&lt;/p&gt;
	&lt;table border="1"&gt;
	  &lt;tr&gt;&lt;th&gt;Place&lt;/th&gt;&lt;th&gt;Year&lt;/th&gt;&lt;th&gt;Criteria&lt;/th&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Rapa Nui&lt;/td&gt;&lt;td&gt;1995&lt;/td&gt;
		&lt;td rowspan="6"&gt;Cultural&lt;/td&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Churches of Chiloé&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Historical Valparaíso&lt;/td&gt;&lt;td&gt;2003&lt;/td&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Saltpeter Works&lt;/td&gt;&lt;td&gt;2005&lt;/td&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Sewell Mining Town&lt;/td&gt;&lt;td&gt;2006&lt;/td&gt;&lt;/tr&gt;
	  &lt;tr&gt;&lt;td&gt;Qhapaq Ñan&lt;/td&gt;&lt;td&gt;2014&lt;/td&gt;&lt;/tr&gt;
	&lt;/table&gt;
  &lt;/body&gt;
&lt;/html&gt;</code></pre>
			<div style="height:2em;">&nbsp;</div>
			<div id="unesco">
				<p>UNESCO World Heritage Sites</p>
				<div>
					<div class="html-h1">World Heritage Sites</div>
					<div class="html-h2">Chile</div>
					<p>Chile has 6 UNESCO World Heritage Sites.</p>
					<table border="1">
						<tr><th>Place</th><th>Year</th><th>Criteria</th></tr>
						<tr><td>Rapa Nui</td><td>1995</td>
							<td rowspan="6">Cultural</td></tr>
						<tr><td>Churches of Chiloé</td><td>2000</td></tr>
						<tr><td>Historical Valparaíso</td><td>2003</td></tr>
						<tr><td>Saltpeter Works</td><td>2005</td></tr>
						<tr><td>Sewell Mining Town</td><td>2006</td></tr>
						<tr><td>Qhapaq Ñan</td><td>2014</td></tr>
					</table>
				</div>
			</div>
			<div style="height:3.5em;">&nbsp;</div>
			<figcaption>Example markup document (HTML) with source-code (left) and formatted document (right)</figcaption>
		</figure>

		<h4 id="sssec-wrapper-based-extraction" class="subsection">Wrapper-based extraction</h4>
		<p>Many general approaches are based on <em>wrappers</em> that locate and extract the useful information directly from the markup document. While the traditional approach was to define such wrappers manually – a task for which a variety of declarative languages and tools have been defined – such approaches are brittle to changes in a website’s layout&nbsp;[<a href="#ref-FerraraMFB14">Ferrara et al., 2014</a>]. Hence other approaches allow for (semi-)automatically <em>inducing</em> wrappers&nbsp;[<a href="#ref-FlescaMM04">Flesca et al., 2004</a>]. A modern such approach – used to enrich knowledge graphs in systems such as LODIE&nbsp;[<a href="#ref-GentileZC14">Gentile et al., 2014</a>] – is to apply distant supervision, whereby EL is used to identify and link entities in the webpage to nodes in the knowledge graph such that paths in the markup that connect pairs of nodes for known edges can be extracted, ranked, and applied to other examples. Taking Figure&nbsp;<a href="#fig-html">6.2</a>, for example, distant supervision may link <span class="tnode">Rapa&nbsp;Nui</span> and <span class="tnode"><strong>World&nbsp;Heritage&nbsp;Sites</strong></span> to the nodes <span class="gnode">Easter&nbsp;Island</span> and <span class="gnode">World&nbsp;Heritage&nbsp;Site</span> in the knowledge graph using EL, and given the edge <span class="gnode">Easter&nbsp;Island</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">named</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">World&nbsp;Heritage&nbsp;Site</span> in the knowledge graph (extracted per Figure&nbsp;<a href="#fig-textExtract">6.1</a>), identify the candidate path \((x,\)<span class="markupf">td</span>\([1]^{-} \cdot \) <span class="markupf">tr</span>\(^{-} \cdot \) <span class="markupf">table</span>\(^- \cdot \) <span class="markupf">h1</span>\(,y)\) as reflecting edges of the form <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">named</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span>, where \(t[n]\) indicates the \(n\)<sup>th</sup> child of tag \(t\), \(t^-\) its inverse, and \(t_1 \cdot t_2\) concatenation. Finally, paths with high confidence (e.g., ones “witnessed” by many known edges in the knowledge graph) can then be used to extract novel edges, such as <span class="gnode">Qhapaq&nbsp;Ñan</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">named</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">World&nbsp;Heritage&nbsp;Site</span>, both on this page and on related pages of the website with similar structure (e.g., for other countries).</p>

		<h4 id="sssec-web-table-extraction" class="subsection">Web table extraction</h4>
		<p>Other approaches target specific types of markup, most commonly <em>web tables</em> embedded in HTML webpages. However, web tables are designed to enhance human rather than machine readability. Many web tables are used for layout and page structure (e.g., navigation bars). Those that contain data may follow different formats, such as relational tables, listings, attribute-value tables, and matrices&nbsp;[<a href="#ref-CafarellaHWWZ08">Cafarella et al., 2008</a>, <a href="#ref-CrestanP11">Crestan and Pantel, 2011</a>]. A first step is to classify tables to find ones appropriate for the given extraction mechanism(s)&nbsp;[<a href="#ref-CrestanP11">Crestan and Pantel, 2011</a>, <a href="#ref-EberiusBHTAL15">Eberius et al., 2015</a>]. Next, web tables may contain column spans, row spans, inner tables, or may be split vertically to improve human aesthetics. Table normalisation merges split tables, un-nests tables, transposes tables, etc.&nbsp;[<a href="#ref-PivkCSGRS07">Pivk et al., 2007</a>, <a href="#ref-CafarellaHWWZ08">Cafarella et al., 2008</a>, <a href="#ref-CrestanP11">Crestan and Pantel, 2011</a>, <a href="#ref-DengJLLY13">Deng et al., 2013</a>, <a href="#ref-ErmilovN16">Ermilov and Ngonga Ngomo, 2016</a>, <a href="#ref-LehmbergRMB16">Lehmberg et al., 2016</a>]. Some approaches then identify the table <em>protagonist</em>&nbsp;[<a href="#ref-CrestanP11">Crestan and Pantel, 2011</a>, <a href="#ref-MunozHM14">Muñoz et al., 2014</a>] – the main entity that the table describes – often found elsewhere in the webpages; for example, though not mentioned by the table of Figure&nbsp;<a href="#fig-textExtract">6.1</a>, <span class="tnode">World&nbsp;Heritage&nbsp;Sites</span> is its protagonist. Finally, extraction processes may associate cells with entities&nbsp;[<a href="#ref-LimayeSC10">Limaye et al., 2010</a>, <a href="#ref-MulwadFJ13">Mulwad et al., 2013</a>], columns with types&nbsp;[<a href="#ref-DengJLLY13">Deng et al., 2013</a>, <a href="#ref-LimayeSC10">Limaye et al., 2010</a>, <a href="#ref-MulwadFJ13">Mulwad et al., 2013</a>], and column pairs with relations&nbsp;[<a href="#ref-LimayeSC10">Limaye et al., 2010</a>, <a href="#ref-MunozHM14">Muñoz et al., 2014</a>]. When enriching knowledge graphs, recent approaches apply distant supervision, linking cells to knowledge graph nodes in order to generate candidates for type and relation extraction&nbsp;[<a href="#ref-LimayeSC10">Limaye et al., 2010</a>, <a href="#ref-MulwadFJ13">Mulwad et al., 2013</a>, <a href="#ref-MunozHM14">Muñoz et al., 2014</a>]. Statistical distributions can also help to link numerical columns&nbsp;[<a href="#ref-NeumaierUPP16">Neumaier et al., 2016</a>]. Specialised table extraction frameworks have also been proposed for specific websites, where prominent knowledge graphs, such as DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>] and YAGO&nbsp;[<a href="#ref-suchanek2008yago">Suchanek et al., 2008</a>] focus on extraction from info-box tables in Wikipedia.</p>

		<h4 id="sssec-deep-web-crawling" class="subsection">Deep Web crawling</h4>
		<p>The <em>Deep Web</em> presents a rich source of information accessible only through searches on web forms, thus requiring <em>Deep Web crawling</em> techniques to access&nbsp;[<a href="#ref-MadhavanKKGRH08">Madhavan et al., 2008</a>]. Systems have been proposed to extract knowledge graphs from Deep Web sources&nbsp;[<a href="#ref-GellerCA08">Geller et al., 2008</a>, <a href="#ref-LehmannFGNSSUBGHLA12">Lehmann et al., 2012</a>, <a href="#ref-CollaranaG0GVA16">Collarana et al., 2016</a>]. Approaches typically attempt to generate sensible form inputs – which may be based on a user query or generated from reference knowledge – and then extract data from the generated responses (markup documents) using the aforementioned techniques&nbsp;[<a href="#ref-GellerCA08">Geller et al., 2008</a>, <a href="#ref-LehmannFGNSSUBGHLA12">Lehmann et al., 2012</a>, <a href="#ref-CollaranaG0GVA16">Collarana et al., 2016</a>].</p>
		</section>

		<section id="sssec-graphCreationStructured" class="section">
		<h3>Structured Sources</h3>
		<p>Much of the legacy data available within organisations and on the Web is represented in structured formats, primarily tables – in the form of relational databases, CSV files, etc. – but also tree-structured formats such as JSON, XML etc. Unlike text and markup documents, structured sources can often be <em>mapped</em> to knowledge graphs whereby the structure is (precisely) transformed according to a mapping rather than (imprecisely) extracted. The mapping process involves two steps: 1) create a mapping from the source to a graph, and 2) use the mapping in order to materialise the source data as a graph or to virtualise the source (creating a graph view over the legacy data).</p>

		<h4 id="sssec-mapping-from-tables" class="subsection">Mapping from tables</h4>
		<p>Tabular sources of data are prevalent; for example, the structured content underlying many organisations and websites are housed in relational databases. In Figure&nbsp;<a href="#fig-rdbCrime">6.3</a> we present an example of a relational database instance that we wish to integrate into our knowledge graph. There are then two approaches for mapping content from tables to knowledge graphs: a <em>direct mapping</em>, and a <em>custom mapping</em>.</p>

		<figure id="fig-rdbCrime">
			<div id="report">
				<p>Report</p>
				<table class="condensedTable">
					<tr>
						<th>crime</th>
						<th>claimant</th>
						<th>station</th>
						<th>date</th>
					</tr>
					<tr>
						<td>Pickpocketing</td>
						<td>XY12SDA</td>
						<td>Viña del Mar</td>
						<td>2019-04-12</td>
					</tr>
					<tr>
						<td>Assault</td>
						<td>AB9123N</td>
						<td>Arica</td>
						<td>2019-04-12</td>
					</tr>
					<tr>
						<td>Pickpocketing</td>
						<td>XY12SDA</td>
						<td>Rapa Nui</td>
						<td>2019-04-12</td>
					</tr>
					<tr>
						<td>Fraud</td>
						<td>FI92HAS</td>
						<td>Arica</td>
						<td>2019-04-13</td>
					</tr>
				</table>
			</div>
			<div style="height:1em;">&nbsp;</div>
			<div id="claimant">
				<p>Claimant</p>
				<table class="condensedTable">
					<tr>
						<th style="text-decoration:underline;">id</th>
						<th>name</th>
						<th>country</th>
					</tr>
					<tr>
						<td>XY12SDA</td>
						<td>John Smith</td>
						<td>U.S.</td>
					</tr>
					<tr>
						<td>AB9123N</td>
						<td>Joan Dubois</td>
						<td>France</td>
					</tr>
					<tr>
						<td>XI92HAS</td>
						<td>Jorge Hernández</td>
						<td>Chile</td>
					</tr>
				</table>
			</div>
			<div style="height:1em;">&nbsp;</div>
			<figcaption>Relational database instance with two tables describing crime data</figcaption>
		</figure>
		<figure id="fig-direct">
			<img src="images/fig-direct.svg" alt="Direct mapping result for the first rows of both tables in Figure&nbsp;33"/>
			<figcaption>Direct mapping result for the first rows of both tables in Figure&nbsp;<a href="#fig-rdbCrime">6.3</a></figcaption>
		</figure>

		<p>A direct mapping automatically generates a graph from a table. We present in Figure&nbsp;<a href="#fig-direct">6.4</a> the result of a standard direct mapping&nbsp;[<a href="#ref-dm">Arenas et al., 2012</a>], which creates an edge <span class="gnode">x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">y</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">z</span> for each (non-header, non-empty, non-<span class="sc">null</span>) cell of the table, such that <span class="gnode">x</span> represents the row of the cell, <span class="gelab">y</span> the column name of the cell, and <span class="gnode">z</span> the value of the cell. In particular, <span class="gnode">x</span> typically encodes the values of the primary key for a row (e.g., <strong><code>Claimant</code></strong>.<strong class="underline"><code>id</code></strong>); otherwise, if no primary key is defined (e.g., per the <strong><code>Report</code></strong> table), <span class="gnode">x</span> can be an anonymous node or a node based on the row number. The node <span class="gnode">x</span> and edge label <span class="gelab">y</span> further encode the name of the table to avoid clashes across tables that have the same column names used with different meanings. For each row <span class="gnode">x</span>, we may add a type edge based on the name of its table. The value <span class="gnode">z</span> may be mapped to datatype values in the corresponding graph model based on the source domain (e.g., a value in an SQL column of type <code>Date</code> can be mapped to <code>xsd:date</code> in the RDF data model). If the value is <span class="sc">null</span> (or empty), typically the corresponding edge will be omitted.<sup class="fnmark" id="fnm30"><a href="#fn30">30</a></sup><span class="footnote" id="fn30"><sup><a href="#fnm30">note 30</a></sup> One might consider representing <span class="sc">null</span>s with anonymous/blank nodes. However, <span class="sc">null</span>s in SQL can be used to mean that there is no such value, which conflicts with the existential semantics of such nodes (e.g., in RDF).</span> With respect to Figure&nbsp;<a href="#fig-direct">6.4</a>, we highlight the difference between the nodes <span class="gnode">Claimant-XY12SDA</span> and <span class="gnode">XY12SDA</span>, where the former denotes the row (or entity) identified by the latter primary key value. In case of a foreign key between two tables – such as <strong>Report.claimant</strong> referencing <strong>Claimant.<span class="underline">id</span></strong> – we can link, for example, to <span class="gnode">Claimant-XY12SDA</span> rather than <span class="gnode">XY12SDA</span>, where the former node also has the name and country of the claimant. A direct mapping along these lines has been standardised for mapping relational databases to RDF&nbsp;[<a href="#ref-dm">Arenas et al., 2012</a>], where <a href="#ref-StoicaFS19">Stoica et al. [2019]</a> have recently proposed an analogous direct mapping for property graphs. Another direct mapping has been defined for CSV and other tabular data&nbsp;[<a href="#ref-csvweb">Tandy et al., 2015</a>] that further allows for specifying column names, primary/foreign keys, and data types – which are often missing in such data formats – as part of the mapping itself.</p>
		<p>Although a direct mapping can be applied automatically on tabular sources of data and preserve the information of the original source – i.e., allowing a deterministic inverse mapping that reconstructs the tabular source from the output graph&nbsp;[<a href="#ref-SequedaAM12">Sequeda et al., 2012</a>] – in many cases it is desirable to customise a mapping, such as to align edge labels or nodes with a knowledge graph under enrichment, etc. Along these lines, declarative mapping languages allow for manually defining custom mappings from tabular sources to graphs. A standard language along these lines is the RDB2RDF Mapping Language (R2RML)&nbsp;[<a href="#ref-r2rml">Das et al., 2012</a>], which allows for mapping from individual rows of a table to one or more custom edges, with nodes and edges defined either as constants, as individual cell values, or using templates that concatenate multiple cell values from a row and static substrings into a single term; for example, a template <code>{id}-{country}</code> may produce nodes such as <span class="gnode">XY12SDA-U.S.</span> from the <strong>Claimant</strong> table. In case that the desired output edges cannot be defined from a single row, R2RML allows for (SQL) queries to generate tables from which edges can be extracted where, for example, edges such as <span class="gnode">U.S.</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">crimes</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">2</span> can be generated by defining the mapping with respect to a query that joins the <strong><code>Report</code></strong> and <strong><code>Claimant</code></strong> tables on <code><strong>claimant</strong>=<strong>id</strong></code>, grouping by <code><strong>country</strong></code>, and applying a count for each country group. A mapping can then be defined on the results table such that the source node denotes the value of <code><strong>country</strong></code>, the edge label is the constant <span class="gelab">crimes</span>, and the target node is the count value. An analogous standard also exists for mapping CSV and other tabular data to RDF graphs, again allowing keys, column names, and datatypes to be chosen as part of the mapping&nbsp;[<a href="#ref-csvwmeta">Tennison and Kellogg, 2015</a>].</p>
		<p>Once the mappings have been defined, one option is to use them to <em>materialise</em> graph data following an <em>Extract-Transform-Load</em> (<em>ETL</em>) approach, whereby the tabular data are transformed and explicitly serialised as graph data using the mapping. A second option is to use <em>virtualisation</em> through a <em>Query Rewriting</em> (<em>QR</em>) approach, whereby queries on the graph (using, e.g., SPARQL, Cypher, etc.) are translated to queries over the tabular data (typically using SQL). Comparing these two options, ETL allows the graph data to be used as if they were any other data in the knowledge graph. However, ETL requires updates to the underlying tabular data to be explicitly propagated to the knowledge graph, whereas a QR approach only maintains one copy of data to be updated. The area of <em>Ontology-Based Data Access</em> (<em>OBDA</em>)&nbsp;[<a href="#ref-XiaoCKLPRZ18">Xiao et al., 2018</a>] is concerned with QR approaches that support ontological entailments as seen in Chapter&nbsp;<a href="#chap-deductive">4</a>. Although most QR approaches only support non-recursive entailments expressible as a single (non-recursive) query, some QR approaches support recursive entailments through rewritings to recursive queries&nbsp;[<a href="#ref-SequedaAM14">Sequeda et al., 2014</a>].</p>

		<h4 id="sssec-mapping-from-trees" class="subsection">Mapping from trees</h4>
		<p>A number of popular data formats are based on trees, including XML and JSON. While one could imagine – leaving aside issues such as the ordering of children in a tree – a trivial direct mapping from trees to graphs by simply creating edges of the form <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">child</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span> for each node \(y\) that is a child of \(x\) in the source tree, such an approach is not typically used, as it represents the literal structure of the source data. Instead, the content of tree-structured data can be more naturally represented as a graph using a custom mapping. Along these lines, the GRDLL standard&nbsp;[<a href="#ref-grddl">Connolly, 2007</a>] allows for mapping from XML to (RDF) graphs, while languages such as RML allow for mapping from a variety of formats, including XML and JSON, to (RDF) graphs&nbsp;[<a href="#ref-DimouSSSMKW14">Dimou et al., 2014</a>]. In contrast, hybrid query languages such as XSPARQL&nbsp;[<a href="#ref-BishofDKLP12">Bischof et al., 2012</a>] allow for querying XML and RDF in unison, thus supporting both materialisation and virtualisation of graphs over tree-structured sources of legacy data.</p>

		<h4 id="sssec-mapping-from-other" class="subsection">Mapping from other knowledge graphs</h4>
		<p>We may also leverage existing knowledge graphs in order to construct or enrich another knowledge graph. For example, a large number of points of interest for the Chilean tourist board may be available in existing knowledge graphs such as BabelNet&nbsp;[<a href="#ref-NavigliPonzetto:12">Navigli and Ponzetto, 2012</a>], DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>], LinkedGeoData&nbsp;[<a href="#ref-StadlerLHA12">Stadler et al., 2012</a>], Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>], YAGO&nbsp;[<a href="#ref-YAGO">Hoffart et al., 2011</a>], etc. However, not all entities and/or relations may be of interest. A standard option to extract a relevant sub-graph of data is to use construct queries that generate graphs as output&nbsp;[<a href="#ref-neumaier2018enabling">Neumaier and Polleres, 2019</a>]. Entity and schema alignment between the knowledge graphs may be further necessary to better integrate (parts of) external knowledge graphs, using linking tools for graphsexternal identifiers&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>], or indeed may be done manually&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>]. For instance, Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>] uses Freebase&nbsp;[<a href="#ref-bollacker2007freebase">Bollacker et al., 2007b</a>, <a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>] as a source; <a href="#ref-gottschalk2018eventkg">Gottschalk and Demidova [2018]</a> extract an event-centric knowledge graph from Wikidata, DBpedia and YAGO; while <a href="#ref-neumaier2018enabling">Neumaier and Polleres [2019]</a> construct a spatio-temporal knowledge graph from Geonames, Wikidata, and PeriodO&nbsp;[<a href="#ref-GoldenS16">Golden and Shaw, 2016</a>] (as well as tabular data).</p>
		</section>

		<section id="ssec-knowledgeConceptual" class="section">
		<h3>Schema/Ontology Creation</h3>
		<p>The discussion thus far has focussed on extracting <em>data</em> from external sources in order to create and enrich a knowledge graph. In this section, we discuss some of the principal methods for generating a schema based on external sources of data, including human knowledge. For discussion on extracting a schema from the knowledge graph itself, we refer back to Section&nbsp;<a href="#ssec-emergentSchema">3.1.3</a>. In general, much of the work in this area has focussed on the creation of ontologies using either ontology engineering methodologies, and/or ontology learning. We discuss these two approaches in turn.</p>

		<h4 id="sssec-ontology-engineering" class="subsection">Ontology engineering</h4>
		<p>Ontology engineering refers to the development and application of methodologies for building ontologies, proposing principled processes by which better quality ontologies can be constructed and maintained with less effort. Early methodologies&nbsp;[<a href="#ref-Gruninger1995">Grüninger and Fox, 1995a</a>, <a href="#ref-Fernandez1997">Fernández et al., 1997</a>, <a href="#ref-Noy2001">Noy and McGuinness, 2001</a>] were often based on a waterfall-like process, where requirements and conceptualisation were fixed before starting to define the ontology, using, for example, an ontology engineering tool&nbsp;[<a href="#ref-gomez2006ontological">Gómez-Pérez et al., 2006</a>, <a href="#ref-onteng">Keet, 2018</a>, <a href="#ref-kendall2019ontology">Kendall and McGuinness, 2019</a>]. However, for situations involving large or ever-evolving ontologies, more iterative and agile ways of building and maintaining ontologies have been proposed.</p>
		<p>DILIGENT&nbsp;[<a href="#ref-Pinto2009">Pinto et al., 2009</a>] was an early example of an agile methodology, proposing a complete process for ontology life-cycle management and knowledge evolution, as well as separating local changes (local views on knowledge) from global updates of the core part of the ontology, using a review process to authorise the propagation of changes from the local to the global level. This methodology is similar to how, for instance, the large clinical reference terminology SNOMED&nbsp;CT&nbsp;[<a href="#ref-snomed2019">IHTSDO, 2019</a>] (also available as an ontology) is maintained and evolved, where the (international) core terminology is maintained based on global requirements, while national or local extensions to SNOMED CT are maintained based on local requirements. A group of authors then decides which national or local extensions to propagate to the core terminology. More modern agile methodologies include eXtreme Design (XD)&nbsp;[<a href="#ref-PresuttiDGB09">Presutti et al., 2009</a>, <a href="#ref-Blomqvist2016">Blomqvist et al., 2016</a>], Modular Ontology Modelling (MOM)&nbsp;[<a href="#ref-hitzler2016modeling">Krisnadhi and Hitzler, 2016b</a>, <a href="#ref-hitzler2018tutorial">Hitzler and Krisnadhi, 2018</a>], Simplified Agile Methodology for Ontology Development (SAMOD)&nbsp;[<a href="#ref-peroni2016simplified">Peroni, 2016</a>], and more besides. Such methodologies typically include two key elements: <em>ontology requirements</em> and (more recently) <em>ontology design patterns</em>.</p>
		<p>Ontology requirements specify the intended task of the resulting ontology, or of the knowledge graph itself in conjunction with the new ontology. A common way to express ontology requirements is through <em>Competency Questions</em> (<em>CQ</em>)&nbsp;[<a href="#ref-gruninger1995role">Grüninger and Fox, 1995b</a>], which are natural language questions illustrating the typical information needs that one would require the ontology (or the knowledge graph) to respond to. Such CQs can then be complemented with additional restrictions, and reasoning requirements, in case that the ontology should also contain restrictions and general axioms for inferring new knowledge or checking data consistency. A common way of testing ontologies (or knowledge graphs based on them) is then to formalise the CQs as queries over some test set of data, and make sure the expected results are entailed&nbsp;[<a href="#ref-blomqvist2012ontology">Blomqvist et al., 2012</a>, <a href="#ref-keet2016test">Keet, 2016</a>]. We may, for example, consider the CQ “<em>What are all the events happening in Santiago?</em>”, which can be represented as a graph query <span class="gnode">Event</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">type</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gvar">?event</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>. Taking the data graph of Figure&nbsp;<a href="#fig-delg">2.1</a> and the axioms of Figure&nbsp;<a href="#fig-sg">3.2</a>, we can check to see if the expected result <span class="gnode">EID15</span> is entailed by the ontology and the data, and since it is not, we may consider expanding the axioms to assert that <span class="gnode">location</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Transitive</span>.</p>
		<p>Ontology Design Patterns (ODPs) are another common feature of modern methodologies&nbsp;[<a href="#ref-gangemi2005ontology">Gangemi, 2005</a>, <a href="#ref-blomqvist2005patterns">Blomqvist and Sandkuhl, 2005</a>], specifying generalisable ontology modelling patterns that can be used as inspiration for modelling similar patterns, as modelling templates&nbsp;[<a href="#ref-Egana2008">Egaña et al., 2008</a>, <a href="#ref-Skjaeveland2018">Skjæveland et al., 2018</a>], or as directly reusable components&nbsp;[<a href="#ref-DagaPS08">Daga et al., 2008</a>, <a href="#ref-shimizu2019modl">Shimizu et al., 2019</a>]. Several pattern libraries have been made available online, ranging from carefully curated ones&nbsp;[<a href="#ref-Aranguren2008">Aranguren et al., 2008</a>, <a href="#ref-shimizu2019modl">Shimizu et al., 2019</a>] to open and community moderated ones&nbsp;[<a href="#ref-DagaPS08">Daga et al., 2008</a>]. As an example, to model events in our scenario, we may adopt the Core Event ontology pattern proposed by <a href="#ref-KrisnadhiH16">Krisnadhi and Hitzler [2016a]</a>, which specifies a spatio-temporal extent, sub-events, and participants of an event, along with competency questions, formal definitions, etc., to support this pattern.</p>

		<h4 id="sssec-ontology-learning" class="subsection">Ontology learning</h4>
		<p>The previous methodologies outline methods by which ontologies can be built and maintained manually. Ontology learning, in contrast, can be used to (semi-)automatically extract information from text that is useful for the ontology engineering process&nbsp;[<a href="#ref-buitelaar2005ontology">Buitelaar et al., 2005</a>, <a href="#ref-cimiano2006ontology">Cimiano, 2006</a>]. Early methods focussed on extracting terminology from text that may represent the relevant domain’s classes; for example, from a collection of text documents about tourism, a terminology extraction tool – using measures of <em>unithood</em> that determine how cohesive an \(n\)-gram is as a unitary phrase, and <em>termhood</em> that determine how relevant the phrase is to a domain&nbsp;[<a href="#ref-Martinez-Rodriguez18">Martínez-Rodríguez et al., 2018</a>] – may identify \(n\)-grams such as “<span class="sf">visitor visa</span>”, “<span class="sf">World Heritage Site</span>”, “<span class="sf">off-peak rate</span>”, etc., as terminology of particular importance to the tourist domain that thus may merit inclusion in such an ontology. Ontological axioms may also be extracted from text. A common target is to extract sub-class axioms from text, leveraging patterns based on modifying nouns and adjectives that incrementally specialise concepts (e.g., extracting <span class="gnode">Visitor&nbsp;Visa</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Visa</span> from the noun phrase “<span class="sf">visitor visa</span>” and isolated appearances of “<span class="sf">visa</span>” elsewhere), or using Hearst patterns&nbsp;[<a href="#ref-Hearst92">Hearst, 1992</a>] (e.g., extracting <span class="gnode">Off-Peak&nbsp;Rate</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">subc.&nbsp;of</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Discount</span> from “<span class="sf">many <span class="underline">discounts, such as off-peak rates</span>, are available</span>” based on the pattern “<span class="sf underline">X, such as Y</span>”). Textual definitions can also be harvested from large texts to extract hypernym relations and induce a taxonomy from scratch&nbsp;[<a href="#ref-OntolearnReloaded13">Velardi et al., 2013</a>]. More recent works aim to extract more expressive axioms from text, including disjointness axioms&nbsp;[<a href="#ref-Volker2015">Völker et al., 2015</a>]; and axioms involving the union and intersection of classes, along with existential, universal, and qualified-cardinality restrictions&nbsp;[<a href="#ref-petrucci2016ontology">Petrucci et al., 2016</a>]. The results of an ontology learning process can then serve as input to a more general ontology engineering methodology, allowing us to validate the terminological coverage of an ontology, to identify new classes and axioms, etc.</p>
		</section>
	</section>
	<section id="chap-quality" class="chapter">
		<h2>Quality Assessment</h2>
		<p>Independently of the (kinds of) source(s) from which a knowledge graph is created, the resulting initial knowledge graph will usually be incomplete, and will often contain duplicate, contradictory or even incorrect statements, especially when taken from multiple sources. After the initial creation and enrichment of a knowledge graph from external sources, a crucial step is thus to assess the <em>quality</em> of the resulting knowledge graph. By quality, we here refer to <em>fitness for purpose</em>. Quality assessment then helps to ascertain for which purposes a knowledge graph can be reliably used. Take, for instance, the sample of an initial knowledge graph created by the tourist board shown in Figure&nbsp;<a href="#fig-bad">7.1</a>. Is this knowledge graph of good quality? Does it exhibit issues that might limit the applications for which it is fit for purpose? Can we define and detect such issues? These questions are crucial to address before the knowledge graph is deployed, but they are also challenging to address in a general way.</p>

		<figure id="fig-bad">
			<img src="images/fig-bad.svg" alt="A newly created knowledge graph about events and their venues"/>
			<figcaption>A newly created knowledge graph about events and their venues</figcaption>
		</figure>

		<p>This chapter discusses (sometimes overlapping) <em>quality dimensions</em> that capture qualitative aspects of the multifaceted notion of data quality; some of these dimensions apply more generally to databases&nbsp;[<a href="#ref-BatiniRSV15">Batini et al., 2015</a>], while others are more specific to knowledge graphs&nbsp;[<a href="#ref-ZaveriRMPLA16">Zaveri et al., 2016</a>]. We further discuss <em>quality metrics</em> that provide ways to measure quantitative aspects of these dimensions. We group dimensions and metrics in a manner inspired by <a href="#ref-BatiniS16">Batini and Scannapieco [2016]</a>.</p>

		<section id="ssec-accuracy" class="section">
		<h3>Accuracy</h3>
		<p><em>Accuracy</em> refers to the extent to which entities and relations – encoded by nodes and edges in the graph – correctly represent real-life phenomena. Accuracy can be divided into three dimensions: <em>syntactic accuracy</em>, <em>semantic accuracy</em>, and <em>timeliness</em>.</p>

		<h4 id="sssec-syntactic-accuracy" class="subsection">Syntactic accuracy</h4>
		<p><em>Syntactic accuracy</em> is the degree to which the data are accurate with respect to the grammatical rules defined for the domain and/or data model. A prevalent example of syntactic inaccuracy occurs with datatype nodes, which may be incompatible with a defined range or be malformed. For example, assuming that a property <span class="gelab">start</span> is defined with the range <code>xsd:dateTime</code>, the value <span class="gnode">March&nbsp;29,&nbsp;2019</span> in Figure&nbsp;<a href="#fig-bad">7.1</a> would be incompatible with the defined range, while a value <span class="gnode">"March&nbsp;29,&nbsp;2019,&nbsp;20:00"^^xsd:dateTime</span> would be malformed (a value such as <span class="gnode">"2019-03-22T20:00:00"^^xsd:dateTime</span> is rather expected). A corresponding metric for syntactic accuracy is the ratio between the number of invalid values of a given property and the total number of values for the same property&nbsp;[<a href="#ref-ZaveriRMPLA16">Zaveri et al., 2016</a>]. Such forms of syntactic accuracy can typically be assessed using validation tools&nbsp;[<a href="#ref-Furber">Fürber and Hepp, 2011</a>, <a href="#ref-Hogan">Hogan et al., 2010</a>].</p>

		<h4 id="sssec-semantic-accuracy" class="subsection">Semantic accuracy</h4>
		<p><em>Semantic accuracy</em> is the degree to which data values correctly represent real-world phenomena, which may be affected by imprecise extraction results, untrustworthy sources, vandalism, etc. For instance, in Figure&nbsp;<a href="#fig-bad">7.1</a>, the start of the <span class="gnode">EID15</span> event comes after the end of the event, possibly due to a typo in the year. While such a case could potentially be identified using, for example, shape-based validation, other cases might be more difficult to detect; for example, if we were to accidentally (and incorrectly) swap the venues for <span class="gnode">EID15</span> and <span class="gnode">EID17</span>, there might be no indication whatsoever in the knowledge graph that the venues are incorrect, even if we have additional schemata/ontologies/rules available. Assessing the level of semantic inaccuracy is challenging. While one option is to apply manual verification, an automatic option may be to check the stated relation against several sources&nbsp;[<a href="#ref-Lei">Lei et al., 2007</a>, <a href="#ref-EstevesRRL18">Esteves et al., 2018</a>]. An alternative is to validate the quality of the processes used to generate the knowledge graph, based on measures such as precision, possibly with the help of human experts or gold standards&nbsp;[<a href="#ref-IESW">Martínez-Rodríguez et al., 2020</a>].</p>

		<h4 id="sssec-timeliness" class="subsection">Timeliness</h4>
		<p><em>Timeliness</em> is the degree to which the knowledge graph is kept up-to-date with the real world state&nbsp;[<a href="#ref-KaferAUOH13">Käfer et al., 2013</a>]. A knowledge graph may be semantically accurate now, but may quickly become inaccurate (outdated) if no procedures are in place to keep it up-to-date in a timely manner. Considering Figure&nbsp;<a href="#fig-bad">7.1</a>, the events appear to be from years ago, and if not updated, then the knowledge graph will not be suitable for applications that wish to recommend upcoming events to users. Additionally, the meaning of some values in the graph, such as <span class="gnode">Next&nbsp;Tuesday</span> or <span class="gnode">Next&nbsp;Thursday</span> (which may have been extracted from the text of a news article, for example), will change over time, and become semantically inaccurate in the future. Similarly, the age of Santiago will quickly become outdated, where instead representing the year that the city was founded would facilitate timeliness. Timeliness can be assessed based on how frequently the knowledge graph is updated with respect to underlying sources&nbsp;[<a href="#ref-KaferAUOH13">Käfer et al., 2013</a>, <a href="#ref-RulaPPM14">Rula et al., 2014</a>], which can be done using temporal annotations of changes in the knowledge graph&nbsp;[<a href="#ref-RulaPHSM12">Rula et al., 2012</a>, <a href="#ref-RulaPRNLME19">Rula et al., 2019</a>], as well as contextual representations that capture the temporal validity of data (see Section&nbsp;<a href="#ssec-knowledgeContext">3.3</a>).</p>
		</section>

		<section id="sssec-coverage" class="section">
		<h3>Coverage</h3>
		<p>Coverage refers to avoiding the omission of domain-relevant elements, which otherwise may yield incomplete query results or entailments, biased models, etc.</p>
		
		<h4 id="sssec-completeness" class="subsection">Completeness</h4>
		<p><em>Completeness</em> refers to the degree to which all required information is present in a particular dataset. Completeness comprises the following aspects: (i) <em>schema completeness</em> refers to the degree to which the classes and properties of a schema are represented in the data graph, (ii) <em>property completeness</em> refers to the ratio of missing values for a specific property, (iii) <em>population completeness</em> refers to the percentage of all real-world entities of a particular type that are represented in the datasets, and (iv) <em>linkability completeness</em> refers to the degree to which instances in the data set are interlinked. Taking some examples from Figure&nbsp;<a href="#fig-bad">7.1</a>, the lack of information about the fare for <span class="gnode">EID15</span> might be seen as a form of property incompleteness, while missing events held in Chile around the same time might lead to population incompleteness. Measuring completeness is non-trivial as it assumes knowledge of a hypothetical <em>ideal knowledge graph</em>&nbsp;[<a href="#ref-DarariNPR18">Darari et al., 2018</a>] that contains all the elements that the knowledge graph in question <em>should</em> have. Concrete strategies may involve comparison with gold standards that provide samples of the ideal knowledge graph (possibly based on <em>completeness statements</em>&nbsp;[<a href="#ref-DarariNPR18">Darari et al., 2018</a>]), or measuring the recall of extraction methods from complete sources&nbsp;[<a href="#ref-IESW">Martínez-Rodríguez et al., 2020</a>].</p>

		<h4 id="sssec-representativeness" class="subsection">Representativeness</h4>
		<p><em>Representativeness</em> is a related dimension that, instead of focusing on the ratio of domain-relevant elements that are missing, rather focuses on assessing high-level <em>biases</em> in what is included/excluded from the knowledge graph&nbsp;[<a href="#ref-Baeza-Yates18">Baeza-Yates, 2018</a>]. As such, this dimension assumes that the knowledge graph is incomplete – i.e., that it is a sample of the ideal knowledge graph – and asks how biased this sample is. Biases may occur in the data, in the schema, or during reasoning&nbsp;[<a href="#ref-Janowicz0RZM18">Janowicz et al., 2018</a>]. Examples of data biases include geographic biases that under-represent entities/relations from certain parts of the world&nbsp;[<a href="#ref-Janowicz0RZM18">Janowicz et al., 2018</a>], linguistic biases that under-represent multilingual resources (e.g., labels and descriptions) for certain languages&nbsp;[<a href="#ref-KaffeePVSCP17">Kaffee et al., 2017</a>], social biases that under-represent people of particular genders or races&nbsp;[<a href="#ref-WagnerGGM16">Wagner et al., 2016</a>], and so forth. In contrast, schema biases may result from high-level definitions extracted from biased data&nbsp;[<a href="#ref-Janowicz0RZM18">Janowicz et al., 2018</a>], semantic definitions that do not cover uncommon cases, etc. Unrecognised biases may lead to adverse effects; for example, if the knowledge graph of Figure&nbsp;<a href="#fig-bad">7.1</a> has a geographic bias towards events and attractions close to Santiago city – due perhaps to the sources used for creation, the employment of curators from the city, etc. – then this may lead to tourism in and around Santiago being disproportionately promoted to the detriment of tourism elsewhere in Chile. Measures of representativeness may involve comparing known statistical distributions with those of the knowledge graph, for example, comparing geolocated entities with known population densities&nbsp;[<a href="#ref-Janowicz0RZM18">Janowicz et al., 2018</a>], linguistic distributions with known distributions of speakers&nbsp;[<a href="#ref-KaffeePVSCP17">Kaffee et al., 2017</a>], etc. Another more general option is to compare the knowledge graph with general statistical laws, where <a href="#ref-SouletGMS18">Soulet et al. [2018]</a> use (non-)conformance with Benford’s law<sup class="fnmark" id="fnm31"><a href="#fn31">31</a></sup><span class="footnote" id="fn31"><sup><a href="#fnm31">note 31</a></sup> Benford’s law states that the leading significant digit in many collections of numbers is more likely to be small.</span> to measure representativeness in knowledge graphs.</p>
		</section>

		<section id="ssec-coherency" class="section">
		<h3>Coherency</h3>
		<p><em>Coherency</em> refers to how well the knowledge graph conforms to – or is coherent with – the formal semantics and constraints defined at the schema-level.</p>

		<figure id="fig-badOnt">
			<img src="images/fig-badOnt.svg" alt="An ontology for the knowledge graph of Figure&nbsp;7.1"/>
			<figcaption>An ontology for the knowledge graph of Figure&nbsp;<a href="#fig-bad">7.1</a></figcaption>
		</figure>

		<h4 id="sssec-consistency" class="subsection">Consistency</h4>
		<p><em>Consistency</em> means that a knowledge graph is free of contradictions (i.e., inconsistencies) with respect to the particular logical entailment considered. For example, if we apply the entailments defined in Table&nbsp;<a href="#tab-ontEqIneq">4.1</a> over the graph of Figure&nbsp;<a href="#fig-bad">7.1</a>, we see that the edge <span class="gnode">Santiago&nbsp;de&nbsp;Chile</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">same&nbsp;as</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago&nbsp;de&nbsp;Cuba</span> is inferred from both entities being the same as <span class="gnode">Santiago</span>, which generates an inconsistency with the edge <span class="gnode">Santiago&nbsp;de&nbsp;Chile</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">diff.&nbsp;from</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago&nbsp;de&nbsp;Cuba</span> as stated in the graph. While in this case it is evident that <span class="gnode">Santiago&nbsp;de&nbsp;Cuba</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">same&nbsp;as</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> is semantically inaccurate (considering that the venues connected to <span class="gnode">Santiago</span> are in Chile), in other cases there may not be an obvious inaccuracy. Take, for example, the ontology defined in Figure&nbsp;<a href="#fig-badOnt">7.2</a>, combined with the graph of Figure&nbsp;<a href="#fig-bad">7.1</a>, and the ontological entailments of Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a>. Noting that the food festival <span class="gnode">EID15</span> offers a takeaway service, according to the ontology, this entails that <span class="gnode">EID15</span> is a restaurant, a building, and a place, which is disjoint with event. However, <span class="gnode">EID15</span> is also entailed to be a festival, and then an event, generating an inconsistency. In this case there is no clear individual “error” leading to an inconsistency. Possibly the graph of Figure&nbsp;<a href="#fig-bad">7.1</a> should not use the property <span class="gelab">service</span> for a food event (though it seems a “good fit”), or perhaps the ontology of Figure&nbsp;<a href="#fig-badOnt">7.2</a> should not define the domain of the property <span class="gelab">service</span> to be a restaurant. Any ontological features in Tables&nbsp;<a href="#tab-ontEqIneq">4.1</a>–<a href="#tab-ontClass">4.3</a> with a “not” condition can give rise to inconsistencies if the negated condition is entailed. A measure of consistency can be the number of inconsistencies found in a knowledge graph, possibly sub-divided into the number of such inconsistencies identified by each semantic feature&nbsp;[<a href="#ref-BonattiHPS11">Bonatti et al., 2011</a>].</p>

		<h4 id="sssec-validity" class="subsection">Validity</h4>
		<p><em>Validity</em> means that the knowledge graph is free of constraint violations, such as captured by shape expressions&nbsp;[<a href="#ref-ThorntonSSGMPW19">Thornton et al., 2019</a>] (see Section&nbsp;<a href="#sssec-validating-schema">3.1.2</a>. We may, for example, specify a shape <span class="shap">City</span> whose target nodes have at most one country. Then, taking the edges <span class="gnode">Chile</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">country</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">country</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Cuba</span> from Figure&nbsp;<a href="#fig-bad">7.1</a>, and assuming that <span class="gnode">Santiago</span> becomes a target of <span class="shap">City</span>, we have a constraint violation. Conversely, even if we defined analogous cardinality restrictions in an ontology (e.g., even if we defined that <span class="gelab">country</span> was functional), this would not necessarily cause an inconsistency since, without UNA, we would first infer that <span class="gnode">Chile</span> and <span class="gnode">Cuba</span> refer to the same entity. Similarly, using shapes, we can more easily detect missing data; for example, we can define a shape <span class="shap">Event</span>, and require that it have at least one value for the property <span class="gelab">fare</span>. Now, if <span class="gnode">EID15</span> becomes targetted by <span class="shap">Event</span>, then we will have a constraint violation as the node has no value for <span class="gelab">fare</span>. Conversely, even if we defined analogous cardinality restrictions in an ontology (e.g., we defined that events have a minimum cardinality of 1 for <span class="gelab">fare</span>), this would not cause an inconsistency since, under the OWA, we would rather entail that the event <span class="ginode">EID15</span> has some fair (that is not described in the graph). Consistency and validity can thus indicate different types of issues. A straightforward measure of validity is to count the number of violations per constraint.</p>
		</section>

		<section id="ssec-succinctness" class="section">
		<h3>Succinctness</h3>
		<p><em>Succinctness</em> refers to the inclusion only of relevant content (avoiding “information overload”) that is represented in a concise and intelligible manner.</p>

		<h4 id="sssec-conciseness" class="subsection">Conciseness</h4>
		<p><em>Conciseness</em> refers to avoiding schema and data elements that are irrelevant to the domain. <a href="#ref-MendesMB12">Mendes et al. [2012b]</a> distinguish <em>intensional conciseness</em> (schema level), which refers to the case when the data do not contain redundant schema elements (properties, classes, shapes, etc.), and <em>extensional conciseness</em> (data level), where the data do not describe redundant entities and relations. For example, the inclusion of a property and class for modelling jurisdictions and legal entities in the ontology of Figure&nbsp;<a href="#fig-badOnt">7.2</a> may affect the intensional conciseness of the ontology in the context of a knowledge graph about tourist events. Similarly, the inclusion of data about <span class="gnode">Santiago&nbsp;de&nbsp;Cuba</span> in our knowledge graph dedicated to tourism in Chile may affect the extensional conciseness of the knowledge graph, potentially returning irrelevant results for the given domain. In general, conciseness can be measured in terms of the ratio of properties, classes, shapes, entities, relations, etc., of relevance to the domain, which may in turn require a gold standard, or measures of domain-relevance.</p>

		<h4 id="sssec-representational-conciseness" class="subsection">Representational conciseness</h4>
		<p><em>Representational conciseness</em> refers to the extent to which content is compactly represented in the knowledge graph, which may again be intensional or extensional&nbsp;[<a href="#ref-ZaveriRMPLA16">Zaveri et al., 2016</a>]. For example, having two properties <span class="gelab">category</span> and <span class="gelab">type</span> serving the same purpose would negatively affect the intensional form of representational conciseness, while having two nodes <span class="gnode">Santiago</span> and <span class="gnode">Santiago&nbsp;de&nbsp;Chile</span> that split the data available about the capital of Chile would affect the extensional form of representational conciseness. Another example of poor representational conciseness is the unnecessary use of complex modelling constructs, such as using reification unnecessarily, or using linked lists when the order of elements is not important&nbsp;[<a href="#ref-HoganUHCPD12">Hogan et al., 2012a</a>]. An example of this is the anonymous node used in Figure&nbsp;<a href="#fig-bad">7.1</a> to represent the days on which <span class="gnode">EID17</span> starts and ends, which could rather be directly associated with the event (at least if we assume that events have one start and one end moment in time). A different example is the specification of the duration of <span class="gnode">EID15</span>, which could be calculated from the start and end values (assuming the correct datatypes were used). Though representational conciseness is challenging to assess, measures such as the number of redundant nodes can be used&nbsp;[<a href="#ref-Furber">Fürber and Hepp, 2011</a>].</p>

		<h4 id="sssec-understandability" class="subsection">Understandability</h4>
		<p><em>Understandability</em> refers to the ease with which data can be interpreted without ambiguity by human users, which involves – at least – the provision of human-readable labels and descriptions (preferably in different languages&nbsp;[<a href="#ref-KaffeePVSCP17">Kaffee et al., 2017</a>]) that allow such beings to understand what is being spoken about&nbsp;[<a href="#ref-HoganUHCPD12">Hogan et al., 2012a</a>]. Referring back to Figure&nbsp;<a href="#fig-bad">7.1</a>, though the nodes <span class="gnode">EID15</span> and <span class="gnode">EID17</span> are used to ensure unique identifiers for events, they should also be associated with labels, such as <span class="gnode">Ñam</span>. Ideally the human readable information is sufficient to disambiguate a particular node, such as associating a description <span class="gnode">"Santiago,&nbsp;the&nbsp;capital&nbsp;of&nbsp;Chile"@en</span> with <span class="gnode">Santiago</span> to disambiguate the city from synonymous ones. Measures of understandability may include the ratio of nodes with human-readable labels and descriptions, the uniqueness of such labels and descriptions, the languages supported, etc.</p>
		</section>

		<section id="ssec-other-quality" class="section">
		<h3>Other Quality Dimensions</h3>
		<p>The list of quality dimensions provided here should be considered illustrative rather than complete. Further dimensions may be pertinent in the context of specific domains, applications, or graph data models. For more discussion, we refer to the survey by <a href="#ref-ZaveriRMPLA16">Zaveri et al. [2016]</a> and to the book by <a href="#ref-BatiniS16">Batini and Scannapieco [2016]</a>.</p>
		</section>
	</section>
	<section id="chap-refine" class="chapter">
		<h2>Refinement</h2>
		<p>Beyond assessing the quality of a knowledge graph, there exist techniques to <em>refine</em> the knowledge graph, in particular to (semi-)automatically complete and correct the knowledge graph&nbsp;[<a href="#ref-Paulheim17">Paulheim, 2017</a>], aka <em>knowledge graph completion</em> and <em>knowledge graph correction</em>, respectively. As distinguished from the creation and enrichment tasks outlined in Chapter&nbsp;<a href="#chap-create">6</a>, refinement typically does not involve applying extraction or mappings over external sources in order to ingest their content into a given knowledge graph (potentially using external sources to verify its content).</p>

		<section id="ssec-completion" class="section">
		<h3>Completion</h3>
		<p>Knowledge graphs are characterised by incompleteness&nbsp;[<a href="#ref-West14">West et al., 2014</a>]. As such, knowledge graph completion aims at filling in the <em>missing edges</em> (aka <em>missing links</em>) of a knowledge graph, i.e., edges that are deemed correct but are neither given nor entailed by the knowledge graph. This task is often addressed with <em>link prediction</em> techniques proposed in the area of <em>Statistical Relational Learning</em>&nbsp;[<a href="#ref-Getoor07">Getoor and Taskar, 2007</a>], which predict the existence – or sometimes more generally, predict the probability of correctness – of missing edges. For instance, one might predict that the edge <span class="gnode">Moon&nbsp;Valley</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San&nbsp;Pedro</span> is a probable missing edge for the graph of Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, given that most bus routes observed are return services (i.e., <span class="gelab">bus</span> is typically symmetric). Link prediction may target three settings: <em>general links</em> involving edges with arbitrary labels, e.g., <span class="gelab">bus</span>, <span class="gelab">flight</span>, <span class="gelab">type</span>, etc.; <em>type links</em> involving edges with label <span class="gelab">type</span>, indicating the type of an entity; and <em>identity links</em> involving edges with label <span class="gelab">same as</span>, indicating that two nodes refer to the same entity (cf. Section&nbsp;<a href="#sssec-external_identy">3.2.2</a>). While type and identity links can be addressed using general link prediction techniques, the particular semantics of type and identity links can be addressed with custom techniques. (The related task of generating links across knowledge graphs – referred to as <em>link discovery</em>&nbsp;[<a href="#ref-nentwig2017survey">Nentwig et al., 2017</a>] – will be discussed later in Section&nbsp;<a href="#ssec-principles">9.1</a>.)</p>

		<h4 id="sssec-general-link-prediction" class="subsection">General link prediction</h4>
		<p>Link prediction, in the general case, is often addressed with inductive techniques as discussed in Chapter&nbsp;<a href="#chap-inductive">5</a>, and in particular, knowledge graph embeddings and rule/axiom mining. For example, given Figure&nbsp;<a href="#fig-chileTransport">5.2</a>, using knowledge graph embeddings, we may detect that given an edge of the form <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span>, a (missing) edge <span class="gnode">\(y\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(x\)</span> has high plausibility, while using symbol-based approaches, we may learn the high-level rule <span class="gvar">?x</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?y</span> \(\Rightarrow\) <span class="gvar">?y</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar">?x</span> that may infer/predict new <span class="gelab">bus</span> links. Either approach would help us to predict the missing link <span class="gnode">Moon Valley</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">bus</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">San Pedro</span>.</p>

		<h4 id="sssec-type-link-prediction" class="subsection">Type-link prediction</h4>
		<p>Type links are of particular importance to a knowledge graph, where dedicated techniques can be leveraged taking into account the specific semantics of such links. In the case of type prediction, there is only one edge label (<span class="gelab">type</span>) and typically fewer distinct values (classes) than in other cases, such that the task can be reduced to a traditional classification task&nbsp;[<a href="#ref-Paulheim17">Paulheim, 2017</a>], training models to identify each semantic class based on features such as outgoing and/or incoming edge labels on their instances in the knowledge graph&nbsp;[<a href="#ref-paulheim2013type">Paulheim and Bizer, 2013</a>, <a href="#ref-SleemanF13">Sleeman and Finin, 2013</a>]. For example, assume that in Figure&nbsp;<a href="#fig-chileTransport">5.2</a> we also know that <span class="gnode">Arica</span>, <span class="gnode">Calama</span>, <span class="gnode">Puerto Montt</span>, <span class="gnode">Punta Arenas</span> and <span class="gnode">Santiago</span> are of <span class="gelab">type</span> <span class="gnode">City</span>. We may then predict that <span class="gnode">Iquique</span> and <span class="gnode">Easter Island</span> are also of <span class="gelab">type</span> <span class="gnode">City</span> based on the presence of edges labelled <span class="gelab">flight</span> to/from these nodes, which (we assume) are learnt to be a good feature for prediction of that class (the former prediction is correct, while the latter is incorrect). Graph neural networks (see Section&nbsp;<a href="#ssec-gnns">5.3</a>) can also be used for node classification/type prediction.</p>

		<h4 id="sssec-identity-link-prediction" class="subsection">Identity-link prediction</h4>
		<p>Predicting identity links involves searching for nodes that refer to the same entity, but are not stated or entailed to be the same; this is analogous to the task of <em>entity matching</em> (aka record linkage, deduplication, etc.) considered in more general data integration settings&nbsp;[<a href="#ref-KopckeR10">Köpcke and Rahm, 2010</a>]. Such techniques are generally based on two types of <em>matchers</em>: <em>value matchers</em> determine how similar the values of two entities on a given property are, which may involve similarity metrics on strings, numbers, dates, etc.; while <em>context matchers</em> consider the similarity of entities based on various nodes and edges&nbsp;[<a href="#ref-KopckeR10">Köpcke and Rahm, 2010</a>]. An illustrative example is given in Figure&nbsp;<a href="#fig-identity">8.1</a>, where value matchers will compute similarity between values such as <span class="gnode">7400</span> and <span class="gnode">7500</span>, while context matchers will compute similarity between <span class="gnode">Easter Island</span> and <span class="gnode">Rapa&nbsp;Nui</span> based on their surrounding information, such as similar latitudes, longitudes, populations, and the same seat (conversely, a value matcher on this pair of nodes would measure string similarity between “<code>Easter Island</code>” and “<code>Rapa Ñui</code>”).</p>

		<figure id="fig-identity">
			<img src="images/fig-identity.svg" alt="Identity linking example: Easter Island and Rapa Nui denote the same place"/>
			<figcaption>Identity linking example: <span class="gnode">Easter&nbsp;Island</span> and <span class="gnode">Rapa&nbsp;Nui</span> denote the same place</figcaption>
		</figure>

		<p>A major challenge in this setting is efficiency, where a pairwise matching would require \(O(n^2)\) comparisons for \(n\) the number of nodes. To address this issue, <em>blocking</em> can be used to group similar entities into (possibly overlapping, possibly disjoint) “blocks” based on similarity-preserving keys, with matching performed within each block&nbsp;[<a href="#ref-isele2011efficient">Isele et al., 2011</a>, <a href="#ref-KopckeR10">Köpcke and Rahm, 2010</a>, <a href="#ref-DraisbachN11">Draisbach and Naumann, 2011</a>]; for example, if matching places based on latitude/longitude, blocks may represent geographic regions. An alternative to discrete blocking is to use <em>windowing</em> over entities in a similarity-preserving ordering&nbsp;[<a href="#ref-DraisbachN11">Draisbach and Naumann, 2011</a>], or to consider searching for similar entities within <em>multi-dimensional spaces</em> (e.g., spacetime&nbsp;[<a href="#ref-santipantakis2019stld">Santipantakis et al., 2019</a>], spaces with Minkowski distances&nbsp;[<a href="#ref-minkowski">Ngonga Ngomo, 2012</a>], orthodromic spaces&nbsp;[<a href="#ref-orchid">Ngonga Ngomo, 2013</a>], etc.&nbsp;[<a href="#ref-SherifN18">Sherif and Ngonga Ngomo, 2018</a>]). The results can either be pairs of nodes with a computed confidence of them referring to the same entity, or crisp identity links extracted based on a fixed threshold, or binary classification&nbsp;[<a href="#ref-KopckeR10">Köpcke and Rahm, 2010</a>]. For confident identity links, the nodes’ edges may then be <em>consolidated</em>&nbsp;[<a href="#ref-HoganZUPD12">Hogan et al., 2012b</a>]; for example, we may select <span class="gnode">Easter Island</span> as the canonical node and merge the edges of <span class="gnode">Rapa&nbsp;Nui</span> onto it, enabling us to find, e.g., <em>World Heritage Sites in the Pacific Ocean</em> from Figure&nbsp;<a href="#fig-identity">8.1</a> based on the (consolidated) sub-graph <span class="gnode">World Heritage Site</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">named</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gnode">Easter Island</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">ocean</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Pacific</span>.</p>
		</section>

		<section id="ssec-correction" class="section">
		<h3>Correction</h3>
		<p>As opposed to completion – which finds new edges in a knowledge graph – correction identifies and removes existing incorrect edges in the knowledge graph. We here divide the principal approaches for knowledge graph correction into two main lines: <em>fact validation</em>, which assigns a plausibility score to a given edge, typically in reference to external sources; and <em>inconsistency repairs</em>, which aim to resolve inconsistencies found in the knowledge graph through ontological axioms.</p>

		<h4 id="sssec-fact-validation" class="subsection">Fact validation</h4>
		<p>The task of <em>fact validation</em> (aka <em>fact checking</em>)&nbsp;[<a href="#ref-gerber2015defacto">Gerber et al., 2015</a>, <a href="#ref-syed2018factcheck">Syed et al., 2018</a>, <a href="#ref-yin2008truth">Yin et al., 2008</a>, <a href="#ref-syed2019copaal">Syed et al., 2019</a>, <a href="#ref-EstevesRRL18">Esteves et al., 2018</a>, <a href="#ref-shiralkar2017finding">Shiralkar et al., 2017</a>, <a href="#ref-shi2016discriminative">Shi and Weninger, 2016</a>, <a href="#ref-socher2013reasoning">Socher et al., 2013</a>, <a href="#ref-bordes2013translating">Bordes et al., 2013</a>] involves assigning plausibility or <em>veracity</em> scores to facts/edges, typically between \(0\) and \(1\). An ideal fact-checking function assumes a hypothetical reference universe (an ideal knowledge graph) and would return \(1\) for the fact <span class="gnode">Santa Lucía</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> (being true) while returning \(0\) for <span class="gnode">Sotomayor</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">city</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> (being false). There is a clear relation between fact validation and link prediction – with both relying on assessing the plausibility of edges/facts/links – and indeed the same numeric- and symbol-based techniques can be applied for both cases. However, fact validation often considers online assessment of edges given as input, whereas link prediction is often an offline task that generates novel candidate edges to be assessed from the knowledge graph. Furthermore, works on fact validation are characterised by their consideration of external reference sources, which may be <em>unstructured sources</em>&nbsp;[<a href="#ref-gerber2015defacto">Gerber et al., 2015</a>, <a href="#ref-syed2018factcheck">Syed et al., 2018</a>, <a href="#ref-Samadi2016">Samadi et al., 2016</a>, <a href="#ref-yin2008truth">Yin et al., 2008</a>] or <em>structured sources</em> &nbsp;[<a href="#ref-syed2019copaal">Syed et al., 2019</a>, <a href="#ref-shiralkar2017finding">Shiralkar et al., 2017</a>, <a href="#ref-shi2016discriminative">Shi and Weninger, 2016</a>, <a href="#ref-socher2013reasoning">Socher et al., 2013</a>, <a href="#ref-bordes2013translating">Bordes et al., 2013</a>].</p>
		<p>Approaches based on unstructured sources assume that they are given a <em>verbalisation function</em> – using, for example, rule-based approaches&nbsp;[<a href="#ref-ngonga2013sorry">Ngonga Ngomo et al., 2013</a>, <a href="#ref-ell2014sparql">Ell et al., 2014</a>], encoder–decoder architectures&nbsp;[<a href="#ref-gardent2017webnlg">Gardent et al., 2017</a>], etc. – that is able to translate edges into natural language. Thereafter, approaches for computing the plausibility of facts in natural language – called <em>fact finders</em>&nbsp;[<a href="#ref-Pasternack2010">Pasternack and Roth, 2010</a>, <a href="#ref-pasternack2011making">Pasternack and Roth, 2011</a>] – can be directly employed. Many fact finding algorithms construct an \(n\)-partite (often bipartite) graph whose nodes are facts and sources, where a source is connected to a fact if the source “evidences” the fact, i.e., if it contains a text snippet that matches – with sufficient confidence – the verbalisation of the input edge. Two mutually-dependent scores, namely the trustworthiness of sources and the plausibility of facts, are then calculated based on this graph, where fact finders differ on how they compute these scores&nbsp;[<a href="#ref-pasternack2011making">Pasternack and Roth, 2011</a>]. Here we mention three scores proposed by <a href="#ref-Pasternack2010">Pasternack and Roth [2010]</a></p>
		<ul>
			<li><em>Sums</em>&nbsp;[<a href="#ref-Pasternack2010">Pasternack and Roth, 2010</a>] adapts the classical HITS centrality algorithm&nbsp;[<a href="#ref-kleinberg1999hubs">Kleinberg, 1999</a>] by defining sources as hubs (with 0 authority score) and facts as authorities (with 0 hub score).</li>
			<li><em>Average Log</em>&nbsp;[<a href="#ref-Pasternack2010">Pasternack and Roth, 2010</a>] extends HITS with a normalisation factor that prevents a single source from receiving a high trustworthiness score by evidencing many facts (that may be false).</li>
			<li><em>Investment</em>&nbsp;[<a href="#ref-Pasternack2010">Pasternack and Roth, 2010</a>] lets the scores of facts grow with a non-linear function based on “investments” coming from the connected sources. The score a source receives from a fact is based on the individual facts in this particular source compared to the other connected sources.</li>
		</ul>
		<p><a href="#ref-pasternack2011making">Pasternack and Roth [2011]</a> then show that these three algorithms can be generalised into a single multi-layered graph-based framework within which (1) a source can support a fact with a weight expressing uncertainty, (2) similar facts can support each other, and (3) sources can be grouped together leading to an implicit support between sources of the same group. Other approaches for fact checking of knowledge graphs later extended this framework&nbsp;[<a href="#ref-galland2010">Galland et al., 2010</a>, <a href="#ref-Samadi2016">Samadi et al., 2016</a>]. Alternative approaches based on machine learning classifiers have also emerged, where commonly-used features include trust scores for information sources, co-occurrences of facts in sources, and so forth&nbsp;[<a href="#ref-gerber2015defacto">Gerber et al., 2015</a>, <a href="#ref-syed2018factcheck">Syed et al., 2018</a>].</p>
		<p>Approaches for fact validation based on structured data typically assume external knowledge graphs as reference sources and are based on finding paths that support the edge being validated. Unsupervised approaches search for undirected&nbsp;[<a href="#ref-shiralkar2017finding">Shiralkar et al., 2017</a>, <a href="#ref-ciampaglia2015computational">Ciampaglia et al., 2015</a>] or directed&nbsp;[<a href="#ref-syed2019copaal">Syed et al., 2019</a>] paths up to a given threshold length that support the input edge. The relatedness between input edges and paths is computed using a mutual information function, such as normalised pointwise mutual information&nbsp;[<a href="#ref-bouma2009normalized">Bouma, 2009</a>]. Supervised approaches rather extract features for input edges from external knowledge graphs&nbsp;[<a href="#ref-sun2011pathsim">Sun et al., 2011</a>, <a href="#ref-zhao2015automatic">Zhao et al., 2015</a>, <a href="#ref-lao2010relational">Lao and Cohen, 2010</a>] and train a classification model to label the edges as true or false. An important set of features are <em>metapaths</em>, which encode sequences of predicates that correlate positively with the edge label of the input edge. Amongst such works, PredPath&nbsp;[<a href="#ref-shi2016discriminative">Shi and Weninger, 2016</a>] automatically extracts metapaths based on type information. Several approaches rather encode the reference nodes and edges using graph embeddings (see Section&nbsp;<a href="#ssec-embeddings">5.2</a>), which are then used to estimate the plausibility of the input edge being validated.</p>

		<h4 id="sssec-inconsistency-repairs" class="subsection">Inconsistency repairs</h4>
		<p>Ontologies can contain axioms – such as disjointness – that lead to inconsistencies. While such axioms can be provided by experts, they can can also be derived through symbolic learning, as discussed in Section&nbsp;<a href="#ssec-symlearn">5.4</a>. Such axioms can then be used to detect inconsistencies. With respect to correcting a knowledge graph, however, detecting inconsistencies is not enough: techniques are also required to <em>repair</em> such inconsistencies, which itself is not a trivial task. In the simplest case, we may have an instance of two disjoint classes, such as that <span class="gnode">Santiago</span> is of type <span class="gnode">City</span> and <span class="gnode">Airport</span>, which are stated or found to be disjoint. To repair the inconsistency, it would be preferable to remove only the “incorrect” class, but which should we remove? This is not a trivial question, particularly if we consider that one edge can be involved in many inconsistencies, and one inconsistency can involve many edges. The issue of computing repairs becomes more complex when entailment is considered, where we not only need to remove the stated type, but also all of the ways in which it might be entailed; for example, removing the edge <span class="gnode">Santiago</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Airport</span> is insufficient if we further have an edge <span class="gnode">Arica</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">flight</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> combined with an axiom <span class="gnode">flight</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">range</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Airport</span>. <a href="#ref-TopperKS12">Töpper et al. [2012]</a> suggest potential repairs for such violations – remove a domain/range constraint, remove a disjointness constraint, remove a type edge, or remove an edge with a domain/range constraint – where one is chosen manually. In contrast, <a href="#ref-BonattiHPS11">Bonatti et al. [2011]</a> propose an automated method to repair inconsistencies based on <em>minimal hitting sets</em>&nbsp;[<a href="#ref-Reiter87">Reiter, 1987</a>], where each set is a minimal explanation for an inconsistency. The edges to remove are chosen based on scores of the trustworthiness of their sources and how many minimal hitting sets they are either elements of or help to entail an element of, where the knowledge graph is revised to avoid re-entailment of the removed edges. Rather than repairing the data, another option is to evaluate queries under inconsistency-aware semantics, such as returning <em>consistent answers</em> valid under every possible repair&nbsp;[<a href="#ref-LukasiewiczMS13">Lukasiewicz et al., 2013</a>].</p>
		</section>

		<section id="ssec-other-refinement-tasks" class="section">
		<h3>Other Refinement Tasks</h3>
		<p>In comparison to the quality clusters discussed in Chapter&nbsp;<a href="#chap-quality">7</a>, the refinement methods discussed herein address particular aspects of the accuracy, coverage, and coherency dimensions. Beyond these, one could conceive of further refinement methods to address further quality issues of knowledge graphs, such as succinctness. In general, however, the refinement tasks of <em>knowledge graph completion</em> and <em>knowledge graph correction</em> have received the majority of attention until now. For further details on knowledge graph refinement, we refer to the survey by <a href="#ref-Paulheim17">Paulheim [2017]</a>.</p>
		</section>
	</section>
	<section id="chap-publish" class="chapter">
		<h2>Publication</h2>
		<p>While it may not always be desirable to publish knowledge graphs (for example, those that offer a competitive advantage to a company&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>], it may be desirable or even required to publish other knowledge graphs, such as those produced by volunteers&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>, <a href="#ref-MahdisoltaniBS15">Mahdisoltani et al., 2015</a>, <a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>], by publicly-funded research&nbsp;[<a href="#ref-CallahanCAD13">Callahan et al., 2013</a>, <a href="#ref-GrothLGGHP14">Groth et al., 2014</a>, <a href="#ref-uniprot2014">Consortium, 2014</a>], by governmental organisations&nbsp;[<a href="#ref-HendlerHMT12">Hendler et al., 2012</a>, <a href="#ref-ShadboltO13">Shadbolt and O'Hara, 2013</a>]. Publishing refers to making the knowledge graph (or part thereof) accessible to the public, often on the Web. Knowledge graphs published as open data are called open knowledge graphs (discussed in Section&nbsp;<a href="#sec-openkgs">10.1</a>).</p>
		<p>In the following, we first discuss two sets of principles that have been proposed to guide the publication of data on the Web. We next discuss access protocols by which the public can interact with the content of a knowledge graph. Finally, we consider techniques to restrict the access or usage of (parts of) a knowledge graph.</p>

		<section id="ssec-principles" class="section">
		<h3>Best Practices</h3>
		<p>We now discuss two key sets of publishing principles: the FAIR Principles&nbsp;<a href="#ref-wilkinson2016fair">Wilkinson et al. [2016]</a>, and the Linked Data Principles&nbsp;[<a href="#ref-ldprinciples">Berners-Lee, 2006</a>].</p>

		<h4 id="ssec-fair" class="subsection">FAIR Principles</h4>
		<p>The FAIR Principles were originally proposed in the context of publishing scientific data&nbsp;[<a href="#ref-wilkinson2016fair">Wilkinson et al., 2016</a>] – particularly motivated by maximising the impact of publicly-funded research – but the principles generally apply to other situations where data are to be published in a manner that facilitates their re-use by external agents, with particular emphasis on machine-readability.</p>
		<p>FAIR itself is an acronym for four foundational principles, each with particular goals&nbsp;[<a href="#ref-wilkinson2016fair">Wilkinson et al., 2016</a>], that may apply to <em>data</em>, <em>metadata</em>, or both – the latter being denoted <em>(meta)data</em>.<sup class="fnmark" id="fnm32"><a href="#fn32">32</a></sup><span class="footnote" id="fn32"><sup><a href="#fnm32">note 32</a></sup> Metadata are data about data. The distinction is often important in observational sciences, where in astronomy, for example, data may include raw image data, while metadata may include coordinates and time.</span> We now describe the FAIR principles (slightly rephrasing the original wording in some cases for brevity&nbsp;[<a href="#ref-wilkinson2016fair">Wilkinson et al., 2016</a>]).</p>
		
		<ul>
			<li><em>Findability</em> refers to the ease with which external agents who might benefit from the dataset can initially locate the dataset. Four sub-goals should be met:<ul>
				<li>F1: (meta)data are assigned a globally unique and persistent identifier.</li>
				<li>F2: data are described with rich metadata (see R1).</li>
				<li>F3: metadata explicitly include the identifier of the data they describe.</li>
				<li>F4: (meta)data are registered or indexed in a searchable resource.</li>
			</ul></li>
			<li><em>Accessibility</em> refers to the ease with which external agents can access the dataset (after locating it). Two goals are defined, the first with two sub-goals:<ul>
				<li>A1: (meta)data are retrievable by their identifier via a standard protocol.<ul>
					<li>A1.1: the protocol is open, free, and universally implementable.</li>
					<li>A1.2: the protocol uses authentication and authorisation if suitable.</li>
				</ul></li>
				<li>A2: metadata are accessible, even when the data are no longer available.</li>
			</ul></li>
			<li><em>Interoperability</em> refers to the ease with which the dataset can be exploited (in unison with other datasets) using standard tools. Three goals are defined:<ul>
				<li>I1: meta)data use an accessible, agreed-upon, and general knowledge representation formalism.</li>
				<li>I2: (meta)data use vocabularies that follow FAIR principles.</li>
				<li>I3: (meta)data include qualified references to other (meta)data.</li>
			</ul></li>
			<li><em>Reusability</em> refers to the ease with which the dataset can be re-used in conjunction with other datasets. One goal is defined (with three sub-goals):<ul>
				<li>R1: meta(data) are richly described with accurate and relevant attributes.<ul>
					<li>R1.1. (meta)data are released with a clear and accessible license.</li>
					<li>R1.2. (meta)data are associated with detailed provenance.</li>
					<li>R1.3. (meta)data meet domain-relevant community standards.</li>
				</ul></li>
			</ul></li>
		</ul>
		<p>In the context of knowledge graphs, a variety of vocabularies, tools, and services have been proposed that both directly and indirectly help to satisfy the FAIR principles. In terms of <em>Findability</em>, as discussed in Chapter&nbsp;<a href="#chap-graph">2</a>, IRIs are built into the RDF model, providing a general schema for global identifiers. In addition, resources such as the Vocabulary of Interlinked Datasets (VoID)&nbsp;[<a href="#ref-AlexanderCHZ09">Alexander et al., 2009</a>] allow for representing metadata about graphs, while services such as DataHub&nbsp;[<a href="#ref-BhardwajBCDEMP15">Bhardwaj et al., 2015</a>] provide a central repository of such dataset descriptions. Access protocols that enable <em>Accessibility</em> will be discussed in Section&nbsp;<a href="#ssec-access">9.2</a>, while mechanisms for authorisation will be discussed in Section&nbsp;<a href="#ssec-UsageControl">9.3</a>. With respect to <em>Interoperability</em>, as discussed in Chapter&nbsp;<a href="#chap-deductive">4</a>, ontologies serve as a general knowledge representation formalism, and can in turn be used to describe vocabularies that follow FAIR principles. Regarding <em>Reusability</em>, licensing will be discussed in Section&nbsp;<a href="#ssec-UsageControl">9.3</a>, while the <em>PROV Data Model</em>&nbsp;[<a href="#ref-prov13">Gil et al., 2013</a>] discussed in Chapter&nbsp;<a href="#chap-knowledge">3</a>, can encode provenance in detail.</p>
		<p>Various knowledge graphs have been published using FAIR principles, where <a href="#ref-wilkinson2016fair">Wilkinson et al. [2016]</a> explicitly mention Open PHACTS&nbsp;[<a href="#ref-GrothLGGHP14">Groth et al., 2014</a>], a data integration platform for drug discovery, and UniProt&nbsp;[<a href="#ref-uniprot2014">Consortium, 2014</a>], a large collection of protein sequence and annotation data, as conforming to FAIR principles. Both datasets offer graph views of their content through RDF.</p>

		<h4 id="sssec-ld" class="subsection">Linked Data Principles</h4>
		<p><a href="#ref-wilkinson2016fair">Wilkinson et al. [2016]</a> state that FAIR Principles “precede implementation choices”, meaning that the principles do not cover <em>how</em> they can or should be achieved. Preceding the FAIR Principles by almost a decade are the Linked Data Principles, proposed by <a href="#ref-ldprinciples">Berners-Lee [2006]</a>, which provide a technical basis for one possible way in which these FAIR Principles can be achieved. Specifically the Linked Data Principles are as follows:</p>
		<ol>
			<li>Use IRIs as names for things.</li>
			<li>Use HTTP IRIs so those names can be looked up.</li>
			<li>When a HTTP IRI is looked up, provide useful content about the entity that the IRI names using standard data formats.</li>
			<li>Include links to the IRIs of related entities in the content returned.</li>
		</ol>
		<p>These principles were proposed in a Semantic Web setting, where for principle&nbsp;(3), the standards based on RDF (including RDFS, OWL, etc.) are currently recommended for use, particularly because they allow for naming entities using HTTP IRIs, which further paves the way for satisfying all four principles. As such, these principles outline a way in which (RDF) graph-structured data can be published on the Web such that these graphs are interlinked to form what <a href="#ref-ldprinciples">Berners-Lee [2006]</a> calls a “Web of Data”, whose goal is to increase automation on the Web by making content available not only in (HTML) documents intended for human consumption, but also as (RDF) structured data that machines can locate, retrieve, combine, validate, reason over, query over, etc., towards solving tasks automatically&nbsp;[<a href="#ref-Hogan20a">Hogan, 2020b</a>]. Conceptually, the Web of Data is then composed of graphs of data published on individual web-pages, where one can click on a node or edge-label – or more precisely perform a HTTP lookup on an IRI of the graph – to be transported to another graph elsewhere on the Web with relevant content for that node or edge-label, and so on recursively.</p>
		<p>Figure&nbsp;<a href="#fig-ld">9.1</a> provides a small example with two Linked Data documents published on the Web, with each containing an RDF graph. As discussed in Section&nbsp;<a href="#sec-identity">3.2</a>, terms such as <code>clv:Concert</code>, <code>wd:Q142701</code>, <code>rdfs:label</code>, etc., are abbreviations for IRIs, where, for example, <code>wd:Q142701</code> expands to <a class="uri" href="http://www.wikidata.org/entity/Q142701">http://www.wikidata.org/entity/Q142701</a>. Prefixes beginning with <code>cl</code> are fictitious prefixes we assume to have been created by the Chilean tourist board. The IRIs prefixed with <span style="color:#bf0040;">\(\hookrightarrow\)<img style="position:relative;top:0.2em;margin-left:0.1em;" src="images/earth.png" width="16" alt="Earth"/></span> indicate the document returned if the node is looked up. The leftmost document is published by the tourist board and describes Lollapalooza 2018 (identified by the node <span class="gnode">cle:LP2018</span>), which links to the headlining act Pearl Jam (<span class="gnode">wd:Q142701</span>) described by an external knowledge graph, namely Wikidata. By looking up the node <span class="gnode">wd:Q142701</span> in the leftmost graph, the IRI <em>dereferences</em> (i.e., returns via HTTP) the document with the RDF graph on the right describing that entity in more detail. From the rightmost document, the node <span class="gnode">wd:Q221535</span> can be looked up, in turn, to find a graph about Eddie Vedder (not shown in the example). The IRIs for entities and documents are distinguished to ensure that we do not confuse data about the entity and the document; for example, while <code>wd:Q221535</code> refers to Eddie Vedder, the IRI <code style="color:#bf0040">wd<strong>d</strong>:Q221535</code> refers to the document about Eddie Vedder; if we were to assign a last-modified date to the document, we should use <span class="gnode" style="color:#bf0040">wd<strong>d</strong>:Q221535</span> not <span class="gnode">wd:Q221535</span>. In Figure&nbsp;<a href="#fig-ld">9.1</a>, we can further observe that edge labels (which are also IRIs) and nodes representing classes (e.g., <span class="gnode">clv:Concert</span>) can also be dereferenced, typically returning semantic definitions of the respective terms.</p>

		<figure id="fig-ld">
			<img src="images/fig-ld.svg" alt="Two example Linked Data documents from two websites, each containing an RDF graph, where wd:Q142701 refers to Pearl Jam in Wikidata while wdd:Q142701 refers to the RDF graph about Pearl Jam, and where wd:Q221535 refers to Eddie Vedder while wdd:Q221535 refers to the RDF graph about Eddie Vedder; the edge-label wdt:571 refers to “inception” in Wikidata, while wdt:527 refers to “has part”"/>
			<figcaption>Two example Linked Data documents from two websites, each containing an RDF graph, where <code>wd:Q142701</code> refers to Pearl Jam in Wikidata while <code>wdd:Q142701</code> refers to the RDF graph about Pearl Jam, and where <code>wd:Q221535</code> refers to Eddie Vedder while <code>wdd:Q221535</code> refers to the RDF graph about Eddie Vedder; the edge-label <span class="gelab">wdt:571</span> refers to “inception” in Wikidata, while <span class="gelab">wdt:527</span> refers to “has part”</figcaption>
		</figure>

		<p>A key challenge is posed by the fourth principle – include links to related entities – as illustrated in Figure&nbsp;<a href="#fig-ld">9.1</a>, where <span class="gnode">wd:Q221535</span> in the leftmost graph constitutes a link to related content about Pearl Jam in an external knowledge graph. Specifically, the <em>link discovery</em> task considers adding such links from one knowledge graph to another, which may involve inclusion of IRIs that dereference to external graphs (per Figure&nbsp;<a href="#fig-ld">9.1</a>), or links with special semantics such as identity links. In comparison with the link prediction task discussed in Section&nbsp;<a href="#ssec-completion">8.1</a>, which is used to complete links within a knowledge graph, link discovery aims to discover links across knowledge graphs, which involves unique aspects: first, link discovery typically considers disjoint sets of source (local) nodes and target (remote) nodes; second, the knowledge graphs may often use different vocabularies; third, while in link prediction there already exist local examples of the links to predict, in link discovery, there are often no existing links between knowledge graphs to learn from. A common technique is to define manually-crafted linkage rules (aka link specifications) that apply heuristics for defining links that potentially incorporate similarity measures&nbsp;[<a href="#ref-NgomoA11">Ngonga Ngomo and Auer, 2011</a>, <a href="#ref-silk">Volz et al., 2009</a>]. Link discovery is greatly expedited by the provision of standard identifier schemes within knowledge graphs, such as ISBNs for books, alpha-2 and alpha-3 codes for countries (e.g., <span class="sc">cl</span>, <span class="sc">clp</span>), or even links to common knowledge graphs such as DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>] or Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>] (that themselves include standard identifiers). We refer to the survey on link discovery by <a href="#ref-nentwig2017survey">Nentwig et al. [2017]</a> for more details.</p>
		<p>More finer-grained recommendations for publishing Linked Data have also been proposed, relating to how best to implement dereferencing, what kinds of links to include, how to publish and interlink vocabularies, amongst other considerations&nbsp;[<a href="#ref-ldbook">Heath and Bizer, 2011</a>, <a href="#ref-JanowiczHAKV14">Janowicz et al., 2014</a>]. We refer to the book by <a href="#ref-ldbook">Heath and Bizer [2011]</a> for more discussion on how to publish Linked Data on the Web.</p>
		</section>

		<section id="ssec-access" class="section">
		<h3>Access Protocols</h3>
		<p>Publishing involves giving access to the public to interact with the knowledge graph, which implies the provision of <em>access protocols</em> that define the requests that agents can make and the response that they can expect as a result. Per the <em>Accessibility</em> principle of FAIR (specifically A1.1), this protocol should be open, free, and universally implementable. In the context of knowledge graphs, as shown in Figure&nbsp;<a href="#fig-access">9.2</a>, there are a number of access protocols to choose from, varying from simple protocols that allow users to simply download all content, towards protocols that accept and evaluate increasingly complex requests. While simpler protocols require less computation on the server that publishes the data, more complex protocols allow agents to request more specific data, thus reducing bandwidth. A knowledge graph may also offer a variety of access protocols catering to different agents with different requirements&nbsp;[<a href="#ref-VerborghSCCMW14">Verborgh et al., 2014</a>]. We now discuss such access protocols.</p>

		<figure id="fig-access">
			<img src="images/fig-access.svg" alt="Access protocols for knowledge graphs, from simple protocols (left) to more complex protocols (right)"/>
			<figcaption>Access protocols for knowledge graphs, from simple protocols (left) to more complex protocols (right)</figcaption>
		</figure>

		<h4 id="sssec-dumps" class="subsection">Dumps</h4>
		<p>A dump is a file or collection of files containing the content of the knowledge graph available for download. The request in this case is for the file(s) and the response is the content of the file(s). In order to publish dumps, first of all, concrete – and ideally standard – syntaxes are required to serialise the graph. While for RDF graphs there are various standard syntaxes available based on XML&nbsp;[<a href="#ref-rdfxml11">Gandon and Schreiber, 2014</a>], JSON&nbsp;[<a href="#ref-jsonld">Sporny et al., 2014</a>], custom syntaxes&nbsp;[<a href="#ref-turtle">Prud'hommeaux and Carothers, 2014</a>], and more besides, currently there are only non-standard syntaxes available for property graphs&nbsp;[<a href="#ref-TomaszukASLC19">Tomaszuk et al., 2019</a>]. Second, to reduce bandwidth, compression methods can be applied. While standard compression such as GZIP or BZip2 can be straightforwardly applied on any file, custom compression methods have been proposed for graphs that not only offer better compression ratios than these standard methods, but also offer additional functionalities, such as compact indexes for performing efficient lookups once the file is downloaded&nbsp;[<a href="#ref-FernandezMGPA13">Fernández et al., 2013</a>]. Finally, to further reduce bandwidth, when the knowledge graph is updated, “diffs” can be computed and published to obviate the need for agents to download all data from scratch (see&nbsp;[<a href="#ref-TummarelloMBE07">Tummarello et al., 2007</a>, <a href="#ref-PapavasileiouFFKC13">Papavasileiou et al., 2013</a>, <a href="#ref-AhnIEZK14">Ahn et al., 2015</a>]). Still, however, dumps are only suited to certain use-cases, in particular for agents that wish to maintain a full local copy of a knowledge graph. If an agent were rather only interested in, for example, all food festivals in Santiago, downloading the entire dump may require transferring and processing a lot of irrelevant data.</p>

		<h4 id="sssec-node-lookups" class="subsection">Node lookups</h4>
		<p>Protocols for performing node lookups accept a node (id) request (e.g., <span class="gnode">cle:LP2018</span> in Figure&nbsp;<a href="#fig-ld">9.1</a>) and return a (sub-)graph describing that node (e.g., the document <code style="color:#bf0040">cld:LP2018</code>). Such a protocol is the basis for the Linked Data principles outlined previously, whereby node lookups are implemented through HTTP dereferencing, which further allows nodes in remote graphs to be referenced from across the Web. Although there are varying definitions on what content should be returned for a node&nbsp;[<a href="#ref-cbd">Stickler, 2005</a>], a common convention is to return a sub-graph containing either all outgoing edges for that node or all incident edges (both outgoing and incoming) for that node&nbsp;[<a href="#ref-HoganUHCPD12">Hogan et al., 2012a</a>]. Though simple, mechanisms for evaluating graph patterns can be implemented on top of a node lookup interface by traversing from node to node per the particular graph pattern&nbsp;[<a href="#ref-HartigBF09">Hartig et al., 2009</a>]; for example, to find all food festivals in Santiago – represented by the graph pattern <span class="gnode">Food Festival</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">type</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> – we may perform a node lookup for <span class="gnode">Santiago</span>, subsequently performing a node lookup for each node connected by a <span class="gelab">location</span> edge to <span class="gnode">Santiago</span>, returning those nodes declared to be of type <span class="gnode">Food Festival</span>. However, such an approach may not be feasible if no starting node is declared (e.g., if all nodes are variables), if the node lookup service does not return incoming edges, etc. The client agent may also need to request more data than necessary; for example, the document returned for <span class="gnode">Santiago</span> may return a lot of data irrelevant to the query, and nodes with a <span class="gelab">location</span> in <span class="gnode">Santiago</span> that are not instances of <span class="gnode">Food Festival</span> still need to be looked up to check their type. Node lookups are relatively inexpensive for servers to support in terms of CPU, but may again waste bandwidth due to transferring irrelevant data.</p>

		<h4 id="sssec-edge-patterns" class="subsection">Edge patterns</h4>
		<p>Edge patterns – also known as <em>triple patterns</em> in the case of directed, edge-labelled graphs – are singleton graph patterns, i.e., graph patterns with a single edge. Examples of edge patterns are <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Food Festival</span> or <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>, etc., where any term can be a variable or a constant. A protocol for edge patterns accepts such a pattern and returns all solutions for the pattern. Edge patterns provide more flexibility than node lookups, where graph patterns are more readily decomposed into edge patterns than node lookups. With respect to the agent interested in food festivals in Santiago, they can first, for example, request solutions for the edge pattern <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> and locally join/intersect these solutions with those of <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Food Festival</span>. Given that some edge patterns (e.g., <span class="gvar"><strong>?x</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="sf"><strong>?y</strong></span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar"><strong>?z</strong></span>) can return many solutions, protocols for edge patterns may offer additional practical features such as iteration or pagination over results&nbsp;[<a href="#ref-VerborghSHHVMHC16">Verborgh et al., 2016</a>]. Much like node lookups, the server cost of responding to a request is relatively low and easy to predict. However, the server may often need to transfer irrelevant intermediate results to the client, which in the previous example may involve returning nodes located in Santiago that are not food festivals. This issue is further aggravated if the client does not have access to statistics about the knowledge graph in order to plan how to best perform the join; for example, if there are relatively few food festivals but many things located in Santiago, rather than intersecting the solutions of the two aforementioned edge patterns, it should be more efficient to send a request for each food festival to see if it is in Santiago, but deciding this requires statistics about the knowledge graph. Extensions to the edge-pattern protocol have thus been proposed to allow for more efficient joins&nbsp;[<a href="#ref-HartigLP17">Hartig et al., 2017</a>], such as allowing batches of solutions to be sent alongside the edge pattern to only return solutions compatible with the solutions in the request&nbsp;[<a href="#ref-HartigA16">Hartig and Buil Aranda, 2016</a>] (e.g., sending a batch of solutions for <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">type</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Food Festival</span> to join with the solutions for the request <span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span>).</p>

		<h4 id="sssec-graph-patterns" class="subsection">(Complex) graph patterns</h4>
		<p>Another alternative is to let client agents make requests based on (complex) graph patterns (see Section&nbsp;<a href="#ssec-querying">2.2</a>), with the server returning (only) the final solutions. In our running example, this involves the client issuing a request for <span class="gnode">Food Festival</span><img class="tip" src="images/edge-revtip.png" width="15" alt="arrow tip leftward"/><span class="edge">type</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="gvar"><strong>?ff</strong></span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">location</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">Santiago</span> and directly receiving the relevant results. Compared with the previous protocols, this protocol is much more efficient in terms of bandwidth: it allows clients to make more specific requests and the server to return more specific responses. However, this reduction in bandwidth use comes at the cost of the server having to evaluate much more complex requests, where, furthermore, the costs of a single request are much more difficult to anticipate. While a variety of optimised engines exist for evaluating (complex) graph patterns (e.g.,&nbsp;[<a href="#ref-virtuoso">Erling, 2012</a>, <a href="#ref-Miller13">Miller, 2013</a>, <a href="#ref-ThompsonPC14">Thompson et al., 2014</a>] amongst many others), the problem of evaluating such queries is known to be intractable&nbsp;[<a href="#ref-AnglesABHRV17">Angles et al., 2017</a>]. Perhaps for this reason, public services offering such a protocol (most often supporting SPARQL queries&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>]) have been found to often exhibit downtimes, timeouts, partial results, slow performance, etc.&nbsp;[<a href="#ref-ArandaHUV13">Buil-Aranda et al., 2013b</a>]. Even considering such issues, however, popular services continue to receive – and successfully evaluate – millions of requests/queries per day&nbsp;[<a href="#ref-malyshev2018getting">Malyshev et al., 2018</a>, <a href="#ref-SaleemAHMN15">Saleem et al., 2015</a>], with difficult (worst-case) instances being rare in practice&nbsp;[<a href="#ref-BonifatiMT17">Bonifati et al., 2017</a>].</p>

		<h4 id="sssec-other-protocols" class="subsection">Other protocols</h4>
		<p>While Figure&nbsp;<a href="#fig-access">9.2</a> makes explicit reference to some of the most commonly-encountered access protocols found for knowledge graphs in practice, one may of course imagine other protocols lying almost anywhere on the spectrum from more simple to more complex interfaces. To the right of (Complex) Graph Patterns, one could consider supporting even more complex requests, such as queries with entailments&nbsp;[<a href="#ref-Glimm11">Glimm, 2011</a>], queries that allow recursion&nbsp;[<a href="#ref-ReutterSV15">Reutter et al., 2015</a>], federated queries that can join results from remote services&nbsp;[<a href="#ref-ArandaACP13">Buil-Aranda et al., 2013a</a>], or even (hypothetically) supporting Turing-complete requests that allow running arbitrary procedural code on a knowledge graph. As mentioned at the outset, a server may also choose to support multiple, complementary protocols&nbsp;[<a href="#ref-VerborghSCCMW14">Verborgh et al., 2014</a>].</p>
		</section>

		<section id="ssec-UsageControl" class="section">
		<h3>Usage Control</h3>
		<p>Considering our hypothetical tourism knowledge graph, at first glance, one might assume that the knowledge required to deliver the envisaged services is public and thus can be used both by the tourism board and the tourists. On closer inspection, however, we may see the need for usage control in various forms:</p>
		<ul class="inline-ul roman">
			<li>both the tourist board and its partners should associate an appropriate license with knowledge that they contribute to the knowledge graph, such that the terms of use are clear to all interested parties;</li>
			<li>a tourist might opt to install an app on their mobile phone that could be used to recommend tourist attractions based on their location, bringing with it potential privacy concerns regarding who has access to their location;</li>
			<li>the tourist board may be required to report criminal activities to the police services and thus may need to encrypt personal information; and</li>
			<li>the tourist board could potentially share information relating to tourism demographics in an anonymous format to allow for other agencies and companies to anticipate demand and improve transport infrastructure on strategic routes.</li>
		</ul>
		<p>Thus in this section, we examine the state of the art in terms of knowledge graph licensing, usage policies, encryption, and anonymisation.</p>

		<h4 id="sssec-licensing" class="subsection">Licensing</h4>
		<p>When it comes to associating machine readable licenses with knowledge graphs, the W3C Open Digital Rights Language (ODRL)&nbsp;[<a href="#ref-odrl">Iannella and Villata, 2018</a>] provides an information model and related vocabularies that can be used to specify permissions, duties, and prohibitions with respect to actions relating to assets. ODRL supports fine-grained descriptions of digital rights that are represented as – and thus can be embedded within – graphs. Figure&nbsp;<a href="#fig-license">9.3</a> illustrates a license granting the assignee the permission to <span class="gnode">Modify</span>, <span class="gnode">Distribute</span>, and <span class="gnode">Derive</span> work from the <span class="gnode">Event Graph</span> (e.g., Figure&nbsp;<a href="#fig-delg">2.1</a>); however the assignee is obliged to <span class="gnode">Attribute</span> the copyright holder. From a modelling perspective, ODRL can be used to model several well-known license families, for instance Apache, Creative Commons (CC), and Berkeley Software Distribution (BSD), to name but a few&nbsp;[<a href="#ref-CabrioAV14">Cabrio et al., 2014</a>, <a href="#ref-panasiuk2018modeling">Panasiuk et al., 2018</a>]. Additionally, <a href="#ref-CabrioAV14">Cabrio et al. [2014]</a> propose methods to automatically extract machine-readable licenses from unstructured text. From a reasoning perspective, license compatibility validation and composition techniques&nbsp;[<a href="#ref-villata2012licenses">Villata and Gandon, 2012</a>, <a href="#ref-guido_heuristics_2013">Governatori et al., 2013</a>, <a href="#ref-MoreauSPD19">Moreau et al., 2019</a>] can be used to combine knowledge graphs that are governed by different licenses. Such techniques are employed by the the Data Licenses Clearance Center (DALICC), which includes a library of standard machine readable licenses, and tools that enable users both to compose arbitrary custom licenses and also to verify the compatibility of different licenses&nbsp;[<a href="#ref-pellegrini2019DALICC">Pellegrini et al., 2019</a>].</p>

		<figure id="fig-license">
			<img src="images/fig-license.svg" alt="A license for event data, along with permissions, actions, and obligations"/>
			<figcaption>A license for event data, along with permissions, actions, and obligations</figcaption>
		</figure>

		<h4 id="sssec-usage-policies" class="subsection">Usage policies</h4>
		<p>Access control policies based on edge patterns can be used to restrict access to parts of a knowledge graph&nbsp;[<a href="#ref-Reddivari2005">Reddivari et al., 2005</a>, <a href="#ref-Flouris2010">Flouris et al., 2010</a>, <a href="#ref-Kirrane2013">Kirrane et al., 2013</a>]. WebAccessControl (WAC)<sup class="fnmark" id="fnm33"><a href="#fn33">33</a></sup><span class="footnote" id="fn33"><sup><a href="#fnm33">note 33</a></sup> WAC, <a class="uri" style="display:inline" href="http://www.w3.org/wiki/WebAccessControl">http://www.w3.org/wiki/WebAccessControl</a></span> is an access control framework for graphs that uses WebID for authentication and provides a vocabulary for specifying access control policies. Extensions of this WAC vocabulary have been proposed to capture privacy preferences&nbsp;[<a href="#ref-SaccoP11">Sacco and Passant, 2011</a>] and to cater for contextual constraints&nbsp;[<a href="#ref-Villata2011">Villata et al., 2011</a>, <a href="#ref-Costabello2012">Costabello et al., 2012</a>]. Although ODRL is primarily used to specify licenses, profiles to additionally specify access policies&nbsp;[<a href="#ref-steyskal2014">Steyskal and Polleres, 2014</a>] and regulatory obligations&nbsp;[<a href="#ref-agarwal2018legislative">Agarwal et al., 2018</a>, <a href="#ref-devos2019ODRL">De Vos et al., 2019</a>] have also been proposed in recent years, as discussed in the survey by <a href="#ref-kirrane2017access">Kirrane et al. [2017]</a>.</p>
		<p>As a generalisation of access policies, usage policies specify how data can be used: what kinds of processing can be applied, by whom, for what purpose, etc. The example usage policy presented in Figure&nbsp;<a href="#fig-usage">9.4</a> states that the process <span class="gnode">Analyse</span> of <span class="gnode">Location Graph</span> can be performed on <span class="gnode">Internal Servers</span> by members of <span class="gnode">Company Staff</span> in order to provide <span class="gnode">Event Recommendations</span>. Vocabularies for usage policies have been proposed by the SPECIAL H2020 project&nbsp;[<a href="#ref-special">Bonatti et al., 2019</a>] and the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG)&nbsp;[<a href="#ref-dpv">Pandit et al., 2019</a>, <a href="#ref-bonatti2019big">Bonatti and Kirrane, 2019</a>]. Once specified in these vocabularies, usage policies can then be used to verify that data processing conforms to legal norms and to the consent provided by subjects&nbsp;[<a href="#ref-DelanauxBRT18">Delanaux et al., 2018</a>, <a href="#ref-bonatti2019big">Bonatti and Kirrane, 2019</a>].</p>

		<figure id="fig-usage">
			<img src="images/fig-usage.svg" alt="A policy for usage of a sub-graph of location data in the knowledge graph"/>
			<figcaption>A policy for usage of a sub-graph of location data in the knowledge graph</figcaption>
		</figure>

		<h4 id="sssec-encryption" class="subsection">Encryption</h4>
		<p>Rather than internally controlling usage, the tourist board could use encryption mechanisms on parts of the published knowledge graph, for example relating to reports of crimes, and provide keys to partners who should have access to the plaintext. While a straightforward approach is to encrypt the entire graph (or sub-graphs) with one key, more fine-grained encryption can be performed for individual nodes or edge-labels in a graph, potentially providing different clients access to different information through different keys&nbsp;[<a href="#ref-giereth2005partial">Giereth, 2005</a>]. The CryptOntology&nbsp;[<a href="#ref-gerbracht2008possibilities">Gerbracht, 2008</a>] can further be used to embed details about the encryption mechanism used within the knowledge graph. Figure&nbsp;<a href="#fig-crypto">9.5</a> illustrates how this could be used to encrypt the names of claimants from Figure&nbsp;<a href="#fig-direct">6.4</a>, storing the ciphertext <span class="gnode">zhk…kjg</span>, as well as the key-length and encryption algorithm used. In order to grant access to the plaintext, one approach is to encrypt individual edges with symmetric keys so as to allow specific types of edge patterns to only be executed by clients with the appropriate key&nbsp;[<a href="#ref-kasten2013towards">Kasten et al., 2013</a>]. This approach can be used, for example, to allow clients who know a claimant ID (e.g., <span class="gnode">Claimant-XY12SDA</span>) and have the appropriate key to find (only) the name of the claimant through an edge pattern <span class="gnode">Claimant-XY12SDA</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><span class="edge">Claimant-name</span><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gvar"><strong>?name</strong></span>. A key limitation of this approach, however, is that it requires attempting to decrypt all edges to find all possible solutions. A more efficient alternative is to combine functional encryption and specialised indexing to retrieve solutions from the encrypted graph without attempting to decrypt all edges&nbsp;[<a href="#ref-FernandezKPS17">Fernández et al., 2017</a>].</p>

		<figure id="fig-crypto">
			<img src="images/fig-crypto.svg" alt="Directed edge-labelled graph with the name of the claimant encrypted; plaintext elements are dashed and may be omitted from published data (possibly along with encryption details)"/>
			<figcaption>Directed edge-labelled graph with the name of the claimant encrypted; plaintext elements are dashed and may be omitted from published data (possibly along with encryption details)</figcaption>
		</figure>

		<h4 id="sssec-anonymisation" class="subsection">Anonymisation</h4>
		<p>Consider that the tourist board acquires information on transport taken by individuals within the country, which can be used – not only by the board, but potentially other stakeholders, such as travel companies – to understand trajectories taken by tourists. However, from a data-protection perspective, it would be advisable to anonymise the knowledge graph to avoid leaking the personal travel history of individuals.</p>
		<p>A first approach to anonymisation is to suppress and generalise knowledge in a graph such that individuals cannot be identified, based on \(k\)-anonymity&nbsp;[<a href="#ref-samarati1998protecting">Samarati and Sweeney, 1998</a>]<sup class="fnmark" id="fnm34"><a href="#fn34">34</a></sup><span class="footnote" id="fn34"><sup><a href="#fnm34">note 34</a></sup> \(k\)-anonymity guarantees that the data of an individual is indistinguishable from at least \(k-1\) other individuals.</span>, \(l\)-diversity&nbsp;[<a href="#ref-li2007t">Li et al., 2007</a>]<sup class="fnmark" id="fnm35"><a href="#fn35">35</a></sup><span class="footnote" id="fn35"><sup><a href="#fnm35">note 35</a></sup> \(l\)-diversity guarantees that sensitive data fields have at least \(l\) diverse values within each group of individuals; this avoids leaks such as that all tourists from Austria (a group of individuals) in the data have been pick-pocketed (a sensitive attribute), which would reveal sensitive information about individuals from Austria.</span>, etc. Approaches that apply \(k\)-anonymity on graphs identify and suppress “quasi-identifiers” that would allow a given individual to be distinguished from fewer than \(k-1\) other individuals&nbsp;[<a href="#ref-radulovic2015towards">Radulovic et al., 2015</a>, <a href="#ref-HeitmannEtAl2017">Heitmann et al., 2017</a>]. Figure&nbsp;<a href="#fig-anonymised">9.6</a> illustrates a possible result of \(k\)-anonymisation for a sub-graph describing a flight passenger, where quasi-identifiers (passport, plane ticket) have been converted into blank nodes, ensuring that the passenger (the dashed blank node) cannot be distinguished from \(k-1\) other individuals. In the context of a graph, however, <em>neighbourhood attacks</em>&nbsp;[<a href="#ref-ZhouP11">Zhou and Pei, 2011</a>] – using information about neighbours – can also break \(k\)-anonymity, where we also suppress the day and time of the flight, which, though not sensitive information per se, could otherwise break \(k\)-anonymity for passengers (if, for example, a particular flight had fewer than \(k\) males from the U.S. onboard). The graph shown in Figure&nbsp;<a href="#fig-anonymised">9.6</a> then offers \(k\)-anonymity for the particular individual assuming that at least \(k\) male passengers from the U.S. flew during December 2018 from Arica to Santiago.</p>

		<figure id="fig-anonymised">
			<img src="images/fig-anonymised.svg" alt="Anonymised sample of a directed edge-labelled graph describing a passenger (dashed) of a flight"/>
			<figcaption>Anonymised sample of a directed edge-labelled graph describing a passenger (dashed) of a flight</figcaption>
		</figure>

		<p>More complex neighbourhood attacks may rely on more abstract graph patterns, observing that individuals can be deanonymised purely from knowledge of the graph structure, even if all nodes and edge labels are left blank; for example, if we know that a team of \(k-1\) players take flights together for a particular number of away games, we could use this information for a neighbourhood attack that reveals the set of players in the graph. Hence a number of guarantees specific to graphs have been proposed, including \(k\)-degree anonymity&nbsp;[<a href="#ref-LiuT08">Liu and Terzi, 2008</a>], which ensures that individuals cannot be deanonymised by attackers with knowledge of the degree of particular individuals. The approach is based on minimally modifying the graph to ensure that each node has at least \(k-1\) other nodes with the same degree. A stronger guarantee, called \(k\)-isomorphic neighbour anonymity&nbsp;[<a href="#ref-ZhouP08">Zhou and Pei, 2008</a>], avoids neighbourhood attacks where an attacker knows how an individual is connected to nodes in their neighbourhood; this is done by modifying the graph to ensure that for each node, there exist at least \(k-1\) nodes with isomorphic (i.e., identically structured) neighbourhoods elsewhere in the graph. Both approaches only protect against attackers with knowledge of bounded neighbourhoods. An even stronger notion is that of \(k\)-automorphism&nbsp;[<a href="#ref-ZouCO09a">Zou et al., 2009</a>], which ensures that for every node, it is structurally indistinguishable from \(k-1\) other nodes, thus avoiding any attack based on structural information (as a trivial example, a \(k\)-clique or a \(k\)-cycle satisfy \(k\)-automorphism). Many of these techniques for anonymisation of graph data were motivated by social networks&nbsp;[<a href="#ref-NarayananS09">Narayanan and Shmatikov, 2009</a>], though they can also be applied to knowledge graphs, per the work of <a href="#ref-LinT17">Lin and Tripunitara [2017]</a>, who adapt \(k\)-automorphism for directed edge-labelled graphs (specifically RDF graphs).</p>
		<p>While the aforementioned approaches anonymise data, a second approach is to apply anonymisation when answering queries, such as adding noise to the solutions in a way that preserves privacy. One approach is to apply \(\varepsilon\)-differential privacy&nbsp;[<a href="#ref-Dwork:2006:DP:2097282.2097284">Dwork, 2006</a>]<sup class="fnmark" id="fnm36"><a href="#fn36">36</a></sup><span class="footnote" id="fn36"><sup><a href="#fnm36">note 36</a></sup> \(\varepsilon\)-differential privacy ensures that the probability of a given result from a process (e.g., query) applied to data, to which random noise is added, differs no more than \(e^\varepsilon\) when the data includes or excludes any individual.</span> for querying graphs&nbsp;[<a href="#ref-Silva2017">Silva et al., 2017</a>]. Such mechanisms are typically used for aggregate (e.g., count) queries, where noise is added to avoid leaks about individuals. To illustrate, differential privacy may allow for counting the number of passengers of specified nationalities taking specified flights, adding (just enough) random noise to the count to ensure that we cannot tell, within a certain probability (controlled by \(\varepsilon\)), whether or not a particular individual took a flight, where, intuitively speaking, we would require (proportionally) less noise for nationalities with many passengers in the data, but more noise to “hide” passengers from more uncommon nationalities.</p>
		<p>These approaches require information loss for stronger guarantees of privacy; which to choose is thus heavily application dependent. If the anonymised data are to be published in their entirety as a “dump”, then an approach based on \(k\)-anonymity can be used to protect individuals, while \(l\)-diversity can be used to protect groups. On the other hand, if the data are to be made available, in part, through a query interface, then \(\varepsilon\)-differential privacy is a more suitable framework.</p>
		</section>
	</section>
	<section id="chap-kgs" class="chapter">
		<h2>Knowledge Graphs in Practice</h2>
		<p>In this chapter, we discuss some of the most prominent knowledge graphs that have emerged in the past years. We begin by discussing open knowledge graphs, most of which have been published on the Web per the guidelines and protocols described in Chapter&nbsp;<a href="#chap-publish">9</a>. We later discuss enterprise knowledge graphs that have been created by companies from diverse industries for a wide range of applications.</p>

		<section id="sec-openkgs" class="section">
		<h3>Open Knowledge Graphs</h3>
		<p>By <em>open knowledge graphs</em>, we refer to knowledge graphs published under the Open Data philosophy, namely that “<em>open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)</em>”.<sup class="fnmark" id="fnm37"><a href="#fn37">37</a></sup><span class="footnote" id="fn37"><sup><a href="#fnm37">note 37</a></sup> See <a class="uri" href="http://opendefinition.org/">http://opendefinition.org/</a></span> Many open knowledge graphs have been published in the form of <em>Linked Open Datasets</em>&nbsp;[<a href="#ref-ldbook">Heath and Bizer, 2011</a>], which are (RDF) graphs published under the Linked Data principles (see Section&nbsp;<a href="#sssec-ld">9.1.2</a>) following the Open Data philosophy. Many of the most prominent open knowledge graphs – including DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>], YAGO&nbsp;[<a href="#ref-suchanek2007yago">Suchanek et al., 2007</a>], Freebase&nbsp;[<a href="#ref-bollacker2007freebase">Bollacker et al., 2007b</a>], and Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>] – cover multiple domains, representing a broad diversity of entities and relationships; we first discuss these in turn. Later we discuss some of the other (specific) domains for which open knowledge graphs are currently available. Most of the open knowledge graphs we discuss in this section are modelled in RDF, published following Linked Data principles, and offer access to their data through dumps (RDF), node lookups (Linked Data), graph patterns (SPARQL) and, in some cases, edge patterns (Triple Pattern Fragments).</p>

		<h4 id="sssec-dbpedia" class="subsection">DBpedia</h4>
		<p>The DBpedia project was developed to extract a graph-structured representation of the semi-structured data embedded in Wikipedia articles&nbsp;[<a href="#ref-auer2007dbpedia">Auer et al., 2007</a>], enabling the integration, processing, and querying of these data in a unified manner. The resulting knowledge graph is further enriched by linking to external open resources, including images, webpages, and external datasets such as DailyMed, DrugBank, GeoNames, MusicBrainz, New York Times, and WordNet&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>]. The DBpedia extraction framework consists of several components, corresponding to abstractions of Wikipedia article sources, graph storage and serialisation destinations, wiki-markup extractors, parsers, and extraction managers&nbsp;[<a href="#ref-bizer2009dbpedia">Bizer et al., 2009</a>]. Specific extractors are designed to process labels, abstracts, interlanguage links, images, redirects, disambiguation pages, external links, internal pagelinks, homepages, categories, and geocoordinates. The content in the DBpedia knowledge graph is not only multidomain, but also multilingual: as of 2012, DBpedia contained labels and abstracts in up to 97 different languages&nbsp;[<a href="#ref-mendes2012dbpedia">Mendes et al., 2012a</a>]. Entities within DBpedia are classified using four different schemata in order to address varying requirements&nbsp;[<a href="#ref-bizer2009dbpedia">Bizer et al., 2009</a>]. These schemata include a Simple Knowledge Organization System (SKOS) representation of Wikipedia categories, a Yet Another Great Ontology (YAGO) classification schema (discussed presently), an Upper Mapping and Binding Exchange Layer (UMBEL) ontology categorisation schema, and a custom schema called the DBpedia ontology with classes such as <code>Person</code>, <code>Place</code>, <code>Organisation</code>, and <code>Work</code>&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>]. DBpedia also supports live synchronisation in order to remain consistent with dynamic Wikipedia articles&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>].</p>

		<h4 id="sssec-yago" class="subsection">Yet Another Great Ontology</h4>
		<p>YAGO likewise extracts graph-structured data from Wikipedia, which are then unified with the hierarchical structure of WordNet to create a “<em>light-weight and extensible ontology with high quality and coverage</em>”&nbsp;[<a href="#ref-suchanek2007yago">Suchanek et al., 2007</a>]. This knowledge graph aims to be applied for various information technology tasks, such as machine translation, word sense disambiguation, query expansion, document classification, data cleaning, information integration, etc. While earlier approaches automatically extracted structured knowledge from text using pattern matching, natural language processing (NLP), and statistical learning, the resulting content tended to lack in quality when compared with what was possible through manual construction&nbsp;[<a href="#ref-suchanek2007yago">Suchanek et al., 2007</a>]. However, manual construction is costly, making it challenging to achieve broad coverage and keep the data up-to-date. In order to extract data with high coverage and quality, YAGO (like DBpedia) mostly extracts data from Wikipedia infoboxes and category pages, which contain core entity information and lists of articles for a specific category, respectively. These, in turn, are unified with hierarchical concepts from WordNet&nbsp;[<a href="#ref-suchanek2008yago">Suchanek et al., 2008</a>]. A schema – called the YAGO model – provides a vocabulary defined in RDFS; this model allows for representing words as entities, capturing synonymy and ambiguity&nbsp;[<a href="#ref-suchanek2007yago">Suchanek et al., 2007</a>]. The model further supports reification, \(n\)-ary relations, and data types&nbsp;[<a href="#ref-suchanek2008yago">Suchanek et al., 2008</a>]. Refinement mechanisms employed within YAGO include canonicalisation, where each edge and node is mapped to a unique identifier and duplicate elements are removed, and type checking, where nodes that cannot be assigned to a class by deductive or inductive methods are eliminated&nbsp;[<a href="#ref-suchanek2008yago">Suchanek et al., 2008</a>]. YAGO would be extended in later years to support spatio-temporal context&nbsp;[<a href="#ref-YAGO">Hoffart et al., 2011</a>] and multilingual Wikipedias&nbsp;[<a href="#ref-MahdisoltaniBS15">Mahdisoltani et al., 2015</a>].</p>

		<h4 id="sssec-freebase" class="subsection">Freebase</h4>
		<p>Freebase was a general-purpose, broad collection of human knowledge that aimed to address some of the large-scale information integration problems associated with the decentralised nature of the Semantic Web, such as uneven adoption, implementation challenges, and distributed query performance limitations&nbsp;[<a href="#ref-bollacker2007platform">Bollacker et al., 2007a</a>]. Unlike DBpedia and YAGO – which are mostly extracted from Wikipedia/WordNet – Freebase solicited contributions directly from human editors. Included in the Freebase platform were a scalable data store with versioning mechanisms; a large data object store (LOB) for the storage of text, image, and media files; an API that could be queried using the Metaweb Query Language (MQL); a Web user interface; and a lightweight typing system&nbsp;[<a href="#ref-bollacker2007platform">Bollacker et al., 2007a</a>]. The latter typing system was designed to support collaborative processes. Rather than forcing ontological correctness or logical consistency, the system was implemented as a loose collection of structuring mechanisms – based on datatypes, semantic classes, properties, schema definitions, etc. – that allowed for incompatible types and properties to coexist simultaneously&nbsp;[<a href="#ref-bollacker2007platform">Bollacker et al., 2007a</a>]. Content could be added to Freebase interactively through the Web user interface or in an automated way by leveraging the API’s write functionality. Freebase had been acquired by Google in 2010, where the content of Freebase formed an important part of the Google Knowledge Graph announced in 2012&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>]. When Freebase became read-only as of March 2015, the knowledge graph contained over three billion edges. Much of this content was subsequently migrated to Wikidata&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>].</p>

		<h4 id="sssec-wikidata" class="subsection">Wikidata</h4>
		<p>Wikipedia contains a wealth of semi-structured data embedded in info-boxes, lists, tables, etc., as exploited by DBpedia and YAGO. However, these data have traditionally been curated and updated manually across different articles and languages; for example, a goal scored by a Chilean football player may require manual updates in the player's article, the tournament article, the team article, lists of top scorers, and so forth, across hundreds of language versions. Manual curation has led to a variety of data quality issues, including contradictory data in different articles, languages, etc. The Wikimedia Foundation uses Wikidata as a centralised, collaboratively-edited knowledge graph to supply Wikipedia – and arbitrary other clients – with data. Under this vision, a fact could be added to Wikidata once, triggering the automatic update of potentially multitudinous articles in Wikipedia across different languages&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>]. Like Wikipedia, Wikidata is also considered a secondary source containing <em>claims</em> that should reference primary sources, though claims can also be initially added without reference&nbsp;[<a href="#ref-PiscopoKPS17">Piscopo et al., 2017</a>]. Wikidata further allows for different viewpoints in terms of potentially contradictory (referenced) claims&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>]. Wikidata is multilingual, where nodes and edges are assigned language-agnostic <code>Qxx</code> and <code>Pxx</code> codes (see Figure&nbsp;<a href="#fig-ld">9.1</a>) and are subsequently associated with labels, aliases, and descriptions in various languages&nbsp;[<a href="#ref-KaffeePVSCP17">Kaffee et al., 2017</a>], allowing claims to be surfaced in these languages. Collaborative editing is not only permitted on the data level, but also on the schema level, allowing users to add or modify lightweight semantic axioms&nbsp;[<a href="#ref-PiscopoS18">Piscopo and Simperl, 2018</a>] – including sub-classes, sub-properties, inverse properties, etc. – as well as shapes&nbsp;[<a href="#ref-BonevaDFG19">Boneva et al., 2019</a>]. Wikidata offers various access protocols&nbsp;[<a href="#ref-malyshev2018getting">Malyshev et al., 2018</a>] and has received broad adoption, being used by Wikipedia to generate infoboxes in certain domains&nbsp;[<a href="#ref-SaezH18">Sáez and Hogan, 2018</a>], being supported by Google&nbsp;[<a href="#ref-pellissier2016freebase">Pellissier Tanon et al., 2016</a>], and having been used as a data source for prominent end-user applications such as Apple’s Siri, amongst others&nbsp;[<a href="#ref-malyshev2018getting">Malyshev et al., 2018</a>].</p>

		<h4 id="sssec-other-open-kgs" class="subsection">Other open cross-domain knowledge graphs</h4>
		<p>Aside from DBpedia, YAGO, Freebase and Wikidata, a number of other cross-domain knowledge graphs have been developed down through the years. BabelNet&nbsp;[<a href="#ref-NavigliPonzetto:12">Navigli and Ponzetto, 2012</a>], like YAGO, is based on unifying WordNet and Wikipedia, but with the integration of additional knowledge graphs such as Wikidata, and a focus on creating a knowledge graph of multilingual lexical forms (organised into multilingual synsets) by transforming lexicographic resources such as Wiktionary and OmegaWiki into knowledge graphs. Compared to other knowledge graphs, lexicalised knowledge graphs such as BabelNet bring together the encyclopedic information found in Wikipedia with the lexicographic information usually found in monolingual and bilingual dictionaries. The Cyc project&nbsp;[<a href="#ref-lenat1995cyc">Lenat, 1995</a>] aims to encode common-sense knowledge in a machine-readable way, where over 900 person-years of effort&nbsp;[<a href="#ref-MatuszekCWD06">Matuszek et al., 2006</a>] have, since 1986, gone into the creation of 2.2 million facts and rules. Though Cyc is proprietary, an open subset called OpenCyc has been published, where we refer to the comparison by <a href="#ref-FarberBMR18">Färber et al. [2018]</a> of DBpedia, Freebase, OpenCyc, and YAGO for further details. The Never Ending Language Learning (NELL) project&nbsp;[<a href="#ref-MitchellCHTYBCM18">Mitchell et al., 2018</a>] has, since 2010, extracted a graph of 120 million edges from the text of web pages using OIE methods (see Chapter&nbsp;<a href="#chap-create">6</a>). Each such open knowledge graph applies different combinations of the languages and techniques discussed in this book over different sources with differing results.</p>

		<h4 id="sssec-domain-specific-open-kgs" class="subsection">Domain-specific open knowledge graphs</h4>
		<p>Open knowledge graphs have been published in a variety of specific domains. <a href="#ref-SchmachtenbergBP14">Schmachtenberg et al. [2014]</a> identify the most prominent domains in the context of Linked Data as follows: <em>media</em>, relating to news, television, radio, etc. (e.g., the BBC World Service Archive&nbsp;[<a href="#ref-RaimondFSA14">Raimond et al., 2014</a>]); <em>government</em>, relating to the publication of data for transparency and development (e.g., by the U.S.&nbsp;[<a href="#ref-HendlerHMT12">Hendler et al., 2012</a>] and U.K.&nbsp;[<a href="#ref-ShadboltO13">Shadbolt and O'Hara, 2013</a>] governments); <em>publications</em>, relating to academic literature in various disciplines (e.g., OpenCitations&nbsp;[<a href="#ref-PeroniSV17">Peroni et al., 2017</a>], SciGraph&nbsp;[<a href="#ref-IanaJNBHP19">Iana et al., 2019</a>], Microsoft Academic Knowledge Graph&nbsp;[<a href="#ref-MAKG">Färber, 2019</a>]); <em>geographic</em>, relating to places and regions of interest (e.g., LinkedGeoData&nbsp;[<a href="#ref-StadlerLHA12">Stadler et al., 2012</a>]); <em>life sciences</em>, relating to proteins, genes, drugs, diseases, etc. (e.g., Bio2RDF&nbsp;[<a href="#ref-CallahanCAD13">Callahan et al., 2013</a>]); and <em>user-generated content</em>, relating to reviews, open source projects, etc. (e.g., Revyu&nbsp;[<a href="#ref-HeathM08a">Heath and Motta, 2008</a>]). Open knowledge graphs have also been published in other domains, including <em>cultural heritage</em>&nbsp;[<a href="#ref-HyvonenMKAKRSTPKVTPFSPLN09">Hyvönen et al., 2009</a>], <em>music</em>&nbsp;[<a href="#ref-RaimondSS09">Raimond et al., 2009</a>], <em>law</em>&nbsp;[<a href="#ref-Montiel-Ponsoda17">Montiel-Ponsoda et al., 2017</a>], <em>theology</em>&nbsp;[<a href="#ref-SherifN15">Sherif and Ngonga Ngomo, 2015</a>], and even <em>tourism</em>&nbsp;[<a href="#ref-LuLS16">Lu et al., 2016</a>, <a href="#ref-abs-1805-05744">Kärle et al., 2018</a>, <a href="#ref-MaturanaALMH18">Maturana et al., 2018</a>, <a href="#ref-ZhangCHYAL19">Zhang et al., 2019</a>]. The envisaged applications for such knowledge graphs are as varied as the domains from which they emanate, but often relate to integration&nbsp;[<a href="#ref-RaimondSS09">Raimond et al., 2009</a>, <a href="#ref-CallahanCAD13">Callahan et al., 2013</a>], recommendation&nbsp;[<a href="#ref-RaimondSS09">Raimond et al., 2009</a>, <a href="#ref-LuLS16">Lu et al., 2016</a>], transparency&nbsp;[<a href="#ref-HendlerHMT12">Hendler et al., 2012</a>, <a href="#ref-ShadboltO13">Shadbolt and O'Hara, 2013</a>], archiving&nbsp;[<a href="#ref-HyvonenMKAKRSTPKVTPFSPLN09">Hyvönen et al., 2009</a>, <a href="#ref-RaimondFSA14">Raimond et al., 2014</a>], decentralisation&nbsp;[<a href="#ref-HeathM08a">Heath and Motta, 2008</a>], multilingual support&nbsp;[<a href="#ref-SherifN15">Sherif and Ngonga Ngomo, 2015</a>], regulatory compliance&nbsp;[<a href="#ref-Montiel-Ponsoda17">Montiel-Ponsoda et al., 2017</a>], etc.</p>
		</section>

		<section id="ssec-enterprise-kgs" class="section">
		<h3>Enterprise Knowledge Graphs</h3>
		<p>A variety of companies have announced the creation of proprietary “enterprise knowledge graphs” with a variety of goals in mind, which include: improving search capabilities&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>, <a href="#ref-BingKG">Shrivastava, 2017</a>, <a href="#ref-AmazonKG">Krishnan, 2018</a>, <a href="#ref-AirBnBKG">Chang, 2018</a>, <a href="#ref-UberKG">Hamad et al., 2018</a>], providing user recommendations&nbsp;[<a href="#ref-AirBnBKG">Chang, 2018</a>, <a href="#ref-UberKG">Hamad et al., 2018</a>], implementing conversational/personal agents&nbsp;[<a href="#ref-eBayKG">Pittman et al., 2017</a>], enhancing targeted advertising&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], empowering business analytics&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], connecting users&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>, <a href="#ref-NoyGJNPT19">Noy et al., 2019</a>], extending multilingual support&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>], facilitating research and discovery&nbsp;[<a href="#ref-AstraZenecaKG">Bendtsen and Petrovski, 2019</a>], assessing and mitigating risk&nbsp;[<a href="#ref-ThompsonReutersKG">Tobin, 2017</a>, <a href="#ref-MaanaKG">Dalgliesh, 2016</a>], tracking news events&nbsp;[<a href="#ref-BloombergKG">Meij, 2019</a>], and increasing transport automation&nbsp;[<a href="#ref-HensonSTK19">Henson et al., 2019</a>], amongst (many) others. Though highly diverse, these enterprise knowledge graphs do follow some high-level trends, as reflected in the discussion by <a href="#ref-NoyGJNPT19">Noy et al. [2019]</a>: (1) data are typically integrated into the knowledge graph from a variety of both external and internal sources (often involving text); (2) the enterprise knowledge graph is often very large, with millions or even billions of nodes and edges, posing challenges in terms of scalability; (3) refinement of the initial knowledge graph – adding new links, consolidating duplicate entities, etc. – is important to improve quality; (4) techniques to keep the knowledge graph up-to-date with the domain are often crucial; (5) a mix of ontological and machine learning representations are often combined or used in different situations in order to draw conclusions from the enterprise knowledge graph; (6) the ontologies used tend to be lightweight, often simple taxonomies representing a hierarchy of classes or concepts. We now discuss the main industries in which enterprise knowledge graphs have been deployed.</p>

		<h4 id="sssec-web-search" class="subsection">Web search</h4>
		<p>Web search engines have traditionally focused on matching a query string with sub-strings in web documents. The Google Knowledge Graph&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>, <a href="#ref-NoyGJNPT19">Noy et al., 2019</a>] rather promoted a paradigm of “<em>things not strings</em>” – analogous to semantic search&nbsp;[<a href="#ref-GuhaMM03">Guha et al., 2003</a>] – where the search engine would now try to identify the entities that a particular search may be expressing interest in. The knowledge graph itself describes these entities and how they interrelate. One of the main user-facing applications of the Google Knowledge Graph is the “Knowledge Panel”, which presents a pane on the right-hand side of (some) search results describing the principal entity that the search appears to be seeking, including some images, attribute–value pairs, and a list of related entities that users also search for. The Google Knowledge Graph was key to popularising the modern usage of the phrase “knowledge graph” (see Appendix&nbsp;<a href="#chap-defs">A</a>). Other major search engines, such as Microsoft Bing<sup class="fnmark" id="fnm38"><a href="#fn38">38</a></sup><span class="footnote" id="fn38"><sup><a href="#fnm38">note 38</a></sup> Microsoft’s Knowledge Graph was previously called “Satori” (meaning <em>understanding</em> in Japanese).</span>&nbsp;[<a href="#ref-BingKG">Shrivastava, 2017</a>], would later announce knowledge graphs along similar lines.</p>

		<h4 id="sssec-commerce" class="subsection">Commerce</h4>
		<p>Enterprise knowledge graphs have also been announced by companies that are principally concerned with selling or renting goods and services. A prominent example of such a knowledge graph is that used by Amazon&nbsp;[<a href="#ref-AmazonKG">Krishnan, 2018</a>, <a href="#ref-dong2019building">Dong, 2019</a>], which describes the products on sale in their online marketplace. One of the main stated goals of this knowledge graph is to enable more advanced (semantic) search features for products, as well as to improve product recommendations to users of its online marketplace. Another knowledge graph for commerce was announced by eBay&nbsp;[<a href="#ref-eBayKG">Pittman et al., 2017</a>], which encodes product descriptions and shopping behaviour patterns, and is used to power conversational agents that help users to find relevant products through a natural language interface. Airbnb&nbsp;[<a href="#ref-AirBnBKG">Chang, 2018</a>] has also described a knowledge graph that encodes accommodation for rent, places, events, experiences, neighbourhoods, users, tags, etc., on top of which a taxonomic schema is defined. This knowledge graph is used to offer potential clients recommendations of attractions, events, and activities available in the neighbourhood of a particular home for rent. Uber&nbsp;[<a href="#ref-UberKG">Hamad et al., 2018</a>] has similarly announced a knowledge graph focused on food and restaurants for their “Uber Eats” delivery service. The goals are again to offer semantic search features and recommendations to users who are uncertain of precisely what kind of food they are looking for.</p>

		<h4 id="sssec-social-networks" class="subsection">Social networks</h4>
		<p>Enterprise knowledge graphs have also emerged in the context of social networking services. Facebook&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>] has gathered together a knowledge graph describing not only social data about users, but also the entities they are interested in, including celebrities, places, movies, music, etc., in order to connect people, understand their interests, and provide recommendations. LinkedIn&nbsp;[<a href="#ref-LinkedInKG">He et al., 2016</a>] announced a knowledge graph containing users, jobs, skills, companies, places, schools, etc., on top of which a taxonomic schema is defined. The knowledge graph is used to provide multilingual translations of important concepts, to improve targeted advertising, to provide advanced features for job search and people search, and likewise to provide recommendations matching jobs to people (and vice versa). Another knowledge graph has been created by Pinterest&nbsp;[<a href="#ref-PinterestKG">Gon\c{c}alves et al., 2019</a>], describing users and their interests, the latter being organised into a taxonomy. The main use-cases for the knowledge graph are to help users to more easily find content of interest to them, as well as to enhance revenue through targeted advertisements.</p>

		<h4 id="sssec-finance" class="subsection">Finance</h4>
		<p>The financial sector has also seen deployment of enterprise knowledge graphs. Amongst these, Bloomberg&nbsp;[<a href="#ref-BloombergKG">Meij, 2019</a>] has proposed a knowledge graph that powers financial data analytics, including sentiment analysis for companies based on current news reports and tweets, a question answering service, as well as detecting emerging events that may affect stock values. Thomson Reuters (Refinitiv)&nbsp;[<a href="#ref-ThompsonReutersKG">Tobin, 2017</a>] has likewise announced a knowledge graph encoding “the financial ecosystem” of people, organisations, equity instruments, industry classifications, joint ventures and alliances, supply chains, etc., using a taxonomic schema to organise these entities. Some of the applications they mention for the knowledge graph include supply chain monitoring, risk assessment, and investment research. Knowledge graphs have also been used for deductive reasoning, with Banca d’Italia&nbsp;[<a href="#ref-BellomariniFGS19">Bellomarini et al., 2019</a>] using rule-based reasoning to determine, for example, the percentage of ownership of a company by various stakeholders. Other companies exploring financial knowledge graphs include Accenture&nbsp;[<a href="#ref-AccentureKG">Okorafor and Ray, 2019</a>], Capital One&nbsp;[<a href="#ref-CapitalOneKG">Branum and Sehon, 2019</a>], Wells Fargo&nbsp;[<a href="#ref-WellsFargoKG">Newman, 2019</a>], amongst various others.</p>

		<h4 id="sssec-other-industries" class="subsection">Other industries</h4>
		<p>Enterprises have also been actively developing knowledge graphs to enable novel applications in a variety of other industries, including: <em>healthcare</em>, where IBM are exploring use-cases for drug discovery&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>] and information extraction from package inserts&nbsp;[<a href="#ref-GentileGRW19">Gentile et al., 2019</a>], while AstraZeneca&nbsp;[<a href="#ref-AstraZenecaKG">Bendtsen and Petrovski, 2019</a>] are using a knowledge graph to advance genomics research and disease understanding; <em>transport</em>, where Bosch are exploring a knowledge graph of scenes and locations for driving automation&nbsp;[<a href="#ref-HensonSTK19">Henson et al., 2019</a>]; <em>oil &amp; gas</em>, where Maana&nbsp;[<a href="#ref-MaanaKG">Dalgliesh, 2016</a>] are using knowledge graphs to perform data integration for risk mitigation regarding oil wells and drilling; and more besides.</p>
		</section>
	</section>
	<section id="chap-conclude" class="chapter">
		<h2>Summary and Conclusion</h2>
		<p>We have provided a comprehensive introduction to knowledge graphs, which have been receiving more and more attention in recent years. Under the definition of a knowledge graph as <em>a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities</em>, we have discussed models by which data can be structured as graphs; representations of schema, identity and context; techniques for leveraging deductive and inductive knowledge; methods for the creation, enrichment, quality assessment and refinement of knowledge graphs; principles and standards for publishing knowledge graphs; and finally, we have discussed the adoption of both open and enterprise knowledge graphs in the real world.</p>
		<p>In this final chapter, we provide some concluding remarks, and further offer some insights on potential future directions for research on knowledge graphs.</p>
		<p><em class="paragraph">Concluding remarks</em>. Knowledge graphs have garnered significant attention not only from diverse organisations and industries, but also diverse research communities. This attention is due, in no small part, to the ubiquitous nature of the problem that knowledge graphs address: integrating and extracting value from diverse sources of data at large scale, be it in the context of a particular organisation, community, or more general collections of human knowledge. The key insight of knowledge graphs is that graphs provide a simple, flexible, intuitive and yet powerful abstraction for representing and integrating diverse data at large scale. This insight is far from new (see Appendix&nbsp;<a href="#chap-defs">A</a>), but rather has finally come of age with the advent of knowledge graphs. Graphs have long been used to represent data and knowledge in areas such as Graph Algorithms and Theory, Graph Databases, Information Extraction, Knowledge Representation, Machine Learning, the Semantic Web, and more besides. The advances in these areas can now be unified and applied for knowledge graphs.</p>
		<p>Thus, the decision to model data as a graph opens up a “tool-box” of languages, techniques and systems – stemming from diverse areas – that can be deployed in order to integrate and extract value from data at large scale, as follows:</p>
		<ul>
			<li>A variety of <em>graph query languages</em> are now available that (unlike other NoSQL alternatives) are fully-featured, supporting not only the relational algebra, but also novel features such as navigational queries that can match paths of arbitrary length. A broad selection of graph databases and user interfaces supporting these query languages are now also available.</li>
			<li>Though graphs do not depend on a detailed (relational-like) schema to represent data, various notions of <em>graph schemata</em> have been proposed in order to validate, summarise and define the semantics of graphs.</li>
			<li><em>Contextual frameworks for graphs</em> can be used to represent and reason about the scope of truth of knowledge in the graph – relating to the time, space, provenance, confidence level, etc., for which something is held true – including various alternatives for reification, annotated graph frameworks, etc.</li>
			<li>Deductive forms of reasoning can be enabled over graphs using <em>ontologies</em> and/or <em>rules</em>, which can not only encode a machine-readable consensus about the meaning of the graph, but also provide automated access to implicit knowledge entailed by a graph through materialisation or query rewriting.</li>
			<li><em>Graph algorithms</em>, such as centrality measures, community detection, clustering, etc., can be applied on the data to gain insights about influential entities or edges, close-knit sub-graphs of entities, and more besides, with <em>graph parallel frameworks</em> capable of applying such algorithms at large scale.</li>
			<li>Recent and continual advances in <em>knowledge graph embeddings</em> and <em>graph neural networks</em> have now opened up new possibilities for applying machine learning natively over graphs in the context of diverse tasks, including classification, question answering, recommendations, and more besides.</li>
			<li><em>Rule and axiom mining</em> techniques allow for extracting formal, declarative hypotheses from a knowledge graph that encode high-level patterns and can be applied to derive new knowledge in a deductive, explainable manner.</li>
			<li><em>Graph-based information extraction</em> can be applied to extract and/or enrich a knowledge graph from legacy sources of text and semi-structured data, while <em>graph-based mapping languages</em> facilitate integrating diverse sources of legacy structured data into the knowledge graph.</li>
			<li>Tools, techniques and methodologies for <em>ontology engineering</em> and <em>ontology learning</em> can further guide the – potentially collaborative – creation of an ontology for the knowledge graph, encoding a consensus about its semantics, and enabling access to implicit knowledge through deductive reasoning.</li>
			<li><em>Quality dimensions and metrics for knowledge graphs</em> allow for systematically assessing the readiness of the knowledge graph for its envisaged applications, in both a qualitative and quantitative manner, where a variety of tools and frameworks are available to help perform such assessments.</li>
			<li>Knowledge graphs that have been integrated from diverse sources are likely to be incomplete, or to encode incorrect data, where techniques and tools for <em>knowledge graph refinement</em> facilitate the automated completion and correction of knowledge graphs, thus improving its overall quality and usefulness.</li>
			<li>For the purposes of publishing open knowledge graphs, <em>principles &amp; best practices</em> and <em>access protocols</em>, as well as techniques for <em>linking</em>, <em>licensing</em>, <em>access &amp; usage control</em>, <em>encryption</em> and <em>anonymisation</em>, can be leveraged to maximise their potential impact on society in an ethical way.</li>
		</ul>
		<p>As we have discussed in Chapter&nbsp;<a href="#chap-kgs">10</a>, the various components of this “knowledge graph tool-box” can already be found deployed in practice, having been applied – to varying degrees – in the context of numerous open and enterprise knowledge graphs. As adoption of knowledge graphs continues, work will also continue on improving and combining these tools, as well as on developing novel tools that help to better integrate and extract value from diverse sources of data at large scale.</p>
		<p><em class="paragraph">Future directions</em>. Research on knowledge graphs involves a confluence of techniques from different research areas with the common objective of maximising the knowledge – and thus value – that can be distilled from diverse sources at large scale using a graph-based data abstraction&nbsp;[<a href="#ref-Hogan20">Hogan, 2020a</a>].</p>
		<p>In the intersection of data graphs and deductive knowledge, we emphasise emerging topics such as <em>formal semantics for property graphs</em>, with languages that can take into account the meaning of labels and property–value pairs on nodes and edges&nbsp;[<a href="#ref-Krotzsch0OT18">Krötzsch et al., 2018</a>]; and <em>reasoning and querying over contextual data</em>, in order to derive conclusions and results valid in a particular setting&nbsp;[<a href="#ref-SerafiniH12">Serafini and Homola, 2012</a>, <a href="#ref-zimm-etal-2012-JWS">Zimmermann et al., 2012</a>, <a href="#ref-SchuetzBNSS20">Schuetz et al., 2021</a>]. In the intersection of data graphs and inductive knowledge, we highlight topics such as <em>similarity-based query relaxation</em>, allowing to find approximate answers to exact queries based on numerical representations (e.g., embeddings)&nbsp;[<a href="#ref-WangWLCZQ18">Wang et al., 2018</a>]; <em>shape induction</em>, in order to learn and formalise inherent patterns in the knowledge graph as constraints&nbsp;[<a href="#ref-Mihindukulasooriya18">Mihindukulasooriya et al., 2018</a>]; and <em>contextual knowledge graph embeddings</em> that provide numeric representations of nodes and edges that vary with time, place, etc.&nbsp;[<a href="#ref-KazemiGJKSFP19">Kazemi et al., 2019</a>]. In the intersection of deductive and inductive knowledge, we mention the topics of <em>entailment-aware knowledge graph embeddings</em>&nbsp;[<a href="#ref-GuoWWWG16">Guo et al., 2016</a>, <a href="#ref-DemeesterRR16">Demeester et al., 2016</a>], that incorporate rules and/or ontologies when computing plausibility; <em>expressive graph neural networks</em> proven capable of complex classification analogous to expressive ontology languages&nbsp;[<a href="#ref-BarceloKMPRS20">Barceló et al., 2020</a>]; as well as further advances on <em>rule and axiom mining</em>, allowing to extract symbolic, deductive representations from the knowledge graphs&nbsp;[<a href="#ref-GalarragaTHS15">Galárraga et al., 2015</a>, <a href="#ref-BuhmannLW16">Bühmann et al., 2016</a>]. Further challenges arise when considering the creation, enrichment, refinement and publication of knowledge graphs, which call for further works on topics such as <em>automated quality assessment (and repair)</em>, <em>distantly-supervised extraction frameworks</em>, <em>efficient access protocols</em>, and <em>anonymisation</em>, to name but a few.</p>
		<p>Aside from specific topics, more general challenges for knowledge graphs include <em>scalability</em>, particularly for deductive and inductive reasoning; <em>quality</em>, not only in terms of data, but also the models induced from knowledge graphs; <em>diversity</em>, such as managing contextual or multi-modal data; <em>dynamicity</em>, considering temporal or streaming data; and finally <em>usability</em>, which is key to increasing adoption. Though techniques are continuously being proposed to address these challenges, they are unlikely to ever be completely “solved”; rather they serve as dimensions along which knowledge graphs, and their techniques, tools, etc., will continue to mature.</p>
		<p>Given the availability of open knowledge graphs whose quality continues to improve, as well as the growing adoption of enterprise knowledge graphs in various industries, future research on knowledge graphs has the potential to foster key advancements in broad aspects of society. Here we have highlighted just some examples of future research directions of importance to this pursuit.</p>
	</section>

	<section id="sec-references" class="prechapter">
		<h2 id="bibliography">Bibliography</h2>
		<ul class="reflist">
			<li id="ref-Abiteboul97">Serge Abiteboul. 1997. <a href="https://doi.org/10.1007/3-540-62222-5_33">Querying Semi-Structured Data</a>. In <em>Database Theory - ICDT '97, 6th International Conference, Delphi, Greece, January 8-10, 1997, Proceedings</em>, Foto N. Afrati and Phokion G. Kolaitis (Eds.). Lecture Notes in Computer Science, vol.&nbsp;1186. Springer, 1–18.</li>
			<li id="ref-agarwal2018legislative">Sushant Agarwal, Simon Steyskal, Franjo Antunovic, and Sabrina Kirrane. 2018. Legislative Compliance Assessment: Framework, Model and GDPR Instantiation. In <em>Privacy Technologies and Policy - 6th Annual Privacy Forum, APF 2018, Barcelona, Spain, June 13-14, 2018, Revised Selected Papers</em>, Manel Medina, Andreas Mitrakas, Kai Rannenberg, Erich Schweighofer, and Nikolaos Tsouroulas (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11079. Springer, 131–149.</li>
			<li id="ref-Agrawal93">Rakesh Agrawal, Tomasz Imieli&nacute;ski, and Arun Swami. 1993. <a href="https://doi.org/10.1145/170035.170072">Mining association rules between sets of items in large databases</a>. In <em>Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26-28, 1993</em>, Peter Buneman and Sushil Jajodia (Eds.). ACM Press, 207–216.</li>
			<li id="ref-AhnIEZK14">Jinhyun Ahn, Dong-Hyuk Im, Jae-Hong Eom, Nansu Zong, and Hong-Gee Kim. 2015. G-Diff: A Grouping Algorithm for RDF Change Detection on MapReduce. In <em>Semantic Technology - 4th Joint International Conference, JIST 2014, Chiang Mai, Thailand, November 9-11, 2014. Revised Selected Papers</em>, Thepchai Supnithi, Takahira Yamaguchi, Jeff Z. Pan, Vilas Wuwongse, and Marut Buranarach (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8943. Springer, 230–235.</li>
			<li id="ref-AkhterNS18">Adnan Akhter, Axel-Cyrille Ngonga Ngomo, and Muhammad Saleem. 2018. <a href="https://doi.org/10.1007/978-3-030-03667-6_1">An Empirical Evaluation of RDF Graph Partitioning Techniques</a>. In <em>Knowledge Engineering and Knowledge Management - 21st International Conference, EKAW 2018, Nancy, France, November 12-16, 2018, Proceedings</em>, Catherine Faron-Zucker, Chiara Ghidini, Amedeo Napoli, and Yannick Toussaint (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11313. Springer, 3–18.</li>
			<li id="ref-AlexanderCHZ09">Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. 2009. <a href="http://ceur-ws.org/Vol-538/ldow2009_paper20.pdf">Describing Linked Datasets</a>. In <em>Proceedings of the WWW2009 Workshop on Linked Data on the Web, LDOW 2009, Madrid, Spain, April 20, 2009</em>, Christian Bizer, Tom Heath, Tim Berners-Lee, and Michael Hausenblas (Eds.). CEUR Workshop Proceedings, vol.&nbsp;538. Sun SITE Central Europe (CEUR). 10 pages.</li>
			<li id="ref-AnglesG08">Renzo Angles and Claudio Gutierrez. 2008. <a href="https://doi.org/10.1145/1322432.1322433">Survey of graph database models</a>. <em>ACM Computing Surveys</em> 40(1), 1:1–1:39.</li>
			<li id="ref-AnglesABHRV17">Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoč. 2017. <a href="https://doi.org/10.1145/3104031">Foundations of Modern Query Languages for Graph Databases</a>. <em>ACM Computing Surveys</em> 50(5), 68:1–68:40.</li>
			<li id="ref-AnglesABBFGLPPS18">Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. <a href="https://doi.org/10.1145/3183713.3190654">G-CORE: A Core for Future Graph Query Languages</a>. In <em>Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018</em>, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM Press, 1421–1432.</li>
			<li id="ref-AnglesTT19">Renzo Angles, Harsh Thakkar, and Dominik Tomaszuk. 2019. <a href="http://ceur-ws.org/Vol-2369/paper01.pdf">RDF and Property Graphs Interoperability: Status and Issues</a>. In <em>Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asunción, Paraguay, June 3-7, 2019</em>, Aidan Hogan and Tova Milo (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2369. Sun SITE Central Europe (CEUR). 11 pages.</li>
			<li id="ref-Angles18">Renzo Angles. 2018. <a href="http://ceur-ws.org/Vol-2100/paper26.pdf">The Property Graph Database Model</a>. In <em>Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, Cali, Colombia, May 21–25, 2018</em>, Dan Olteanu and Barbara Poblete (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2100. Sun SITE Central Europe (CEUR). 10 pages.</li>
			<li id="ref-Aranguren2008">Mikel Egaña Aranguren, Erick Antezana, Martin Kuiper, and Robert Stevens. 2008. <a href="https://doi.org/10.1186/1471-2105-9-S5-S1">Ontology Design Patterns for bio-ontologies: a case study on the Cell Cycle Ontology</a>. <em>BMC Bioinformatics</em> 9(5), pS1.</li>
			<li id="ref-dm">Marcelo Arenas, Alexandre Bertails, Eric Prud'hommeaux, and Juan Sequeda. 2012. <em><a href="https://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/">A Direct Mapping of Relational Data to RDF, W3C Recommendation 27 September 2012</a></em>. W3C Recommendation. World Wide Web Consortium. September 27, 2012.</li>
			<li id="ref-ArenasGKMZ16">Marcelo Arenas, Bernardo Cuenca Grau, Evgeny Kharlamov, Sarunas Marciuska, and Dmitriy Zheleznyakov. 2016. <a href="https://doi.org/10.1016/j.websem.2015.12.002">Faceted search over RDF-based knowledge graphs</a>. <em>Journal of Web Semantics</em> 37-38, 55–74.</li>
			<li id="ref-ArtaleCKZ09">Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. 2009. <a href="https://doi.org/10.1613/jair.2820">The DL-Lite Family and Relations</a>. <em>Journal of Artificial Intelligence Research</em> 36, 1–69.</li>
			<li id="ref-auer2007dbpedia">Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In <em>The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007</em>, Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.). Lecture Notes in Computer Science, vol.&nbsp;4825. Springer, 722–735.</li>
			<li id="ref-BaaderHLS17">Franz Baader, Ian Horrocks, Carsten Lutz, and Ulrike Sattler. 2017. <em><a href="http://www.cambridge.org/de/academic/subjects/computer-science/knowledge-management-databases-and-data-mining/introduction-description-logic?format=PB17zVGeWD2TZUeu6s.97">An Introduction to Description Logic</a></em>. Cambridge University Press.</li>
			<li id="ref-BachB07">Nguyen Bach and Sameer Badaskar. 2007. <em>A Review of Relation Extraction</em>. Carnegie Mellon University.</li>
			<li id="ref-Baeza-Yates18">Ricardo Baeza-Yates. 2018. <a href="https://doi.org/10.1145/3209581">Bias on the Web</a>. <em>Communications of the ACM</em> 61(6), 54–61.</li>
			<li id="ref-framenet">Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In <em>36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL'98, August 10-14, 1998, Université de Montréal, Montréal, Quebec, Canada. Proceedings of the Conference</em>, Christian Boitet and Pete Whitelock (Eds.). Morgan Kaufmann, 86–90.</li>
			<li id="ref-Bakker">René Ronald Bakker. 1987. <em>Knowledge Graphs: Representation and Structuring of Scientific Knowledge</em>. Ph.D. dissertation. University of Twente.</li>
			<li id="ref-BalazevicAH19">Ivana Balazevic, Carl Allen, and Timothy M. Hospedales. 2019. <a href="https://proceedings.neurips.cc/paper/2019/hash/f8b932c70d0b2e6bf071729a4fa68dfc-Abstract.html">Multi-relational Poincaré Graph Embeddings</a>. In <em>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada</em>, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.), 4465–4475.</li>
			<li id="ref-BalazevicAH19a">Ivana Balazevic, Carl Allen, and Timothy M. Hospedales. 2019. <a href="https://doi.org/10.18653/v1/D19-1522">TuckER: Tensor Factorization for Knowledge Graph Completion</a>. In <em>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019</em>, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). The Association for Computational Linguistics, 5184–5193.</li>
			<li id="ref-BalazevicAH19b">Ivana Balazevic, Carl Allen, and Timothy M. Hospedales. 2019. <a href="https://doi.org/10.1007/978-3-030-30493-5_52">Hypernetwork Knowledge Graph Embeddings</a>. In <em>Artificial Neural Networks and Machine Learning - ICANN 2019 - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings - Workshop and Special Sessions</em>, Igor V. Tetko, Vera Kurková, Pavel Karpov, and Fabian J. Theis (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11731. Springer, 553–565.</li>
			<li id="ref-BankoCSBE07">Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web. In <em>IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007</em>, Manuela M. Veloso (Ed.). AAAI Press, 2670–2676.</li>
			<li id="ref-BarceloKMPRS20">Pablo Barceló, Egor V. Kostylev, Mikael Monet, Jorge Peréz, Juan Reutter, and Juan Pablo Silva. 2020. <a href="https://openreview.net/forum?id=r1lZ7AEKvB">The Logical Expressiveness of Graph Neural Networks</a>. In <em>8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020</em>. OpenReview.net. 20 pages.</li>
			<li id="ref-BastB13">Hannah Bast and Björn Buchhold. 2013. <a href="https://doi.org/10.1145/2505515.2505689">An index for efficient semantic full-text search</a>. In <em>22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013</em>, Qi He, Arun Iyengar, Wolfgang Nejdl, Jian Pei, and Rajeev Rastogi (Eds.). ACM Press, 369–378.</li>
			<li id="ref-BatiniS16">Carlo Batini and Monica Scannapieco. 2016. <em><a href="https://doi.org/10.1007/978-3-319-24106-7">Data and Information Quality - Dimensions, Principles and Techniques</a></em>. Data-Centric Systems and Applications. Springer.</li>
			<li id="ref-BatiniRSV15">Carlo Batini, Anisa Rula, Monica Scannapieco, and Gianluigi Viscusi. 2015. <a href="https://doi.org/10.4018/JDM.2015010103">From Data Quality to Big Data Quality</a>. <em>Journal of Database Management</em> 26(1), 60–82.</li>
			<li id="ref-BellomariniSG18">Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. <a href="http://www.vldb.org/pvldb/vol11/p975-bellomarini.pdf">The Vadalog System: Datalog-based Reasoning for Knowledge Graphs</a>. <em>Proceedings of the VLDB Endowment</em> 11(9), 975–987.</li>
			<li id="ref-BellomariniFGS19">Luigi Bellomarini, Daniele Fakhoury, Georg Gottlob, and Emanuel Sallinger. 2019. <a href="https://doi.org/10.1109/ICDE.2019.00011">Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology</a>. In <em>35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019</em>. IEEE Computer Society, 26–37.</li>
			<li id="ref-AstraZenecaKG">Claus Bendtsen and Slavé Petrovski. 2019. How data and AI are helping unlock the secrets of disease. AstraZeneca Blog. November, 2019.</li>
			<li id="ref-Bergman19">Michael K. Bergman. 2019. <a href="http://www.mkbergman.com/2244/a-common-sense-view-of-knowledge-graphs/">A Common Sense View of Knowledge Graphs</a>. Adaptive Information, Adaptive Innovation, Adaptive Infrastructure Blog. July 1, 2019.</li>
			<li id="ref-n3">Tim Berners-Lee and Dan Connolly. 2011. <em><a href="https://www.w3.org/TeamSubmission/2011/SUBM-n3-20110328/">Notation3 (N3): A readable RDF syntax</a></em>. W3C Team Submission. World Wide Web Consortium. March 28, 2011.</li>
			<li id="ref-berners-lee01">Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The Semantic Web. <em>Scientific American</em> 284(5), 34–43.</li>
			<li id="ref-ldprinciples">Tim Berners-Lee. 2006. <a href="https://www.w3.org/DesignIssues/LinkedData.html">Linked Data</a>. W3C Design Issues. July, 2006.</li>
			<li id="ref-BhardwajBCDEMP15">Anant P. Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, and Aditya G. Parameswaran. 2015. <a href="http://cidrdb.org/cidr2015/Papers/CIDR15_Paper18.pdf">DataHub: Collaborative Data Science & Dataset Version Management at Scale</a>. In <em>CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings</em>. www.cidrdb.org. 7 pages.</li>
			<li id="ref-BikelSW99">Daniel M. Bikel, Richard M. Schwartz, and Ralph M. Weischedel. 1999. <a href="https://doi.org/10.1023/A:1007558221122">An Algorithm that Learns What's in a Name</a>. <em>Machine Learning</em> 34(1–3), 211–231.</li>
			<li id="ref-BishofDKLP12">Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, and Axel Polleres. 2012. <a href="https://doi.org/10.1007/s13740-012-0008-7">Mapping between RDF and XML with XSPARQL</a>. <em>Journal of Web Semantics</em> 1(3), 147–185.</li>
			<li id="ref-bizer2009dbpedia">Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. <a href="https://doi.org/10.1016/j.websem.2009.07.002">DBpedia - A crystallization point for the Web of Data</a>. <em>Journal of Web Semantics</em> 7(3), 154–165.</li>
			<li id="ref-blomqvist2005patterns">Eva Blomqvist and Kurt Sandkuhl. 2005. Patterns in Ontology Engineering: Classification of Ontology Patterns. In <em>ICEIS 2005, Proceedings of the Seventh International Conference on Enterprise Information Systems, Miami, USA, May 25-28, 2005</em>, Chin-Sheng Chen, Joaquim Filipe, Isabel Seruca, and José Cordeiro (Eds.), vol.&nbsp;3, 413–416.</li>
			<li id="ref-blomqvist2012ontology">Eva Blomqvist, Azam Seil Sepour, and Valentina Presutti. 2012. Ontology Testing - Methodology and Tool. In <em>Knowledge Engineering and Knowledge Management - 18th International Conference, EKAW 2012, Galway City, Ireland, October 8-12, 2012. Proceedings</em>, Annette ten Teije, Johanna Völker, Siegfried Handschuh, Heiner Stuckenschmidt, Mathieu d'Aquin, Andriy Nikolov, Nathalie Aussenac-Gilles, and Nathalie Hernandez (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7603. Springer, 216–226.</li>
			<li id="ref-Blomqvist2016">Eva Blomqvist, Karl Hammar, and Valentina Presutti. 2016. Engineering Ontologies with Patterns – The eXtreme Design Methodology. In <em>Ontology Engineering with Ontology Design Patterns</em>, Pascal Hitzler, Aldo Gangemi, Krzysztof Janowicz, Adila Krisnadhi, and Valentina Presutti (Eds.). Studies on the Semantic Web, vol.&nbsp;25. IOS Press.</li>
			<li id="ref-bollacker2007platform">Kurt Bollacker, Patrick Tufts, Tomi Pierce, and Robert Cook. 2007. A platform for scalable, collaborative, structured information integration. In <em>Intl. Workshop on Information Integration on the Web (IIWeb’07)</em>, Ullas Nambiar and Zaiqing Nie (Eds.). 6 pages.</li>
			<li id="ref-bollacker2007freebase">Kurt Bollacker, Robert Cook, and Patrick Tufts. 2007. <a href="http://www.aaai.org/Library/AAAI/2007/aaai07-355.php">Freebase: A Shared Database of Structured General Human Knowledge</a>. In <em>Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, July 22-26, 2007, Vancouver, British Columbia, Canada</em>. AAAI Press, 1962–1963.</li>
			<li id="ref-bonatti2019big">Piero A. Bonatti and Sabrina Kirrane. 2019. Big Data and Analytics in the Age of the GDPR. In <em>2019 IEEE International Congress on Big Data, BigData Congress 2019, Milan, Italy, July 8-13, 2019</em>, Elisa Bertino, Carl K. Chang, Peter Chen, Ernesto Damiani, Michael Goul, and Katsunori Oyama (Eds.). IEEE Computer Society, 7–16.</li>
			<li id="ref-BonattiHPS11">Piero A. Bonatti, Aidan Hogan, Axel Polleres, and Luigi Sauro. 2011. <a href="https://doi.org/10.1016/j.websem.2011.06.003">Robust and scalable Linked Data reasoning incorporating provenance and trust annotations</a>. <em>Journal of Web Semantics</em> 9(2), 165–201.</li>
			<li id="ref-BonattiDPP18">Piero Andrea Bonatti, Stefan Decker, Axel Polleres, and Valentina Presutti. 2018. <a href="https://doi.org/10.4230/DagRep.8.9.29">Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371)</a>. <em>Dagstuhl Reports</em> 8(9), 29–111.</li>
			<li id="ref-special">Piero Bonatti, Sabrina Kirrane, Iliana Mineva Petrova, Luigi Sauro, and Eva Schlehahn. 2019. <em><a href="https://ai.wu.ac.at/policies/policylanguage/">The SPECIAL Usage Policy Language, V1.0</a></em>. Draft. Vienna University of Economics and Business. December 31, 2019.</li>
			<li id="ref-Boneva2017">Iovka Boneva, Jose Emilio Labra Gayo, and Eric G. Prud'hommeaux. 2017. <a href="https://doi.org/10.1007/978-3-319-68288-4_7">Semantics and Validation of Shapes Schemas for RDF</a>. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10587. Springer, 104–120.</li>
			<li id="ref-BonevaDFG19">Iovka Boneva, Jérémie Dusart, Daniel Fernández-álvarez, and José Emilio Labra Gayo. 2019. <a href="http://ceur-ws.org/Vol-2456">Shape Designer for ShEx and SHACL constraints</a>. In <em>Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 26-30, 2019</em>, Mari Carmen Suárez-Figueroa, Gong Cheng, Anna Lisa Gentile, Christophe Guéret, C. Maria Keet, and Abraham Bernstein (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2456. Sun SITE Central Europe (CEUR), 269–272.</li>
			<li id="ref-BonifatiMT17">Angela Bonifati, Wim Martens, and Thomas Timm. 2017. <a href="http://www.vldb.org/pvldb/vol11/p149-bonifati.pdf">An Analytical Study of Large SPARQL Query Logs</a>. <em>Proceedings of the VLDB Endowment</em> 11(2), 149–161.</li>
			<li id="ref-bordes2013translating">Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. <a href="https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html">Translating Embeddings for Modeling Multi-relational Data</a>. In <em>Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States</em>, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.), 2787–2795.</li>
			<li id="ref-bouma2009normalized">Gerlof Bouma. 2009. Normalized (Pointwise) Mutual Information in Collocation Extraction. In <em>Von der Form zur Bedeutung: Texte automatisch verarbeiten - From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009</em>, Christian Chiarcos, Richard Eckart de Castilho, and Manfred Stede (Eds.). Gunter Narr Verlag, 31–40.</li>
			<li id="ref-BrachmanL85">Ronald J. Brachman and Hector J. Levesque. 1986. The Knowledge Level of a KBMS. In <em>On Knowledge Base Management Systems: Integrating Artificial Intelligence and Database Technologies, Book resulting from the Islamorada Workshop 1985 (Islamorada, FL, USA)</em>, Michael L. Brodie and John Mylopoulos (Eds.). Topics in Information Systems. Springer, 9–12.</li>
			<li id="ref-BrachmanS85">Ronald J. Brachman and James G. Schmolze. 1985. An Overview of the KL-ONE Knowledge Representation System. <em>Cognitive Science</em> 9(2), 171–216.</li>
			<li id="ref-Brachman">Ronald J. Brachman. 1977. <em>A structural paradigm for representing knowledge</em>. Ph.D. dissertation. Harvard University.</li>
			<li id="ref-CapitalOneKG">Patricia Branum and Bethany Sehon. 2019. Knowledge Graph Pilot Improves Data Quality While Providing a Customer 360 View. Invited talk at the Knowledge Graph Conference.</li>
			<li id="ref-RDFS">Dan Brickley and R. V. Guha. 2014. <em><a href="https://www.w3.org/TR/2014/REC-rdf-schema-20140225/">RDF Schema 1.1</a></em>. W3C Recommendation. World Wide Web Consortium. February 25, 2014.</li>
			<li id="ref-BrunaZSL13">Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. <a href="http://arxiv.org/abs/1312.6203">Spectral Networks and Locally Connected Networks on Graphs</a>. In <em>2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings</em>, Yoshua Bengio and Yann LeCun (Eds.). OpenReview.net.</li>
			<li id="ref-BuchananF78">Bruce G. Buchanan and Edward A. Feigenbaum. 1978. Dendral and Meta-Dendral: Their Applications Dimension. <em>Artificial Intelligence</em> 11(1–2), 5–24.</li>
			<li id="ref-BuhmannLW16">Lorenz Bühmann, Jens Lehmann, and Patrick Westphal. 2016. <a href="https://doi.org/10.1016/j.websem.2016.06.001">DL-Learner – A framework for inductive learning on the Semantic Web</a>. <em>Journal of Web Semantics</em> 39, 15–24.</li>
			<li id="ref-ArandaACP13">Carlos Buil-Aranda, Marcelo Arenas, Óscar Corcho, and Axel Polleres. 2013. <a href="https://doi.org/10.1016/j.websem.2012.10.001">Federating queries in SPARQL 1.1: Syntax, semantics and evaluation</a>. <em>Journal of Web Semantics</em> 18(1), 1–17.</li>
			<li id="ref-ArandaHUV13">Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich, and Pierre-Yves Vandenbussche. 2013. SPARQL Web-Querying Infrastructure: Ready for Action?. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part II</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8219. Springer, 277–293.</li>
			<li id="ref-buitelaar2005ontology">Paul Buitelaar, Philipp Cimiano, and Bernardo Magnini (Eds.). 2005. <em>Ontology learning from text: methods, evaluation and applications</em>. Frontiers in Artificial Intelligence and Applications, vol.&nbsp;123. IOS Press.</li>
			<li id="ref-BunescuM05">Razvan C. Bunescu and Raymond J. Mooney. 2005. <a href="http://papers.nips.cc/book/advances-in-neural-information-processing-systems-18-2005">Subsequence Kernels for Relation Extraction</a>. In <em>Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada]</em>, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.), 171–178.</li>
			<li id="ref-BunescuM07">Razvan C. Bunescu and Raymond J. Mooney. 2007. Learning to Extract Relations from the Web using Minimal Supervision. In <em>ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic</em>, John A. Carroll, Antal van den Bosch, and Annie Zaenen (Eds.). The Association for Computational Linguistics, 576–583.</li>
			<li id="ref-CabrioAV14">Elena Cabrio, Alessio Palmero Aprosio, and Serena Villata. 2014. <a href="https://doi.org/10.1007/978-3-319-07443-6_18">These Are Your Rights - A Natural Language Processing Approach to Automated RDF Licenses Generation</a>. In <em>The Semantic Web: Trends and Challenges - 11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014. Proceedings</em>, Valentina Presutti, Claudia d'Amato, Fabien Gandon, Mathieu d'Aquin, Stephen Staab, and Anna Tordia (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8465. Springer, 255–269.</li>
			<li id="ref-CafarellaHWWZ08">Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. <a href="http://www.vldb.org/pvldb/vol1/1453916.pdf">WebTables: exploring the power of tables on the web</a>. <em>Proceedings of the VLDB Endowment</em> 1(1), 538–549.</li>
			<li id="ref-CaiZC18">HongYun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. <a href="https://doi.org/10.1109/TKDE.2018.2807452">A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 30(9), 1616–1637.</li>
			<li id="ref-CallahanCAD13">Alison Callahan, Jose Cruz-Toledo, Peter Ansell, and Michel Dumontier. 2013. <a href="https://doi.org/10.1007/978-3-642-38288-8_14">Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data</a>. In <em>The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings</em>, Philipp Cimiano, Óscar Corcho, Valentina Presutti, Laura Hollink, and Sebastian Rudolph (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7882. Springer, 200–212.</li>
			<li id="ref-Car2019">Nicholas J. Car, Paul J. Box, and Ashley Sommer. 2019. <a href="https://doi.org/10.1007/978-3-030-21348-0_35">The Location Index: A Semantic Web Spatial Data Infrastructure</a>. In <em>The Semantic Web - 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2-6, 2019, Proceedings</em>, Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J. G. Gray, Vanessa López, Armin Haller, and Karl Hammar (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11503. Springer, 543–557.</li>
			<li id="ref-CebiricGKKMTZ19">Šejla Čebiri&cacute;, François Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou, and Mussab Zneika. 2019. <a href="https://doi.org/10.1007/s00778-018-0528-3">Summarizing semantic graphs: a survey</a>. <em>The Very Large Data Base Journal</em> 28(3), 295–327.</li>
			<li id="ref-CeriGT89">Stefano Ceri, Georg Gottlob, and Letizia Tanca. 1989. <a href="https://doi.org/10.1109/69.43410">What you Always Wanted to Know About Datalog (And Never Dared to Ask)</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 1(1), 146–166.</li>
			<li id="ref-AirBnBKG">Spencer Chang. 2018. <a href="https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95">Scaling Knowledge Access and Retrieval at Airbnb</a>. AirBnB Medium Blog. September 4, 2018.</li>
			<li id="ref-ChiticariuLR13">Laura Chiticariu, Yunyao Li, and Frederick R. Reiss. 2013. <a href="https://aclanthology.org/D13-1079/">Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!</a>. In <em>Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL</em>. The Association for Computational Linguistics, 827–832.</li>
			<li id="ref-ChiticariuDLRZ18">Laura Chiticariu, Marina Danilevsky, Yunyao Li, Frederick Reiss, and Huaiyu Zhu. 2018. SystemT: Declarative Text Understanding for Enterprise. In <em>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 3 (Industry Papers)</em>, Srinivas Bangalore, Jennifer Chu-Carroll, and Yunyao Li (Eds.). The Association for Computational Linguistics, 76–83.</li>
			<li id="ref-ciampaglia2015computational">Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. 2015. Computational fact checking from knowledge networks. <em>PLOS One</em> 10(6), pe0128193.</li>
			<li id="ref-cimiano2006ontology">Philipp Cimiano. 2006. Ontology Learning from Text. In <em>Ontology Learning and Population from Text: Algorithms, Evaluation and Applications</em>. Springer, 19–34.</li>
			<li id="ref-ClemmerD11">Aaron Clemmer and Stephen Davies. 2011. <a href="https://doi.org/10.1007/978-3-642-23088-2_21">Smeagol: A "Specific-to-General" Semantic Web Query Interface Paradigm for Novices</a>. In <em>Database and Expert Systems Applications - 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I</em>, Abdelkader Hameurlain, Stephen W. Liddle, Klaus-Dieter Schewe, and Xiaofang Zhou (Eds.). Springer, 288–302.</li>
			<li id="ref-cochez2017biased">Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. <a href="https://doi.org/10.1145/3102254.3102279">Biased Graph Walks for RDF Graph Embeddings</a>. In <em>Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, WIMS 2017, Amantea, Italy, June 19-22, 2017</em>, Rajendra Akerkar, Alfredo Cuzzocrea, Jannong Cao, and Mohand-Said Hacid (Eds.). ACM Press, 21:1–21:12.</li>
			<li id="ref-cochez2017global">Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. <a href="https://doi.org/10.1007/978-3-319-68288-4_12">Global RDF Vector Space Embeddings</a>. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10587. Springer, 190–207.</li>
			<li id="ref-CollaranaG0GVA16">Diego Collarana, Mikhail Galkin, Christoph Lange, Irlán Grangel-González, Maria-Esther Vidal, and Sören Auer. 2016. FuhSen: A Federated Hybrid Search Engine for Building a Knowledge Graph On-Demand (Short Paper). In <em>On the Move to Meaningful Internet Systems: OTM 2016 Conferences - Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24-28, 2016, Proceedings</em>, Christophe Debruyne, Hervé Panetto, Robert Meersman, Tharam S. Dillon, eva Kühn, Declan O'Sullivan, and Claudio Agostino Ardagna (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10033. Springer, 752–761.</li>
			<li id="ref-CollinsS99">Michael Collins and Yoram Singer. 1999. <a href="https://www.aclweb.org/anthology/W99-0613/">Unsupervised Models for Named Entity Classification</a>. In <em>Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP 1999, College Park, MD, USA, June 21-22, 1999</em>. The Association for Computational Linguistics. 11 pages.</li>
			<li id="ref-grddl">Dan Connolly. 2007. <em><a href="https://www.w3.org/TR/2007/REC-grddl-20070911/">Gleaning Resource Descriptions from Dialects of Languages (GRDDL), W3C Recommendation 11 September 2007</a></em>. W3C Recommendation. World Wide Web Consortium. September 11, 2007.</li>
			<li id="ref-ConsensM90">Mariano P. Consens and Alberto O. Mendelzon. 1990. <a href="https://doi.org/10.1145/298514.298591">GraphLog: a Visual Formalism for Real Life Recursion</a>. In <em>Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, April 2-4, 1990, Nashville, Tennessee, USA</em>, Daniel J. Rosenkrantz and Yehoshua Sagiv (Eds.). ACM Press, 404–416.</li>
			<li id="ref-uniprot2014">The UniProt Consortium. 2014. UniProt: a hub for protein information. <em>Nucleic Acids Research</em> 43(D1), D204–D212.</li>
			<li id="ref-CorbyF10">Olivier Corby and Catherine Faron-Zucker. 2010. The KGRAM Abstract Machine for Knowledge Graph Querying. In <em>2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, Toronto, Canada, August 31 - September 3, 2010, Main Conference Proceedings</em>, Jimmy Xiangji Huang, Irwin King, Vijay V. Raghavan, and Stefan Rueger (Eds.). IEEE Computer Society, 338–341.</li>
			<li id="ref-CorcoglionitiRA16">Francesco Corcoglioniti, Marco Rospocher, and Alessio Palmero Aprosio. 2016. <a href="https://doi.org/10.1109/TKDE.2016.2602206">Frame-Based Ontology Population with PIKES</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 28(12), 3261–3275.</li>
			<li id="ref-Corman2018b">Julien Corman, Juan L. Reutter, and Ognjen Savkovi&cacute;. 2018. <a href="https://doi.org/10.1007/978-3-030-00671-6_19">Semantics and Validation of Recursive SHACL</a>. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 318–336.</li>
			<li id="ref-CormanFRS19a">Julien Corman, Fernando Florenzano, Juan L. Reutter, and Ognjen Savkovic. 2019. <a href="https://doi.org/10.1007/978-3-030-30793-6_9">Validating SHACL Constraints over a SPARQL Endpoint</a>. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11778. Springer, 145–163.</li>
			<li id="ref-Costabello2012">Luca Costabello, Serena Villata, Nicolas Delaforge, and Fabien Gandon. 2012. <a href="http://ceur-ws.org/Vol-937/ldow2012-paper-05.pdf">Linked Data Access Goes Mobile: Context-Aware Authorization for Graph Stores</a>. In <em>WWW2012 Workshop on Linked Data on the Web, Lyon, France, 16 April, 2012</em>, Christian Bizer, Tom Heath, Tim Berners-Lee, and Michael Hausenblas (Eds.). CEUR Workshop Proceedings, vol.&nbsp;937. Sun SITE Central Europe (CEUR). 8 pages.</li>
			<li id="ref-CourseyM09">Kino Coursey and Rada Mihalcea. 2009. Topic Identification Using Wikipedia Graph Centrality. In <em>Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31 - June 5, 2009, Boulder, Colorado, USA, Short Papers</em>. The Association for Computational Linguistics, 117–120.</li>
			<li id="ref-timeOnt">Simon Cox, Chris Little, Jerry R. Hobbs, and Feng Pan. 2017. <em><a href="https://www.w3.org/TR/2017/REC-owl-time-20171019/">Time Ontology in OWL</a></em>. W3C Recommendation / OGC 16-071r2. World Wide Web Consortium and Open Geospatial Consortium. October 19, 2017.</li>
			<li id="ref-CrestanP11">Eric Crestan and Patrick Pantel. 2011. Web-scale table census and classification. In <em>Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9-12, 2011</em>, Irwin King, Wolfgang Nejdl, and Hang Li (Eds.). ACM Press, 545–554.</li>
			<li id="ref-rdf11">Richard Cyganiak, David Wood, and Markus Lanthaler. 2014. <em><a href="https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/">RDF 1.1 Concepts and Abstract Syntax</a></em>. W3C Recommendation. World Wide Web Consortium. February 25, 2014.</li>
			<li id="ref-DagaPS08">Enrico Daga, Valentina Presutti, and Alberto Salvati. 2008. <a href="http://ceur-ws.org/Vol-426/swap2008_submission_63.pdf">http://ontologydesignpatterns.org and Evaluation WikiFlow</a>. In <em>Proceedings of the 5th Workshop on Semantic Web Applications and Perspectives (SWAP2008), Rome, Italy, December 15-17, 2008</em>, Aldo Gangemi, Johannes Keizer, Valentina Presutti, and Heiko Stoermer (Eds.). CEUR Workshop Proceedings, vol.&nbsp;426. Sun SITE Central Europe (CEUR), 1–11.</li>
			<li id="ref-MaanaKG">Jeff Dalgliesh. 2016. <a href="https://www.maana.io/blog/enterprise-knowledge-graph-connects-oil-gas-data-silos/">How the Enterprise Knowledge Graph Connects Oil and Gas Data Silos</a>. Maana Blog. May 20, 2016.</li>
			<li id="ref-dAmatoSTMG16">Claudia d'Amato, Steffen Staab, Andrea G. B. Tettamanzi, Duc Minh Tran, and Fabien L. Gandon. 2016. <a href="https://doi.org/10.1145/2851613.2851842">Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases</a>. In <em>Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016</em>, Sascha Ossowski (Ed.). ACM Press, 333–338.</li>
			<li id="ref-dAmatoTM16">Claudia d'Amato, Andrea G. B. Tettamanzi, and Minh Duc Tran. 2016. <a href="https://doi.org/10.1007/978-3-319-49004-5_8">Evolutionary Discovery of Multi-relational Association Rules from Ontological Knowledge Bases</a>. In <em>Knowledge Engineering and Knowledge Management - 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings</em>, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10024. Springer, 113–128.</li>
			<li id="ref-DarariNPR18">Fariz Darari, Werner Nutt, Giuseppe Pirrò, and Simon Razniewski. 2018. <a href="https://doi.org/10.1145/3196248">Completeness Management for RDF Data Sources</a>. <em>ACM Transactions on the Web</em> 12(3), 18:1–18:53.</li>
			<li id="ref-r2rml">Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. <em><a href="https://www.w3.org/TR/2012/REC-r2rml-20120927/">R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012</a></em>. W3C Recommendation. World Wide Web Consortium. September 27, 2012.</li>
			<li id="ref-DaveJLXGZ16">Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, and Matei Zaharia. 2016. <a href="https://doi.org/10.1145/2960414.2960416">GraphFrames: an integrated API for mixing graph and relational queries</a>. In <em>Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, Redwood Shores, CA, USA, June 24 - 24, 2016</em>, Peter A. Boncz and Josep-Lluís Larriba-Pey (Eds.). ACM Press, 2:1–2:8.</li>
			<li id="ref-Lexvo">Gerard de Melo. 2015. <a href="https://doi.org/10.3233/SW-150171">Lexvo.org: Language-Related Information for the Linguistic Linked Data Cloud</a>. <em>Semantic Web Journal</em> 6(4), 393–400.</li>
			<li id="ref-de1990hybrid">Luc De Raedt, Bart Vandersmissen, Marc Denecker, and Maurice Bruynooghe. 1990. A hybrid approach to learning and its knowledge representation. In <em>Proceedings of the third COGNITIVA symposium on At the crossroads of artificial intelligence, cognitive science, and neuroscience</em>. Elsevier, 409–416.</li>
			<li id="ref-DeRaedt08">Luc De Raedt (Ed.). 2008. <em><a href="https://doi.org/10.1007/978-3-540-68856-3">Logical and Relational Learning</a></em>. Springer.</li>
			<li id="ref-devos2019ODRL">Marina De Vos, Sabrina Kirrane, Julian Padget, and Ken Satoh. 2019. ODRL policy modelling and compliance checking. In <em>Rules and Reasoning - Third International Joint Conference, RuleML+RR 2019, Bolzano, Italy, September 16-19, 2019, Proceedings</em>, Paul Fodor, Marco Montali, Diego Calvanese, and Dumitru Roman (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11784. Springer, 36–51.</li>
			<li id="ref-DelanauxBRT18">Remy Delanaux, Angela Bonifati, Marie-Christine Rousset, and Romuald Thion. 2018. Query-Based Linked Data Anonymization. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 530–546.</li>
			<li id="ref-DemeesterRR16">Thomas Demeester, Tim Rocktäschel, and Sebastian Riedel. 2016. <a href="https://doi.org/10.18653/v1/d16-1146">Lifted Rule Injection for Relation Embeddings</a>. In <em>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016</em>, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 1389–1399.</li>
			<li id="ref-DengJLLY13">Dong Deng, Yu Jiang, Guoliang Li, Jian Li, and Cong Yu. 2013. <a href="http://www.vldb.org/pvldb/vol6/p1606-li.pdf">Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases</a>. <em>Proceedings of the VLDB Endowment</em> 6(13), 1606–1617.</li>
			<li id="ref-DettmersMS018">Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. <a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17366">Convolutional 2D Knowledge Graph Embeddings</a>. In <em>Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018</em>, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 1811–1818.</li>
			<li id="ref-IBMKG">Deepika Devarajan. 2017. <a href="https://www.ibm.com/blogs/bluemix/2017/12/happy-birthday-watson-discovery/">Happy Birthday Watson Discovery</a>. IBM Cloud Blog. December 21, 2017.</li>
			<li id="ref-DiazAB16">Gonzalo I. Diaz, Marcelo Arenas, and Michael Benedikt. 2016. <a href="https://doi.org/10.14778/3007263.3007302">SPARQLByE: Querying RDF data by example</a>. <em>Proceedings of the VLDB Endowment</em> 9(13), 1533–1536.</li>
			<li id="ref-DiefenbachBSM20">Dennis Diefenbach, Andreas Both, Kamal Singh, and Pierre Maret. 2020. <a href="https://doi.org/10.3233/SW-190343">Towards a question answering system over the Semantic Web</a>. <em>Semantic Web Journal</em> 11(3), 421–439.</li>
			<li id="ref-DiengGTC92">Rose Dieng, Alain Giboin, Paul-André Tourtier, and Olivier Corby. 1992. Knowledge Acquisition for Explainable, Multi-Expert, Knowledge-Based Design Systems. In <em>Current Developments in Knowledge Acquisition - EKAW'92, 6th European Knowledge Acquisition Workshop, Heidelberg and Kaiserslautern, Germany, May 18-22, 1992</em>, Thomas Wetter, Klaus-Dieter Althoff, John H. Boose, Brian R. Gaines, and Marc Linster (Eds.). Lecture Notes in Computer Science, vol.&nbsp;599. Springer, 298–317.</li>
			<li id="ref-DimouSSSMKW14">Anastasia Dimou, Miel Vander Sande, Jason Slepicka, Pedro A. Szekely, Erik Mannens, Craig A. Knoblock, and Rik Van de Walle. 2014. <a href="https://doi.org/10.1109/ICSC.2014.25">Mapping Hierarchical Sources into RDF Using the RML Mapping Language</a>. In <em>2014 IEEE International Conference on Semantic Computing, Newport Beach, CA, USA, June 16-18, 2014</em>. IEEE Computer Society, 151–158.</li>
			<li id="ref-Dividino09">Renata Queiroz Dividino, Sergej Sizov, Steffen Staab, and Bernhard Schueler. 2009. <a href="https://doi.org/10.1016/j.websem.2009.07.004">Querying for provenance, trust, uncertainty and other meta knowledge in RDF</a>. <em>Journal of Web Semantics</em> 7(3), 204–219.</li>
			<li id="ref-DongGHHLMSSZ14">Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. <a href="https://doi.org/10.1145/2623330.2623623">Knowledge vault: a web-scale approach to probabilistic knowledge fusion</a>. In <em>The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014</em>, Sofus A. Macskassy, Claudia Perlich, Jure Leskovec, Wei Wang, and Rayid Ghani (Eds.). ACM Press, 601–610.</li>
			<li id="ref-dong2019building">Xin Luna Dong. 2019. <a href="https://doi.org/10.1109/ICDE.2019.00010">Building a Broad Knowledge Graph for Products</a>. In <em>35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019</em>. IEEE Computer Society, 25–25.</li>
			<li id="ref-DraisbachN11">Uwe Draisbach and Felix Naumann. 2011. A generalization of blocking and windowing algorithms for duplicate detection. In <em>2011 International Conference on Data and Knowledge Engineering, ICDKE 2011, Milano, Italy, September 6, 2011</em>, Ji Zhang and Giovanni Livraga (Eds.). IEEE Computer Society, 18–24.</li>
			<li id="ref-rfc3987">Martin Dürst and Michel Suignard. 2005. <em><a href="http://www.ietf.org/rfc/rfc3987.txt">Internationalized Resource Identifiers (IRIs)</a></em>. RFC. Internet Engineering Task Force. January, 2005.</li>
			<li id="ref-dutta2014semantifying">Arnab Dutta, Christian Meilicke, and Heiner Stuckenschmidt. 2014. Semantifying Triples from Open Information Extraction Systems. In <em>STAIRS 2014 - Proceedings of the 7th European Starting AI Researcher Symposium, Prague, Czech Republic, August 18-22, 2014</em>, Ulle Endriss and João Leite (Eds.). Frontiers in Artificial Intelligence and Applications, vol.&nbsp;264. IOS Press, 111–120.</li>
			<li id="ref-Dutta2015ESKwithOI">Arnab Dutta, Christian Meilicke, and Heiner Stuckenschmidt. 2015. Enriching Structured Knowledge with Open Information. In <em>Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015</em>, Aldo Gangemi, Stefano Leonardi, and Alessandro Panconesi (Eds.). ACM Press, 267–277.</li>
			<li id="ref-Dwork:2006:DP:2097282.2097284">Cynthia Dwork. 2006. Differential Privacy. In <em>Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II</em>, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Lecture Notes in Computer Science, vol.&nbsp;4052. Springer, 1–12.</li>
			<li id="ref-markov">Eugene Dynkin. 1965. <em><a href="https://doi.org/10.1007/978-3-662-00031-1">Markov processes: Volume 1</a></em>. Springer.</li>
			<li id="ref-EberiusBHTAL15">Julian Eberius, Katrin Braunschweig, Markus Hentsch, Maik Thiele, Ahmad Ahmadov, and Wolfgang Lehner. 2015. Building the Dresden Web Table Corpus: A Classification Approach. In <em>2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015, Limassol, Cyprus, December 7-10, 2015</em>, Ioan Raicu, Omer F. Rana, and Rajkumar Buyya (Eds.). IEEE Computer Society, 41–50.</li>
			<li id="ref-Egana2008">Mikel Egaña, Alan Rector, Robert Stevens, and Erick Antezana. 2008. Applying Ontology Design Patterns in Bio-ontologies. In <em>Knowledge Engineering: Practice and Patterns, 16th International Conference, EKAW 2008, Acitrezza, Italy, September 29 - October 2, 2008. Proceedings</em>, Aldo Gangemi and Jérôme Euzenat (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5268. Springer, 7–16.</li>
			<li id="ref-EhrlingerW16">Lisa Ehrlinger and Wolfram Wöß. 2016. <a href="http://ceur-ws.org/Vol-1695/paper4.pdf">Towards a Definition of Knowledge Graphs</a>. In <em>Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS'16) co-located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016), Leipzig, Germany, September 12-15, 2016</em>, Michael Martin, Martí Cuquet, and Erwin Folmer (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1695. Sun SITE Central Europe (CEUR). 4 pages.</li>
			<li id="ref-ElbassuoniRSSW09">Shady Elbassuoni, Maya Ramanath, Ralf Schenkel, Marcin Sydow, and Gerhard Weikum. 2009. Language-model-based ranking for queries on RDF-graphs. In <em>Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2-6, 2009</em>, David Wai-Lok Cheung, Il-Yeol Song, Wesley W. Chu, Xiaohua Hu, and Jimmy J. Lin (Eds.). ACM Press, 977–986.</li>
			<li id="ref-ell2014sparql">Basil Ell, Andreas Harth, and Elena Simperl. 2014. SPARQL Query Verbalization for Explaining Semantic Search Engine Queries. In <em>The Semantic Web: Trends and Challenges - 11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014. Proceedings</em>, Valentina Presutti, Claudia d'Amato, Fabien Gandon, Mathieu d'Aquin, Stephen Staab, and Anna Tordia (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8465. Springer, 426–441.</li>
			<li id="ref-virtuoso">Orri Erling. 2012. <a href="http://sites.computer.org/debull/A12mar/vicol.pdf">Virtuoso, a Hybrid RDBMS/Graph Column Store</a>. <em>IEEE Data Engineering Bulletin</em> 35(1), 3–8.</li>
			<li id="ref-ErmilovN16">Ivan Ermilov and Axel-Cyrille Ngonga Ngomo. 2016. TAIPAN: Automatic Property Mapping for Tabular Data. In <em>Knowledge Engineering and Knowledge Management - 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings</em>, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10024. Springer, 163–179.</li>
			<li id="ref-EstevesRRL18">Diego Esteves, Anisa Rula, Aniketh Janardhan Reddy, and Jens Lehmann. 2018. <a href="https://doi.org/10.1145/3177873">Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis</a>. <em>Journal of Data and Information Quality</em> 9(3), 16:1–16:26.</li>
			<li id="ref-Estrada2011">Ernesto Estrada. 2011. <em><a href="https://doi.org/10.1093/acprof:oso/9780199591756.001.0001">The Structure of Complex Networks: Theory and Applications</a></em>. Oxford University Press, Inc..</li>
			<li id="ref-EtzioniCDKPSSWY04">Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall: (preliminary results). In <em>Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, May 17-20, 2004</em>, Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills (Eds.). ACM Press, 100–110.</li>
			<li id="ref-EtzioniFCSM11">Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. 2011. Open Information Extraction: The Second Generation. In <em>IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011</em>, Toby Walsh (Ed.). IJCAI/AAAI, 3–10.</li>
			<li id="ref-FaderSE11">Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. <a href="https://www.aclweb.org/anthology/volumes/D11-1/">Identifying Relations for Open Information Extraction</a>. In <em>Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL</em>. The Association for Computational Linguistics, 1535–1545.</li>
			<li id="ref-FanWW13">Wenfei Fan, Xin Wang, and Yinghui Wu. 2013. <a href="http://www.vldb.org/pvldb/vol6/p1510-fan.pdf">Diversified Top-$k$ Graph Pattern Matching</a>. <em>Proceedings of the VLDB Endowment</em> 6(13), 1510–1521.</li>
			<li id="ref-FanizzidE08">Nicola Fanizzi, Claudia d'Amato, and Floriana Esposito. 2008. <a href="https://doi.org/10.1007/978-3-540-85928-4_12">DL-FOIL Concept Learning in Description Logics</a>. In <em>Inductive Logic Programming, 18th International Conference, ILP 2008, Prague, Czech Republic, September 10-12, 2008, Proceedings</em>, Filip Zelezný and Nada Lavrac (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5194. Springer, 107–121.</li>
			<li id="ref-FarberBMR18">Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Rettinger. 2018. <a href="https://doi.org/10.3233/SW-170275">Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO</a>. <em>Semantic Web Journal</em> 9(1), 77–129.</li>
			<li id="ref-MAKG">Michael Färber. 2019. The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11779. Springer, 113–129.</li>
			<li id="ref-FenselSAHKPTUW20">Dieter Fensel, Umutcan Simsek, Kevin Angele, Elwin Huaman, Elias Kärle, Oleksandra Panasiuk, Ioan Toma, Jürgen Umbrich, and Alexander Wahler. 2020. <em><a href="https://doi.org/10.1007/978-3-030-37439-6">Knowledge Graphs - Methodology, Tools and Selected Use Cases</a></em>. Springer.</li>
			<li id="ref-Fernandez1997">Mariano Fernández, Asuncón Gómez-Pérez, and Natalia Juristo. 1997. METHONTOLOGY: from Ontological Art towards Ontological Engineering. In <em>Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering</em>.</li>
			<li id="ref-FernandezMGPA13">Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. 2013. <a href="https://doi.org/10.1016/j.websem.2013.01.002">Binary RDF representation for publication and exchange (HDT)</a>. <em>Journal of Web Semantics</em> 19, 22–41.</li>
			<li id="ref-FernandezKPS17">Javier D. Fernández, Sabrina Kirrane, Axel Polleres, and Simon Steyskal. 2017. Self-Enforcing Access Control for Encrypted RDF. In <em>The Semantic Web - 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Proceedings, Part I</em>, Eva Blomqvist, Diana Maynard, Aldo Gangemi, Rinke Hoekstra, Pascal Hitzler, and Olaf Hartig (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10249. Springer, 607–622.</li>
			<li id="ref-FerraraMFB14">Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. <a href="https://doi.org/10.1016/j.knosys.2014.07.007">Web data extraction, applications and techniques: A survey</a>. <em>Knowledge-based Systems</em> 70, 301–323.</li>
			<li id="ref-Ferre17">Sébastien Ferré. 2017. <a href="https://doi.org/10.3233/SW-150208">Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language</a>. <em>Semantic Web Journal</em> 8(3), 405–418.</li>
			<li id="ref-fillmore1976frame">Charles J. Fillmore. 1976. <a href="https://doi.org/10.1111/j.1749-6632.1976.tb25467.x">Frame semantics and the nature of language</a>. <em>Annals of the New York Academy of Sciences</em> 280(1), 20–32.</li>
			<li id="ref-FinkelGM05">Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In <em>ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 25-30 June 2005, University of Michigan, USA</em>, Kevin Knight, Hwee Tou Ng, and Kemal Oflazer (Eds.). The Association for Computational Linguistics, 363–370.</li>
			<li id="ref-FlescaMM04">Sergio Flesca, Giuseppe Manco, Elio Masciari, Eugenio Rende, and Andrea Tagarelli. 2004. <a href="http://content.iospress.com/articles/ai-communications/aic307">Web wrapper induction: a brief survey</a>. <em>AI Communications</em> 17(2), 57–61.</li>
			<li id="ref-Flouris2010">Giorgos Flouris, Irini Fundulaki, Maria Michou, and Grigoris Antoniou. 2010. Controlling Access to RDF Graphs. In <em>Future Internet - FIS 2010 - Third Future Internet Symposium, Berlin, Germany, September 20-22, 2010. Proceedings</em>, Arne-Jørgen Berre, Asunción Gómez-Pérez, Kurt Tutschku, and Dieter Fensel (Eds.). Lecture Notes in Computer Science, vol.&nbsp;6369. Springer, 107–117.</li>
			<li id="ref-Forgy82">Charles Forgy. 1982. <a href="https://doi.org/10.1016/0004-3702(82)90020-0">Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem</a>. <em>Artificial Intelligence</em> 19(1), 17–37.</li>
			<li id="ref-R">The R Foundation. 1992. <a href="https://www.r-project.org/">The R Project for Statistical Computing</a>. Online at <a href="https://www.r-project.org/">https://www.r-project.org/</a>.</li>
			<li id="ref-FrancisGGLLMPRS18">Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. <a href="https://doi.org/10.1145/3183713.3190657">Cypher: An Evolving Query Language for Property Graphs</a>. In <em>Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018</em>, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM Press, 1433–1445.</li>
			<li id="ref-frege">Gottlob Frege. 1879. <em>Begriffsschrift</em>. Halle.</li>
			<li id="ref-FreitasOOCS11a">André Freitas, João Gabriel Oliveira, Seán O'Riain, Edward Curry, and João Carlos Pereira da Silva. 2011. <a href="https://doi.org/10.1007/978-3-642-22327-3_40">Treo: Best-Effort Natural Language Queries over Linked Data</a>. In <em>Natural Language Processing and Information Systems - 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Alicante, Spain, June 28-30, 2011. Proceedings</em>, Rafael Muñoz, Andrés Montoyo, and Elisabeth Métais (Eds.). Lecture Notes in Computer Science, vol.&nbsp;6716. Springer, 286–289.</li>
			<li id="ref-Furber">Christian Fürber and Martin Hepp. 2011. <a href="http://aisel.aisnet.org/ecis2011/">SWIQA - a semantic web information quality assessment framework</a>. In <em>19th European Conference on Information Systems, ECIS 2011, Helsinki, Finland, June 9-11, 2011</em>, Virpi Kristiina Tuunainen, Matti Rossi, and Joe Nandhakumar (Eds.), p76.</li>
			<li id="ref-Gad-ElrabSUW16">Mohamed H. Gad-Elrab, Daria Stepanova, Jacopo Urbani, and Gerhard Weikum. 2016. <a href="https://doi.org/10.1007/978-3-319-46523-4_15">Exception-Enriched Rule Learning from Knowledge Graphs</a>. In <em>The Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part I</em>, Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9981. Springer, 234–251.</li>
			<li id="ref-GalarragaTHS13">Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. <a href="https://doi.org/10.1145/2488388.2488425">AMIE: association rule mining under incomplete evidence in ontological knowledge bases</a>. In <em>22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13-17, 2013</em>, Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo Baeza-Yates, and Sue B. Moon (Eds.). ACM Press, 413–422.</li>
			<li id="ref-GalarragaTHS15">Luis Galárraga, Chistina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. <a href="https://doi.org/10.1007/s00778-015-0394-1">Fast rule mining in ontological knowledge bases with AMIE+</a>. <em>The Very Large Data Base Journal</em> 24(6), 707–730.</li>
			<li id="ref-galland2010">Alban Galland, Serge Abiteboul, Amélie Marian, and Pierre Senellart. 2010. Corroborating Information from Disagreeing Views. In <em>Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, February 4-6, 2010</em>, Brian D. Davison, Torsten Suel, Nick Craswell, and Bing Liu (Eds.). ACM Press, 131–140.</li>
			<li id="ref-rdfxml11">Fabien Gandon and Guus Schreiber. 2014. <em><a href="https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/">RDF 1.1 XML Syntax, W3C Recommendation 25 February 2014</a></em>. W3C Recommendation. World Wide Web Consortium. February 25, 2014.</li>
			<li id="ref-GangemiPRNDM17">Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Giovanni Nuzzolese, Francesco Draicchio, and Misael Mongiovì. 2017. <a href="https://doi.org/10.3233/SW-160240">Semantic Web Machine Reading with FRED</a>. <em>Semantic Web Journal</em> 8(6), 873–893.</li>
			<li id="ref-gangemi2005ontology">Aldo Gangemi. 2005. Ontology design patterns for semantic web content. In <em>The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings</em>, Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark A. Musen (Eds.). Lecture Notes in Computer Science, vol.&nbsp;3729. Springer, 262–276.</li>
			<li id="ref-gardent2017webnlg">Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. The WebNLG challenge: Generating text from RDF data. In <em>Proceedings of the 10th International Conference on Natural Language Generation, INLG 2017, Santiago de Compostela, Spain, September 4-7, 2017</em>, José M. Alonso, Alberto Bugarín, and Ehud Reiter (Eds.). The Association for Computational Linguistics, 124–133.</li>
			<li id="ref-Gelfond88">Michael Gelfond and Vladimir Lifschitz. 1988. The Stable Model Semantics for Logic Programming. In <em>Logic Programming, Proceedings of the Fifth International Conference and Symposium, Seattle, Washington, USA, August 15-19, 1988 (2 Volumes)</em>, Robert A. Kowalski and Kenneth A. Bowen (Eds.). The MIT Press, 1070–1080.</li>
			<li id="ref-GellerCA08">James Geller, Soon Ae Chun, and Yoo Jung An. 2008. <a href="https://doi.org/10.1109/MC.2008.402">Toward the Semantic Deep Web</a>. <em>IEEE Computer</em> 41(9), 95–97.</li>
			<li id="ref-GentileZC14">Anna Lisa Gentile, Ziqi Zhang, and Fabio Ciravegna. 2014. Self Training Wrapper Induction with Linked Data. In <em>Text, Speech and Dialogue - 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings</em>, Petr Sojka, Ales Horák, Ivan Kopecek, and Karel Pala (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8655. Springer, 285–292.</li>
			<li id="ref-GentileGRW19">Anna Lisa Gentile, Daniel Gruhl, Petar Ristoski, and Steve Welch. 2019. Personalized Knowledge Graphs for the Pharmaceutical Domain. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11779. Springer, 400–417.</li>
			<li id="ref-gerber2015defacto">Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. <a href="https://doi.org/10.1016/j.websem.2015.08.001">DeFacto–temporal and multilingual deep fact validation</a>. <em>Journal of Web Semantics</em> 35, 85–101.</li>
			<li id="ref-gerbracht2008possibilities">Sabrina Gerbracht. 2008. Possibilities to Encrypt an RDF-Graph. In <em>2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications</em>. IEEE Computer Society.</li>
			<li id="ref-Getoor07">Lise Getoor and Ben Taskar (Eds.). 2007. <em>Introduction to Statistical Relational Learning</em>. The MIT Press.</li>
			<li id="ref-giereth2005partial">Mark Giereth. 2005. On Partial Encryption of RDF-Graphs. In <em>The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings</em>, Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark A. Musen (Eds.). Lecture Notes in Computer Science, vol.&nbsp;3729. Springer, 308–322.</li>
			<li id="ref-prov13">Yolanda Gil, Simon Miles, Khalid Belhajjame, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. 2013. <em><a href="https://www.w3.org/TR/2013/NOTE-prov-primer-20130430/">PROV Model Primer</a></em>. W3C Working Group Note. World Wide Web Consortium. April 30, 2013.</li>
			<li id="ref-Gimenez-GarciaZ17">José M. Giménez-García, Antoine Zimmermann, and Pierre Maret. 2017. <a href="https://doi.org/10.1007/978-3-319-58068-5_39">NdFluents: An Ontology for Annotated Statements with Inference Preservation</a>. In <em>The Semantic Web - 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Proceedings, Part I</em>, Eva Blomqvist, Diana Maynard, Aldo Gangemi, Rinke Hoekstra, Pascal Hitzler, and Olaf Hartig (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10249. Springer, 638–654.</li>
			<li id="ref-Glimm11">Birte Glimm. 2011. Using SPARQL with RDFS and OWL Entailment. In <em>Reasoning Web. Semantic Technologies for the Web of Data - 7th International Summer School 2011, Galway, Ireland, August 23-27, 2011, Tutorial Lectures</em>, Axel Polleres, Claudia d'Amato, Marcelo Arenas, Siegfried Handschuh, Paula Kroner, Sascha Ossowski, and Peter F. Patel-Schneider (Eds.). Lecture Notes in Computer Science, vol.&nbsp;6848. Springer, 137–201.</li>
			<li id="ref-GlorotBWB13">Xavier Glorot, Antoine Bordes, Jason Weston, and Yoshua Bengio. 2013. <a href="http://arxiv.org/abs/1301.3485">A Semantic Matching Energy Function for Learning with Multi-relational Data</a>. In <em>1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings</em>, Yoshua Bengio and Yann LeCun (Eds.). OpenReview.net. 4 pages.</li>
			<li id="ref-GoldenS16">Patrick Golden and Ryan B. Shaw. 2016. <a href="https://doi.org/10.7717/peerj-cs.44">Nanopublication beyond the sciences: the PeriodO period gazetteer</a>. <em>PeerJ Computer Science</em> 2, pe44.</li>
			<li id="ref-gomez2006ontological">Asunción Gómez-Pérez, Mariano Fernández-López, and Oscar Corcho. 2006. <em>Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web</em>. Springer.</li>
			<li id="ref-PinterestKG">Rafael S. Gonçalves, Matthew Horridge, Rui Li, Yu Liu, Mark A. Musen, Csongor I. Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. 2019. Use of OWL and Semantic Web Technologies at Pinterest. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11779. Springer, 418–435.</li>
			<li id="ref-GonzalezH18">Larry González and Aidan Hogan. 2018. <a href="https://doi.org/10.1145/3178876.3186016">Modelling Dynamics in Semantic Web Knowledge Graphs with Formal Concept Analysis</a>. In <em>Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018</em>, Pierre-Antoine Champin, Fabien L. Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM Press, 1175–1184.</li>
			<li id="ref-gottschalk2018eventkg">Simon Gottschalk and Elena Demidova. 2018. <a href="https://doi.org/10.1007/978-3-319-93417-4_18">EventKG: A Multilingual Event-Centric Temporal Knowledge Graph</a>. In <em>The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings</em>, Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Troncy Raphaël, Laura Hollink, Anna Tordai, and Mehwish Alam (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10843. Springer, 272–287.</li>
			<li id="ref-guido_heuristics_2013">Guido Governatori, Ho-Pun Lam, Antonino Rotolo, Serena Villata, and Fabien Gandon. 2013. Heuristics for Licenses Composition. In <em>Legal Knowledge and Information Systems - JURIX 2013: The Twenty-Sixth Annual Conference, December 11-13, 2013, University of Bologna, Italy</em>, Kevin D. Ashley (Ed.). Frontiers in Artificial Intelligence and Applications, vol.&nbsp;259. IOS Press, 77–86.</li>
			<li id="ref-Grishman12">Ralph Grishman. 2012. <em>Information Extraction: Capabilities and Challenges</em>. NYU Dept. CS.</li>
			<li id="ref-GrothLGGHP14">Paul T. Groth, Antonis Loizou, Alasdair J. G. Gray, Carole A. Goble, Lee Harland, and Steve Pettifer. 2014. <a href="https://doi.org/10.1016/j.websem.2014.03.003">API-centric Linked Data integration: The Open PHACTS Discovery Platform case study</a>. <em>Journal of Web Semantics</em> 29, 12–18.</li>
			<li id="ref-Gruninger1995">Michael Grüninger and Mark S. Fox. 1995. Methodology for the Design and Evaluation of Ontologies. In <em>Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI-95, Montreal</em>.</li>
			<li id="ref-gruninger1995role">Michael Grüninger and Mark S. Fox. 1995. The role of competency questions in enterprise engineering. In <em>Benchmarking—Theory and practice</em>, Asbjorn Rolstadas (Ed.). Springer, 22–31.</li>
			<li id="ref-GuarinoOS9">Nicola Guarino, Daniel Oberle, and Steffen Staab. 2009. <a href="https://doi.org/10.1007/978-3-540-92673-3">What Is an Ontology?</a>. In <em>Handbook on Ontologies</em>, Steffen Staab and Rudi Studer (Eds.). International Handbooks on Information Systems. Springer, 1–17.</li>
			<li id="ref-GuhaMM03">Ramanathan V. Guha, Rob McCool, and Eric Miller. 2003. Semantic search. In <em>Proceedings of the Twelfth International World Wide Web Conference, WWW2003, Budapest, Hungary, 20-24 May 2003</em>, Gusztáv Hencsey, Bebo White, Yih-Farn Robin Chen, László Kovács, and Steve Lawrence (Eds.). ACM Press, 700–709.</li>
			<li id="ref-GuhaMF04">Ramanathan V. Guha, Rob McCool, and Richard Fikes. 2004. <a href="https://doi.org/10.1007/978-3-540-30475-3_4">Contexts for the Semantic Web</a>. In <em>The Semantic Web - ISWC 2004: Third International Semantic Web Conference, Hiroshima, Japan, November 7-11, 2004. Proceedings</em>, Frank van Harmelen, Sheila McIlraith, and Dimitri Plexousakis (Eds.). Lecture Notes in Computer Science, vol.&nbsp;3298. Springer, 32–46.</li>
			<li id="ref-GuoWWWG16">Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. 2016. <a href="https://doi.org/10.18653/v1/d16-1019">Jointly Embedding Knowledge Graphs and Logical Rules</a>. In <em>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016</em>, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 192–202.</li>
			<li id="ref-GuoWWWG18">Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. 2018. <a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16369">Knowledge Graph Embedding With Iterative Guidance From Soft Rules</a>. In <em>Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018</em>, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 4816–4823.</li>
			<li id="ref-GuptaM14">Sonal Gupta and Christopher D. Manning. 2014. Improved Pattern Learning for Bootstrapped Entity Extraction. In <em>Proceedings of the Eighteenth Conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26-27, 2014</em>, Roser Morante and Wen-tau Yih (Eds.). The Association for Computational Linguistics, 98–108.</li>
			<li id="ref-GutierrezHV07">Claudio Gutiérrez, Carlos A. Hurtado, and Alejandro A. Vaisman. 2007. <a href="https://doi.org/10.1109/TKDE.2007.34">Introducing Time into RDF</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 19(2), 207–218.</li>
			<li id="ref-HaagLSE15a">Florian Haag, Steffen Lohmann, Stephan Siek, and Thomas Ertl. 2015. <a href="https://doi.org/10.1007/978-3-319-25639-9_51">QueryVOWL: A Visual Query Notation for Linked Data</a>. In <em>The Semantic Web: ESWC 2015 Satellite Events - ESWC 2015 Satellite Events Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers</em>, Fabien Gandon, Christophe Guéret, Serena Villata, John G. Breslin, Catherine Faron-Zucker, and Antoine Zimmermann (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9341. Springer, 387–402.</li>
			<li id="ref-pids">Juha Hakala. 2010. <em><a href="http://www.persid.org/downloads/PI-intro-2010-09-22.pdf">Persistent identifiers – an overview</a></em>. PersID Technical Report. September 3, 2010.</li>
			<li id="ref-UberKG">Ferras Hamad, Isaac Liu, and Xian Xing Zhang. 2018. <a href="https://eng.uber.com/uber-eats-query-understanding/">Food Discovery with Uber Eats: Building a Query Understanding Engine</a>. Uber Engineering Blog. june 10, 2018.</li>
			<li id="ref-HamiltonBZJL18">William L. Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, and Jure Leskovec. 2018. <a href="https://proceedings.neurips.cc/paper/2018/hash/ef50c335cca9f340bde656363ebd02fd-Abstract.html">Embedding Logical Queries on Knowledge Graphs</a>. In <em>Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada</em>, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.), 2030–2041.</li>
			<li id="ref-HammondPT17">Tony Hammond, Michele Pasin, and Evangelos Theodoridis. 2017. <a href="http://ceur-ws.org/Vol-1963/paper493.pdf">Data integration and disintegration: Managing Springer Nature SciGraph with SHACL and OWL</a>. In <em>Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 23rd - to - 25th, 2017</em>, Nadeschda Nikitina, Dezhao Song, Achille Fokoue, and Peter Haase (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1963. Sun SITE Central Europe (CEUR). 2 pages.</li>
			<li id="ref-sparql11">Steve Harris, Andy Seaborne, and Eric Prud'hommeaux. 2013. <em><a href="https://www.w3.org/TR/2013/REC-sparql11-query-20130321/">SPARQL 1.1 Query Language</a></em>. W3C Recommendation. World Wide Web Consortium. March 21, 2013.</li>
			<li id="ref-Harth10">Andreas Harth. 2010. <a href="https://doi.org/10.1016/j.websem.2010.08.001">VisiNav: A system for visual search and navigation on web data</a>. <em>Journal of Web Semantics</em> 8(4), 348–354.</li>
			<li id="ref-HartigA16">Olaf Hartig and Carlos Buil-Aranda. 2016. Bindings-Restricted Triple Pattern Fragments. In <em>On the Move to Meaningful Internet Systems: OTM 2016 Conferences - Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24-28, 2016, Proceedings</em>, Christophe Debruyne, Hervé Panetto, Robert Meersman, Tharam S. Dillon, eva Kühn, Declan O'Sullivan, and Claudio Agostino Ardagna (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10033. Springer, 762–779.</li>
			<li id="ref-Hartig14">Olaf Hartig and Bryan Thompson. 2014. <a href="http://arxiv.org/abs/1406.3399">Foundations of an Alternative Approach to Reification in RDF</a>. <em>CoRR</em> abs/1406.3399. 14 pages.</li>
			<li id="ref-HartigBF09">Olaf Hartig, Christian Bizer, and Johann Christoph Freytag. 2009. Executing SPARQL Queries over the Web of Linked Data. In <em>The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings</em>, Abraham Bernstein, David R. Karger, Tom Heath, Lee Feigenbaum, Diana Maynard, Enrico Motta, and Krishnaprasad Thirunarayan (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5823. Springer, 293–309.</li>
			<li id="ref-HartigLP17">Olaf Hartig, Ian Letter, and Jorge Pérez. 2017. A Formal Framework for Comparing Linked Data Fragments. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10587. Springer, 364–382.</li>
			<li id="ref-Hartig17">Olaf Hartig. 2017. <a href="http://ceur-ws.org/Vol-1912/paper12.pdf">Foundations of RDF* and SPARQL* – An Alternative Approach to Statement-Level Metadata in RDF</a>. In <em>Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7-9, 2017</em>, Juan L. Reutter and Divesh Srivastava (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1912. Sun SITE Central Europe (CEUR). 11 pages.</li>
			<li id="ref-rdf11sem">Patrick J. Hayes and Peter F. Patel-Schneider. 2014. <em><a href="https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/">RDF 1.1 Semantics</a></em>. W3C Recommendation. World Wide Web Consortium. February 25, 2014.</li>
			<li id="ref-LinkedInKG">Qi He, Bee-Chung Chen, and Deepak Agarwal. 2016. <a href="https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph">Building The LinkedIn Knowledge Graph</a>. LinkedIn Blog. October 6, 2016.</li>
			<li id="ref-Hearst92">Marti A. Hearst. 1992. <a href="https://www.aclweb.org/anthology/volumes/C92-1/">Automatic Acquisition of Hyponyms from Large Text Corpora</a>. In <em>14th International Conference on Computational Linguistics, COLING 1992, Nantes, France, August 23-28, 1992</em>, 539–545.</li>
			<li id="ref-ldbook">Tom Heath and Christian Bizer. 2011. <em><a href="http://linkeddatabook.com/editions/1.0/">Linked Data: Evolving the Web into a Global Data Space (1st Edition)</a></em>. Synthesis Lectures on the Semantic Web: Theory and Technology, vol.&nbsp;1. Morgan & Claypool.</li>
			<li id="ref-HeathM08a">Tom Heath and Enrico Motta. 2008. <a href="https://doi.org/10.1016/j.websem.2008.09.003">Revyu: Linking reviews and ratings into the Web of Data</a>. <em>Journal of Web Semantics</em> 6(4), 266–273.</li>
			<li id="ref-HeindorfPSE16">Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. 2016. Vandalism Detection in Wikidata. In <em>Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016</em>, Snehasis Mukhopadhyay, ChengXiang Zhai, Elisa Bertino, Fabio Crestani, Javed Mostafa, Jie Tang, Luo Si, Xiaofang Zhou, Yi Chang, Yunyao Li, and Parikshit Sondhi (Eds.). ACM Press, 327–336.</li>
			<li id="ref-HeitmannEtAl2017">Benjamin Heitmann, Felix Hermsen, and Stefan Decker. 2017. <a href="http://ceur-ws.org/Vol-1951/PrivOn2017_paper_3.pdf">k-RDF-Neighbourhood Anonymity: Combining Structural and Attribute-based Anonymisation for Linked Data</a>. In <em>Proceedings of the 5th Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017) co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 22, 2017</em>, Christopher Brewster, Michelle Cheatham, Mathieu d'Aquin, Stefan Decker, and Sabrina Kirrane (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1951. Sun SITE Central Europe (CEUR). 16 pages.</li>
			<li id="ref-HellmannLAB13">Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 2013. Integrating NLP Using Linked Data. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part II</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8219. Springer, 98–113.</li>
			<li id="ref-HelmsB05">Remko Helms and Kees Buijsrogge. 2005. <a href="https://ieeexplore.ieee.org/xpl/conhome/10080/proceeding">Knowledge Network Analysis: A Technique to Analyze Knowledge Management Bottlenecks in Organizations</a>. In <em>16th International Workshop on Database and Expert Systems Applications (DEXA 2005), 22-26 August 2005, Copenhagen, Denmark</em>. IEEE Computer Society, 410–414.</li>
			<li id="ref-HendlerHMT12">James A. Hendler, Jeanne Holm, Chris Musialek, and George Thomas. 2012. <a href="https://doi.org/10.1109/MIS.2012.27">US Government Linked Open Data: Semantic.data.gov</a>. <em>IEEE Intelligent Systems</em> 27(3), 25–31.</li>
			<li id="ref-HensonSTK19">Cory Henson, Stefan Schmid, Anh Tuan Tran, and Antonios Karatzoglou. 2019. <a href="http://ceur-ws.org/Vol-2456/paper84.pdf">Using a Knowledge Graph of Scenes to Enable Search of Autonomous Driving Data</a>. In <em>Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 26-30, 2019</em>, Mari Carmen Suárez-Figueroa, Gong Cheng, Anna Lisa Gentile, Christophe Guéret, C. Maria Keet, and Abraham Bernstein (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2456. Sun SITE Central Europe (CEUR), 313–314.</li>
			<li id="ref-HernandezHK15">Daniel Hernández, Aidan Hogan, and Markus Krötzsch. 2015. <a href="http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf">Reifying RDF: What Works Well With Wikidata?</a>. In <em>Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, USA, October 11, 2015</em>, Thorsten Liebig and Achille Fokoue (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1457. Sun SITE Central Europe (CEUR), 32–47.</li>
			<li id="ref-Hitchcock27">Frank L. Hitchcock. 1927. <a href="https://doi.org/10.1002/sapm192761164">The Expression of a Tensor or a Polyadic as a Sum of Products</a>. <em>Journal of Mathematics and Physics</em> 6(1–4), 164–189.</li>
			<li id="ref-hitzler2018tutorial">Pascal Hitzler and Adila Krisnadhi. 2018. <a href="http://arxiv.org/abs/1808.08433">A Tutorial on Modular Ontology Modeling with Ontology Design Patterns: The Cooking Recipes Ontology</a>. <em>CoRR</em> abs/1808.08433. 22 pages.</li>
			<li id="ref-Hitzler2010">Pascal Hitzler, Markus Krötzsch, and Sebastian Rudolph. 2010. <em><a href="http://www.semantic-web-book.org/">Foundations of Semantic Web Technologies</a></em>. Chapman and Hall/CRC Press.</li>
			<li id="ref-OWL2">Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F. Patel-Schneider, and Sebastian Rudolph. 2012. <em><a href="https://www.w3.org/TR/2012/REC-owl2-primer-20121211/">OWL 2 Web Ontology Language Primer (Second Edition)</a></em>. W3C Recommendation. World Wide Web Consortium. December 11, 2012.</li>
			<li id="ref-HoSGKW18">Vinh Thinh Ho, Daria Stepanova, Mohamed H. Gad-Elrab, Evgeny Kharlamov, and Gerhard Weikum. 2018. <a href="https://doi.org/10.1007/978-3-030-00671-6_5">Rule Learning from Knowledge Graphs Guided by Embedding Models</a>. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 72–90.</li>
			<li id="ref-Hoede95">Cornelis Hoede. 1995. On the ontology of knowledge graphs. In <em>Conceptual Structures: Applications, Implementation and Theory, Third International Conference on Conceptual Structures, ICCS '95, Santa Cruz, California, USA, August 14-18, 1995, Proceedings</em>, Gerard Ellis, Robert Levinson, William Rich, and John F. Sowa (Eds.). Lecture Notes in Computer Science, vol.&nbsp;954. Springer, 308–322.</li>
			<li id="ref-YAGO">Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Edwin Lewis-Kelham, Gerard de Melo, and Gerhard Weikum. 2011. <a href="https://doi.org/10.1145/1963192.1963296">YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages</a>. In <em>Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, March 28 - April 1, 2011 (Companion Volume)</em>, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 229–232.</li>
			<li id="ref-HoffmannZLZW11">Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. In <em>The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA</em>, Dekang Lin, Yuji Matsumoto, and Rada Mihalcea (Eds.). The Association for Computational Linguistics, 541–550.</li>
			<li id="ref-Hogan17">Aidan Hogan. 2017. <a href="https://doi.org/10.1145/3068333">Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes</a>. <em>ACM Transactions on the Web</em> 11(4), 22:1–22:62.</li>
			<li id="ref-Hogan20">Aidan Hogan. 2020. <a href="https://doi.org/10.1007/978-3-030-60067-9_8">Knowledge Graphs: Research Directions</a>. In <em>Reasoning Web. Declarative Artificial Intelligence – 16th International Summer School 2020, Oslo, Norway, June 24–26, 2020, Tutorial Lectures</em>, Marco Manna and Andreas Pieris (Eds.). Lecture Notes in Computer Science, vol.&nbsp;12258. Springer, 223–253.</li>
			<li id="ref-Hogan20a">Aidan Hogan. 2020. <em><a href="https://doi.org/10.1007/978-3-030-51580-5">The Web of Data</a></em>. Springer.</li>
			<li id="ref-Hogan">Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. <a href="http://ceur-ws.org/Vol-628/ldow2010_paper04.pdf">Weaving the Pedantic Web</a>. In <em>Proceedings of the WWW2010 Workshop on Linked Data on the Web, LDOW 2010, Raleigh, USA, April 27, 2010</em>, Christian Bizer, Tom Heath, Tim Berners-Lee, and Michael Hausenblas (Eds.). CEUR Workshop Proceedings, vol.&nbsp;628. Sun SITE Central Europe (CEUR). 10 pages.</li>
			<li id="ref-HoganUHCPD12">Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. 2012. <a href="https://doi.org/10.1016/j.websem.2012.02.001">An empirical survey of Linked Data conformance</a>. <em>Journal of Web Semantics</em> 14, 14–44.</li>
			<li id="ref-HoganZUPD12">Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, and Stefan Decker. 2012. <a href="https://doi.org/10.1016/j.websem.2011.11.002">Scalable and distributed methods for entity matching, consolidation and disambiguation over Linked Data corpora</a>. <em>Journal of Web Semantics</em> 10, 76–110.</li>
			<li id="ref-HoganAMP14">Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. <a href="https://doi.org/10.1016/j.websem.2014.06.004">Everything you always wanted to know about blank nodes</a>. <em>Journal of Web Semantics</em> 27–28, 42–69.</li>
			<li id="ref-HoganRS20">Aidan Hogan, Juan L. Reutter, and Adrián Soto. 2020. <a href="httpd://doi.org/10.1007/978-3-030-62419-4_29">In-Database Graph Analytics with Recursive SPARQL</a>. In <em>The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part I</em>, Jeff Z. Pan, Valentina A. M. Tamma, Claudia d'Amato, Krzysztof Janowicz, Bo Fu, Axel Polleres, Oshani Seneviratne, and Lalana Kagal (Eds.). Lecture Notes in Computer Science, vol.&nbsp;12506. Springer, 511–528.</li>
			<li id="ref-HorrocksP04">Ian Horrocks and Peter F. Patel-Schneider. 2004. <a href="https://doi.org/10.1016/j.websem.2004.06.003">Reducing OWL entailment to description logic satisfiability</a>. <em>Journal of Web Semantics</em> 1(4), 345–357.</li>
			<li id="ref-swrl">Ian Horrocks, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean. 2004. <em><a href="https://www.w3.org/Submission/2004/SUBM-SWRL-20040521/">SWRL: A Semantic Web Rule Language Combining OWL and RuleML</a></em>. W3C Member Submission. May 21, 2004.</li>
			<li id="ref-Hu0YWZ18">Sen Hu, Lei Zou, Jeffrey Xu Yu, Haixun Wang, and Dongyan Zhao. 2018. <a href="https://doi.org/10.1109/TKDE.2017.2766634">Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 30(5), 824–837.</li>
			<li id="ref-HuangZLL19">Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. <a href="https://doi.org/10.1145/3289600.3290956">Knowledge Graph Embedding Based Question Answering</a>. In <em>Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 2019</em>, J. Shane Culpepper, Alistair Moffat, Paul N. Bennett, and Kristina Lerman (Eds.). ACM Press, 105–113.</li>
			<li id="ref-HuntT03a">Andy Hunt and Dave Thomas. 2003. <a href="https://doi.org/10.1109/MS.2003.1196331">The Trip-Packing Dilemma</a>. <em>IEEE Software</em> 20(3), 106–107.</li>
			<li id="ref-HusseinYC18">Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. 2018. <a href="https://doi.org/10.1007/s00778-015-0394-1">Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings</a>. In <em>Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018</em>, Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun Wang (Eds.). ACM Press, 437–446.</li>
			<li id="ref-HutchisonHS17">Dylan Hutchison, Bill Howe, and Dan Suciu. 2017. <a href="https://doi.org/10.1145/3070607.3070608">LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation</a>. In <em>Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2017, Chicago, IL, USA, May 19, 2017</em>, Foto N. Afrati and Jacek Sroka (Eds.). ACM Press, 2:1–2:10.</li>
			<li id="ref-HyvonenMKAKRSTPKVTPFSPLN09">Eero Hyvönen, Eetu Mäkelä, Tomi Kauppinen, Olli Alm, Jussi Kurki, Tuukka Ruotsalo, Katri Seppälä, Joeli Takala, Kimmo Puputti, Heini Kuittinen, Kim Viljanen, Jouni Tuominen, Tuomas Palonen, Matias Frosterus, Reetta Sinkkilä, Panu Paakkarinen, Joonas Laitio, and Katariina Nyberg. 2009. CultureSampo: A National Publication System of Cultural Heritage on the Semantic Web 2.0. In <em>The Semantic Web: Research and Applications, 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31-June 4, 2009, Proceedings</em>, Lora Aroyo, Paolo Traverso, Fabio Ciravegna, Philipp Cimiano, Tom Heath, Eero Hyvönen, Riichiro Mizoguchi, Eyal Oren, Marta Sabou, and Elena Paslaru Bontas Simperl (Eds.), vol.&nbsp;5554. Springer, 851–856.</li>
			<li id="ref-IanaJNBHP19">Andreea Iana, Steffen Jung, Philipp Naeser, Aliaksandr Birukou, Sven Hertling, and Heiko Paulheim. 2019. Building a Conference Recommender System Based on SciGraph and WikiCFP. In <em>Semantic Systems. The Power of AI and Knowledge Graphs - 15th International Conference, SEMANTiCS 2019, Karlsruhe, Germany, September 9-12, 2019, Proceedings</em>, Maribel Acosta, Philippe Cudré-Mauroux, Maria Maleshkova, Tassilo Pellegrini, Harald Sack, and York Sure-Vetter (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11702. Springer, 117–123.</li>
			<li id="ref-odrl">Renato Iannella and Serena Villata. 2018. <em><a href="https://www.w3.org/TR/odrl-model/">ODRL Information Model 2.2</a></em>. W3C Recommendation. World Wide Web Consortium. February 15, 2018.</li>
			<li id="ref-snomed2019">International Health Terminology Standards Development Organisation. 2019. <a href="https://confluence.ihtsdotools.org/display/DOCEG?preview=/71172150/94404969/SNOMED%20CT%20Editorial%20Guide-20190731.pdf">SNOMED CT Editorial Guide</a>. July 31, 2019.</li>
			<li id="ref-IosupHNHPMCCSAT16">Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardt, Hassan Chafi, Mihai Capota, Narayanan Sundaram, Michael J. Anderson, Ilie Gabriel Tanase, Yinglong Xia, Lifeng Nai, and Peter A. Boncz. 2016. <a href="http://www.vldb.org/pvldb/vol9/p1317-iosup.pdf">LDBC Graphalytics: A Benchmark for Large-Scale Graph on Parallel and Distributed Platforms</a>. <em>Proceedings of the VLDB Endowment</em> 9(13), 1317–1328.</li>
			<li id="ref-isele2011efficient">Robert Isele, Anja Jentzsch, and Christian Bizer. 2011. Efficient multidimensional blocking for link discovery without losing recall. In <em>Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011</em>, Amélie Marian and Vasilis Vassalos (Eds.). 6 pages.</li>
			<li id="ref-james">P. James. 1992. Knowledge Graphs. <em>Linguistic Instruments in Knowledge Engineering</em>.</li>
			<li id="ref-JankeS18">Daniel Janke and Steffen Staab. 2018. <a href="https://doi.org/10.1007/978-3-030-00338-8_7">Storing and Querying Semantic Data in the Cloud</a>. In <em>Reasoning Web. Learning, Uncertainty, Streaming, and Scalability - 14th International Summer School 2018, Esch-sur-Alzette, Luxembourg, September 22-26, 2018, Tutorial Lectures</em>, Claudia d'Amato and Martin Theobald (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11078. Springer, 173–222.</li>
			<li id="ref-JankeST18">Daniel Janke, Steffen Staab, and Matthias Thimm. 2018. <a href="https://doi.org/10.1016/j.websem.2018.02.002">Impact analysis of data placement strategies on query efforts in distributed RDF stores</a>. <em>Journal of Web Semantics</em> 50, 21–48.</li>
			<li id="ref-JanowiczHAKV14">Krzysztof Janowicz, Pascal Hitzler, Benjamin Adams, Dave Kolas, and Charles Vardeman. 2014. <a href="https://doi.org/10.3233/SW-140135">Five stars of Linked Data vocabulary use</a>. <em>Semantic Web Journal</em> 5(3), 173–176.</li>
			<li id="ref-Janowicz0RZM18">Krzysztof Janowicz, Bo Yan, Blake Regalia, Rui Zhu, and Gengchen Mai. 2018. <a href="http://ceur-ws.org/Vol-2180/ISWC_2018_Outrageous_Ideas_paper_17.pdf">Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes</a>. In <em>Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018</em>, Marieke van Erp, Medha Atre, Vanessa López, Kavitha Srinivas, and Carolina Fortuna (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2180. Sun SITE Central Europe (CEUR). 5 pages.</li>
			<li id="ref-JayaramGL15">Nandish Jayaram, Sidharth Goyal, and Chengkai Li. 2015. <a href="https://doi.org/10.14778/2824032.2824106">VIIQ: Auto-Suggestion Enabled Visual Interface for Interactive Graph Query Formulation</a>. <em>Proceedings of the VLDB Endowment</em> 8(12), 1940–1943.</li>
			<li id="ref-JayaramKLYE15">Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, and Ramez Elmasri. 2015. <a href="https://doi.org10.1109/TKDE.2015.2426696">Querying Knowledge Graphs by Example Entity Tuples</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 27(10), 2797–2811.</li>
			<li id="ref-TransD">Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. <a href="https://doi.org/10.3115/v1/p15-1067">Knowledge graph embedding via dynamic mapping matrix</a>. In <em>Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on NaturalLanguage Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers</em>. The Association for Computational Linguistics, 687–696.</li>
			<li id="ref-Jiang02">Yun-fei Jiang and Ning Ma. 2002. A Plan Recognition Algorithm Based on Plan Knowledge Graph. <em>Journal of Software</em> 13.</li>
			<li id="ref-JurafskyM18">Dan Jurafsky and James H. Martin. 2019. <em><a href="https://web.stanford.edu/~jurafsky/slp3/">Speech and Language Processing</a></em>.</li>
			<li id="ref-JurgensNavigli14">David Jurgens and Roberto Navigli. 2014. <a href="https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/421">It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation</a>. <em>Transactions of the Association for Computational Linguistics</em> 2, 449–464.</li>
			<li id="ref-KaferAUOH13">Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Patrick O'Byrne, and Aidan Hogan. 2013. Observing Linked Data Dynamics. In <em>The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings</em>, Philipp Cimiano, Óscar Corcho, Valentina Presutti, Laura Hollink, and Sebastian Rudolph (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7882. Springer, 213–227.</li>
			<li id="ref-KaffeePVSCP17">Lucie-Aimée Kaffee, Alessandro Piscopo, Pavlos Vougiouklis, Elena Simperl, Leslie Carr, and Lydia Pintscher. 2017. A Glimpse into Babel: An Analysis of Multilinguality in Wikidata. In <em>Proceedings of the 13th International Symposium on Open Collaboration, OpenSym 2017, Galway, Ireland, August 23-25, 2017</em>, Lorraine Morgan (Ed.). ACM Press, 14:1–14:5.</li>
			<li id="ref-Kamp1981ATheoryOfTruth">Hans Kamp. 1981. A Theory of Truth and Semantic Representation. In <em>Formal Semantics – the Essential Readings</em>, Paul H. Portner and Barbara H. Partee (Eds.). Blackwell, 189–222.</li>
			<li id="ref-abs-1805-05744">Elias Kärle, Umutcan Simsek, Oleksandra Panasiuk, and Dieter Fensel. 2018. <a href="http://arxiv.org/abs/1805.05744">Building an Ecosystem for the Tyrolean Tourism Knowledge Graph</a>. <em>CoRR</em> abs/1805.05744. 8 pages.</li>
			<li id="ref-KasneciSIRW08">Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum. 2008. NAGA: Searching and Ranking Knowledge. In <em>Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico</em>, Gustavo Alonso, José A. Blakeley, and Arbee L. P. Chen (Eds.). IEEE Computer Society, 953–962.</li>
			<li id="ref-kasten2013towards">Andreas Kasten, Ansgar Scherp, Frederik Armknecht, and Matthias Krause. 2013. <a href="http://ceur-ws.org/Vol-1121/privon2013_paper5.pdf">Towards Search on Encrypted Graph Data</a>. In <em>Proceedings of the Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2013) co-located with the 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, October 22, 2013</em>, Stefan Decker, Jim Hendler, and Sabrina Kirrane (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1121. Sun SITE Central Europe (CEUR), 46–57.</li>
			<li id="ref-Kazemi018">Seyed Mehran Kazemi and David Poole. 2018. <a href="https://proceedings.neurips.cc/paper/2018/hash/b2ab001909a8a6f04b51920306046ce5-Abstract.html">SimplE Embedding for Link Prediction in Knowledge Graphs</a>. In <em>Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada</em>, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.), 4289–4300.</li>
			<li id="ref-KazemiGJKSFP19">Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. 2019. <a href="http://arxiv.org/abs/1905.11485">Relational Representation Learning for Dynamic (Knowledge) Graphs: A Survey</a>. <em>CoRR</em> abs/1905.11485.</li>
			<li id="ref-keet2016test">C. Maria Keet. 2016. Test-driven development of ontologies. In <em>The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Proceedings</em>, Harald Sack, Eva Blomqvist, Mathieu d'Aquin, Chiara Ghidini, Simone Paolo Ponzetto, and Christoph Lange (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9678. Springer, 642–657.</li>
			<li id="ref-onteng">C. Maria Keet. 2018. <em>An Introduction to Ontology Engineering</em>. College Publications.</li>
			<li id="ref-KejriwalKS2021">Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekely (Eds.). 2021. <em>Knowledge Graphs: Fundamentals, Techniques, and Applications</em>. The MIT Press.</li>
			<li id="ref-kendall2019ontology">Elisa F. Kendall and Deborah L. McGuinness. 2019. <em>Ontology Engineering</em>. Synthesis Lectures on the Semantic Web: Theory and Technology, vol.&nbsp;9. Morgan & Claypool.</li>
			<li id="ref-rif">Michael Kifer and Harold Boley. 2013. <em><a href="https://www.w3.org/TR/2013/NOTE-rif-overview-20130205/">RIF Overview (Second Edition)</a></em>. W3C Working Group Note. World Wide Web Consortium. February 5, 2013.</li>
			<li id="ref-KipfW17">Thomas N. Kipf and Max Welling. 2017. <a href="https://openreview.net/forum?id=SJU4ayYgl">Semi-Supervised Classification with Graph Convolutional Networks</a>. In <em>5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings</em>. OpenReview.net. 14 pages.</li>
			<li id="ref-Kirrane2013">Sabrina Kirrane, Ahmed Abdelrahman, Alessandra Mileo, and Stefan Decker. 2013. Secure Manipulation of Linked Data. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8218. Springer, 248–263.</li>
			<li id="ref-kirrane2017access">Sabrina Kirrane, Alessandra Mileo, and Stefan Decker. 2017. <a href="https://doi.org/10.3233/SW-160236">Access control and the Resource Description Framework: A survey</a>. <em>Semantic Web Journal</em> 8(2), 311–352.</li>
			<li id="ref-kleinberg1999hubs">Jon M. Kleinberg. 1999. <a href="https://doi.org/10.1145/345966.345982">Hubs, authorities, and communities</a>. <em>ACM Computing Surveys</em> 31(4es), p5.</li>
			<li id="ref-KlueglAP09">Peter Kluegl, Martin Atzmueller, and Frank Puppe. 2009. TextMarker: A Tool for Rule-Based Information Extraction. In <em>UIMA-GSCL Workshop</em>, 233–240.</li>
			<li id="ref-SHACLSpec">Holger Knublauch and Dimitris Kontokostas. 2017. <em><a href="https://www.w3.org/TR/2017/REC-shacl-20170720/">Shapes Constraint Language (SHACL)</a></em>. W3C Recommendation. World Wide Web Consortium. June 20, 2017.</li>
			<li id="ref-spin">Holger Knublauch, James A. Hendler, and Kingsley Idehen. 2011. <em>SPIN – Overview and Motivation</em>. W3C Member Submission. February 22, 2011.</li>
			<li id="ref-KopckeR10">Hanna Köpcke and Erhard Rahm. 2010. <a href="https://doi.org/10.1016/j.datak.2009.10.003">Frameworks for entity matching: A comparison</a>. <em>Data and Knowledge Engineering</em> 69(2), 197–210.</li>
			<li id="ref-AmazonKG">Arun Krishnan. 2018. <a href="https://blog.aboutamazon.com/innovation/making-search-easier">Making search easier: How Amazon's Product Graph is helping customers find products more easily</a>. Amazon Blog. August 17, 2018.</li>
			<li id="ref-KrisnadhiH16">Adila Krisnadhi and Pascal Hitzler. 2016. <a href="https://doi.org/10.3233/978-1-61499-826-6-29">A Core Pattern for Events</a>. In <em>Advances in Ontology Design and Patterns [revised and extended versions of the papers presented at the 7th edition of the Workshop on Ontology and Semantic Web Patterns, WOP@ISWC 2016, Kobe, Japan, 18th October 2016]</em>, Karl Hammar, Pascal Hitzler, Adila Krisnadhi, Agnieszka Lawrynowicz, Andrea Giovanni Nuzzolese, and Monika Solanki (Eds.). Studies on the Semantic Web, vol.&nbsp;32. IOS Press, 29–37.</li>
			<li id="ref-hitzler2016modeling">Adila Krisnadhi and Pascal Hitzler. 2016. Modeling With Ontology Design Patterns: Chess Games As a Worked Example. In <em>Ontology Engineering with Ontology Design Patterns: Foundations and Applications</em>, Pascal Hitzler, Aldo Gangemi, Krysztof Janowicz, Adila Krisnadhi, and Valentina Presutti (Eds.). Studies on the Semantic Web, vol.&nbsp;25. IOS Press, 3–21.</li>
			<li id="ref-KrizhevskySH17">Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. <a href="https://doi.org/10.1145/3065386">ImageNet classification with deep convolutional neural networks</a>. <em>Communications of the ACM</em> 60(6), 84–90.</li>
			<li id="ref-Krotzsch0OT18">Markus Krötzsch, Maximilian Marx, Ana Ozaki, and Veronika Thost. 2018. <a href="https://doi.org/10.24963/ijcai.2018/743">Attributed Description Logics: Reasoning on Knowledge Graphs</a>. In <em>Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden</em>, Jérôme Lang (Ed.). IJCAI/AAAI, 5309–5313.</li>
			<li id="ref-Kummel73">Peter Kümmel. 1973. <a href="https://www.aclweb.org/anthology/C73-2020.pdf">An Algorithm of Limited Syntax Based on Language Universals</a>. In <em>Computational And Mathematical Linguistics: Proceedings of the 5th International Conference on Computational Linguistics, COLING 1973, Pisa, Italy, August 27 - September 1, 1973</em>, Antonio Zampolli and Nicoletta Calzolari (Eds.). The Association for Computational Linguistics, 225–248.</li>
			<li id="ref-Kung82">H. T. Kung. 1982. <a href="https://doi.org/10.1109/MC.1982.1653825">Why Systolic Architectures?</a>. <em>IEEE Computer</em> 15(1), 37–46.</li>
			<li id="ref-Labra2017">Jose Emilio Labra Gayo, Eric Prud'hommeaux, Iovka Boneva, and Dimitris Kontokostas. 2018. <em><a href="https://doi.org/10.2200/s00786ed1v01y201707wbe016">Validating RDF Data</a></em>. Synthesis Lectures on the Semantic Web: Theory and Technology, vol.&nbsp;7. Morgan & Claypool.</li>
			<li id="ref-Labra-Gayo2019">Jose Emilio Labra Gayo, Herminio García-González, Daniel Fernández-Alvarez, and Eric Prud'hommeaux. 2019. <a href="https://doi.org/10.1007/978-3-030-06149-4_6">Challenges in RDF Validation</a>. In <em>Current Trends in Semantic Web Technologies: Theory and Practice</em>, Giner Alor-Hernández, José Luis Sánchez-Cervantes, Alejandro Rodríguez-González, and Rafael Valencia-García (Eds.). Studies in Computational Intelligence. Springer, 121–151.</li>
			<li id="ref-LampleBSKD16">Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In <em>NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016</em>, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.). The Association for Computational Linguistics, 260–270.</li>
			<li id="ref-lao2010relational">Ni Lao and William W. Cohen. 2010. <a href="https://doi.org/10.1007/s10994-010-5205-8">Relational retrieval using a combination of path-constrained random walks</a>. <em>Machine Learning</em> 81(1), 53–67.</li>
			<li id="ref-LehmannFGNSSUBGHLA12">Jens Lehmann, Tim Furche, Giovanni Grasso, Axel-Cyrille Ngonga Ngomo, Christian Schallhart, Andrew Jon Sellers, Christina Unger, Lorenz Bühmann, Daniel Gerber, Konrad Höffner, David Liu, and Sören Auer. 2012. deqa: Deep Web Extraction for Question Answering. In <em>The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part II</em>, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7650. Springer, 131–147.</li>
			<li id="ref-LehmannIJJKMHMK15">Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. <a href="https://doi.org/10.3233/SW-140134">DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia</a>. <em>Semantic Web Journal</em> 6(2), 167–195.</li>
			<li id="ref-LehmbergRMB16">Oliver Lehmberg, Dominique Ritze, Robert Meusel, and Christian Bizer. 2016. A Large Public Corpus of Web Tables containing Time and Context Metadata. In <em>Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016, Companion Volume</em>, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). ACM Press, 75–76.</li>
			<li id="ref-Lei">Yuangui Lei, Victoria Uren, and Enrico Motta. 2007. A framework for evaluating semantic metadata. In <em>Proceedings of the Fourth International Conference on Knowledge Capture</em>, Derek Sleeman and Ken Barker (Eds.). ACM Press, 135–142.</li>
			<li id="ref-lenat1995cyc">Douglas B. Lenat. 1995. <a href="https://doi.org/10.1145/219717.219745">CYC: A large-scale investment in knowledge infrastructure</a>. <em>Communications of the ACM</em> 38(11), 33–38.</li>
			<li id="ref-LeveneP89">Mark Levene and Alexandra Poulovassilis. 1989. The Hypernode Model: A Graph-Theoretic Approach to Integrating Data and Computation. In <em>Workshop on Foundations of Models and Languages for Data and Objects, Aigen, Austria, 25.-29. September 1989</em>, Andreas Heuer (Ed.). Informatik-Berichte des IfI, vol.&nbsp;89-2. Technische Universität Clausthal, 55–77.</li>
			<li id="ref-li2007t">Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In <em>Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007</em>, Rada Chirkova, Asuman Dogac, M. Tamer Özsu, and Timos K. Sellis (Eds.). IEEE Computer Society, 106–115.</li>
			<li id="ref-LimayeSC10">Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. <a href="http://www.vldb.org/pvldb/vldb2010/pvldb_vol3/R118.pdf">Annotating and Searching Web Tables Using Entities, Types and Relationships</a>. <em>Proceedings of the VLDB Endowment</em> 3(1), 1338–1347.</li>
			<li id="ref-LinT17">Zhiyuan Lin and Mahesh Tripunitara. 2017. <a href="https://doi.org/10.1145/3029806.3029827">Graph Automorphism-Based, Semantics-Preserving Security for the Resource Description Framework (RDF)</a>. In <em>Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, CODASPY 2017, Scottsdale, AZ, USA, March 22-24, 2017</em>, Gail-Joon Ahn, Alexander Pretschner, and Gabriel Ghinita (Eds.). ACM Press, 337–348.</li>
			<li id="ref-lin2015learning">Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. <a href="http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571">Learning entity and relation embeddings for knowledge graph completion</a>. In <em>Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA</em>, Blai Bonet and Sven Koenig (Eds.). AAAI Press, 2181–2187.</li>
			<li id="ref-LingW12">Xiao Ling and Daniel S. Weld. 2012. <a href="http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/5152">Fine-Grained Entity Recognition</a>. In <em>Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada</em>, Jörg Hoffmann and Bart Selman (Eds.). AAAI Press, 94–100.</li>
			<li id="ref-LiuT08">Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In <em>Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008</em>, Jason Tsong-Li Wang (Ed.). ACM Press, 93–106.</li>
			<li id="ref-LiuSDK18">Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. <a href="https://doi.org/10.1145/3186727">Graph Summarization Methods and Applications: A Survey</a>. <em>ACM Computing Surveys</em> 51(3), 62:1–62:x34.</li>
			<li id="ref-lloyd2012foundations">John W. Lloyd. 1984. <em>Foundations of logic programming</em>. Springer.</li>
			<li id="ref-LockardDSE18">Colin Lockard, Xin Luna Dong, Prashant Shiralkar, and Arash Einolghozati. 2018. CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web. <em>Proceedings of the VLDB Endowment</em> 11(10), 1084–1096.</li>
			<li id="ref-canon">Dave Longley and Manu Sporny. 2019. <em><a href="http://json-ld.github.io/normalization/spec/">RDF Dataset Normalization, A Standard RDF Dataset Normalization Algorithm</a></em>. W3C Community Group Draft Report. February 27, 2019.</li>
			<li id="ref-LowGKBGH12">Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. <a href="http://vldb.org/pvldb/vol5/p716_yuchenglow_vldb2012.pdf">Distributed GraphLab: A Framework for Machine Learning in the Cloud</a>. <em>Proceedings of the VLDB Endowment</em> 5(8), 716–727.</li>
			<li id="ref-LuBLCG13">Chunliang Lu, Lidong Bing, Wai Lam, Ki Chan, and Yuan Gu. 2013. Web Entity Detection for Semi-structured Text Data Records with Unlabeled Data. <em>International Journal of Computational Linguistics and Applications</em> 4(2), 135–150.</li>
			<li id="ref-LuLS16">Chun Lu, Philippe Laublet, and Milan Stankovic. 2016. <a href="https://doi.org/10.1007/978-3-319-49004-5_27">Travel Attractions Recommendation with Knowledge Graphs</a>. In <em>Knowledge Engineering and Knowledge Management - 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings</em>, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10024. Springer, 416–431.</li>
			<li id="ref-LukasiewiczMS13">Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari. 2013. <a href="http://ceur-ws.org/Vol-1014/paper_6.pdf">Complexity of Inconsistency-Tolerant Query Answering in Datalog+/-</a>. In <em>Informal Proceedings of the 26th International Workshop on Description Logics, Ulm, Germany, July 23 - 26, 2013</em>, Thomas Eiter, Birte Glimm, Yevgeny Kazakov, and Markus Krötzsch (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1014. Sun SITE Central Europe (CEUR), 791–803.</li>
			<li id="ref-LuoHLN15">Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. <a href="https://www.aclweb.org/anthology/volumes/D15-1/">Joint Entity Recognition and Disambiguation</a>. In <em>Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015</em>, Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, 879–888.</li>
			<li id="ref-MachadoR90">Ricardo José Machado and Armando Freitas da Rocha. 1990. The Combinatorial Neural Network: A Connectionist Model for Knowledge Based Systems. In <em>Uncertainty in Knowledge Bases, 3rd International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU '90, Paris, France, July 2-6, 1990, Proceedings</em>, Bernadette Bouchon-Meunier, Ronald R. Yager, and Lotfi A. Zadeh (Eds.). Lecture Notes in Computer Science, vol.&nbsp;521. Springer, 578–587.</li>
			<li id="ref-MadhavanKKGRH08">Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Y. Halevy. 2008. Google's Deep Web crawl. <em>Proceedings of the VLDB Endowment</em> 1(2), 1241–1252.</li>
			<li id="ref-MahdisoltaniBS15">Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2015. <a href="http://cidrdb.org/cidr2015/Papers/CIDR15_Paper1.pdf">YAGO3: A Knowledge Base from Multilingual Wikipedias</a>. In <em>CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings</em>. www.cidrdb.org. 11 pages.</li>
			<li id="ref-MaillotB18">Pierre Maillot and Carlos Bobed. 2018. <a href="https://doi.org/10.1145/3167132.3167342">Measuring structural similarity between RDF graphs</a>. In <em>Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018</em>, Hisham M. Haddad, Roger L. Wainwright, and Richard Chbeir (Eds.). ACM Press, 1960–1967.</li>
			<li id="ref-MalewiczABDHLC10">Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. <a href="https://doi.org/10.1145/1807167.1807184">Pregel: a system for large-scale graph processing</a>. In <em>Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010</em>, Ahmed K. Elmagarmid and Divyakant Agrawal (Eds.). ACM Press, 135–146.</li>
			<li id="ref-malyshev2018getting">Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, and Adrian Bielefeldt. 2018. Getting the most out of Wikidata: Semantic technology usage in Wikipedia’s knowledge graph. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part II</em>, Denny Vrandecic, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11137. Springer, 376–394.</li>
			<li id="ref-MarchiM74">Ezio Marchi and Osvaldo Miguel. 1974. On the structure of the teaching-learning interactive process. <em>International Journal of Game Theory</em> 3, 83–99.</li>
			<li id="ref-Martinez-Rodriguez18">Jose L. Martínez-Rodríguez, Ivan López-Arévalo, and Ana B. Rios-Alvarado. 2018. OpenIE-based approach for Knowledge Graph construction from text. <em>Expert Systems With Applications</em> 113, 339–355.</li>
			<li id="ref-IESW">Jose L. Martínez-Rodríguez, Aidan Hogan, and Ivan Lopez-Arevalo. 2020. <a href="https://doi.org/10.3233/SW-180333">Information Extraction meets the Semantic Web: A Survey</a>. <em>Semantic Web Journal</em> 11(2), 255–335.</li>
			<li id="ref-MaturanaALMH18">Ricardo Alonso Maturana, Elena Alvarado-Cortes, Susana López-Sola, María Ortega Martínez-Losa, and Pablo Hermoso-González. 2018. <a href="https://doi.org/10.1007/978-3-030-03056-8_20">La Rioja Turismo: The Construction and Exploitation of a Queryable Tourism Knowledge Graph</a>. In <em>Current Trends in Web Engineering - ICWE 2018 International Workshops, MATWEP, EnWot, KD-WEB, WEOD, TourismKG, Cáceres, Spain, June 5, 2018, Revised Selected Papers</em>, Cesare Pautasso, Fernando Sánchez-Figueroa, Kari Systä, and Juan Manuel Murillo Rodriguez (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11153. Springer, 213–220.</li>
			<li id="ref-MatuszekCWD06">Cynthia Matuszek, John Cabral, Michael J. Witbrock, and John De Oliveira. 2006. <a href="http://www.aaai.org/Library/Symposia/Spring/ss06-05.php">An Introduction to the Syntax and Content of Cyc</a>. In <em>Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, Papers from the 2006 AAAI Spring Symposium, Technical Report SS-06-05, Stanford, California, USA, March 27-29, 2006</em>. AAAI Press, 44–49.</li>
			<li id="ref-MausamSSBE12">Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. 2012. Open Language Learning for Information Extraction. In <em>Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, July 12-14, 2012, Jeju Island, Korea</em>, Jun'ichi Tsujii, James Henderson, and Marius Pasca (Eds.). The Association for Computational Linguistics, 523–534.</li>
			<li id="ref-Mausam16">Mausam. 2016. Open Information Extraction Systems and Downstream Applications. In <em>Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016</em>, Subbarao Kambhampati (Ed.). IJCAI/AAAI, 4074–4077.</li>
			<li id="ref-NLP-SW">Diana Maynard, Kalina Bontcheva, and Isabelle Augenstein. 2016. <em><a href="https://doi.org/10.2200/S00741ED1V01Y201611WBE015">Natural Language Processing for the Semantic Web</a></em>. Morgan & Claypool.</li>
			<li id="ref-Commonsense">John McCarthy. 1990. <em>Formalizing Common Sense: Papers by John McCarthy</em>. Greenwood Publishing Group.</li>
			<li id="ref-McCarthy93">John McCarthy. 1993. <a href="http://www-formal.stanford.edu/jmc/context3/context3.html">Notes on Formalizing Context</a>. In <em>Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France, August 28 - September 3, 1993</em>, Ruzena Bajcsy (Ed.). Morgan Kaufmann, 555–562.</li>
			<li id="ref-BloombergKG">Edgar Meij. 2019. Understanding News using the Bloomberg Knowledge Graph. Invited talk at the Big Data Innovators Gathering (TheWebConf).</li>
			<li id="ref-mendes2012dbpedia">Pablo N. Mendes, Max Jakob, and Christian Bizer. 2012. DBpedia: A Multilingual Cross-domain Knowledge Base. In <em>Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, May 23-25, 2012</em>, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 1813–1817.</li>
			<li id="ref-MendesMB12">Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. 2012. <a href="https://doi.org/10.1145/2320765.2320803">Sieve: linked data quality assessment and fusion</a>. In <em>Proceedings of the 2012 Joint EDBT/ICDT Workshops, Berlin, Germany, March 30, 2012</em>, Divesh Srivastava and Ismail Ari (Eds.). Journal of the ACM, 116–123.</li>
			<li id="ref-Mihindukulasooriya18">Nandana Mihindukulasooriya, Mohammad Rifat Ahmmad Rashid, Giuseppe Rizzo, Raúl García-Castro, Óscar Corcho, and Marco Torchiano. 2018. <a href="https://doi.org/10.1145/3167132.3167341">RDF shape induction using knowledge base profiling</a>. In <em>Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018</em>, Hisham M. Haddad, Roger L. Wainwright, and Richard Chbeir (Eds.). ACM Press, 1952–1959.</li>
			<li id="ref-mikolov2013efficient">Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. <a href="http://arxiv.org/abs/1301.3781">Efficient estimation of word representations in vector space</a>. In <em>1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings</em>, Yoshua Bengio and Yann LeCun (Eds.). OpenReview.net. 12 pages.</li>
			<li id="ref-MillerF07">George A. Miller and Christiane Fellbaum. 2007. WordNet then and now. <em>Language Resources and Evaluation (LRE)</em> 41(2), 209–214.</li>
			<li id="ref-Miller13">Justin J. Miller. 2013. <a href="https://aisel.aisnet.org/sais2013/24">Graph Database Applications and Concepts with Neo4j</a>. In <em>Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA March 23rd-24th, 2013</em>. AIS eLibrary, 141–147.</li>
			<li id="ref-minsky">Marvin Minsky. 1974. <a href="https://courses.media.mit.edu/2004spring/mas966/Minsky%201974%20Framework%20for%20knowledge.pdf">A Framework for representing knowledge</a>. <em>MIT-AI Memo 306, Santa Monica</em>. 76 pages.</li>
			<li id="ref-MintzBSJ09">Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In <em>ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2-7 August 2009, Singapore</em>, Keh-Yih Su, Jian Su, and Janyce Wiebe (Eds.). The Association for Computational Linguistics, 1003–1011.</li>
			<li id="ref-MitchellCHTYBCM18">Tom M. Mitchell, William W. Cohen, Estevam R. Hruschka Jr., Partha P. Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapandula Nakashole, Emmanouil A. Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard C. Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, and Joel Welling. 2018. Never-ending learning. <em>Communications of the ACM</em> 61(5), 103–115.</li>
			<li id="ref-MontiBMRSB17">Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda, and Michael M. Bronstein. 2017. <a href="https://doi.org/10.1109/CVPR.2017.576">Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs</a>. In <em>2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017</em>. IEEE Computer Society, 5425–5434.</li>
			<li id="ref-Montiel-Ponsoda17">Elena Montiel-Ponsoda, Víctor Rodríguez-Doncel, and Jorge Gracia. 2017. <a href="http://ceur-ws.org/Vol-2049/02paper.pdf">Building the Legal Knowledge Graph for Smart Compliance Services in Multilingual Europe</a>. In <em>Proceedings of the 1st Workshop on Technologies for Regulatory Compliance co-located with the 30th International Conference on Legal Knowledge and Information Systems (JURIX 2017), Luxembourg, December 13, 2017</em>, Víctor Rodríguez-Doncel, Pompeu Casanovas, and Jorge González-Conejero (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2049. Sun SITE Central Europe (CEUR), 15–17.</li>
			<li id="ref-MoreauSPD19">Benjamin Moreau, Patricia Serrano-Alvarado, Matthieu Perrin, and Emmanuel Desmontils. 2019. Modelling the Compatibility of Licenses. In <em>The Semantic Web - 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2-6, 2019, Proceedings</em>, Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J. G. Gray, Vanessa López, Armin Haller, and Karl Hammar (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11503. Springer, 255–269.</li>
			<li id="ref-Moreno-VegaH18">José Moreno-Vega and Aidan Hogan. 2018. <a href="https://doi.org/10.1007/978-3-030-00671-6_18">GraFa: Scalable Faceted Browsing for RDF Graphs</a>. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 301–317.</li>
			<li id="ref-MoroNavigli13">Andrea Moro and Roberto Navigli. 2013. Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm. In <em>IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013</em>, Francesca Rossi (Ed.). IJCAI/AAAI, 2148–2154.</li>
			<li id="ref-Moroetal:14">Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. <em>Transactions of the Association for Computational Linguistics</em> 2, 231–244.</li>
			<li id="ref-MotikSH09">Boris Motik, Rob Shearer, and Ian Horrocks. 2009. <a href="https://doi.org/10.1613/jair.2811">Hypertableau Reasoning for Description Logics</a>. <em>Journal of Artificial Intelligence Research</em> 36, 165–228.</li>
			<li id="ref-key:owl2profiles">Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, and Carsten Lutz. 2012. <em><a href="http://www.w3.org/TR/2012/REC-owl2-profiles-20121211/">OWL 2 Web Ontology Language Profiles (Second Edition)</a></em>. W3C Recommendation. World Wide Web Consortium. December 11, 2012.</li>
			<li id="ref-MulwadFJ13">Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic Message Passing for Generating Linked Data from Tables. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8218. Springer, 363–378.</li>
			<li id="ref-obof">Chris Mungall, Alan Ruttenberg, Ian Horrocks, and David Osumi-Sutherland. 2012. <a href="http://owlcollab.github.io/oboformat/doc/obo-syntax.html">OBO Flat File Format 1.4 Syntax and Semantics</a>. Editor's Draft. May, 2012.</li>
			<li id="ref-MunozPG09">Sergio Muñoz, Jorge Pérez, and Claudio Gutiérrez. 2009. <a href="https://doi.org/10.1016/j.websem.2009.07.003">Simple and Efficient Minimal RDFS</a>. <em>Journal of Web Semantics</em> 7(3), 220–234.</li>
			<li id="ref-MunozHM14">Emir Muñoz, Aidan Hogan, and Alessandra Mileo. 2014. Using Linked Data to mine RDF from Wikipedia's tables. In <em>Seventh ACM International Conference on Web Search and Data Mining, WSDM 2014, New York, NY, USA, February 24-28, 2014</em>, Ben Carterette, Fernando Diaz, Carlos Castillo, and Donald Metzler (Eds.). ACM Press, 533–542.</li>
			<li id="ref-NadeauS07">David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. <em>Lingvisticae Investigationes</em> 30(1), 3–26.</li>
			<li id="ref-NakasholeTW13">Ndapandula Nakashole, Tomasz Tylenda, and Gerhard Weikum. 2013. <a href="https://www.aclweb.org/anthology/volumes/P13-1/">Fine-grained Semantic Typing of Emerging Entities</a>. In <em>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers</em>. The Association for Computational Linguistics, 1488–1497.</li>
			<li id="ref-NarayananS09">Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing Social Networks. In <em>30th IEEE Symposium on Security and Privacy (S&P 2009), 17-20 May 2009, Oakland, California, USA</em>. IEEE Computer Society, 173–187.</li>
			<li id="ref-NavigliPonzetto:12">Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. <em>Artificial Intelligence</em> 193, 217–250.</li>
			<li id="ref-Navigli:09">Roberto Navigli. 2009. Word Sense Disambiguation: A Survey. <em>ACM Computing Surveys</em> 41(2), 1–69.</li>
			<li id="ref-nentwig2017survey">Markus Nentwig, Michael Hartung, Axel-Cyrille Ngonga Ngomo, and Erhard Rahm. 2017. <a href="https://doi.org/10.3233/SW-150210">A survey of current link discovery frameworks</a>. <em>Semantic Web Journal</em> 8(3), 419–436.</li>
			<li id="ref-neumaier2018enabling">Sebastian Neumaier and Axel Polleres. 2019. <a href="https://doi.org/10.1016/j.websem.2018.12.007">Enabling Spatio-Temporal Search in Open Data</a>. <em>Journal of Web Semantics</em> 55, 21–36.</li>
			<li id="ref-NeumaierUPP16">Sebastian Neumaier, Jürgen Umbrich, Josiane Xavier Parreira, and Axel Polleres. 2016. Multi-level Semantic Labelling of Numerical Values. In <em>The Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part I</em>, Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9981. Springer, 428–445.</li>
			<li id="ref-WellsFargoKG">David Newman. 2019. Knowledge Graphs and AI: The Future of Financial Data.</li>
			<li id="ref-NgomoA11">Axel-Cyrille Ngonga Ngomo and Sören Auer. 2011. LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In <em>IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011</em>, Toby Walsh (Ed.). IJCAI/AAAI, 2312–2317.</li>
			<li id="ref-ngonga2013sorry">Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann, and Daniel Gerber. 2013. Sorry, I don't speak SPARQL: translating SPARQL queries into natural language. In <em>22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13-17, 2013</em>, Daniel Schwabe, Virgílio A. F. Almeida, Hartmut Glaser, Ricardo Baeza-Yates, and Sue B. Moon (Eds.). ACM Press, 977–988.</li>
			<li id="ref-minkowski">Axel-Cyrille Ngonga Ngomo. 2012. Link discovery with guaranteed reduction ratio in affine spaces with Minkowski measures. In <em>The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I</em>, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7649. Springer, 378–393.</li>
			<li id="ref-orchid">Axel-Cyrille Ngonga Ngomo. 2013. ORCHID–reduction-ratio-optimal computation of geo-spatial distances for link discovery. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8218. Springer, 395–410.</li>
			<li id="ref-Nguyen14">Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. 2014. <a href="https://doi.org/10.1145/2566486.2567973">Don't Like RDF Reification?: Making Statements About Statements Using Singleton Property</a>. In <em>23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7-11, 2014</em>, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM Press, 759–770.</li>
			<li id="ref-NguyenTW16">Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2016. J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features. <em>Transactions of the Association for Computational Linguistics</em> 4, 215–229.</li>
			<li id="ref-nickel2013tensor">Maximilian Nickel and Volker Tresp. 2013. <a href="https://doi.org/10.1007/978-3-642-40994-3_40">Tensor factorization for multi-relational learning</a>. In <em>Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III</em>, Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Zelezný (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8190. Springer, 617–621.</li>
			<li id="ref-nickel">Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. <em>Proceedings of the IEEE</em> 104(1), 11–33.</li>
			<li id="ref-NickelRP16">Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. 2016. <a href="http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12484">Holographic Embeddings of Knowledge Graphs</a>. In <em>Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA</em>, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 1955–1961.</li>
			<li id="ref-Noy2001">Natalya F. Noy and Deborah L. McGuinness. 2001. <em><a href="https://protege.stanford.edu/publications/ontology_development/ontology101.pdf">Ontology Development 101: A Guide to Creating Your First Ontology</a></em>. Stanford Knowledge Systems Laboratory.</li>
			<li id="ref-NoyGJNPT19">Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. <a href="https://doi.org/10.1145/3329781.3332266">Industry-scale Knowledge Graphs: Lessons and Challenges</a>. <em>ACM Queue</em> 17(2). 20 pages.</li>
			<li id="ref-NurdiatiH08">Sri Nurdiati and Cornelis Hoede. 2012. <a href="https://core.ac.uk/download/pdf/11468596.pdf">25 Years of Development of Knowledge Graph Theory: the Results and the Challenge</a>. Memorandum 1876, University of Twente. September, 2012.</li>
			<li id="ref-AccentureKG">Ekpe Okorafor and Atish Ray. 2019. <a href="https://www.accenture.com/us-en/insights/digital/data-to-knowledge">The path from data to knowledge</a>. Accenture Applied Intelligence Blog. June 19, 2019.</li>
			<li id="ref-page1999pagerank">Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. <em><a href="http://ilpubs.stanford.edu:8090/422/">The PageRank Citation Ranking: Bringing order to the Web</a></em>. Stanford InfoLab. November, 1999.</li>
			<li id="ref-PVGW2017">Jeff Z. Pan, Guido Vetere, José Manuél Gómez-Pérez, and Honghan Wu (Eds.). 2017. <em><a href="https://doi.org/10.1007/978-3-319-45654-6">Exploiting Linked Data and Knowledge Graphs in Large Organisations</a></em>. Springer.</li>
			<li id="ref-panasiuk2018modeling">Oleksandra Panasiuk, Simon Steyskal, Giray Havur, Anna Fensel, and Sabrina Kirrane. 2018. Modeling and Reasoning over Data Licenses. In <em>The Semantic Web: ESWC 2018 Satellite Events - ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018, Revised Selected Papers</em>, Aldo Gangemi, Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Sebastian Rudolph, Maria Maleshkova, Heiko Paulheim, Jeff Z. Pan, and Mehwish Alam (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11155. Springer, 218–222.</li>
			<li id="ref-dpv">Harshvardhan J. Pandit, Axel Polleres, Bert Bos, Rob Brennan, Bud Bruegger, Fajar J. Ekaputra, Javier D. Fernández, Ramisa Gachpaz Hamed, Elmar Kiesling, Mark Lizar, Eva Schlehahn, Simon Steyskal, and Rigo Wenning. 2019. <em><a href="https://www.w3.org/ns/dpv">Data Privacy Vocabulary v0.1</a></em>. Draft Community Group Report. World Wide Web Consortium. November 28, 2019.</li>
			<li id="ref-PapavasileiouFFKC13">Vicky Papavasileiou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, and Vassilis Christophides. 2013. High-level change detection in RDF(S) KBs. <em>ACM Transactions on Database Systems</em> 38(1), 1:1–1:42.</li>
			<li id="ref-ParkKDZF19">Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2019. <a href="https://doi.org/10.1145/3292500.3330855">Estimating Node Importance in Knowledge Graphs Using Graph neural Networks</a>. In <em>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019</em>, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM Press, 596–606.</li>
			<li id="ref-ParkKDZF20">Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. 2020. <a href="https://doi.org/10.1145/3394486.3403093">MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals</a>. In <em>KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020</em>, Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM Press, 503–512.</li>
			<li id="ref-Pasternack2010">Jeff Pasternack and Dan Roth. 2010. <a href="https://www.aclweb.org/anthology/C10-1099/">Knowing What to Believe (when You Already Know Something)</a>. In <em>COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China</em>, Chu-Ren Huang and Dan Jurafsky (Eds.). Tsinghua University Press, 877–885.</li>
			<li id="ref-pasternack2011making">Jeff Pasternack and Dan Roth. 2011. <a href="https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-387">Making Better Informed Trust Decisions with Generalized Fact-Finding</a>. In <em>IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011</em>, Toby Walsh (Ed.). IJCAI/AAAI, 2324–2329.</li>
			<li id="ref-paulheim2013type">Heiko Paulheim and Christian Bizer. 2013. Type inference on noisy RDF data. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part II</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8219. Springer, 510–525.</li>
			<li id="ref-Paulheim17">Heiko Paulheim. 2017. <a href="https://doi.org/10.3233/SW-160218">Knowledge graph refinement: A survey of approaches and evaluation methods</a>. <em>Semantic Web Journal</em> 8(3), 489–508.</li>
			<li id="ref-Paulheim18a">Heiko Paulheim. 2018. <a href="http://ceur-ws.org/Vol-2180/ISWC_2018_Outrageous_Ideas_paper_10.pdf">How much is a Triple? Estimating the Cost of Knowledge Graph Creation</a>. In <em>Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018</em>, Marieke van Erp, Medha Atre, Vanessa López, Kavitha Srinivas, and Carolina Fortuna (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2180. Sun SITE Central Europe (CEUR). 4 pages.</li>
			<li id="ref-PechsiriP10">Chaveevan Pechsiri and Rapepun Piriyakul. 2010. Explanation Knowledge Graph Construction Through Causality Extraction from Texts. <em>Journal of Computer Science and Technology</em> 25(5), 1055–1070.</li>
			<li id="ref-peirce">Charles S. Peirce. 1878. How to Make Our Ideas Clear. <em>Popular Science Monthly</em> 12, 286–302.</li>
			<li id="ref-pellegrini2019DALICC">Tassilo Pellegrini, Giray Havur, Simon Steyskal, Oleksandra Panasiuk, Anna Fensel, Victor Mireles, Thomas Thurner, Axel Polleres, Sabrina Kirrane, and Andrea Schönhofer. 2019. DALICC: A License Management Framework for Digital Assets. In <em>Proceedings of the Internationales Rechtsinformatik Symposion (IRIS)</em>. 10 pages.</li>
			<li id="ref-pellissier2016freebase">Thomas Pellissier Tanon, Denny Vrandeči&cacute;, Sebastian Schaffert, Thomas Steiner, and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In <em>Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016</em>, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). ACM Press, 1419–1428.</li>
			<li id="ref-TanonSRMW17">Thomas Pellissier Tanon, Daria Stepanova, Simon Razniewski, Paramita Mirza, and Gerhard Weikum. 2017. <a href="https://doi.org/10.1007/978-3-319-68288-4_30">Completeness-Aware Rule Learning from Knowledge Graphs</a>. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10587. Springer, 507–525.</li>
			<li id="ref-pennington2014glove">Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. <a href="https://doi.org/10.3115/v1/d14-1162">Glove: Global vectors for word representation</a>. In <em>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL</em>, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). The Association for Computational Linguistics, 1532–1543.</li>
			<li id="ref-PeroniSV17">Silvio Peroni, David M. Shotton, and Fabio Vitali. 2017. One Year of the OpenCitations Corpus – Releasing RDF-Based Scholarly Citation Data into the Public Domain. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10588. Springer, 184–192.</li>
			<li id="ref-peroni2016simplified">Silvio Peroni. 2016. A simplified agile methodology for ontology development. In <em>OWL: - Experiences and Directions - Reasoner Evaluation - 13th International Workshop, OWLED 2016, and 5th International Workshop, ORE 2016, Bologna, Italy, November 20, 2016, Revised Selected Papers</em>, Mauro Dragoni, María Poveda-Villalón, and Ernesto Jiménez-Ruiz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10161. Springer, 55–69.</li>
			<li id="ref-XSD">David Peterson, Shudi Gao, Ashok Malhotra, C. M. Sperberg-McQueen, Henry S. Thompson, and Paul V. Biron. 2012. <em><a href="https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/">W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes</a></em>. W3C Recommendation. World Wide Web Consortium. April 5, 2012.</li>
			<li id="ref-petrucci2016ontology">Giulio Petrucci, Chiara Ghidini, and Marco Rospocher. 2016. Ontology learning in the deep. In <em>Knowledge Engineering and Knowledge Management - 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings</em>, Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10024. Springer, 480–495.</li>
			<li id="ref-PhamPEB15">Minh-Duc Pham, Linnea Passing, Orri Erling, and Peter A. Boncz. 2015. <a href="https://doi.org/10.1145/2736277.2741121">Deriving an Emergent Relational Schema from RDF Data</a>. In <em>Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015</em>, Aldo Gangemi, Stefano Leonardi, and Alessandro Panconesi (Eds.). ACM Press, 864–874.</li>
			<li id="ref-Pinto2009">H. Sofia Pinto, C. Tempich, and Steffen Staab. 2009. <a href="https://doi.org/10.1007/978-3-540-92673-3">Ontology Engineering and Evolution in a Distributed World Using DILIGENT</a>. In <em>Handbook on Ontologies</em>, Steffen Staab and Rudi Studer (Eds.). International Handbooks on Information Systems. Springer, 153–176.</li>
			<li id="ref-PiscopoS18">Alessandro Piscopo and Elena Simperl. 2018. <a href="https://doi.org/10.1145/3274410">Who Models the World?: Collaborative Ontology Creation and User Roles in Wikidata</a>. <em>Proceedings of the ACM on Human-Computer Interaction</em> 2(CSCW), 141:1–141:18.</li>
			<li id="ref-PiscopoKPS17">Alessandro Piscopo, Lucie-Aimée Kaffee, Chris Phethean, and Elena Simperl. 2017. Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References. In <em>The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I</em>, Claudia d'Amato, Miriam Fernández, Valentina A. M. Tamma, Freddy Lécué, Philippe Cudré-Mauroux, Juan F. Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10587. Springer, 542–558.</li>
			<li id="ref-eBayKG">R. J. Pittman, Amit Srivastava, Sanjika Hewavitharana, Ajinkya Kale, and Saab Mansour. 2017. <a href="https://www.ebayinc.com/stories/news/cracking-the-code-on-conversational-commerce/">Cracking the Code on Conversational Commerce</a>. eBay Blog. April 6, 2017.</li>
			<li id="ref-PivkCSGRS07">Aleksander Pivk, Philipp Cimiano, York Sure, Matjaz Gams, Vladislav Rajkovic, and Rudi Studer. 2007. Transforming arbitrary tables into logical form with TARTAR. <em>Data and Knowledge Engineering</em> 60(3), 567–595.</li>
			<li id="ref-Popping03">Roel Popping. 2003. Knowledge Graphs and Network Text Analysis. <em>Social Science Information</em> 42(91), 91–106.</li>
			<li id="ref-PresuttiDGB09">Valentina Presutti, Enrico Daga, Aldo Gangemi, and Eva Blomqvist. 2009. <a href="http://ceur-ws.org/Vol-516/pap21.pdf">eXtreme Design with Content Ontology Design Patterns</a>. In <em>Proceedings of the Workshop on Ontology Patterns (WOP 2009) , collocated with the 8th International Semantic Web Conference ( ISWC-2009 ), Washington D.C., USA, 25 October, 2009</em>, Eva Blomqvist, Kurt Sandkuhl, François Scharffe, and Vojtech Svátek (Eds.). CEUR Workshop Proceedings, vol.&nbsp;516. Sun SITE Central Europe (CEUR). 15 pages.</li>
			<li id="ref-turtle">Eric Prud'hommeaux and Gavin Carothers. 2014. <em><a href="https://www.w3.org/TR/2014/REC-turtle-20140225/">RDF 1.1 Turtle – Terse RDF Triple Language, W3C Recommendation 25 February 2014</a></em>. W3C Recommendation. World Wide Web Consortium. February 25, 2014.</li>
			<li id="ref-Prudhommeaux2014">Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. <a href="https://doi.org/10.1145/2660517.2660523">Shape Expressions: An RDF Validation and Transformation Language</a>. In <em>Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4-5, 2014</em>, Harald Sack, Agata Filipowska, Jens Lehmann, and Sebastian Hellmann (Eds.). ACM Press, 32–40.</li>
			<li id="ref-PujaraMGC13">Jay Pujara, Hui Miao, Lise Getoor, and William W. Cohen. 2013. <a href="https://doi.org/10.1007/978-3-642-41335-3_34">Knowledge Graph Identification</a>. In <em>The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I</em>, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josian Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8218. Springer, 542–557.</li>
			<li id="ref-QiCLWJW19">Guilin Qi, Huajun Chen, Kang Liu, Haofen Wang, Qiu Ji, and Tianxing Wu. 2021. <em><a href="https://www.springer.com/gp/book/9789811081767">Knowledge Graph</a></em>. Database Management & Information Retrieval. Springer.</li>
			<li id="ref-quillian">Ross Quillian. 1963. <em>A notation for representing conceptual information: An application to semantics and mechanical English paraphrasing</em>. Systems Development Corp..</li>
			<li id="ref-RabanserSG17">Stephan Rabanser, Oleksandr Shchur, and Stephan Günnemann. 2017. <a href="http://arxiv.org/abs/1711.10781">Introduction to Tensor Decompositions and their Applications in Machine Learning</a>. <em>CoRR</em> abs/1711.10781. 13 pages.</li>
			<li id="ref-rada1986gradualness">Roy Rada. 1986. Gradualness eases refinement of medical knowledge. <em>Medical Informatics</em> 11(1), 59–73.</li>
			<li id="ref-radulovic2015towards">Filip Radulovic, Raúl García-Castro, and Asunción Gómez-Pérez. 2015. Towards the Anonymisation of RDF Data. In <em>The 27th International Conference on Software Engineering and Knowledge Engineering, SEKE 2015, Wyndham Pittsburgh University Center, Pittsburgh, PA, USA, July 6-8, 2015</em>, Haiping Xu (Ed.). KSI Research Inc. and Knowledge Systems Institute Graduate School, 646–651.</li>
			<li id="ref-RaimondSS09">Yves Raimond, Christopher Sutton, and Mark B. Sandler. 2009. Interlinking Music-Related Data on the Web. <em>IEEE MultiMedia</em> 16(2), 52–63.</li>
			<li id="ref-RaimondFSA14">Yves Raimond, Tristan Ferne, Michael Smethurst, and Gareth Adams. 2014. <a href="https://doi.org/10.1016/j.websem.2014.07.005">The BBC World Service Archive prototype</a>. <em>Journal of Web Semantics</em> 27–28, 2–9.</li>
			<li id="ref-rappaport1988dynamic">Alain T. Rappaport and Albert M. Gouyet. 1988. Dynamic, interactive display system for a knowledge base. US Patent US4752889A. June 21, 1988.</li>
			<li id="ref-RatinovR09">Lev-Arie Ratinov and Dan Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In <em>Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, Boulder, Colorado, USA, June 4-5, 2009</em>, Suzanne Stevenson and Xavier Carreras (Eds.). The Association for Computational Linguistics, 147–155.</li>
			<li id="ref-Reddivari2005">Pavan Reddivari, Tim Finin, and Anupam Joshi. 2005. <a href="https://ebiquity.umbc.edu/_file_directory_/papers/159.pdf">Policy-based access control for an RDF store</a>. In <em>Policy Management for the Web, A workshop held at the 14th International World Wide Web Conference Tuesday 10 May 2005, Chiba Japan</em>, Lalana Kagal, Tim Finin, and James Hendler (Eds.), 78–81.</li>
			<li id="ref-Reiter87">Raymond Reiter. 1987. A Theory of Diagnosis from First Principles. <em>Artificial Intelligence</em> 32(1), 57–95.</li>
			<li id="ref-RenEWTVH15">Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, and Jiawei Han. 2015. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering. In <em>Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015</em>, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM Press, 995–1004.</li>
			<li id="ref-RenWHQVJAH17">Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, and Jiawei Han. 2017. CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. In <em>Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017</em>, Rick Barrett, Rick Cummings, Eugene Agichtein, and Evgeniy Gabrilovich (Eds.). ACM Press, 1015–1024.</li>
			<li id="ref-ReutterSV15">Juan L. Reutter, Adrián Soto, and Domagoj Vrgoc. 2015. <a href="https://doi.org/10.1007/978-3-319-25007-6_2">Recursion in SPARQL</a>. In <em>The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part I</em>, Marcelo Arenas, Óscar Corcho, Elena Paslaru Bontas Simperl, Markus Strohmaier, Mathieu d'Aquin, Kavitha Srinivas, Paul T. Groth, Michel Dumontier, Jeff Heflin, Krishnaprasad Thirunarayan, and Stephen Staab (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9366. Springer, 19–35.</li>
			<li id="ref-RiedelYM10">Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling Relations and Their Mentions without Labeled Text. In <em>Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part III</em>, José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.). Lecture Notes in Computer Science, vol.&nbsp;6323. Springer, 148–163.</li>
			<li id="ref-ristoski2016rdf2vec">Petar Ristoski and Heiko Paulheim. 2016. <a href="https://doi.org/10.1007/978-3-319-46523-4_30">RDF2Vec: RDF Graph Embeddings for Data Mining</a>. In <em>The Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part I</em>, Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9981. Springer, 498–514.</li>
			<li id="ref-ritchens">Richard H. Ritchens. 1956. General program for mechanical translation between any two languages via an algebraic interlingua. <em>Mechanical Translation</em> 3(2), p37.</li>
			<li id="ref-Rizzo2017">Giuseppe Rizzo, Claudia d'Amato, Nicola Fanizzi, and Floriana Esposito. 2017. <a href="https://doi.org/10.1007/978-3-319-58068-5_12">Terminological Cluster Trees for Disjointness Axiom Discovery</a>. In <em>The Semantic Web - 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Proceedings, Part I</em>, Eva Blomqvist, Diana Maynard, Aldo Gangemi, Rinke Hoekstra, Pascal Hitzler, and Olaf Hartig (Eds.). Lecture Notes in Computer Science, vol.&nbsp;10249. Springer, 184–201.</li>
			<li id="ref-RizzoFd20">Giuseppe Rizzo, Nicola Fanizzi, and Claudia d'Amato. 2020. <a href="https://doi.org/10.1016/j.future.2020.02.071">Class expression induction as concept space exploration: From DL-Foil to DL-Focl</a>. <em>Future Gener. Comput. Syst.</em> 108, 256–272.</li>
			<li id="ref-RizzodF21">Giuseppe Rizzo, Claudia d'Amato, and Nicola Fanizzi. 2021. <a href="https://doi.org/10.3233/SW-200391">An unsupervised approach to disjointness learning based on terminological cluster trees</a>. <em>Semantic Web Journal</em> 12(3), 423–447.</li>
			<li id="ref-Rocktaschel017">Tim Rocktäschel and Sebastian Riedel. 2017. <a href="http://papers.nips.cc/paper/6969-end-to-end-differentiable-proving">End-to-end Differentiable Proving</a>. In <em>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA</em>, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 3788–3800.</li>
			<li id="ref-Rodriguez15">Marko A. Rodriguez. 2015. <a href="https://doi.org/10.1145/2815072.2815073">The Gremlin graph traversal machine and language</a>. In <em>Proceedings of the 15th Symposium on Database Programming Languages, Pittsburgh, PA, USA, October 25-30, 2015</em>, James Cheney and Thomas Neumann (Eds.). ACM Press, 1–10.</li>
			<li id="ref-RollerKN18">Stephen Roller, Douwe Kiela, and Maximilian Nickel. 2018. <a href="https://www.aclweb.org/anthology/volumes/P18-2/">Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora</a>. In <em>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers</em>, Iryna Gurevych and Yusuke Miyao (Eds.). The Association for Computational Linguistics, 358–363.</li>
			<li id="ref-RospocherEVFARS16">Marco Rospocher, Marieke van Erp, Piek Vossen, Antske Fokkens, Itziar Aldabe, German Rigau, Aitor Soroa, Thomas Ploeger, and Tessel Bogaard. 2016. Building event-centric knowledge graphs from news. <em>Journal of Web Semantics</em> 37–38, 132–151.</li>
			<li id="ref-rouces2015framebase">Jacobo Rouces, Gerard de Melo, and Katja Hose. 2015. Framebase: Representing $n$-ary relations using semantic frames. In <em>The Semantic Web. Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Portoroz, Slovenia, May 31 - June 4, 2015. Proceedings</em>, Fabien Gandon, Marta Sabou, Harald Sack, Claudia d'Amato, Philippe Cudré-Mauroux, and Antoine Zimmermann (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9088. Springer, 505–521.</li>
			<li id="ref-RudolphKH08">Sebastian Rudolph, Markus Krötzsch, and Pascal Hitzler. 2008. <a href="https://doi.org/10.1007/978-3-540-88564-1_28">Description Logic Reasoning with Decision Diagrams: Compiling SHIQ to Disjunctive Datalog</a>. In <em>The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings</em>, Amit P. Sheth, Steffen Staab, Mike Dean, Massimo Paolucci, Diana Maynard, Timothy W. Finin, and Krishnaprasad Thirunarayan (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5318. Springer, 435–450.</li>
			<li id="ref-RulaPHSM12">Anisa Rula, Matteo Palmonari, Andreas Harth, Steffen Stadtmüller, and Andrea Maurino. 2012. On the Diversity and Availability of Temporal Information in Linked Open Data. In <em>The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I</em>, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7649. Springer, 492–507.</li>
			<li id="ref-RulaPPM14">Anisa Rula, Luca Panziera, Matteo Palmonari, and Andrea Maurino. 2014. <a href="http://ceur-ws.org/Vol-1264/cold2014_RulaPPM.pdf">Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity</a>. In <em>Proceedings of the 5th International Workshop on Consuming Linked Data, COLD 2014 co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014</em>, Olaf Hartig, Aidan Hogan, and Juan F. Sequeda (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1264. Sun SITE Central Europe (CEUR).</li>
			<li id="ref-RulaPRNLME19">Anisa Rula, Matteo Palmonari, Simone Rubinacci, Axel-Cyrille Ngonga Ngomo, Jens Lehmann, Andrea Maurino, and Diego Esteves. 2019. TISCO: Temporal scoping of facts. <em>Journal of Web Semantics</em> 54, 72–86.</li>
			<li id="ref-SaccoP11">Owen Sacco and Alexandre Passant. 2011. <a href="http://ceur-ws.org/Vol-813/ldow2011-paper01.pdf">A Privacy Preference Ontology (PPO) for Linked Data</a>. In <em>WWW2011 Workshop on Linked Data on the Web, Hyderabad, India, March 29, 2011</em>, Christian Bizer, Tom Heath, Tim Berners-Lee, and Michael Hausenblas (Eds.). CEUR Workshop Proceedings, vol.&nbsp;813. Sun SITE Central Europe (CEUR). 5 pages.</li>
			<li id="ref-SadeghianADW19">Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, and Patrick Wang. 2019. <a href="http://papers.nips.cc/paper/9669-drum-end-to-end-differentiable-rule-mining-on-knowledge-graphs">DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs</a>. In <em>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada</em>, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.), 15321–15331.</li>
			<li id="ref-SaezH18">Tomás Sáez and Aidan Hogan. 2018. Automatically Generating Wikipedia Info-boxes from Wikidata. In <em>Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23-27, 2018</em>, Pierre-Antoine Champin, Fabien L. Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM Press, 1823–1830.</li>
			<li id="ref-SaleemAHMN15">Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In <em>The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11-15, 2015, Proceedings, Part II</em>, Marcelo Arenas, Óscar Corcho, Elena Paslaru Bontas Simperl, Markus Strohmaier, Mathieu d'Aquin, Kavitha Srinivas, Paul T. Groth, Michel Dumontier, Jeff Heflin, Krishnaprasad Thirunarayan, and Stephen Staab (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9367. Springer, 261–269.</li>
			<li id="ref-Samadi2016">Mehdi Samadi, Partha Talukdar, Manuela Veloso, and Manuel Blum. 2016. <a href="http://dl.acm.org/citation.cfm?id=3015812.3015845">ClaimEval: Integrated and Flexible Framework for Claim Evaluation Using Credibility of Sources</a>. In <em>Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA</em>, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 222–228.</li>
			<li id="ref-samarati1998protecting">Pierangela Samarati and Latanya Sweeney. 1998. <em><a href="http://www.csl.sri.com/papers/sritr-98-04/">Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression</a></em>. Computer Science Laboratory, SRI International.</li>
			<li id="ref-santipantakis2019stld">Georgios M. Santipantakis, Apostolos Glenis, Christos Doulkeridis, Akrivi Vlachou, and George A. Vouros. 2019. stLD: towards a spatio-temporal link discovery framework. In <em>Proceedings of the International Workshop on Semantic Big Data, SBD@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019</em>, Sven Groppe and Le Gruenwald (Eds.). ACM Press, 4:1–4:6.</li>
			<li id="ref-JrS99">Eugene Santos Jr. and Eugene S. Santos. 1999. A framework for building knowledge-bases under uncertainty. <em>Journal of Experimental & Theoretical Artificial Intelligence</em> 11(2), 265–286.</li>
			<li id="ref-ScarselliGTHM09">Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. <a href="https://doi.org/10.1109/TNN.2008.2005605">The Graph Neural Network Model</a>. <em>IEEE Transactions on Neural Networks</em> 20(1), 61–80.</li>
			<li id="ref-SchmachtenbergBP14">Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. <a href="">Adoption of the Linked Data Best Practices in Different Topical Domains</a>. In <em>The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I</em>, Peter Mika, Tania Tudorache, Abraham Bernstein, Christopher A. Welty, Craig A. Knoblock, Denny Vrandecic, Paul T. Groth, Natasha Fridman Noy, Krzysztof Janowicz, and Carole A. Goble (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8796. Springer, 245–260.</li>
			<li id="ref-Schmidt-SchaussS91">Manfred Schmidt-Schauß and Gert Smolka. 1991. Attributive Concept Descriptions with Complements. <em>Artificial Intelligence</em> 48(1), 1–26.</li>
			<li id="ref-SchneiderS11">Michael Schneider and Geoff Sutcliffe. 2011. <a href="https://doi.org/10.1007/978-3-642-22438-6_35">Reasoning in the OWL 2 Full Ontology Language Using First-Order Automated Theorem Proving</a>. In <em>Automated Deduction - CADE-23 - 23rd International Conference on Automated Deduction, Wroclaw, Poland, July 31 - August 5, 2011. Proceedings</em>, Nikolaj Bjørner and Viorica Sofronie-Stokkermans (Eds.). Lecture Notes in Computer Science, vol.&nbsp;6803. Springer, 461–475.</li>
			<li id="ref-Schneider72">E. W. Schneider. 1973. <em><a href="https://files.eric.ed.gov/fulltext/ED088424.pdf">Course Modularization Applied: The Interface System and Its Implications For Sequence Control and Data Analysis</a></em>. Human Resources Research Organization, Alexandria, VA. November, 1973.</li>
			<li id="ref-SchuetzBNSS20">Christoph Schuetz, Loris Bozzato, Bernd Neumayr, Michael Schrefl, and Luciano Serafini. 2021. <a href="http://www.semantic-web-journal.net/content/knowledge-graph-olap-multidimensional-model-and-query-operations-contextualized-knowledge-0">Knowledge Graph OLAP: A Multidimensional Model and Query Operations for Contextualized Knowledge Graphs</a>. <em>Semantic Web Journal</em>.</li>
			<li id="ref-seifer19">Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, and Steffen Staab. 2019. <a href="https://doi.org/10.1145/3357766.3359541">Empirical study on the usage of graph query languages in open source Java projects</a>. In <em>Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2019, Athens, Greece, October 20-22, 2019</em>, Oscar Nierstrasz, Jeff Gray, and Bruno C. d. S. Oliveira (Eds.). ACM Press, 152–166.</li>
			<li id="ref-SequedaAM12">Juan F. Sequeda, Marcelo Arenas, and Daniel P. Miranker. 2012. On directly mapping relational databases to RDF and OWL. In <em>Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16-20, 2012</em>, Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab (Eds.). ACM Press, 649–658.</li>
			<li id="ref-SequedaAM14">Juan F. Sequeda, Marcelo Arenas, and Daniel P. Miranker. 2014. <a href="https://doi.org/10.1007/978-3-319-11964-9_34">OBDA: Query Rewriting or Materialization? In Practice, Both!</a>. In <em>The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I</em>, Peter Mika, Tania Tudorache, Abraham Bernstein, Christopher A. Welty, Craig A. Knoblock, Denny Vrandecic, Paul T. Groth, Natasha Fridman Noy, Krzysztof Janowicz, and Carole A. Goble (Eds.). Lecture Notes in Computer Science, vol.&nbsp;8796. Springer, 535–551.</li>
			<li id="ref-SequedaBMH19">Juan F. Sequeda, Willard J. Briggs, Daniel P. Miranker, and Wayne P. Heideman. 2019. A Pay-as-you-go Methodology to Design and Build Enterprise Knowledge Graphs from Relational Databases. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11779. Springer, 526–545.</li>
			<li id="ref-SerafiniH12">Luciano Serafini and Martin Homola. 2012. <a href="https://doi.org/10.1016/j.websem.2011.12.003">Contextualized Knowledge Repositories for the Semantic Web</a>. <em>Journal of Web Semantics</em> 12, 64–87.</li>
			<li id="ref-SeufertEBKBW16">Stephan Seufert, Patrick Ernst, Srikanta J. Bedathur, Sarath Kumar Kondreddi, Klaus Berberich, and Gerhard Weikum. 2016. Instant Espresso: Interactive Analysis of Relationships in Knowledge Graphs. In <em>Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016, Companion Volume</em>, Jacqueline Bourdeau, Jim Hendler, Roger Nkambou, Ian Horrocks, and Ben Y. Zhao (Eds.). ACM Press, 251–254.</li>
			<li id="ref-ShadboltO13">Nigel Shadbolt and Kieron O'Hara. 2013. <a href="https://doi.org/10.1109/MIC.2013.72">Linked Data in Government</a>. <em>IEEE Inteternet Computing</em> 17(4), 72–77.</li>
			<li id="ref-SherifN15">Mohamed Ahmed Sherif and Axel-Cyrille Ngonga Ngomo. 2015. Semantic Quran. <em>Semantic Web Journal</em> 6(4), 339–345.</li>
			<li id="ref-SherifN18">Mohamed Ahmed Sherif and Axel-Cyrille Ngonga Ngomo. 2018. <a href="https://doi.org/10.3233/SW-170285">A systematic survey of point set distance measures for link discovery</a>. <em>Semantic Web Journal</em> 9(5), 589–604.</li>
			<li id="ref-shi2016discriminative">Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. <em>Knowledge-based Systems</em> 104, 123–133.</li>
			<li id="ref-shimizu2019modl">Cogan Shimizu, Quinn Hirt, and Pascal Hitzler. 2019. <a href="http://arxiv.org/abs/1904.05405">MODL: A Modular Ontology Design Library</a>. <em>CoRR</em> abs/1904.05405. 12 pages.</li>
			<li id="ref-ShimonyDS97">Solomon Eyal Shimony, Carmel Domshlak, and Eugene Santos Jr.. 1997. Cost-Sharing in Bayesian Knowledge Bases. In <em>UAI '97: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Brown University, Providence, Rhode Island, USA, August 1-3, 1997</em>, Dan Geiger and Prakash P. Shenoy (Eds.). Morgan Kaufmann, 421–428.</li>
			<li id="ref-shiralkar2017finding">Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding streams in knowledge graphs to support fact checking. In <em>2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017</em>, Vijay Raghavan, Srinivas Aluru, George Karypis, Lucio Miele, and Xindong Wu (Eds.). IEEE Computer Society, 859–864.</li>
			<li id="ref-BingKG">Saurabh Shrivastava. 2017. <a href="https://blogs.bing.com/search-quality-insights/2017-07/bring-rich-knowledge-of-people-places-things-and-local-businesses-to-your-apps">Bring rich knowledge of people, places, things and local businesses to your apps</a>. Bing Blogs. July 12, 2017.</li>
			<li id="ref-Silva2017">Rôney Reis C. Silva, Bruno C. Leal, Felipe T. Brito, Vânia M. P. Vidal, and Javam C. Machado. 2017. <a href="https://doi.org/10.1145/3105831.3105838">A Differentially Private Approach for Querying RDF Data of Social Networks</a>. In <em>Proceedings of the 21st International Database Engineering & Applications Symposium, IDEAS 2017, Bristol, United Kingdom, July 12-14, 2017</em>, Bipin C. Desai, Jun Hong, and Richard McClatchey (Eds.). ACM Press, 74–81.</li>
			<li id="ref-GoogleKG">Amit Singhal. 2012. <a href="https://www.blog.google/products/search/introducing-knowledge-graph-things-not/">Introducing the Knowledge Graph: things, not strings</a>. Google Blog. May 16, 2012.</li>
			<li id="ref-Skjaeveland2018">Martin G. Skjæveland, Daniel P. Lupp, Leif Harald Karlsen, and Henrik Forssell. 2018. Practical Ontology Pattern Instantiation, Discovery, and Maintenance with Reasonable Ontology Templates. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 477–494.</li>
			<li id="ref-SleemanF13">Jennifer Sleeman and Tim Finin. 2013. Type Prediction for Efficient Coreference Resolution in Heterogeneous Semantic Graphs. In <em>2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, September 16-18, 2013</em>. IEEE Computer Society, 78–85.</li>
			<li id="ref-SmirnovaC19">Alisa Smirnova and Philippe Cudré-Mauroux. 2019. Relation Extraction Using Distant Supervision: A Survey. <em>ACM Computing Surveys</em> 51(5), 106:1–106:35.</li>
			<li id="ref-socher2013reasoning">Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. <a href="https://proceedings.neurips.cc/paper/2013/hash/b337e84de8752b27eda3a12363109e80-Abstract.html">Reasoning with neural tensor networks for knowledge base completion</a>. In <em>Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States</em>, Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.), 926–934.</li>
			<li id="ref-SouletGMS18">Arnaud Soulet, Arnaud Giacometti, Béatrice Markhoff, and Fabian M. Suchanek. 2018. <a href="https://doi.org/10.1007/978-3-030-00671-6_22">Representativeness of Knowledge Bases with the Generalized Benford's Law</a>. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 374–390.</li>
			<li id="ref-sowa">John Sowa. 1979. <a href="https://www.aclweb.org/anthology/P79-1010/">Semantics of Conceptual Graphs</a>. In <em>17th Annual Meeting of the Association for Computational Linguistics, 29 June - 1 July 1979, University of California at San Diego, La Jolla, CA, USA</em>, Norman K. Sondheimer (Ed.). The Association for Computational Linguistics, 39–44.</li>
			<li id="ref-sowa2">John Sowa. 1987. Semantic Networks. In <em>Encyclopedia of Cognitive Science</em>, Stuart C. Shapiro (Ed.). John Wiley & Sons.</li>
			<li id="ref-SpahiuPPRM16a">Blerina Spahiu, Riccardo Porrini, Matteo Palmonari, Anisa Rula, and Andrea Maurino. 2016. <a href="https://doi.org/10.1007/978-3-319-47602-5_51">ABSTAT: Ontology-Driven Linked Data Summaries with Pattern Minimalization</a>. In <em>The Semantic Web - ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers</em>, Harald Sack, Giuseppe Rizzo, Nadine Steinmetz, Dunja Mladenic, Sören Auer, and Christoph Lange (Eds.). Lecture Notes in Computer Science, vol.&nbsp;9989. Springer, 381–395.</li>
			<li id="ref-SperdutiS97">Alessandro Sperduti and Antonina Starita. 1997. <a href="https://doi.org/10.1109/72.572108">Supervised neural networks for the classification of structures</a>. <em>IEEE Transactions on Neural Networks</em> 8(3), 714–735.</li>
			<li id="ref-jsonld">Manu Sporny, Gregg Kellogg, Markus Lanthaler, Dave Longley, and Niklas Lindström. 2014. <em>JSON-LD 1.0, A JSON-based Serialization for Linked Data, W3C Recommendation 16 January 2014</em>. W3C Recommendation. World Wide Web Consortium. January 16, 2014.</li>
			<li id="ref-SrikanthJ89">Rajan Srikanth and Matthias Jarke. 1989. The Design of Knowledge-Based Systems for Managing Ill-Structured Software Projects. <em>Decision Support Systems</em> 5(4), 425–447.</li>
			<li id="ref-StadlerLHA12">Claus Stadler, Jens Lehmann, Konrad Höffner, and Sören Auer. 2012. <a href="https://doi.org/10.3233/SW-2011-0052">LinkedGeoData: A core for a web of spatial open data</a>. <em>Semantic Web Journal</em> 3(4), 333–354.</li>
			<li id="ref-steyskal2014">Simon Steyskal and Axel Polleres. 2014. Defining expressive access policies for linked data using the ODRL ontology 2.0. In <em>Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, Leipzig, Germany, September 4-5, 2014</em>, Harald Sack, Agata Filipowska, Jens Lehmann, and Sebastian Hellmann (Eds.). ACM Press, 20–23.</li>
			<li id="ref-cbd">Patrick Stickler. 2005. <em><a href="https://www.w3.org/Submission/2005/SUBM-CBD-20050603/">CBD – Concise Bounded Description, W3C Member Submission 3 June 2005</a></em>. W3C Member Submission. June 3, 2005.</li>
			<li id="ref-StoicaFS19">Radu Stoica, George H. L. Fletcher, and Juan F. Sequeda. 2019. <a href="http://ceur-ws.org/Vol-2369/short06.pdf">On Directly Mapping Relational Databases to Property Graphs</a>. In <em>Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asunción, Paraguay, June 3-7, 2019</em>, Aidan Hogan and Tova Milo (Eds.). CEUR Workshop Proceedings, vol.&nbsp;2369. Sun SITE Central Europe (CEUR). 4 pages.</li>
			<li id="ref-stokman1988structuring">Frans N. Stokman and Pieter H. de Vries. 1988. Structuring knowledge in a graph. In <em>Human-Computer Interaction</em>, Gerrit C. van der Veer and Gijsbertus Mulder (Eds.). Springer, 186–206.</li>
			<li id="ref-Straccia09">Umberto Straccia. 2009. <a href="https://doi.org/10.1007/978-3-642-05082-4_12">A Minimal Deductive System for General Fuzzy RDF</a>. In <em>Web Reasoning and Rule Systems, Third International Conference, RR 2009, Chantilly, VA, USA, October 25-26, 2009, Proceedings</em>, Axel Polleres and Terrance Swift (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5837. Springer, 166–181.</li>
			<li id="ref-signalcollect">Philip Stutz, Daniel Strebel, and Abraham Bernstein. 2016. <a href="https://doi.org/10.3233/SW-150176">Signal/Collect12</a>. <em>Semantic Web Journal</em> 7(2), 139–166.</li>
			<li id="ref-suchanek2007yago">Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In <em>Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007</em>, Carey L. Williamson, Mary Ellen Zurko, Peter F. Patel-Schneider, and Prashant J. Shenoy (Eds.). ACM Press, 697–706.</li>
			<li id="ref-suchanek2008yago">Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2008. YAGO: A Large Ontology from Wikipedia and WordNet. <em>Journal of Web Semantics</em> 6(3), 203–217.</li>
			<li id="ref-SuchanekLBW19">Fabian M. Suchanek, Jonathan Lajus, Armand Boschin, and Gerhard Weikum. 2019. <a href="https://doi.org/10.1007/978-3-030-31423-1_4">Knowledge Representation and Rule Mining in Entity-Centric Knowledge Bases</a>. In <em>Reasoning Web. Explainable Artificial Intelligence - 15th International Summer School 2019, Bolzano, Italy, September 20-24, 2019, Tutorial Lectures</em>, Markus Krötzsch and Daria Stepanova (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11810. Springer, 110–152.</li>
			<li id="ref-2012Sun">Yizhou Sun and Jiawei Han. 2012. <em><a href="https://doi.org/10.2200/S00433ED1V01Y201207DMK005">Mining Heterogeneous Information Networks: Principles and Methodologies</a></em>. Synthesis Lectures on Data Mining and Knowledge Discovery, vol.&nbsp;3. Morgan & Claypool.</li>
			<li id="ref-sun2011pathsim">Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. <a href="http://www.vldb.org/pvldb/vol4/p992-sun.pdf">Pathsim: Meta path-based top-k similarity search in heterogeneous information networks</a>. <em>Proceedings of the VLDB Endowment</em> 4(11), 992–1003.</li>
			<li id="ref-SunDNT19">Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. <a href="https://openreview.net/forum?id=HkgEQnRqYQ">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space</a>. In <em>7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019</em>. OpenReview.net.</li>
			<li id="ref-SurdeanuTNM12">Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance Multi-label Learning for Relation Extraction. In <em>Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, July 12-14, 2012, Jeju Island, Korea</em>, Jun'ichi Tsujii, James Henderson, and Marius Pasca (Eds.). The Association for Computational Linguistics, 455–465.</li>
			<li id="ref-syed2018factcheck">Zafar Habeeb Syed, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2018. FactCheck: Validating RDF Triples Using Textual Evidence. In <em>Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 2018</em>, Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun Wang (Eds.). ACM Press, 1599–1602.</li>
			<li id="ref-syed2019copaal">Zafar Habeeb Syed, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2019. Unsupervised Discovery of Corroborative Paths for Fact Validation. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11778. Springer, 630–646.</li>
			<li id="ref-sylvester">James Joseph Sylvester. 1878. Chemistry and Algebra. <em>Nature</em> 17, p284.</li>
			<li id="ref-csvweb">Jeremy Tandy, Ivan Herman, and Gregg Kellogg. 2015. <em>Generating RDF from Tabular Data on the Web, W3C Recommendation 17 December 2015</em>. W3C Recommendation. World Wide Web Consortium. December 17, 2015.</li>
			<li id="ref-csvwmeta">Jeni Tennison and Gregg Kellogg. 2015. <em><a href="https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/">Metadata Vocabulary for Tabular Data, W3C Recommendation 17 December 2015</a></em>. W3C Recommendation. World Wide Web Consortium. December 17, 2015.</li>
			<li id="ref-ThompsonPC14">Bryan B. Thompson, Mike Personick, and Martyn Cutcher. 2014. The Bigdata® RDF Graph Database. In <em>Linked Data Management</em>, Andreas Harth, Katja Hose, and Ralf Schenkel (Eds.). CRC Press, 193–237.</li>
			<li id="ref-ThorntonSSGMPW19">Katherine Thornton, Harold Solbrig, Gregory S. Stupp, José Emilio Labra Gayo, Daniel Mietchen, Eric Prud'hommeaux, and Andra Waagmeester. 2019. <a href="https://doi.org/10.1007/978-3-030-21348-0_39">Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation</a>. In <em>The Semantic Web - 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2-6, 2019, Proceedings</em>, Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J. G. Gray, Vanessa López, Armin Haller, and Karl Hammar (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11503. Springer, 606–620.</li>
			<li id="ref-ThompsonReutersKG">Felice Tobin. 2017. <a href="https://www.thomsonreuters.com/en/press-releases/2017/october/thomson-reuters-launches-first-of-its-kind-knowledge-graph-feed.html">Thomson Reuters Launches first of its kind Knowledge Graph Feed allowing Financial Services customers to accelerate their AI and Digital Strategies</a>. Thomson Reuters Press Release. October 23, 2017.</li>
			<li id="ref-TomaszukASLC19">Dominik Tomaszuk, Renzo Angles, Lukasz Szeremeta, Karol Litman, and Diego Cisterna. 2019. Serialization for Property Graphs. In <em>Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis - 15th International Conference, BDAS 2019, Ustro&nacute;, Poland, May 28-31, 2019, Proceedings</em>, Stanislaw Kozielski, Dariusz Mrozek, Pawel Kasprowski, Bozena Malysiak-Mrozek, and Daniel Kostrzewa (Eds.). Communications in Computer and Information Science, vol.&nbsp;1018. Springer, 57–69.</li>
			<li id="ref-TopperKS12">Gerald Töpper, Magnus Knuth, and Harald Sack. 2012. <a href="https://doi.org/10.1145/2362499.2362505">DBpedia ontology enrichment for inconsistency detection</a>. In <em>I-SEMANTICS 2012 - 8th International Conference on Semantic Systems, I-SEMANTICS '12, Graz, Austria, September 5-7, 2012</em>, Valentina Presutti and Helena Sofia Pinto (Eds.). ACM Press, 33–40.</li>
			<li id="ref-milgram">Jeffrey Travers and Stanley Milgram. 1969. An Experimental Study of the Small World Problem. <em>Sociometry</em> 32(4), 425–443.</li>
			<li id="ref-TrouillonWRGB16">Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. <a href="http://proceedings.mlr.press/v48/trouillon16.html">Complex Embeddings for Simple Link Prediction</a>. In <em>Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016</em>, Maria-Florina Balcan and Kilian Q. Weinberger (Eds.). JMLR Workshop and Conference Proceedings, vol.&nbsp;48. JMLR.org, 2071–2080.</li>
			<li id="ref-tucker64extension">Ledyard R. Tucker. 1964. The extension of factor analysis to three-dimensional matrices. In <em>Contributions to Mathematical Psychology</em>, H. Gulliksen and N. Frederiksen (Eds.). Holt, Rinehart and Winston, 110–127.</li>
			<li id="ref-TummarelloMBE07">Giovanni Tummarello, Christian Morbidoni, Reto Bachmann-Gmür, and Orri Erling. 2007. RDFSync: Efficient Remote Synchronization of RDF Models. In <em>The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007</em>, Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (Eds.). Lecture Notes in Computer Science, vol.&nbsp;4825. Springer, 537–551.</li>
			<li id="ref-UdreaRS10">Octavian Udrea, Diego Reforgiato Recupero, and V. S. Subrahmanian. 2010. <a href="https://doi.org/10.1145/1656242.1656245">Annotated RDF</a>. <em>ACM Transactions on Computational Logics</em> 11(2), 10:1–10:41.</li>
			<li id="ref-UrbaniKMHB12">Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri E. Bal. 2012. <a href="https://doi.org/10.1016/j.websem.2011.05.004">WebPIE: A Web-scale Parallel Inference Engine using MapReduce</a>. <em>Journal of Web Semantics</em> 10, 59–75.</li>
			<li id="ref-VargasAHL19">Hernán Vargas, Carlos Buil Aranda, Aidan Hogan, and Claudia López. 2019. <a href="https://doi.org/10.1007/978-3-030-30793-6_37">RDF Explorer: A Visual SPARQL Query Builder</a>. In <em>The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I</em>, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11778. Springer, 647–663.</li>
			<li id="ref-VashishthJT18">Shikhar Vashishth, Prince Jain, and Partha Talukdar. 2018. <a href="https://doi.org/10.1145/3178876.3186030">CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information</a>. In <em>Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018</em>, Pierre-Antoine Champin, Fabien L. Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM Press, 1317–1327.</li>
			<li id="ref-OntolearnReloaded13">Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. <em>Computational Linguistics</em> 39(3), 665–707.</li>
			<li id="ref-VelickovicCCRLB18">Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. <a href="https://openreview.net/forum?id=rJXMpikCZ">Graph Attention Networks</a>. In <em>6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings</em>. OpenReview.net. 12 pages.</li>
			<li id="ref-VerborghSCCMW14">Ruben Verborgh, Miel Vander Sande, Pieter Colpaert, Sam Coppens, Erik Mannens, and Rik Van de Walle. 2014. <a href="http://ceur-ws.org/Vol-1184/ldow2014_paper_04.pdf">Web-Scale Querying through Linked Data Fragments</a>. In <em>Proceedings of the Workshop on Linked Data on the Web, co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014</em>, Christian Bizer, Tom Heath, Sören Auer, and Tim Berners-Lee (Eds.). CEUR Workshop Proceedings, vol.&nbsp;1184. Sun SITE Central Europe (CEUR). 10 pages.</li>
			<li id="ref-VerborghSHHVMHC16">Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, and Pieter Colpaert. 2016. Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. <em>Journal of Web Semantics</em> 37–38, 184–206.</li>
			<li id="ref-villata2012licenses">Serena Villata and Fabien Gandon. 2012. <a href="http://ceur-ws.org/Vol-905/VillataAndGandon_COLD2012.pdf">Licenses Compatibility and Composition in the Web of Data</a>. In <em>Proceedings of the Third International Workshop on Consuming Linked Data, COLD 2012, Boston, MA, USA, November 12, 2012</em>, Juan F. Sequeda, Andreas Harth, and Olaf Hartig (Eds.). CEUR Workshop Proceedings, vol.&nbsp;905. Sun SITE Central Europe (CEUR). 12 pages.</li>
			<li id="ref-Villata2011">Serena Villata, Nicolas Delaforge, Fabien Gandon, and Amelie Gyrard. 2011. An Access Control Model for Linked Data. In <em>On the Move to Meaningful Internet Systems: OTM 2011 Workshops - Confederated International Workshops and Posters: EI2N+NSF ICE, ICSP+INBAST, ISDE, ORM, OTMA, SWWS+MONET+SeDeS, and VADER 2011, Hersonissos, Crete, Greece, October 17-21, 2011. Proceedings</em>, Robert Meersman, Tharam S. Dillon, and Pilar Herrero (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7046. Springer, 454–463.</li>
			<li id="ref-Volker2015">Johanna Völker, Daniel Fleischhacker, and Heiner Stuckenschmidt. 2015. <a href="https://doi.org/10.1016/j.websem.2015.07.001">Automatic Acquisition of Class Disjointness</a>. <em>Journal of Web Semantics</em> 35(P2), 124–139.</li>
			<li id="ref-silk">Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. 2009. Discovering and Maintaining Links on the Web of Data. In <em>The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings</em>, Abraham Bernstein, David R. Karger, Tom Heath, Lee Feigenbaum, Diana Maynard, Enrico Motta, and Krishnaprasad Thirunarayan (Eds.). Lecture Notes in Computer Science, vol.&nbsp;5823. Springer, 650–665.</li>
			<li id="ref-VrandecicK14">Denny Vrandeči&cacute; and Markus Krötzsch. 2014. <a href="https://doi.org/10.1145/2629489">Wikidata: A Free Collaborative Knowledgebase</a>. <em>Communications of the ACM</em> 57(10), 78–85.</li>
			<li id="ref-WagnerTLHS12">Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer. 2012. <a href="https://doi.org/10.1007/978-3-642-30284-8_11">Top-$k$ Linked Data Query Processing</a>. In <em>The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27-31, 2012. Proceedings</em>, Elena Simperl, Philipp Cimiano, Axel Polleres, Óscar Corcho, and Valentina Presutti (Eds.). Lecture Notes in Computer Science, vol.&nbsp;7295. Springer, 56–71.</li>
			<li id="ref-WagnerGGM16">Claudia Wagner, Eduardo Graells-Garrido, David García, and Filippo Menczer. 2016. Women through the glass ceiling: gender asymmetries in Wikipedia. <em>EPJ Data Science</em> 5(1), p5.</li>
			<li id="ref-wang2014knowledge">Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. <a href="http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531">Knowledge Graph Embedding by Translating on Hyperplanes</a>. In <em>Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada</em>, Carla E. Brodley and Peter Stone (Eds.). AAAI Press, 1112–1119.</li>
			<li id="ref-WangWG15">Quan Wang, Bin Wang, and Li Guo. 2015. <a href="http://ijcai.org/Abstract/15/264">Knowledge Base Completion Using Embeddings and Rules</a>. In <em>Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015</em>, Qiang Yang and Michael J. Wooldridge (Eds.). IJCAI/AAAI, 1859–1866.</li>
			<li id="ref-Wang2017KGEmbedding">Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. <a href="https://doi.org/10.1109/TKDE.2017.2754499">Knowledge Graph Embedding: A Survey of Approaches and Applications</a>. <em>IEEE Transactions on Knowledge and Data Engineering</em> 29(12), 2724–2743.</li>
			<li id="ref-WangWLCZQ18">Meng Wang, Ruijie Wang, Jun Liu, Yihe Chen, Lei Zhang, and Guilin Qi. 2018. <a href="https://doi.org/10.1007/978-3-030-00671-6_30">Towards Empty Answers in SPARQL: Approximating Querying with RDF Embedding</a>. In <em>The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I</em>, Denny Vrandeči&cacute;, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Lecture Notes in Computer Science, vol.&nbsp;11136. Springer, 513–529.</li>
			<li id="ref-WangJSWYCY19">Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. <a href="https://doi.org/10.1145/3308558.3313562">Heterogeneous Graph Attention Network</a>. In <em>The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019</em>, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM Press, 2022–2032.</li>
			<li id="ref-WeikumT10">Gerhard Weikum and Martin Theobald. 2010. From information to knowledge: harvesting entities and relationships from web sources. In <em>Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA</em>, Jan Paredaens and Dirk Van Gucht (Eds.). ACM Press, 65–76.</li>
			<li id="ref-West14">Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, and Dekang Lin. 2014. Knowledge Base Completion via Search-Based Question Answering. In <em>23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7-11, 2014</em>, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM Press, 515–526.</li>
			<li id="ref-wilkinson2016fair">Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. 2016. <a href="https://doi.org/10.1038/sdata.2016.18">The FAIR Guiding Principles for scientific data management and stewardship</a>. <em>Scientific Data</em> 3. 9 pages.</li>
			<li id="ref-woods">William A. Woods. 1975. What's in a Link: Foundations for Semantic Networks. In <em>Representation and Understanding</em>, Daniel G. Bobrow and Allan Collins (Eds.). Studies in Cognitive Science. Elsevier, 35–82.</li>
			<li id="ref-WuHH18">Gong-Qing Wu, Ying He, and Xuegang Hu. 2018. Entity Linking: An Issue to Extract Corresponding Entity With Knowledge Base. <em>IEEE Access</em> 6, 6220–6231.</li>
			<li id="ref-abs-1901-00596">Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. <a href="http://arxiv.org/abs/1901.00596">A Comprehensive Survey on Graph Neural Networks</a>. <em>CoRR</em> abs/1901.00596. 22 pages.</li>
			<li id="ref-WylotHCS18">Marcin Wylot, Manfred Hauswirth, Philippe Cudré-Mauroux, and Sherif Sakr. 2018. <a href="https://doi.org/10.1145/3177850">RDF Data Storage and Query Processing Schemes: A Survey</a>. <em>ACM Computing Surveys</em> 51(4), 84:1–84:36.</li>
			<li id="ref-XiaoCKLPRZ18">Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. 2018. Ontology-Based Data Access: A Survey. In <em>Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden</em>, Jérôme Lang (Ed.). IJCAI/AAAI, 5511–5519.</li>
			<li id="ref-XinGFS13">Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. <a href="http://event.cwi.nl/grades2013/02-xin.pdf">GraphX: a resilient distributed graph system on Spark</a>. In <em>First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013</em>, Peter A. Boncz and Thomas Neumann (Eds.). CWI/ACM, 2:1–2:6.</li>
			<li id="ref-XinRZFSS13">Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013. <a href="https://doi.org/10.1145/2463676.2465288">Shark: SQL and rich analytics at scale</a>. In <em>Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013</em>, Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias (Eds.). ACM Press, 13–24.</li>
			<li id="ref-XuHZG13">Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman. 2013. <a href="https://www.aclweb.org/anthology/volumes/P13-2/">Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction</a>. In <em>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 2: Short Papers</em>. The Association for Computational Linguistics, 665–670.</li>
			<li id="ref-XuHLJ19">Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. <a href="https://openreview.net/forum?id=ryGs6iA5Km">How Powerful are Graph Neural Networks?</a>. In <em>7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019</em>. OpenReview.net. 17 pages.</li>
			<li id="ref-distmult">Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. <a href="http://arxiv.org/abs/1412.6575">Embedding Entities and Relations for Learning and Inference in Knowledge Bases</a>. In <em>3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings</em>, Yoshua Bengio and Yann LeCun (Eds.). 12 pages.</li>
			<li id="ref-YangYC17">Fan Yang, Zhilin Yang, and William W. Cohen. 2017. <a href="http://papers.nips.cc/paper/6826-differentiable-learning-of-logical-rules-for-knowledge-base-reasoning">Differentiable Learning of Logical Rules for Knowledge Base Reasoning</a>. In <em>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA</em>, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 2319–2328.</li>
			<li id="ref-YangXJWHW20">Luwei Yang, Zhibo Xiao, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. <a href="https://doi.org/10.1007/978-3-030-45442-5_53">Dynamic Heterogeneous Graph Embedding Using Hierarchical Attentions</a>. In <em>Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part II</em>, Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.). Lecture Notes in Computer Science, vol.&nbsp;12036. Springer, 425–432.</li>
			<li id="ref-YasseriSRKK12">Taha Yasseri, Robert Sumi, András Rung, András Kornai, and János Kertész. 2012. Dynamics of Conflicts in Wikipedia. <em>PLOS One</em> 7(6). 12 pages.</li>
			<li id="ref-yin2008truth">Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2008. Truth discovery with multiple conflicting information providers on the web. <em>IEEE Transactions on Knowledge and Data Engineering</em> 20(6), 796–808.</li>
			<li id="ref-YogatamaGL15">Dani Yogatama, Daniel Gillick, and Nevena Lazic. 2015. <a href="https://www.aclweb.org/anthology/volumes/P15-2/">Embedding Methods for Fine Grained Entity Type Classification</a>. In <em>Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on NaturalLanguage Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Short Papers</em>. The Association for Computational Linguistics, 291–296.</li>
			<li id="ref-ZaveriRMPLA16">Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2016. <a href="https://doi.org/10.3233/SW-150175">Quality assessment for Linked Data: A Survey</a>. <em>Semantic Web Journal</em> 7(1), 63–93.</li>
			<li id="ref-zhang2016collaborative">Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. <a href="https://doi.org/10.1145/2939672.2939673">Collaborative knowledge base embedding for recommender systems</a>. In <em>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016</em>, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM Press, 353–362.</li>
			<li id="ref-ZhangCHYAL19">Weizhen Zhang, Han Cao, Fei Hao, Lu Yang, Muhib Ahmad, and Yifei Li. 2019. <a href="https://doi.org/10.1007/978-981-32-9244-4_3">The Chinese Knowledge Graph on Domain-Tourism</a>. In <em>Advanced Multimedia and Ubiquitous Engineering, MUE/FutureTech 2019</em>, James J. Park, Laurence T. Yang, Young-Sik Jeong, and Fei Hao (Eds.). Lecture Notes in Electrical Engineering, vol.&nbsp;590. Springer, 20–27.</li>
			<li id="ref-zhang">Lei Zhang. 2002. <em>Knowledge Graph Theory and Structural Parsing</em>. Ph.D. dissertation. University of Twente.</li>
			<li id="ref-zhao2015automatic">Mingbo Zhao, Tommy WS Chow, Zhao Zhang, and Bing Li. 2015. Automatic image annotation via compact graph based semi-supervised learning. <em>Knowledge-based Systems</em> 76, 148–165.</li>
			<li id="ref-ZhengWBHZX17">Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. 2017. <a href="https://www.aclweb.org/anthology/volumes/P1&-1/">Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme</a>. In <em>Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers</em>, Regina Barzilay and Min-Yen Ka (Eds.). The Association for Computational Linguistics, 1227–1236.</li>
			<li id="ref-ZhengYZC18">Weiguo Zheng, Jeffrey Xu Yu, Lei Zou, and Hong Cheng. 2018. <a href="https://doi.org/10.14778/3236187.3236192">Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition</a>. <em>Proceedings of the VLDB Endowment</em> 11(11), 1373–1386.</li>
			<li id="ref-ZhouP08">Bin Zhou and Jian Pei. 2008. Preserving Privacy in Social Networks Against Neighborhood Attacks. In <em>Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico</em>, Gustavo Alonso, José A. Blakeley, and Arbee L. P. Chen (Eds.). IEEE Computer Society, 506–515.</li>
			<li id="ref-ZhouP11">Bin Zhou and Jian Pei. 2011. The <em>k</em>-anonymity and <em>l</em>-diversity approaches for privacy preservation in social networks against neighborhood attacks. <em>Knowledge and Information Systems</em> 28(1), 47–77.</li>
			<li id="ref-ZhouSZZ05">Guodong Zhou, Jian Su, Jie Zhang, and Min Zhang. 2005. Exploring Various Knowledge in Relation Extraction. In <em>ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 25-30 June 2005, University of Michigan, USA</em>, Kevin Knight, Hwee Tou Ng, and Kemal Oflazer (Eds.). The Association for Computational Linguistics, 427–434.</li>
			<li id="ref-zimm-etal-2012-JWS">Antoine Zimmermann, Nuno Lopes, Axel Polleres, and Umberto Straccia. 2012. <a href="https://doi.org/10.1016/j.websem.2011.08.006">A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data</a>. <em>Journal of Web Semantics</em> 12, 72–95.</li>
			<li id="ref-ZouCO09a">Lei Zou, Lei Chen, and M. Tamer Özsu. 2009. K-Automorphism: A General Framework For Privacy Preserving Network Publication. <em>Proceedings of the VLDB Endowment</em> 2(1), 946–957.</li>
		</ul>
	</section>
	<section id="app">
	<section id="chap-defs" class="appendix">
		<h2>Background</h2>
		<p>We now discuss the broader historical context that has paved the way for the modern advent of knowledge graphs, and the definitions of the notion of “knowledge graph” that have been proposed both before and after the announcement of the Google Knowledge Graph&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>]. We remark that the discussion presented here builds upon (but does not subsume) previous discussion by <a href="#ref-EhrlingerW16">Ehrlinger and Wöß [2016]</a> and <a href="#ref-Bergman19">Bergman [2019]</a>, which we refer to for further details. Though our goal is to be comprehensive, the list of historical references should not be considered exhaustive.</p>

		<section id="app-historical" class="sectionapp">
		<h3>Historical Perspective</h3>
		<p>The lineage of knowledge graphs can be traced back to the origins of diagrammatic forms of knowledge representation: a tradition going back at least as far as Aristotle (\(\sim\)350&nbsp;BC), followed by notions such as Euler circles and Venn diagrams that helped humans to reason through visual insights. Centuries later, a variety of researchers – particularly <a href="#ref-sylvester">Sylvester [1878]</a>, <a href="#ref-peirce">Peirce [1878]</a> and <a href="#ref-frege">Frege [1879]</a> – independently devised formal diagrammatic systems that not only facilitate reasoning, but also codify reasoning; in other words, their goal was to use diagrams as formal systems.</p>
		<p>With the advent of digital computers, programs began to be used to perform formal reasoning and to code representations of knowledge. These developments can be traced back to works such as those of <a href="#ref-ritchens">Ritchens [1956]</a>, <a href="#ref-quillian">Quillian [1963]</a>, and <a href="#ref-milgram">Travers and Milgram [1969]</a>, which focused on formal representations for natural language, information, and knowledge. These early works were limited (at least by modern standards) by the poor computational resources available. From the formal (logical) point of view, a number of influential developments took place in the 70's, including the introduction of <em>frames</em> by <a href="#ref-minsky">Minsky [1974]</a>, the formalisation of <em>semantic networks</em> by <a href="#ref-Brachman">Brachman [1977]</a> and <a href="#ref-woods">Woods [1975]</a>, and the proposal of <em>conceptual graphs</em> by <a href="#ref-sowa">Sowa [1979]</a>. These works tried to integrate formal logic with diagrammatic representations of knowledge by giving a (more-or-less) formal semantics to graph representations. But as <a href="#ref-sowa">Sowa [1979]</a>a later wrote in the entry “<em>Semantic networks</em>” of the Encyclopedia of Cognitive Science: “<em>Woods (1975) and McDermott (1976) observed, the semantic networks themselves have no well-defined semantics. Standard predicate calculus does have a precisely defined, model theoretic semantics; it is adequate for describing mathematical theories with a closed set of axioms. But the real world is messy, incompletely explored, and full of unexpected surprises.</em>”</p>
		<p>From this era of exploration and attempts to define programs to simulate the visual and formal reasoning of humans, the following key notions were established that are still of relevance today:</p>
		<ul>
			<li>knowledge representation using diagrams (often graphs) and visual means;</li>
			<li>computational procedures and algorithms to perform formal reasoning;</li>
			<li>combinations of formal (logical) and statistical forms of reasoning;</li>
			<li>relevance of diverse types of data (e.g., images, audio) as knowledge sources.</li>
		</ul>
		<p>These works on conceptual graphs, semantic networks, and frames were direct predecessors of Description Logics, which aimed to give a well-defined semantics to these earlier notions towards building practical reasoning systems for decidable logics. Description Logics stem from the KL-ONE system proposed by <a href="#ref-BrachmanS85">Brachman and Schmolze [1985]</a>, and the “<em>attributive concept descriptions with complements</em>” language (aka \(\mathcal{ALC}\)) proposed by <a href="#ref-Schmidt-SchaussS91">Schmidt-Schauß and Smolka [1991]</a>. Description Logics would be further explored in later years (see Section&nbsp;<a href="#sssec-dls">4.3.2</a>) and formed the underpinnings of the Web Ontology Language (OWL) standard&nbsp;[<a href="#ref-OWL2">Hitzler et al., 2012</a>]. Together with the Resource Description Framework (RDF)&nbsp;[<a href="#ref-rdf11">Cyganiak et al., 2014</a>], OWL would become one of the main building blocks of the Semantic Web&nbsp;[<a href="#ref-berners-lee01">Berners-Lee et al., 2001</a>], within which many of the formative ideas and standards underlying knowledge graphs would later be developed, including not only RDF and OWL, but also RDFS&nbsp;[<a href="#ref-RDFS">Brickley and Guha, 2014</a>], SPARQL&nbsp;[<a href="#ref-sparql11">Harris et al., 2013</a>], Linked Data principles&nbsp;[<a href="#ref-ldprinciples">Berners-Lee, 2006</a>], Shape Expressions&nbsp;[<a href="#ref-RDFS">Brickley and Guha, 2014</a>, <a href="#ref-ThorntonSSGMPW19">Thornton et al., 2019</a>], and indeed, many of the other concepts, standards and techniques discussed in this book. Most of the open knowledge graphs discussed in Section&nbsp;<a href="#sec-openkgs">10.1</a> – including BabelNet&nbsp;[<a href="#ref-NavigliPonzetto:12">Navigli and Ponzetto, 2012</a>], DBpedia&nbsp;[<a href="#ref-LehmannIJJKMHMK15">Lehmann et al., 2015</a>], Freebase&nbsp;[<a href="#ref-bollacker2007platform">Bollacker et al., 2007a</a>], Wikidata&nbsp;[<a href="#ref-VrandecicK14">Vrandečić and Krötzsch, 2014</a>], YAGO&nbsp;[<a href="#ref-suchanek2007yago">Suchanek et al., 2007</a>], etc. – have either emerged from the Semantic Web community, or would later adopt the standards it proposes.</p>
		</section>

		<section id="app-pre2012" class="sectionapp">
		<h3>“Knowledge Graphs”: Pre-2012</h3>
		<p>Long before the 2012 announcement of the Google Knowledge Graph, various authors had used the phrase “knowledge graph” in publications stretching back to the 40’s, but with unrelated meaning. To the best of our knowledge, the first reference to a “knowledge graph” of relevance to the modern meaning was in a paper by <a href="#ref-Schneider72">Schneider [1973]</a> in the area of computerised instructional systems for education, where a knowledge graph – in his case a directed graph whose nodes are units of knowledge (concepts) that a student should acquire, and whose edges denote dependencies between such units of knowledge – is used to represent and store an instructional course on a computer. An analogous notion of a “knowledge graph” was used by <a href="#ref-MarchiM74">Marchi and Miguel [1974]</a> to study paths through the knowledge units of an instructional course that yield the highest payoffs for teachers and students in a game-theoretic sense. Around the same time, in a paper on linguistics, <a href="#ref-Kummel73">Kümmel [1973]</a> describes a numerical representation of knowledge, with “radicals” – referring to some symbol with meaning – forming the nodes of a knowledge graph.</p>
		<p>Further authors were to define instantiations of knowledge graphs in the 80’s. <a href="#ref-rada1986gradualness">Rada [1986]</a> defines a knowledge graph in the context of medical expert systems, where domain knowledge is defined as a weighted graph, over which a “gradual” learning process is applied to refine knowledge by making small change to weights. <a href="#ref-Bakker">Bakker [1987]</a> defines a knowledge graph with the purpose of cumulatively representing content gleaned from medical and sociological texts, with a focus on causal relationships. Work on knowledge graphs from the same group would continue over the years, with contributions by <a href="#ref-stokman1988structuring">Stokman and de Vries [1988]</a> further introducing mereological (<em>part of</em>) and instantiation (<em>is a</em>) relations to the knowledge graph, and thereafter by <a href="#ref-james">James [1992]</a>, <a href="#ref-Hoede95">Hoede [1995]</a>, <a href="#ref-zhang">Zhang [2002]</a>, <a href="#ref-Popping03">Popping [2003]</a>, amongst others, in the decades that followed&nbsp;[<a href="#ref-NurdiatiH08">Nurdiati and Hoede, 2012</a>]. The notion of knowledge graph used in such works considered a fixed number of relations. Other authors pursued their own parallel notions of knowledge graphs towards the end of the 80’s. <a href="#ref-rappaport1988dynamic">Rappaport and Gouyet [1988]</a> describe a user interface for visualising a knowledge-base – composed of facts and rules – using a knowledge graph that connects related elements of the knowledge-base. <a href="#ref-SrikanthJ89">Srikanth and Jarke [1989]</a> use the notion of a knowledge graph to represent the entities and relations involved in projects, particularly software projects, where partitioning techniques are applied to the knowledge graph to modularise the knowledge required in the project.</p>
		<p>Continuing to the 90’s, the notion of a “knowledge graph” would again arise in different, seemingly independent settings. <a href="#ref-de1990hybrid">De Raedt et al. [1990]</a> propose a knowledge graph as a directed graph composed of a taxonomy of instances being related with weighted edges to a taxonomy of classes; they use symbolic learning to extract such knowledge graphs from examples. <a href="#ref-MachadoR90">Machado and Freitas da Rocha [1990]</a> define a knowledge graph as an acyclic, weighted <em>and</em>–<em>or</em> graph,<sup class="fnmark" id="fnm39"><a href="#fn39">39</a></sup><span class="footnote" id="fn39"><sup><a href="#fnm39">note 39</a></sup> An <em>and</em>–<em>or</em> graph denotes dependency relations, where <em>and</em> denotes a conjunction of sub-goals on which a goal depends, while <em>or</em> denotes a disjunction of sub-goals.</span> defining fuzzy dependencies that connect observations to hypotheses through intermediary nodes. These knowledge graphs are elicited from domain experts and can be used to generate neural networks for selecting hypotheses from input observations. Knowledge graphs were again later used by <a href="#ref-DiengGTC92">Dieng et al. [1992]</a> to represent the results of knowledge acquisition from experts. <a href="#ref-ShimonyDS97">Shimony et al. [1997]</a> rather define a knowledge graph based on a <em>Bayesian knowledge base</em> – i.e., a Bayesian network that permits directed cycles – over which Bayesian inference can be applied. This definition was further built upon in a later work by <a href="#ref-JrS99">Santos Jr. and Santos [1999]</a>.</p>
		<p>Moving to the 00’s, <a href="#ref-Jiang02">Jiang and Ma [2002]</a> introduce the notion of “plan knowledge graphs” where nodes represent goals and edges dependencies between goals, further encoding supporting degrees that can change upon further evidence. Search algorithms are then defined on the graph to determine a plan for a particular goal. <a href="#ref-HelmsB05">Helms and Buijsrogge [2005]</a> propose a knowledge graph to represent the flow of knowledge in an organisation, with nodes representing knowledge actors (creators, sharers, users), edges representing knowledge flow from one actor to another, and edge weights indicating the “velocity” (delay of flow) and “viscosity” (the depth of knowledge transferred). Graph algorithms are then proposed to find bottlenecks in knowledge flow. <a href="#ref-KasneciSIRW08">Kasneci et al. [2008]</a> propose a search engine for knowledge graphs, defined to be weighted directed edge-labelled graphs, where weights denote confidence scores based on the centrality of source documents from which the edge/relation was extracted. From the same group, <a href="#ref-ElbassuoniRSSW09">Elbassuoni et al. [2009]</a> adopt a similar notion of a knowledge graph, adding edge attributes to include keywords from the source, a count of supporting sources, etc., showing how the graph can be queried. <a href="#ref-CourseyM09">Coursey and Mihalcea [2009]</a> construct a knowledge graph from Wikipedia, where nodes represent Wikipedia articles and categories, while edges represent the proximity of nodes. Given an input text, entity linking and centrality measures are applied over the knowledge graph to determine relevant Wikipedia categories for the text.</p>
		<p>Concluding with the 10’s (prior to 2012), <a href="#ref-PechsiriP10">Pechsiri and Piriyakul [2010]</a> use knowledge graphs to capture “explanation knowledge” – the knowledge of why something is the way it is – by representing events as nodes and causal relationships as edges, claiming that this graphical notation offers more intuitive explanations to users; their work focuses on extracting such graphs from text. <a href="#ref-CorbyF10">Corby and Faron-Zucker [2010]</a> use the phrase “knowledge graph” in a general way to denote any graph encoding knowledge, proposing an abstract machine for querying such graphs.</p>
		<p>Other phrases were used to represent similar notions by other authors, including “information graphs”&nbsp;[<a href="#ref-Kummel73">Kümmel, 1973</a>], “information networks”&nbsp;[<a href="#ref-sun2011pathsim">Sun et al., 2011</a>], “knowledge networks”&nbsp;[<a href="#ref-ciampaglia2015computational">Ciampaglia et al., 2015</a>], as well as “semantic networks”&nbsp;[<a href="#ref-Brachman">Brachman, 1977</a>, <a href="#ref-woods">Woods, 1975</a>, <a href="#ref-NavigliPonzetto:12">Navigli and Ponzetto, 2012</a>] and “conceptual graphs”&nbsp;[<a href="#ref-sowa">Sowa, 1979</a>], as mentioned previously. Here we exclusively considered works that (happen to) use the phrase “knowledge graph” prior to Google’s announcement of their knowledge graph in 2012, where we see that many works had independently coined this phrase for different purposes. Similar to the current practice, all of the works of this period consider a knowledge graph to be formed of a set of nodes denoting entities of interest and a set of edges denoting relations between those entities, with different entities and relations being considered in different works. Some works add extra elements to these knowledge graphs, such as edge weights, edge labels, or other metadata&nbsp;[<a href="#ref-ElbassuoniRSSW09">Elbassuoni et al., 2009</a>]. Other trends include knowledge acquisition from experts&nbsp;[<a href="#ref-rada1986gradualness">Rada, 1986</a>, <a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>, <a href="#ref-DiengGTC92">Dieng et al., 1992</a>] and knowledge extraction from text&nbsp;[<a href="#ref-Bakker">Bakker, 1987</a>, <a href="#ref-stokman1988structuring">Stokman and de Vries, 1988</a>, <a href="#ref-james">James, 1992</a>, <a href="#ref-Hoede95">Hoede, 1995</a>], combinations of symbolic and inductive methods&nbsp;[<a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>, <a href="#ref-de1990hybrid">De Raedt et al., 1990</a>, <a href="#ref-ShimonyDS97">Shimony et al., 1997</a>, <a href="#ref-JrS99">Santos Jr. and Santos, 1999</a>], as well as the use of rules&nbsp;[<a href="#ref-rappaport1988dynamic">Rappaport and Gouyet, 1988</a>], ontologies&nbsp;[<a href="#ref-Hoede95">Hoede, 1995</a>], graph analytics&nbsp;[<a href="#ref-SrikanthJ89">Srikanth and Jarke, 1989</a>, <a href="#ref-HelmsB05">Helms and Buijsrogge, 2005</a>, <a href="#ref-KasneciSIRW08">Kasneci et al., 2008</a>], learning&nbsp;[<a href="#ref-rada1986gradualness">Rada, 1986</a>, <a href="#ref-de1990hybrid">De Raedt et al., 1990</a>, <a href="#ref-ShimonyDS97">Shimony et al., 1997</a>, <a href="#ref-JrS99">Santos Jr. and Santos, 1999</a>], amongst other techniques. Later papers (2008–2010) by <a href="#ref-KasneciSIRW08">Kasneci et al. [2008]</a>, <a href="#ref-ElbassuoniRSSW09">Elbassuoni et al. [2009]</a>, <a href="#ref-CourseyM09">Coursey and Mihalcea [2009]</a> and <a href="#ref-CorbyF10">Corby and Faron-Zucker [2010]</a> introduce notions of “knowledge graph” that are more similar to the current practice.</p>
		<p>However, some trends are not reflected in current practice. Of note is that many of the knowledge graphs defined in this period consider edges as denoting a form of dependence or causality, where <span class="gnode">\(x\)</span><img class="tip" src="images/edge-source.png" width="8" alt="arrow source"/><img class="tip" src="images/edge-tip.png" width="15" alt="arrow tip rightward"/><span class="gnode">\(y\)</span> may denote that \(x\) is a prerequisite for \(y\)&nbsp;[<a href="#ref-Schneider72">Schneider, 1973</a>, <a href="#ref-MarchiM74">Marchi and Miguel, 1974</a>, <a href="#ref-Jiang02">Jiang and Ma, 2002</a>] or that \(x\) leads to \(y\)&nbsp;[<a href="#ref-rada1986gradualness">Rada, 1986</a>, <a href="#ref-Bakker">Bakker, 1987</a>, <a href="#ref-rappaport1988dynamic">Rappaport and Gouyet, 1988</a>, <a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>, <a href="#ref-ShimonyDS97">Shimony et al., 1997</a>, <a href="#ref-Jiang02">Jiang and Ma, 2002</a>]. In some cases <em>and</em>–<em>or</em> graphs are used to denote conjunctions or disjunctions of such relations&nbsp;[<a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>], while in other cases edges are weighted to assign a belief to a relation&nbsp;[<a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>, <a href="#ref-Jiang02">Jiang and Ma, 2002</a>, <a href="#ref-rada1986gradualness">Rada, 1986</a>]. Papers from 1970–2000 tend to have worked with small graphs, which contrasts with modern practice where knowledge graphs can reach scales of millions or billions of nodes&nbsp;[<a href="#ref-NoyGJNPT19">Noy et al., 2019</a>]: during this period, computational resources were more limited&nbsp;[<a href="#ref-Schneider72">Schneider, 1973</a>], and fewer sources of structured data were readily available, meaning that the knowledge graphs were often sourced solely from human experts&nbsp;[<a href="#ref-rada1986gradualness">Rada, 1986</a>, <a href="#ref-MachadoR90">Machado and Freitas da Rocha, 1990</a>, <a href="#ref-DiengGTC92">Dieng et al., 1992</a>] or from text&nbsp;[<a href="#ref-Bakker">Bakker, 1987</a>, <a href="#ref-stokman1988structuring">Stokman and de Vries, 1988</a>, <a href="#ref-james">James, 1992</a>, <a href="#ref-Hoede95">Hoede, 1995</a>].</p>
		</section>	

		<section id="app-post2012" class="sectionapp">
		<h3>“Knowledge Graphs”: 2012 Onwards</h3>
		<p>Google Knowledge Graph was announced in 2012&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>]. This initial announcement was targeted at a broad audience, mainly motivating the knowledge graph and describing applications that it would enable, where the knowledge graph itself is described as “<em>[a graph] that understands real-world entities and their relationships to one another</em>”&nbsp;[<a href="#ref-GoogleKG">Singhal, 2012</a>]. Mentions of “knowledge graphs” gained momentum in the research literature from that point. As noted by <a href="#ref-Bergman19">Bergman [2019]</a>, this announcement by Google was a watershed moment for adopting the phrase “knowledge graph”. However, given the informal nature of the announcement, a technical definition was lacking&nbsp;[<a href="#ref-EhrlingerW16">Ehrlinger and Wöß, 2016</a>, <a href="#ref-BonattiDPP18">Bonatti et al., 2018</a>].</p>
		<p>Given that knowledge graphs were gaining more and more attention not only in practice, but also in the academic literature, formal definitions were becoming a necessity in order to precisely characterise what they were, how they were structured, how they could be used, etc., and more generally to facilitate their study in a precise manner. We can determine four general categories of definitions that have emerged.</p>
		<ul id="def-categories">
			<li>The first category simply defines the knowledge graph as a graph where nodes represent entities, and edges represent relationships between those entities. Often a directed edge-labelled graph is assumed (or analogously, a set of named binary relations, or a set of triples). This simple definition was popularised by seminal papers on knowledge graph embeddings&nbsp;[<a href="#ref-wang2014knowledge">Wang et al., 2014</a>, <a href="#ref-lin2015learning">Lin et al., 2015</a>], being sufficient to represent the data model upon which these embeddings operate. As reflected in the later survey by <a href="#ref-Wang2017KGEmbedding">Wang et al. [2017]</a>, the multitude of works that would follow on knowledge graph embeddings have continued to use this definition. Though simple, the <em>Category I</em> definition raised some doubts: How is a knowledge graph different from a graph (database)? Where does knowledge come into play?</li>
			<li>A second common definition goes as follows: “<em>a knowledge graph is a graph-structured knowledge base</em>”. To the best of our knowledge, the earliest usages of this definition in the literature were by <a href="#ref-nickel">Nickel et al. [2016a]</a> and <a href="#ref-SeufertEBKBW16">Seufert et al. [2016]</a> (interestingly in the formal notation of these initial papers, a knowledge graph is defined analogously to a directed edge-labelled graph). Such a definition raises the question: what, then, is a “knowledge base”? The phrase “knowledge base” was popularised in the 70’s (possibly earlier) in the context of rule-based expert systems&nbsp;[<a href="#ref-BuchananF78">Buchanan and Feigenbaum, 1978</a>], and later were used in the context of ontologies and other logical formalisms&nbsp;[<a href="#ref-BrachmanS85">Brachman and Schmolze, 1985</a>]. The follow-up question then is: can we have a knowledge base (graph-structured or not) without a logical formalism while staying true to the original definitions? Looking in further detail, similar ambiguities have also existed regarding the definition of a “knowledge base” (KB). Of note: <a href="#ref-BrachmanL85">Brachman and Levesque [1986]</a> – reporting after a workshop on this issue – state that “<em>if we ask what the KB tells us about the world, we are asking about its Knowledge Level</em>”.</li>
			<li>The third category of definitions outline more specific technical characteristics that a “knowledge graph” should comply with.<ul>
				<li>In an influential survey on knowledge graph refinement, <a href="#ref-Paulheim17">Paulheim [2017]</a> lists four criteria that characterise the knowledge graphs considered for the paper. Specifically, he puts forward that a knowledge graph “<em>mainly describes real world entities and their interrelations, organized in a graph; defines possible classes and relations of entities in a schema; allows for potentially interrelating arbitrary entities with each other; covers various topical domains</em>”; he thus rules out ontologies without instances (e.g., DOLCE) and graphs of word senses (e.g., WordNet) as not meeting the first two criteria, while relational databases do not meet the third criterion (due to schema restrictions), and domain-specific graphs (e.g., Geonames) are considered to not meet the fourth criterion; this leaves graphs such as DBpedia, YAGO, Freebase, etc.</li>
				<li><a href="#ref-EhrlingerW16">Ehrlinger and Wöß [2016]</a> also review definitions of “knowledge graph”, where they criticise the <em>Category II</em> definitions based on the argument that knowledge bases are often synonymous with ontologies<sup class="fnmark" id="fnm40"><a href="#fn40">40</a></sup><span class="footnote" id="fn40"><sup><a href="#fnm40">note 40</a></sup> Prior definitions of an ontology – such as by <a href="#ref-GuarinoOS9">Guarino et al. [2009]</a> – would seem to contradict this conclusion.</span>, while knowledge graphs are not; they further criticise Google for calling its knowledge graph a “knowledge base”. After reviewing prior definitions of terms such as “knowledge base”, “ontology”, and “knowledge graph”, they propose their definition: “<em>A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge</em>”. In the subsequent discussion, they remark that a knowledge graph is distinguished from an ontology (considered synonymous with a knowledge base) by the provision of reasoning capabilities.</li>
				<li>One of the most detailed technical definitions for a “knowledge graph” is provided by <a href="#ref-BellomariniFGS19">Bellomarini et al. [2019]</a>, who state: “<em>A knowledge graph is a semi-structured data model characterized by three components: (i) a ground extensional component, that is, a set of relational constructs for schema and data (which can be effectively modeled as graphs or generalizations thereof); (ii) an intensional component, that is, a set of inference rules over the constructs of the ground extensional component; (iii) a derived extensional component that can be produced as the result of the application of the inference rules over the ground extensional component (with the so-called “reasoning” process).</em>” They remark that ontologies and rules represent analogous structures, and that a knowledge graph is then a knowledge base extended with reasoning along similar lines to the definition provided by <a href="#ref-EhrlingerW16">Ehrlinger and Wöß [2016]</a>.</li>
			</ul>
			We refer to <a href="#ref-Bergman19">Bergman [2019]</a> for a list of further definitions that fit <em>Category&nbsp;III</em>. While having a specific, technical definition for knowledge graphs provides a more solid foundation for their study, as <a href="#ref-Bergman19">Bergman [2019]</a> remarks, many such definitions do not seem to fit the current practice of knowledge graphs. For example, it is not clear which of these definitions the Google Knowledge Graph itself – responsible for popularising the idea – would meet (if any). Many of the criteria proposed by such definitions are also orthogonal to the many works in the area of knowledge graph embeddings&nbsp;[<a href="#ref-Wang2017KGEmbedding">Wang et al., 2017</a>].</li>
			<li>While the previous three categories involve (sometimes conflicting) intensional definitions, the fourth category adopts an extensional definition of knowledge graphs, defining them in terms of prominent examples of knowledge graphs, such as DBpedia, Google’s Knowledge Graph, Freebase, YAGO, amongst others&nbsp;[<a href="#ref-BonattiDPP18">Bonatti et al., 2018</a>]. Arguably this category avoids the issue of defining a knowledge graph, rather than actually defining them.</li>
		</ul>
		<p>These categories refer to definitions that have appeared in the academic literature. In terms of enterprise knowledge graphs, an important reference is the paper of <a href="#ref-NoyGJNPT19">Noy et al. [2019]</a>, which has been co-authored by leaders of knowledge graph projects from eBay, Facebook, Google, IBM, and Microsoft, and thus can be seen as representing a form of consensus amongst these companies – who have played a key role in the popularisation of knowledge graphs – on what a “knowledge graph” means in this setting. Specifically this paper states that “<em>a knowledge graph describes objects of interest and connections between them</em>”, and goes on to state that “<em>many practical implementations impose constraints on the links in knowledge graphs by defining a schema or ontology</em>”. They later add “<em>Knowledge graphs and similar structures usually provide a shared substrate of knowledge within an organization, allowing different products and applications to use similar vocabulary and to reuse definitions and descriptions that others create. Furthermore, they usually provide a compact formal representation that developers can use to infer new facts and build up the knowledge</em>”. We interpret this definition as corresponding to <em>Category I</em>, but further acknowledging that while not a necessary condition for a knowledge graph, ontologies and formal representations <em>usually</em> play a key role. The definition we provide at the outset of the paper is largely compatible with that of <a href="#ref-NoyGJNPT19">Noy et al. [2019]</a>.</p>
		</section>
	</section>
	</section>
	<section id="sec-bio" class="prechapter">
		<h2 id="bio">Authors’ Biography</h2>
		<h3 id="bio-hogan">Aidan Hogan</h3>
		<p>Aidan Hogan is an Associate Professor at the Department of Computer Science, Universidad de Chile, where he also holds the position of Associate Researcher in the Millennium Institute for Foundational Research on Data (IMFD). He received a B.Eng. and Ph.D. from the National University of Ireland, Galway, in 2006 and 2011, respectively. His primary research interests centre on the Semantic Web and Knowledge Graphs. He is the author of over one hundred research publications on these topics, including two other books: “Reasoning Techniques for the Web of Data” and “The Web of Data”.</p>

		<h3 id="bio-blomqvist">Eva Blomqvist</h3>
		<p>Eva Blomqvist is an Associate Professor at the Department of Computer and Information Science, Linköping University. She received a Ph.D. from Linköping University, Sweden, in 2009, in the area of Ontology Learning for the Semantic Web. After a postdoc at ISTC-CNR in Rome, Italy, she has been a member of the Semantic Web group at Linköping University since 2011. Her primary research interests include the Semantic Web and Knowledge Graphs, more specifically the development and use of ontologies as schemas for Knowledge Graphs. She is the author of over fifty research publications in the area, and has served as scientific program chair of several of the top conferences in the field.</p>

		<h3 id="bio-cochez">Michael Cochez</h3>
		<p>Michael Cochez is an Assistant Professor in the Knowledge Representation and Reasoning Group at the Computer Science department of the Vrije Universiteit, Amsterdam. He received his B.Sc. from the University of Antwerp, Belgium and his M.Sc. and Ph.D. degrees from the University of Jyväskylä, Finland. His research interests are in the intersection of Machine Learning and Knowledge Graphs.</p>

		<h3 id="bio-damato">Claudia d’Amato</h3>
		<p>Claudia d’Amato is an Associate Professor at the Department of Computer Science, University of Bari, Italy and a member of the Knowledge Acquisition and Machine Learning Lab. She also holds a habilitation as Full Professor for the scientific sectors: INF/01 and ING-INF/05. She received her Masters Degree and Ph.D. from the University of Bari, Italy, in 2003 and 2007, respectively. Over the years, she has also spent several invited-researcher stays in different international universities and research institutes. Her primary research interests centre on Machine Learning for the Semantic Web and Knowledge Graphs. She is the author of over one hundred research publications on these topics.</p>

		<h3 id="bio-demelo">Gerard de Melo</h3>
		<p>Gerard de Melo is a Full Professor at the Hasso Plattner Institute for Digital Engineering and at the University of Potsdam, where he holds the Chair for Artificial Intelligence and Intelligent Systems and heads the corresponding research group. Previously, he was a faculty member at Rutgers University in New Jersey and at Tsinghua University in Beijing, and a Post-Doctoral Research Scholar at ICSI/UC Berkeley. He has published over 150 papers on  natural language processing, knowledge graphs, and AI, and received a number of best paper awards.</p>

		<h3 id="bio-gutierrez">Claudio Gutierrez</h3>
		<p>Claudio Gutierrez is Full Professor at the Department of Computer Science, Universidad de Chile. He is also a Senior Researcher in the Millennium Institute for Foundational Research on Data (IMFD). His main research interests are the computational foundations of data and knowledge. He has worked and published extensively in the areas of the Semantic Web and Databases, fields in which he received test of time awards (ISWC and PODS). He also devotes time to research in the field of the History of Science and Technology.</p>

		<h3 id="bio-kirrane">Sabrina Kirrane</h3>
		<p>Sabrina Kirrane is an Assistant Professor at the Vienna University of Economics and Business Institute for Information Systems and New Media, where she is also a member of the Research Institute for Cryptoeconomics and the Sustainable Computing Lab. Her research interests include Security, Privacy, and Policy aspects of the Next Generation Internet (NGI), Distributed and Decentralised Systems, Big Data and Data Science, with a particular focus on policy representation and reasoning (e.g., access constraints, usage policies, regulatory obligations, societal norms, business processes), and the development of transparency and trust techniques for the Web.</p>

		<h3 id="bio-labragayo">Jose Emilio Labra Gayo</h3>
		<p>Jose Emilio Labra Gayo is an Associate Professor at the University of Oviedo, Spain. He founded the WESO (Web Semantics Oviedo) research group in 2004, whose main goal is to apply semantic technologies to solve practical problems. He was a member of the W3C Data Shapes working group and is a member of the W3C Community Groups: Shape Expressions and SHACL. He is coauthor of the “Validating RDF data” book and maintains the ShEx and SHACL library SHaclEX as well as the online tools RDFShape and Wikishape. Previously, he was coordinator of the Master in Web Engineering and Dean of the School of Computer Science Engineering at the University of Oviedo (2004–2012).</p>

		<h3 id="bio-navigli">Roberto Navigli</h3>
		<p>Roberto Navigli is a Full Professor of Computer Science at the Sapienza University of Rome, where he leads the Sapienza NLP Group. His research is focused on multilingual Natural Language Understanding, a field in which he received two grants of the European Research Council. In 2015 he received the META prize for groundbreaking work in overcoming language barriers with the BabelNet lexical-semantic knowledge graph, a project also highlighted in The Guardian and Time magazine, and winner of the Artificial Intelligence Journal prominent paper award 2017. He is the co-founder of Babelscape, a successful company which enables Natural Language Understanding in dozens of languages.</p>

		<h3 id="bio-neumaier">Sebastian Neumaier</h3>
		<p>Sebastian Neumaier is a researcher in the Data Intelligence group at the St. Poelten University of Applied Sciences, Austria. He received an M.Sc. and Ph.D. from the Vienna University of Technology, in 2015 and 2019, respectively. His Ph.D. thesis is centred around methods to facilitate the integration and semantic enrichment of Open Data sources using Knowledge Graph technologies. His current research focuses on different aspects of semantic data management.</p>

		<h3 id="bio-ngongangomo">Axel-Cyrille Ngonga Ngomo</h3>
		<p>Axel-Cyrille Ngonga Ngomo is a Full Professor for Data Science at Paderborn University. He obtained his M.Sc., Ph.D. and habilitation from the University of Leipzig, where he also led the Agile Knowledge Engineering and Semantic Web Group. His research focuses on the automation of the lifecycle of knowledge graphs. Thus, his works include the development of approaches for the extraction, integration, fusion, storage, analysis and exploitation of knowledge graphs.</p>

		<h3 id="bio-polleres">Axel Polleres</h3>
		<p>Axel Polleres heads the Institute for Data, Process and Knowledge Management of Vienna University of Economics and Business (WU Wien), which he joined in September 2013 as a Full Professor in the area of “Data and Knowledge Engineering”. He is also a faculty member of the Complexity Science Hub Vienna and was a visiting professor at Stanford University in 2018. He obtained his Ph.D. and habilitation from Vienna University of Technology. His research focuses on ontologies, query languages, logic programming, configuration technologies, Artificial Intelligence, Semantic Web, Linked Open Data, Knowledge Graphs and their applications for Knowledge Management. Moreover, he actively contributed to international standardisation efforts within the World Wide Web Consortium (W3C) where he co-chaired the W3C SPARQL working group.</p>

		<h3 id="bio-rashid">Sabbir M. Rashid</h3>
		<p>Sabbir M. Rashid is a Ph.D. candidate at Rensselaer Polytechnic Institute (RPI) working with Deborah L. McGuinness on research related to data annotation and harmonisation, ontology engineering, knowledge representation, and various forms of reasoning. Prior to RPI, Sabbir completed a double major at Worcester Polytechnic Institute, where he received B.Sc. degrees in both Physics and Electrical \& Computer Engineering. Much of his graduate studies at RPI have involved research related to data annotation and transformation using Semantic Data Dictionaries. His current work includes the application of deductive and abductive inference techniques over Linked Health Data, such as in the context of chronic diseases like diabetes.</p>

		<h3 id="bio-rula">Anisa Rula</h3>
		<p>Anisa Rula is an Assistant Professor in Computer Science at the Department of Information Engineering, University of Brescia since January 2021 and a researcher at the University of Bonn in the SDA group since January 2017. She obtained her doctoral degree in Computer Science from the University of Milano-Bicocca in 2014. Her research interests are in the intersection of semantic knowledge technologies and data quality with a particular focus on data integration. She is researching new solutions to data integration with respect to the quality of data modelling and efficient solutions for large-scale data sources. Recently she has been working on data understanding for large and complex datasets, on knowledge extraction, and on semantic data enrichment and refinement.</p>

		<h3 id="bio-schmelzeisen">Lukas Schmelzeisen</h3>
		<p>Lukas Schmelzeisen is a PhD candidate working with Steffen Staab in the Analytic Computing group at the University of Stuttgart, Germany. He holds a B.Sc. in Computer Science, which he received in 2015 at University of Koblenz–Landau. His main research interests are continuous representations of both natural language corpora and knowledge graphs. In particular, his current focus is on how such representations can be updated over time.</p>

		<h3 id="bio-sequeda">Juan Sequeda</h3>
		<p>Juan Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his research. His academic and industry work has been on designing and building Knowledge Graph for enterprise data integration where he has researched and developed technologies for semantic and graph data virtualisation, ontology and graph data modelling and schema mapping, and data integration methodologies. Juan holds a Ph.D. in Computer Science from the University of Texas at Austin. He is the recipient of the NSF Graduate Research Fellowship, received 2<sup>nd</sup> Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at International Semantic Web Conference 2014 and the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics. Juan bridges academia and industry through standardisation committees, being a co-chair of the Property Graph Schema Working Group, and past member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC), as well as a past invited expert member and standards editor at the World Wide Web Consortium (W3C).</p>

		<h3 id="bio-staab">Steffen Staab</h3>
		<p>Steffen Staab holds a Cyber Valley endowed chair for Analytic Computing at the University of Stuttgart, Germany, and a chair for Web and Computer Science at the University of Southampton, UK. Steffen is a fellow of the European Association for Artificial Intelligence. His research interests range from knowledge graphs and machine learning to the semantics of human--computer interaction. He is co-director of the Interchange Forum for Reflecting on Intelligent Systems (IRIS) at the University of Stuttgart.</p>

		<h3 id="bio-zimmermann">Antoine Zimmermann</h3>
		<p>Antoine Zimmermann is an Associate Professor at Mines Saint-Étienne in France. He received an M.Sc. and a Ph.D. degree from the University of Grenoble, France in 2004 and 2008 respectively. He spent two years at the Digital Enterprise Research Institute in Galway, Ireland, from 2009 to 2010, then one year at INSA Lyon, France, before getting a position at Mines Saint-Étienne, where he has been a permanent researcher since 2012. In 2021, he received his habilitation from Université Jean Monnet, Saint-Étienne. His research interests are related to the Semantic Web, more specifically on knowledge representation, knowledge engineering, reasoning, data management and context on the Web.</p>
	</section>
	<h2 id="theend"><span>The End</span></h2>
</body>
</html>