-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.html
144 lines (111 loc) · 8.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<!DOCTYPE html>
<html>
<head>
<title>Silk - The Linked Data Integration Framework</title>
<link rel='shortcut icon' type='image/png' href='assets/images/favicon.png'/>
<link rel="stylesheet" media="screen" href="assets/stylesheets/main.css" />
</head>
<body>
<div class="header">
<h1>Silk</h1>
<div id="subtitle">The Linked Data Integration Framework</div>
<div id="logos">
<a href="http://dws.informatik.uni-mannheim.de/" target="_blank">
<img src="assets/images/logo_uni_mannheim.png" />
</a>
<a href="http://www.eccenca.com/" target="_blank">
<img src="assets/images/eccenca_logo.png" />
</a>
</div>
</div>
<div id="navigation">
<ul>
<li><a class="selected" href="/">Home</a></li>
<li><a href="/news">News</a></li>
<li><a href="/download">Download</a></li>
<li><a href="/support">Support</a></li>
<li><a href="/publications">Publications</a></li>
</ul>
</div>
<div id="main">
<div id="authors">
<p>
<a href="http://robertisele.com">Robert Isele</a> (eccenca GmbH) <br>
<a href="http://www.hpi.uni-potsdam.de/naumann/people/anja_jentzsch.html">Anja Jentzsch</a> (Hasso Plattner Institut)<br>
<a href="http://www.bizer.de/">Christian Bizer</a> (University of Mannheim)<br>
<a href="mailto:[email protected]">Julius Volz</a> (Google)<br>
<a href="http://dws.informatik.uni-mannheim.de/en/people/researchers/petar-petrovski/">Petar Petrovski</a> (University of Mannheim)
</p>
</div>
Silk is an open source framework for integrating heterogeneous data sources. The primary uses cases of Silk include:
<ul>
<li>Generating links between related data items within different Linked Data sources.</li>
<li>Linked Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web.</li>
<li>Applying data transformations to structured data sources.</li>
</ul>
Silk is based on the Linked Data paradigm, which is built on two simple ideas: First, RDF provides an expressive data model for representing structured information. Second, RDF links are set between entities in different data sources. Background information about Linked Data and the vision of the Web of Data can be found in the overview article <a href="http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf">Linked Data - The Story So Far</a> and the <a href="http://linkeddatabook.com">Linked Data book</a>.
<H2><a name="linking"></a>Linking Data Sources</H2>
<p>Using the declarative <em>Silk - Link Specification Language</em> (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must
fulfill in order to be interlinked. These link conditions may combine various similarity
metrics and can take the graph around a data item into account, which is addressed
using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints. Link Specifications can be created using the <a href"#workbench">Silk Workbench</a> graphical user interface or manually in XML.</p>
<img src="assets/images/linkageRule.png" ></img>
<p>
The linking process is based on the <em>Silk Link Discovery Engine</em> which offers the following features:
<ul>
<li>Flexible, declarative language for specifying linkage rules</li>
<li>Support of RDF link generation (owl:sameAs links as well as other types)</li>
<li>Employment in distributed environments (by accessing local and remote SPARQL endpoints)</li>
<li>Usable in situations where terms from different vocabularies are mixed and where no consistent RDFS or OWL schemata exist</li>
<li>Scalability and high performance through efficient data handling (speedup factor of 20 compared to Silk 0.2):
<ul>
<li>Reduction of network load by caching and reusing of SPARQL result sets</li>
<li>Multi-threaded computation of the data item comparisons (3 million comparisons per minute on a Core2 Duo)</li>
<li>Optional blocking of data items</li>
</ul>
</li>
</ul>
</p>
<H2><a name="transformations"></a>Data Transformations</H2>
<p>
While the main part of a integration workflow lies in the interlinking of data sources. Data sets coming fron different sources sometimes required the harmonization of the schemata and data formats prior to interlinking. For this purpose, Silk enables the user to create and execute lightweight transformation rules. Transformation rules may be used for:
<ul>
<li>Data cleaning, e.g., removing unwanted values</li>
<li>Mapping between different properties or adding new properties with generated values.</li>
<li>Converting between different data formats. Data may read from sources such as RDF, CSV or XML. Typically the output is written to an RDF store which can be queried using SPARQL, but data can also be written as CSVs to be imported into relational databases or opened in Excel.</li>
</ul>
</p>
<img src="assets/images/dataTransformation.png" ></img>
<H2><a name="workbench"></a>Silk Workbench</H2>
<p>
<em>Silk Workbench</em> is a web application which guides the user through the process of interlinking different data sources.
</p>
<p>
Silk Workbench offers the following features:
<ul>
<li>It enables the user to manage different sets of data sources, linking tasks and transformation tasks.</li>
<li>It offers a graphical editor which enables the user to easily create and edit linking tasks and transformation tasks.</li>
<li>As finding a good linking heuristics is usually an iterative process, the Silk Workbench makes it possible for the user to quickly evaluate the links which are generated by the current link specification.</li>
<li>It allows the user to create and edit a set of reference links used to evaluate the current link specification.</li>
</ul>
Documentation of the Silk Workbench is available in the <a href="https://github.com/silk-framework/silk/blob/master/doc/Workbench.md">Wiki</a>.
</p>
<h2><a name="commandline"></a>Silk Command Line Applications</h2>
In addition to the Workbench, Silk provides three different command line applications for executing link specifications:
<ul>
<li><em>Silk Single Machine</em> is used to generate RDF links on a single machine. The datasets that should be interlinked can either reside on the same machine or on remote machines which are accessed via the SPARQL protocol. <em>Silk Single Machine</em> provides multithreading and caching. In addition, the performance is further enhanced using the <a href="http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/IseleJentzschBizer-WebDB2011.pdf">MultiBlock</a> blocking algorithm.</li>
<li><em>Silk MapReduce</em> is used to generate RDF links between data sets using a cluster of multiple machines. <em>Silk MapReduce</em> is based on Hadoop and can for instance be run on Amazon Elastic MapReduce. <em>Silk MapReduce</em> enables Silk to scale out to very big datasets by distributing the link generation to multiple machines.</li>
<li><em>Silk Server</em> can be used as an identity resolution component within applications that consume Linked Data from the Web. <em>Silk Server</em> provides an HTTP API for matching entities from an incoming stream of RDF data while keeping track of known entities. It can be used for instance together with a Linked Data crawler to populate a local duplicate-free cache with data from the Web.</li>
</ul>
<h2><a name="freetextpreprocessor"></a>Silk Free Text Preprocessor</h2>
<p>
The main goal of the Free Text Pre-processing tool is to produce a structured representation of data that contains or is derived from free text. The tool takes as input an RDF file with properties with free text values and an additional RDF file that contains structured data used to learn the extraction model. Based on the learned model the tool extracts new property-value pairs from free text. The resulting output is an RDF dump file containing the extracted structured values. Using a declarative XML-based language, a user can specify which extraction methods to use.
</p>
<p>
Documentation of the Silk Free Text Preprocessor is available in the <a href="https://www.assembla.com/spaces/silk/wiki/Silk_Free_Text_Preprocessor">Wiki</a>.
</p>
<H2><a name="acknowledgments"></a>Acknowledgments</H2>
<p>This work was supported in part by Vulcan Inc. as part of its <a href="http://www.projecthalo.com">Project Halo</a> and by the EU FP7 project <a href="http://lod2.eu/">LOD2 - Creating Knowledge out of Interlinked Data</a> (Grant No. 257943).</p>
</div>
</body>
</html>