forked from dokieli/dokieli
-
Notifications
You must be signed in to change notification settings - Fork 0
/
linked-sdmx-data.html
704 lines (583 loc) · 81.2 KB
/
linked-sdmx-data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">
<head>
<meta charset="utf-8"/>
<title>Linked SDMX Data</title>
<link rel="stylesheet alternate" media="all" title="LNCS" href="media/css/lncs.css"/>
<link rel="stylesheet" media="all" title="ACM" href="media/css/acm.css"/>
<link rel="stylesheet" media="all" href="media/css/lr.css"/>
<script src="http://code.jquery.com/jquery-2.1.3.min.js"></script>
<script src="scripts/html.sortable.min.js"></script>
<script src="scripts/lr.js"></script>
</head>
<body about="[this:]" typeof="schema:CreativeWork sioc:Post schema:ScholarlyArticle prov:Entity" class="h-feed" prefix="rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# owl: http://www.w3.org/2002/07/owl# xsd: http://www.w3.org/2001/XMLSchema# dcterms: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ v: http://www.w3.org/2006/vcard/ns# pimspace: http://www.w3.org/ns/pim/space# skos: http://www.w3.org/2004/02/skos/core# prov: http://www.w3.org/ns/prov# schema: http://schema.org/ sioc: http://rdfs.org/sioc/ns# rsa: http://www.w3.org/ns/auth/rsa# cert: http://www.w3.org/ns/auth/cert# cal: http://www.w3.org/2002/12/cal/ical# wgs: http://www.w3.org/2003/01/geo/wgs84_pos# bibo: http://purl.org/ontology/bibo/ dbr: http://dbpedia.org/resource/ dbp: http://dbpedia.org/property/ sio: http://semanticscience.org/resource/ opmw: http://www.opmw.org/ontology/ deo: http://purl.org/spar/deo/ doco: http://purl.org/spar/doco/ cito: http://purl.org/spar/cito/ fabio: http://purl.org/spar/fabio/ oa: http://www.w3.org/ns/oa# this: http://csarven.ca/linked-sdmx-data">
<article class="h-entry">
<h1 class="p-name" property="schema:name">Linked SDMX Data</h1>
<div id="authors">
<dl id="author-name">
<dt>Authors</dt>
<dd id="author-1"><span about="[this:]" rel="schema:creator schema:publisher schema:contributor schema:author"><a about="http://csarven.ca/#i" typeof="schema:Person" rel="schema:url" property="schema:name" href="http://csarven.ca/">Sarven Capadisli</a></span><span about="http://csarven.ca/#i" rel="schema:memberOf" resource="[dbr:Leipzig_University]"></span><sup><a href="#author-org-1">1</a></sup><sup><a href="#author-email-1"></a>✊</sup></dd>
<dd id="author-2"><span about="[this:]" rel="schema:contributor"><a about="[this:#SörenAuer]" typeof="schema:Person" rel="schema:url" property="schema:name" href="http://aksw.org/SoerenAuer">Sören Auer</a></span><span about="[this:#SörenAuer]" rel="schema:memberOf" resource="[dbr:Leipzig_University]"></span><sup><a href="#author-org-2">2</a></sup><sup><a href="#author-email-2">⚛</a></sup></dd>
<dd id="author-3"><span about="[this:]" rel="schema:contributor"><a about="[this:#AxelCyrilleNgongaNgomo]" typeof="schema:Person" rel="schema:url" property="schema:name" href="http://aksw.org/AxelNgonga">Axel-Cyrille Ngonga Ngomo</a></span><span about="[this:#AxelCyrilleNgongaNgomo]" rel="schema:memberOf" resource="[dbr:Leipzig_University]"></span><sup><a href="#author-org-3">3</a></sup><sup><a href="#author-email-3">♔</a></sup></dd>
</dl>
<ul id="author-org">
<li id="author-org-1"><sup>1</sup><a about="[dbr:Leipzig_University]" typeof="schema:Organization" property="schema:name" rel="schema:url" href="http://www.zv.uni-leipzig.de/">Leipzig_University</a>, Institut für Informatik, AKSW, <span class="adr"><span class="post-office-box">Postfach 100920</span>, <span class="postal-code">D-04009</span> <span class="region">Leipzig</span>, <span class="country-name">Germany</span></span></li>
<li id="author-org-2"><sup>2</sup><a about="[dbr:Leipzig_University]" typeof="schema:Organization" property="schema:name" rel="schema:url" href="http://www.zv.uni-leipzig.de/">Leipzig_University</a>, Institut für Informatik, AKSW, <span class="adr"><span class="post-office-box">Postfach 100920</span>, <span class="postal-code">D-04009</span> <span class="region">Leipzig</span>, <span class="country-name">Germany</span></span></li>
<li id="author-org-3"><sup>3</sup><a about="[dbr:Leipzig_University]" typeof="schema:Organization" property="schema:name" rel="schema:url" href="http://www.zv.uni-leipzig.de/">Leipzig_University</a>, Institut für Informatik, AKSW, <span class="adr"><span class="post-office-box">Postfach 100920</span>, <span class="postal-code">D-04009</span> <span class="region">Leipzig</span>, <span class="country-name">Germany</span></span></li>
</ul>
<ul id="author-email">
<li id="author-email-1"><sup>✊</sup><a about="http://csarven.ca/#i" rel="schema:email" href="mailto:[email protected]">[email protected]</a></li>
<li id="author-email-2"><sup>⚛</sup><a about="[this:#SörenAuer]" rel="schema:email" href="mailto:[email protected]">[email protected]</a></li>
<li id="author-email-3"><sup>♔</sup><a about="[this:#AxelCyrilleNgongaNgomo]" rel="schema:email" href="mailto:[email protected]">[email protected]</a></li>
</ul>
</div>
<dl id="document-identifier">
<dt>Document ID</dt>
<dd><a href="http://csarven.ca/linked-sdmx-data">http://csarven.ca/linked-sdmx-data</a></dd>
</dl>
<dl id="document-published">
<dt>Published</dt>
<dd><time datetime="2013-02-10" property="schema:datePublished" content="2013-02-10T09:00:00Z" datatype="xsd:dateTime">2013-02-10</time></dd>
</dl>
<dl id="document-modified">
<dt>Modified</dt>
<dd><time datetime="2014-08-11" property="schema:dateModified" content="2014-08-04T09:00:00Z" datatype="xsd:dateTime">2014-08-11</time></dd>
</dl>
<dl id="document-license">
<dt>License</dt>
<dd><a about="[this:]" rel="license schema:license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a></dd>
</dl>
<dl id="document-in-reply-to">
<dt>In Reply To</dt>
<dd><a class="u-in-reply-to" about="[this:]" rel="sioc:reply_of" href="http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-special-issue-linked-dataset-descriptions">Semantic Web Journal Call: 2nd Special Issue on Linked Dataset Descriptions</a></dd>
</dl>
<dl id="document-appeared">
<dt>Appeared In</dt>
<dd about="[this:]" rel="bibo:citedBy" resource="[this:#SWJ-LDD-2015]">
<span about="[this:#SWJ-LDD-2015]" typeof="bibo:Document">
<span property="schema:alternateName">SWJ-LDD-2015</span> (<a rel="schema:url" href="http://www.semantic-web-journal.net/issues" property="schema:name">Semantic Web, Linked Dataset Descriptions</a> <span property="schema:datePublished" xml:lang="">2015</span>)
, Tracking <span rel="rdfs:seeAlso" resource="http://www.semantic-web-journal.net/content/linked-sdmx-data">454-1631</span>
, DOI <span property="bibo:doi" xml:lang="">10.3233/SW-130123</span>
, ISSN <span property="bibo:issn" xml:lang="">2210-4968</span>
, Volume <span property="bibo:volume" xml:lang="">6</span>
, Issue <span property="bibo:issue" xml:lang="">2</span>
, Pages <span property="bibo:pageStart" xml:lang="">105</span> – <span property="bibo:pageEnd" xml:lang="">112</span>
</span>
</dd>
</dl>
<dl id="document-purpose">
<dt>Purpose</dt>
<dd property="schema:purpose">Path to high fidelity Statistical Linked Data</dd>
</dl>
<div id="content" class="e-content">
<section id="abstract" about="[this:]">
<h2>Abstract</h2>
<div property="schema:abstract" class="p-summary">
<p>As statistical data is inherently highly structured and comes with rich metadata (in form of code lists, data cubes etc.), it would be a missed opportunity to not tap into it from the Linked Data angle. At the time of this writing, there exists no simple way to transform statistical data into Linked Data since the raw data comes in different shapes and forms. Given that SDMX (Statistical Data and Metadata eXchange) is arguably the most widely used standard for statistical data exchange, a great amount of statistical data about our societies is yet to be discoverable and identifiable in a uniform way. In this article, we present the design and implementation of SDMX-ML to RDF/XML XSL transformations, as well as the publication of <a href="http://oecd.270a.info/">OECD</a>, <a href="http://bfs.270a.info/">BFS</a>, <a href="http://fao.270a.info/">FAO</a>, <a href="http://ecb.270a.info/">ECB</a>, <a href="http://imf.270a.info/">IMF</a>, <a href="http://uis.270a.info/">UIS</a>, <a href="http://frb.270a.info/">FRB</a>, <a href="http://bis.270a.info/">BIS</a>, and <a href="http://abs.270a.info/">ABS</a> dataspaces with that tooling.</p>
</div>
</section>
<section id="categories-and-subject-descriptors" about="[this:]">
<h2>Categories and Subject Descriptors</h2>
<div>
<ul>
<li><a rel="schema:about" resource="http://acm.rkbexplorer.com/ontologies/acm#H.4" href="http://www.acm.org/about/class/ccs98-html#H.4">H.4</a> [<strong>Information Systems Applications</strong>]: Linked Data</li>
<li><a rel="schema:about" resource="http://acm.rkbexplorer.com/ontologies/acm#D.2" href="http://www.acm.org/about/class/ccs98-html#D.2">D.2</a> [<strong>Software Engineering</strong>]: Semantic Web</li>
</ul>
</div>
</section>
<section id="keywords" about="[this:]">
<h2>Keywords</h2>
<div>
<ul rel="schema:about">
<li><a resource="http://dbpedia.org/resource/Linked_Data" href="http://en.wikipedia.org/wiki/Linked_Data">Linked Data</a></li>
<li><a resource="http://dbpedia.org/resource/SDMX" href="http://en.wikipedia.org/wiki/SDMX">SDMX</a></li>
<li><a resource="http://dbpedia.org/resource/Statistics" href="http://en.wikipedia.org/wiki/Statistics">Statistics</a></li>
<li><a resource="http://dbpedia.org/resource/Data_modeling" href="http://en.wikipedia.org/wiki/Data_modeling">Data modeling</a></li>
<li><a resource="http://dbpedia.org/resource/Data_transformation" href="http://en.wikipedia.org/wiki/Data_transformation">Data transformation</a></li>
<li><a resource="http://dbpedia.org/resource/Dataspaces" href="http://en.wikipedia.org/wiki/Dataspaces">Dataspaces</a></li>
<li><a resource="http://dbpedia.org/resource/Knowledge_management" href="http://en.wikipedia.org/wiki/Knowledge_management">Knowledge management</a></li>
</ul>
</div>
</section>
<section id="introduction" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#introduction]" property="schema:name">Introduction</h2>
<div about="[this:#introduction]" property="schema:description" typeof="deo:Introduction">
<p>While access to statistical data in the public sector has increased in recent years, a range of technical challenges makes it difficult for data consumers to tap into this data at ease. These are particularly related to the following two areas:</p>
<ul>
<li>Automation of data transformation of data from high profile statistical organizations.</li>
<li>Minimization of third-party interpretation of the source data and metadata and lossless transformations.</li>
</ul>
<p>Development teams often face low-level repetitive data management tasks to deal with someone else's data. Within the context of Linked Data, one aspect is to transform this raw statistical data (e.g., SDMX-ML) into an RDF representation in order to be able to start tapping into what's out there in a uniform way.</p>
<p>The contributions of this article are two-fold. We present an approach for transforming SDMX-ML based on XSLT 2.0 templates and showcase our implementation which transforms SDMX-ML data to RDF/XML. Following this, SDMX-ML data from <a href="http://www.oecd.org/">OECD</a> (Organisation for Economic Co-operation and Development), <a href="http://www.bfs.admin.ch/">BFS</a> (<span xml:lang="de">Bundesamt für Statistik</span>@de, Swiss Federal Statistical Office@en), <a href="http://www.fao.org/">FAO</a> (Food and Agriculture Organization of the United Nations), <a href="http://www.ecb.int/">ECB</a> (European Central Bank), and <a href="http://www.imf.org/">IMF</a> (International Monetary Fund), <a href="http://www.uis.unesco.org/">UIS</a> (UNESCO Institute for Statistics), <a href="http://www.federalreserve.org/">FRB</a> (Federal Reserve Board), <a href="http://www.bis.org/">BIS</a> (Bank for International Settlements), <a href="http://abs.270a.info/">ABS</a> (Australian Bureau of Statistics) are retrieved, transformed and published as Linked Data.</p>
</div>
</section>
<section id="background" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#background]" property="schema:name">Background</h2>
<div about="[this:#background]" property="schema:description">
<p>As pointed out in <a href="http://csarven.ca/statistical-linked-dataspaces"><cite>Statistical Linked Dataspaces</cite></a> (Capadisli, S., 2012), what linked statistics provide, and in fact enable, are queries across datasets: Given that the dimension concepts are interlinked, one can learn from a certain observation's dimension value, and enable the automation of cross-dataset queries.</p>
<p>Moreover, a number of approaches have been undertaken in the past to go from raw statistical data from the publisher to linked statistical data, as discussed in great detail in <a href="http://www.springerlink.com/content/t2244913r2583jw1/"><cite>Official statistics and the Practice of Data Fidelity</cite></a> (Cyganiak, R., 2011). These approaches go from retrieval of the data by majority; in tabular formats: Microsoft Excel or CSV, tree formats: XML with a custom schema, SDMX-ML, PC-Axis, to transformation into different RDF serialization formats. As far as graph formats go, majority of datasets in those formats not published by the owners. However, there are number of statistical linked dataspaces in the <a href="http://lod-cloud.net/">LOD Cloud</a> already.</p>
<p>A number of transformation efforts are performed by the Linked Data community based on various formats. For example, the <a href="http://worldbank.270a.info/">World Bank Linked Dataspace</a> is based on custom XML that the <a href="http://worldbank.org/">World Bank</a> provides through their APIs with the application of XSL Templates. The <a href="http://transparency.270a.info/">Transparency International</a> Linked Dataspace's data is based on CSV files with the transformation step through <a href="http://code.google.com/p/google-refine/">Google Refine</a> and the <a href="http://refine.deri.ie/">RDF Extension</a>. That is, data sources provide different data formats for the public, with or without accompanying metadata e.g., vocabularies, provenance. Hence, this repetitive work is no exception to Linked Data teams as they have to constantly be involved either by way of hand-held transformation efforts, or in best-case scenarios, it is done semi-automatically. Currently, there is no automation of the transformation step to the best of our knowledge. This is generally due to the difficulty of the task when dealing with the quality and consistency of the statistical data that is published on the Web, as well as the data formats that are typically focused on consumption. Although SDMX-ML is the primary format of the high profile statistical data organizations, it is yet to be taken advantage of.</p>
</div>
</section>
<section id="sdmx-ml-to-linked-data" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#sdmx-ml-to-linked-data]" property="schema:name">SDMX-ML to Linked Data</h2>
<div about="[this:#sdmx-ml-to-linked-data]" property="schema:description">
<p>Recently, SDMX is approved by ISO as <a href="http://www.iso.org/iso/catalogue_detail.htm?csnumber=52500">International Standard</a>. It is a standard which provides the possibility to consistently carry out data flows between publishers and consumers. SDMX-ML (using XML syntax) is considered to be the gold standard for expressing statistical data. It has a highly structured mechanism to represent statistical observations, classifications, and data structures. Organizations behind SDMX are <a href="http://www.bis.org/">BIS</a> (Bank for International Settlements), <a href="http://www.oecd.org/">OECD</a>, <a href="http://www.un.org/">UN</a> (United Nations), <a href="http://www.ecb.int/">ECB</a>, <a href="http://worldbank.org/">World Bank</a>, <a href="http://www.imf.org/">IMF</a>, <a href="http://www.fao.org/">FAO</a>, and <a href="http://epp.eurostat.ec.europa.eu/">Eurostat</a>.</p>
<p>We argue that high-fidelity statistical data representation in Linked Data should take advantage of SDMX-ML as it is widely adopted by data producers with rich data about our societies, making the need for transforming SDMX-ML to RDF and publishing accompanying Linked Dataspaces of paramount importance</p>
<section id="data-sources" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-sources]" property="schema:name">Data Sources</h3>
<div about="[this:#data-sources]" property="schema:description">
<p>As a demonstration of the SDMX-ML to RDF transformations, the selection of datasets here are from the following organizations. Instead of regurgitating publisher's profile about themselves, and to keep this part brief, I'll quote the first sentences from their about pages:</p>
<dl>
<dt>OECD</dt>
<dd><q>The mission of the Organisation for Economic Co-operation and Development (OECD) is to promote policies that will improve the economic and social well-being of people around the world</q> from <a href="http://www.oecd.org/about/">OECD Our mission</a>.</dd>
<dt>BFS</dt>
<dd><q>Swiss Statistics, the Federal Statistical Office’s web portal. Offering a modern and appealing interface, our website proposes a wide range of statistical information on the most important areas of life: population, health, economy, employment, education and much more</q> from <a href="http://www.bfs.admin.ch/bfs/portal/en/index/dienstleistungen/premiere_visite/01.html">BFS Welcome</a>.</dd>
<dt>FAO</dt>
<dd><q>Achieving food security for all is at the heart of FAO's efforts - to make sure people have regular access to enough high-quality food to lead active, healthy lives</q> from <a href="http://www.fao.org/about/en/">FAO's mandate</a>.</dd>
<dt>ECB</dt>
<dd><q>Whose main task is to maintain the euro's purchasing power and thus price stability in the euro area</q> from <a href="http://www.ecb.int/">ECB home</a>.</dd>
<dt>IMF</dt>
<dd><q>Working to foster global monetary cooperation, secure financial stability, facilitate international trade, promote high employment and sustainable economic growth, and reduce poverty around the world</q> from <a href="http://www.imf.org/">IMF home</a>.</dd>
<dt>UIS</dt>
<dd><q>The primary source for cross-nationally comparable statistics on education, science and technology, culture, and communication for more than 200 countries and territories</q> from <a href="http://www.uis.unesco.org/">UIS home</a>.</dd>
<dt>FRB</dt>
<dd><q>The Federal Reserve, the central bank of the United States, provides the nation with a safe, flexible, and stable monetary and financial system</q> from <a href="http://www.federalreserve.gov/">FRB home</a>.</dd>
<dt>BIS</dt>
<dd><q>The Bank for International Settlements (BIS) is an international organisation which fosters international monetary and financial cooperation and serves as a bank for central banks</q> from <a href="http://www.bis.org/">BIS home</a>.</dd>
<dt>ABS</dt>
<dd><q>We assist and encourage informed decision making, research and discussion within governments and the community, by leading a high quality, objective and responsive national statistical service</q> from <a href="http://www.abs.gov.au/">ABS mission statement</a>.</dd>
</dl>
<p>The OECD, FAO, ECB, IMF, UIS, BIS, ABS datasets consisted of observational and structural data. The OECD, ECB, UIS, BIS, ABS provided complete coverage (to the best of our knowledge), whereas FAO had partial fishery related data, and IMF partial data over their REST service. BFS had all of their classifications available, with no observational data in SDMX-ML.</p>
</div>
</section>
<section id="data-retrieval" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-retrieval]" property="schema:name">Data Retrieval</h3>
<div about="[this:#data-retrieval]" property="schema:description">
<p>As SDMX-ML publishers have their own publishing processes, availability and accessibility of the data varied. After obtaining the dataset codes, names, and URLs with common Linux command-line work, a Bash script was created to retrieve the data.</p>
<p id="data-retrieval-oecd">On how to retrieve all of the datasets from the OECD website was not entirely clear. In order to automatically get a hold of list of datasets, I copied the innerHTML of the DOM tree that contained all the dataset codes from <a href="http://stats.oecd.org/">OECD.StatExtracts</a> to a temporary file. This was done due to the fact that a simple scrape of an HTTP GET wasn't possible as the data on the page was populated via JavaScript on <code>document</code> ready. After constructing a list of datasets and structures to get, two REST API endpoints were called.</p>
<p id="data-retrieval-bfs">BFS offered a Microsoft Excel <a href="http://www.bfs.admin.ch/bfs/portal/en/index/infothek/nomenklaturen/sdmx.Document.139474.xls">document</a> which contained a catalog of their classifications and URLs for retrieval.</p>
<p id="data-retrieval-fao">After searching for keywords along the lines of <code>SDMX site:fao.org</code> at a search-engine nearby, <a href="http://www.fao.org/figis/sdmx/">FAO Fisheries</a> and <a href="http://data.fao.org/sdmx/index.html">data.fao.org</a> SDMX Registry and Repository, and its children pages were marked for SDMX-ML retrieval.</p>
<p id="data-retrieval-ecb">ECB had a similar REST API to OECD. Additionally, SDMX Dataflows was retrieved to get a primary list of datasets to retrieve. Some of the large datasets was retrieved by making multiple smaller calls to the API using a call per refereance area.</p>
<p id="data-retrieval-imf">IMF had the same procedure as ECB and OECD.</p>
<p id="data-retrieval-frb">FRB had the same procedure as above.</p>
<p id="data-retrieval-uis">UIS had the same procedure as above.</p>
<p id="data-retrieval-bis">BIS had the same procedure as above.</p>
<p id="data-retrieval-abs">ABS had the same procedure as above.</p>
</div>
</section>
<section id="provenance" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#provenance]" property="schema:name">Provenance</h3>
<div about="[this:#provenance]" property="schema:description">
<p>We now discuss provenance at: retrieval, transformation, and post-processing.</p>
<section id="provenance-at-retrieval" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#provenance-at-retrieval]" property="schema:name">Provenance at Retrieval</h4>
<div about="[this:#provenance-at-retrieval]" property="schema:description">
<p>At the time of data retrieval, information pertaining to provenance was captured using the <a href="http://www.w3.org/TR/prov-o/">PROV Ontology</a> in order to further enrich the data. This RDF/XML document contains <code>prov:Activity</code> information which indicates the location of the XML document on the local filesystem. It contains other provenance data like when it was retrieved, with what tools, etc. This provenance data from retrieval may be provided to the XSL Transformer during the transformation phase and VoID enrichment.</p>
</div>
</section>
<section id="provenance-at-transformation" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#provenance-at-transformation]" property="schema:name">Provenance at Transformation</h4>
<div about="[this:#provenance-at-transformation]" property="schema:description">
<p>Resources of type <code>qb:DataStructureDefinition</code>, <code>qb:DataSet</code>, <code>skos:ConceptScheme</code> are also typed with the <code>prov:Entity</code> class. Also properties <code>prov:wasAttributedTo</code> were added to these resources with the <code>creator</code> value which is of type <code>prov:Agent</code> obtained from XSLT configuration. There is a unique <code>prov:Activity</code> for each transformation, and it has a <code>schema:name</code>, and contains values for <code>prov:startedAtTime</code>, <code>prov:wasAssociatedWith</code> (the creator), <code>prov:used</code> (i.e., source XML, XSL to transform) to what was <code>prov:generated</code> (and source data URI that it <code>prov:wasDerivedFrom</code>). It also declares <code>dcterms:license</code> where value taken from XSLT configuration. The provenance document from the retrieval phase may be provided to the transformer. In this case, it establishes a link between the current provenance activity (i.e., the transformation), with the earlier provenance activity (i.e., the retrieval) using the <code>prov:wasInformedBy</code> property.</p>
</div>
</section>
<section id="provenance-at-post-processing" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#provenance-at-post-processing]" property="schema:name">Provenance at Post-processing</h4>
<div about="[this:#provenance-at-post-processing]" property="schema:description">
<p>The post-processing step for provenance is intended to retain provenance data for future use. As datasets get updated, it is important to preserve information about past activities by way of exporting all instances of the <code>prov:Activity</code> class from the RDF store. Activities are unique artifacts, on a conceptual level as well as with regard to referencing them. Since one of the main concerns of provenance is to keep track of activities, this post-processing step also allows us to retain a historical account of all activities during the data lifecycle, and to preserve all previously published URIs (cf. <a href="http://www.w3.org/Provider/Style/URI.html">Cool URIs don't change</a>).</p>
</div>
</section>
</div>
</section>
<section id="data-preprocessing" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-preprocessing]" property="schema:name">Data Preprocessing</h3>
<div about="[this:#data-preprocessing]" property="schema:description">
<p>By in large, there was no need to pre-process the data as the transformation dealt with the data as it was. However, some non-vital SDMX components were omitted from the output. For instance, one type of attribute in OECD and ECB observations contained free-text as opposed to its corresponding code from a codelist. Since the RDF Data Cube required codes as opposed to free-text for dimension values, some attributes were excluded. The decision here was to trade-off some precision in favour of retaining the dataset.</p>
</div>
</section>
<section id="data-modeling" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-modeling]" property="schema:name">Data Modeling</h3>
<div about="[this:#data-modeling]" property="schema:description">
<p>This section goves over several areas which are at the heart of representing statistical data in SDMX-ML as Linked Data. The approach taken was to provide a level of consistency for data consumers and tool builders for all statistical Linked Data with its origins from data in SDMX-ML.</p>
<section id="vocabularies" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#vocabularies]" property="schema:name">Vocabularies</h4>
<div about="[this:#vocabularies]" property="schema:description">
<p>Besides the common vocabularies: RDF RDFS, XSD, OWL, XSD, the RDF Data Cube vocabulary is used to describe multi-dimensional statistical data, and SDMX-RDF for the statistical information model. PROV-O is used for provenance coverage. SKOS and XKOS to cover concepts, concept schemes and their relationships to one another.</p>
</div>
</section>
<section id="version-in-uri" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#version-in-uri]" property="schema:name">Versioning</h4>
<div about="[this:#version-in-uri]" property="schema:description">
<p>SDMX data publishers version their classifications and the generated cubes refer to particular versions of those classifications. Consequently, versions need to be explicitly part of classification URIs in order to uniquely identify them. Although including version information in the URI is disputed by some authors, it is a good exception for identifying different concepts and data structures. Jeni Tennison et al discussed <a href="http://www.jenitennison.com/blog/node/112">Versioning URIs</a>, and concluded that there was no one-size-fits all solution. An alternative approach using named graphs for a series of changes was proposed in <a href="http://www.springerlink.com/index/10.1007/978-1-4614-1767-5">Linking UK Government Data</a>.</p>
</div>
</section>
<section id="uri-patterns" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#uri-patterns]" property="schema:name">URI Patterns</h4>
<div about="[this:#uri-patterns]" property="schema:description">
<p>An outline for the URI patterns is given in table below: <code>authority</code> is replaced with the domain (see also: <a href="#agency-identifiers-and-uris">Agency identifiers and URIs</a>) followed with <code>class</code>, <code>code</code>, <code>concept</code>, <code>dataset</code>, <code>property</code>, <code>provenance</code>, or <code>slice</code> as example. These tokens as well as <code>/</code> which is used to separate the dimension concepts in URIs can be configured in the toolkit.</p>
<p>In order to construct the URIs for the above patterns, some of the data values are normalized to make them URI safe but not altered in other ways (e.g., lower-casing). The rationale for this was to keep the consistency of terms in SDMX and RDF.</p>
<table id="uri-patterns-outline">
<caption>URI Patterns</caption>
<thead><tr><th>Entity type</th><th>URI Pattern</th></tr></thead>
<tbody>
<tr><td><code>qb:DataStructureDefinition</code></td><td><code>http://{authority}/structure/{version}/{KeyFamilyID}</code></td></tr>
<tr><td><code>qb:ComponentSpecification</code></td><td><code>http://{authority}/component/{KeyFamilyID}/{dimension|measure|attribute}/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
<tr><td><code>qb:DataSet</code></td><td><code>http://{authority}/dataset/{datasetID}</code></td></tr>
<tr><td><code>qb:Observation</code></td><td><code>http://{authority}/dataset/{datasetID}/{dimension-1}/../{dimension-n}</code></td></tr>
<tr><td><code>qb:Slice</code></td><td><code>http://{authority}/slice/{KeyFamilyID}/{dimension-1}/../{dimension-n-no-FREQ}</code></td></tr>
<tr><td><code>skos:Collection</code></td><td><code>http://{authority}/code/{version}/{hierarchicalCodeListID}</code>,<br/>
<code>http://{authority}/code/{version}/{hierarchyID}</code></td></tr>
<tr><td><code>sdmx:CodeList</code></td><td><code>http://{authority}/code/{version}/{codeListID}</code></td></tr>
<tr><td><code>skos:ConceptScheme</code></td><td><code>http://{authority}/concept/{version}/{conceptSchemeID}</code></td></tr>
<tr><td><code>skos:Concept</code>,<br/>
<code>sdmx:Concept</code></td><td><code>http://{authority}/code/{version}/{codeListID}/{codeID}</code><br/>
<code>http://{authority}/concept/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
<tr><td><code>owl:Class</code>,<br/>
<code>rdfs:Class</code></td><td><code>http://{authority}/class/{version}/{codeListID}</code></td></tr>
<tr><td><code>rdf:Property</code>,<br/>
<code>qb:ComponentProperty</code></td><td><code>http://{authority}/{property|dimension|measure|attribute}/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
<tr><td><code>qb:DimensionProperty</code></td><td><code>http://{authority}/dimension/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
<tr><td><code>qb:MeasureProperty</code></td><td><code>http://{authority}/measure/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
<tr><td><code>qb:AttributeProperty</code></td><td><code>http://{authority}/attribute/{version}/{conceptSchemeID}/{conceptID}</code></td></tr>
</tbody>
</table>
</div>
</section>
<section id="datatypes" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#datatypes]" property="schema:name">Datatypes</h4>
<div about="[this:#datatypes]" property="schema:description">
<p>XSD datatypes are assigned to literals are based on the value of the measure component (e.g., decimal, year). In the absence of this datatype, observation values are checked whether they can be casted to <code>xsd:decimal</code>. Otherwise, they are left as plain literals.</p>
</div>
</section>
</div>
</section>
</div>
</section>
<section id="linked-sdmx-data-transformation" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#linked-sdmx-data-transformation]" property="schema:name">Linked SDMX Data Transformation</h2>
<div about="[this:#linked-sdmx-data-transformation]" property="schema:description">
<p>The <a href="https://github.com/csarven/linked-sdmx">Linked SDMX</a> XSLT 2.0 templates and scripts are developed to transform SDMX-ML data and metadata to RDF/XML. Its goals are:</p>
<ul>
<li>To improve access and discovery of cross-domain statistical data.</li>
<li>To perform the transformation in a lossless and semantics preserving way.</li>
<li>To support and encourage statistical agencies to publish their data using RDF and integrating the transformation into their workflow.</li>
</ul>
<p>The key advantage of this transformation approach is that additional interpretations are not required by the data modeler especially in comparison to alternative transformation (e.g., CSV or XML to RDF serialization). Since the SDMX-RDF vocabulary is based on SDMX-ML standard, and the RDF Data Cube vocabulary is closely aligned with the SDMX information model, the transformation is to a large extent a matter of mapping the source SDMX-ML data to its counter parts in RDF.</p>
<section id="features-of-the-transformation" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#features-of-the-transformation]" property="schema:name">Features of the transformation</h3>
<div about="[this:#features-of-the-transformation]" property="schema:description">
<ul>
<li>Transforms SDMX KeyFamilies, ConceptSchemes and Concepts, CodeLists and Codes, Hierarchical CodeLists, DataSets.</li>
<li>Configurability for SDMX publisher's needs.</li>
<li>Detection and referencing CodeLists and Codes of external agencies.</li>
<li>Support of interlinking publisher-specific annotation types.</li>
<li>Support for omission of components.</li>
<li>Inclusion of provenance data.</li>
</ul>
<figure id="linked-sdmx-transformation-process">
<figcaption>Transformation Process</figcaption>
<object type="image/svg+xml" data="linked-sdmx-transformation.svg" width="460" height="175"></object>
</figure>
</div>
</section>
<section id="what-is-inside" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#what-is-inside]" property="schema:name">What is inside?</h3>
<div about="[this:#what-is-inside]" property="schema:description">
<p>It comes with scripts and sample data:</p>
<ul>
<li>XSLT 2.0 templates to transform Generic SDMX-ML data and metadata. It includes the main XSL template for generic SDMX-ML, an XSL for common templates and functions, and an RDF/XML configuration file to set preferences like base URIs, delimiters in URIs, how to map annotation types.</li>
<li>Bash script that transforms sample data using saxonb-xslt.</li>
<li>Sample SDMX Message and Structure retrieved from those organizations that are initially involved in the SDMX standard, as well as from BFS.</li>
</ul>
</div>
</section>
<section id="requirements" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#requirements]" property="schema:name">Requirements</h3>
<div about="[this:#requirements]" property="schema:description">
<p>The requirements for the Linked SDMX toolkit are an XSLT 2.0 processor to transform, and optionally to configure some of the settings in the transformation with provided <code>config.ttl</code> (in RDF Turtle) and transforming that to an abbreviated version of RDF/XML. In sequel some of they key features are described in more detail.</p>
<p>The transformation follows some common Linked Data practices as well as other ones out of thin air.</p>
</div>
</section>
<section id="configuration" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#configuration]" property="schema:name">Configuration</h3>
<div about="[this:#configuration]" property="schema:description">
<p>The config file is used to pre-configure some of the decisions that are made in XSL templates. Here is an outline for some of the noteworthy things.</p>
<section id="agency-identifiers-and-uris" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#agency-identifiers-and-uris]" property="schema:name">Agency identifiers and URIs</h4>
<div about="[this:#agency-identifiers-and-uris]" property="schema:description">
<p><code>agencies.ttl</code> is used to track some of the mappings for maintenance agencies. It includes the maintenance agency's i.e., the SDMX publisher's, identifier that's in the SDMX Registry, as well as the base URI for that agency. This file allows references to external agency identifiers to be looked up for their base URI and used in the transformations. Currently this agency recognition is treated as either "SDMX" or some agency that's publishing the actual statistics.</p>
<p>In the case of SDMX, when there is a reference to SDMX CodeLists and Codes, it is typically indicated by the component agency being set to <code>SDMX</code> e.g., <code>codelistAgency="SDMX"</code> of a <code>structure:Component</code> and/or <code>agencyID="SDMX"</code> of a CodeList with <code>id="CL_FREQ"</code>. When this is detected, corresponding URIs from the SDMX-RDF vocabulary is used e.g., for metadata; <code>http://purl.org/linked-data/sdmx/2009/code#freq</code>, and data; <code>http://purl.org/linked-data/sdmx/2009/code#freq-A</code>.</p>
<p>Similarly, an agency might use some other agency's codes. By following the same URI pattern conventions, the agency file is used to find the corresponding base URI in order to make a reference. For example, here is a coded property that's used by European Central Bank (<code>ECB</code>) to associate a code list that's defined by Eurostat (<code>eurostat</code>):</p>
<pre>
<http://ecb.270a.info/property/OBS_STATUS>
<http://purl.org/linked-data/cube#codeList>
<http://eurostat.270a.info/code/1.0/CL_OBS_STATUS>
</pre>
<p>Naturally, the transformation does not re-define metadata that's from an external agency as the owners of the data would define them under their authority.</p>
</div>
</section>
<section id="uri-configurations" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#uri-configurations]" property="schema:name">URI configurations</h4>
<div about="[this:#uri-configurations]" property="schema:description">
<p>Base URIs can be set for classes, codelists, concept schemes, datasets, slices, properties, provenance, as well as for the source SDMX data.</p>
<p>The value for <code>uriThingSeparator</code> e.g., <code>/</code>, lets one set the delimiter to separate the "thing" from the rest of the URI. This is typically either a <code>/</code> or <code>#</code>. For example, if slash is used, an URI would end up like <code>http://{authority}/code/{version}/CL_GEO</code> (note the last slash before CL_GEO). If hash is used, an URI would end up like <code>http://{authority}/code/{version}#CL_GEO</code>.</p>
<p>Similarly, <code>uriDimensionSeparator</code> can be set to separate dimension values that's used in RDF Data Cube observation URIs. As observation should have its own unique URI, the method to construct URIs is done by taking dimension values as safe terms to be used in URIs separated by the value in <code>uriDimensionSeparator</code>. For example, here is a crazy looking observation URI where <code>uriDimensionSeparator</code> is set to <code>/</code>: <code>http://{authority}/dataset/HEALTH_STAT/EVIEFE00/EVIDUREV/AUS/1960</code>. But with <code>uriThingSeparator</code> set to <code>#</code> and <code>uriDimensionSeparator</code> set to <code>-</code>, it could end up like <code>http://{authority}/dataset/HEALTH_STAT#EVIEFE00-EVIDUREV-AUS-1960</code>. <code>HEALTH_STAT</code> is the dataset id.</p>
<p>Creator's URI can also be set which is also used for provenance data.</p>
</div>
</section>
<section id="default-language" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#default-language]" property="schema:name">Default language</h4>
<div about="[this:#default-language]" property="schema:description">
<p>From the configuration, it is possible to force a default <code>xml:lang</code> on <code>skos:prefLabel</code> and <code>skos:definition</code> when lang is not originally in the data. If <code>config.rdf</code>contains a non-empty lang value it will use it. Default language may also be applied in the case of Annotations. See Interlinking SDMX Annotations for example.</p>
</div>
</section>
<section id="interlinking-sdmx-annotations" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#interlinking-sdmx-annotations]" property="schema:name">Interlinking SDMX Annotations</h4>
<div about="[this:#interlinking-sdmx-annotations]" property="schema:description">
<p>SDMX Annotations contain important information that can be put to use by the publisher. Data in AnnotationTypes are typically used as publisher's internal conventions. Hence, there is no standardization on how they are used across all SDMX publishers. In order not to leave this information behind in the final transformation, the configuration allows publishers to define the way they should be transformed. This is done by setting <code>interlinkAnnotationTypes</code>: the AnnotationType to detect (in <code>rdfs:type</code>), the predicate (as an XML QName) to use (in <code>rdf:predicate</code>), whether to apply instances of Concepts or Codes to apply to, or as Literals (in <code>rdf:range</code>), and whether to target AnnotationText or AnnotationTitle (in <code>rdfs:label</code>). Currently this feature is only applied to Annotations in Concepts and Codes. Only the AnnotationTypes with a corresponding configuration will be applied, and unspecific ones will be skipped.</p>
</div>
</section>
<section id="omitting-components" about="[this:]" rel="schema:hasPart">
<h4 about="[this:#omitting-components]" property="schema:name">Omitting components</h4>
<div about="[this:#omitting-components]" property="schema:description">
<p>There are cases in which certain data parts contain errors. To get around this until the data is fixed at source, and without giving up on rest of the data at hand, as well as without making any significant assumptions or changes to the remaining data, <code>omitComponents</code> is a configuration option to explicitly skip over those parts. For example, if the Attribute values in a DataSet don't correspond to coded values - where they may contain whitespace - they can be skipped without damaging the rest of the data. This obviously gives up on precision in favour of still making use of the data.</p>
</div>
</section>
</div>
</section>
</div>
</section>
<section id="linked-datasets" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#linked-datasets]" property="schema:name">Linked Datasets</h2>
<div about="[this:#linked-datasets]" property="schema:description">
<p>This section provides information on the publication of OECD, BFS, FAO, ECB, UIS, FRB, and BIS datasets.</p>
<p>The original SDMX-ML files were transformed to RDF/XML using XSLT 2.0. Saxon’s command-line XSLT and XQuery Processor tool was used for the transformations, and employed as part of Bash scripts to iterate through all the files in the datasets.</p>
<section id="rdf-datasets" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#rdf-datasets]" property="schema:name">RDF Datasets</h3>
<div about="[this:#rdf-datasets]" property="schema:description">
<p>Here are some statistics for the transformations.</p>
<p>The command-line tool <code>saxonb-xslt</code> was used to conduct the XSL transformations. 12000M of memory was allocated on a machine with Intel(R) Xeon(R) CPU E5620 @ 2.40GHz. Linux kernel 3.2.0-33-generic was used. Table [<a href="#transformation-time">Transformation time</a>] provides information on datasets; input SDMX-ML size, output RDF/XML size, their size difference in ratio, and the total amount transformation time.</p>
<table id="transformation-time">
<caption>Transformation time</caption>
<thead>
<tr><th>Dataset</th><th>Input size</th><th>Output size</th><th>Ratio</th><th>Time</th></tr>
</thead>
<tfoot><tr><td colspan="5">Input size (rounded) refers to the original data in XML, and the output (rounded) is the RDF/XML size. Time is the real process time.</td></tr></tfoot>
<tbody>
<tr><td>OECD</td><td>3400 MB</td><td>24000 MB</td><td>1:7.1</td><td>131m25.795s</td></tr>
<tr><td>BFS</td><td>111 MB</td><td>154 MB</td><td>1:1.4</td><td>2m38.225s</td></tr>
<tr><td>FAO</td><td>902 MB</td><td>4400 MB</td><td>1:4.9</td><td>31m48.207s</td></tr>
<tr><td>ECB</td><td>11000 MB</td><td>35000 MB</td><td>1:3.2</td><td>316m46.329s</td></tr>
<tr><td>IMF</td><td>392 MB</td><td>3400 MB</td><td>1:8.7</td><td>28m11.826s</td></tr>
<tr><td>UIS</td><td>115 MB</td><td>896 MB</td><td>1:7.8</td><td>2m33.214s</td></tr>
<tr><td>FRB</td><td>783 MB</td><td>11000 MB</td><td>1:14.1</td><td>96m20.676s</td></tr>
<tr><td>BIS</td><td>845 MB</td><td>4000 MB</td><td>1:4.7</td><td>21m7.203s</td></tr>
<tr><td>ABS</td><td>34000 MB</td><td>160000 MB</td><td>1:4.7</td><td>831m52.558s</td></tr>
</tbody>
</table>
<p>Table [<a href="#transformed-data">Transformed data</a>] provides data on the transformed data; number of triples it contains, as well as the number of <code>qb:Observation</code>s, and the ratio.</p>
<table id="transformed-data">
<caption>Transformed data</caption>
<thead><tr><th>Data</th><th>Number of triples</th><th>Number of observations</th><th>Ratio</th></tr></thead>
<tfoot><tr><td colspan="4">Metadata (from <code>graph/meta</code>) includes dataset structures and classifications. Ratio refers to rounded ratio of total number of triples (rounded) to number of observations (rounded) in the dataset.</td></tr></tfoot>
<tbody>
<tr><th>OECD Dataset</th><td>305 million</td><td>30 million</td><td>10.2:1</td></tr>
<tr><th>OECD Metadata</th><td>1.15 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>BFS Metadata</th><td>1.5 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>FAO Dataset</th><td>53 million</td><td>7.2 million</td><td>7.4:1</td></tr>
<tr><th>FAO Metadata</th><td>0.37 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>ECB Dataset</th><td>469 million</td><td>26 million</td><td>18:1</td></tr>
<tr><th>ECB Metadata</th><td>0.47 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>IMF Dataset</th><td>36 million</td><td>3.3 million</td><td>10.9:1</td></tr>
<tr><th>IMF Metadata</th><td>0.05 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>UIS Dataset</th><td>10.4 million</td><td>1.4 million</td><td>7.4:1</td></tr>
<tr><th>UIS Metadata</th><td>0.09 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>FRB Dataset</th><td>135 million</td><td>9.8 million</td><td>13.8:1</td></tr>
<tr><th>FRB Metadata</th><td>0.04 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>BIS Dataset</th><td>54 million</td><td>3.6 million</td><td>15:1</td></tr>
<tr><th>BIS Metadata</th><td>0.04 million</td><td>N/A</td><td>N/A</td></tr>
<tr><th>ABS Dataset</th><td>2360 million</td><td>160 million</td><td>14.8:1</td></tr>
<tr><th>ABS Metadata</th><td>6.06 million</td><td>N/A</td><td>N/A</td></tr>
</tbody>
</table>
<p>Table [<a href="#resource-counts">Resource counts</a>] provides further statistics on prominent resources. It gives a contrast between the classifications and the dataset.</p>
<table id="resource-counts">
<caption>Resource counts</caption>
<thead><tr><th>Source dataset</th><th><code>skos:ConceptScheme</code></th><th><code>skos:Concept</code></th><th><code>rdf:Property</code></th><th><code>qb:Observation</code></th></tr></thead>
<tfoot><tr><td colspan="4">Count of resources in datasets</td></tr></tfoot>
<tbody>
<tr><td>OECD</td><td>1209</td><td>80918</td><td>129</td><td>30183484</td></tr>
<tr><td>BFS</td><td>216</td><td>120202</td><td>0</td><td>0</td></tr>
<tr><td>FAO</td><td>32</td><td>28115</td><td>12</td><td>7186764</td></tr>
<tr><td>ECB</td><td>147</td><td>55609</td><td>231</td><td>25791005</td></tr>
<tr><td>IMF</td><td>25</td><td>3397</td><td>42</td><td>3603719</td></tr>
<tr><td>UIS</td><td>35</td><td>4515</td><td>12</td><td>1437651</td></tr>
<tr><td>FRB</td><td>76</td><td>2935</td><td>64</td><td>9768292</td></tr>
<tr><td>BIS</td><td>27</td><td>1560</td><td>31</td><td>3606466</td></tr>
<tr><td>ABS</td><td>2830</td><td>548953</td><td>111</td><td>160065287</td></tr>
</tbody>
</table>
</div>
</section>
<section id="interlinking" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#interlinking]" property="schema:name">Interlinking</h3>
<div about="[this:#interlinking]" property="schema:description">
<p>Initial interlinking is done among the classifications themselves in the datasets. The OECD and UIS classifications in particular contained highly similar codes (in some cases the same) throughout its codelists. Majority of the interlinks are between <code>skos:Concept</code>s with link relation <code>skos:exactMatch</code>.</p>
<p>The consequent interlinking was done with <a href="http://dbpedia.org/">DBpedia</a>, <a href="http://worldbank.270a.info/">World Bank Linked Data</a>, <a href="http://transparency.270a.info/">Transparency International Linked Data</a>, and <a href="http://eunis.eea.europa.eu/">EUNIS</a> using <a href="http://aksw.org/Projects/limes">LInk discovery framework for MEtric Spaces</a> (LIMES): <a href="http://www.dit.unitn.it/~p2p/OM-2011/om2011_Tpaper1.pdf"><cite>A Time-Efficient Hybrid Approach to Link Discovery</cite></a> (Ngonga Ngomo, A.-C., 2011)</p>
<table id="interlinks-between-datasets">
<caption>Interlinks between datasets</caption>
<thead><tr><th>Source dataset</th><th>Target dataset</th><th>Link count</th></tr></thead>
<tbody>
<tr><td>OECD</td><td>World Bank</td><td>3487</td></tr>
<tr><td>OECD</td><td>Transparency International</td><td>3335</td></tr>
<tr><td>OECD</td><td>DBpedia</td><td>3613</td></tr>
<tr><td>OECD</td><td>BFS</td><td>3383</td></tr>
<tr><td>OECD</td><td>FAO</td><td>3360</td></tr>
<tr><td>OECD</td><td>ECB</td><td>3495</td></tr>
<tr><td>BFS</td><td>World Bank</td><td>185</td></tr>
<tr><td>BFS</td><td>DBpedia</td><td>261</td></tr>
<tr><td>FAO</td><td>World Bank</td><td>178</td></tr>
<tr><td>FAO</td><td>Transparency International</td><td>167</td></tr>
<tr><td>FAO</td><td>DBpedia</td><td>875</td></tr>
<tr><td>FAO</td><td>EUNIS</td><td>359</td></tr>
<tr><td>FAO</td><td>ECB</td><td>210</td></tr>
<tr><td>ECB</td><td>World Bank</td><td>188</td></tr>
<tr><td>ECB</td><td>Transparency International</td><td>167</td></tr>
<tr><td>ECB</td><td>DBpedia</td><td>239</td></tr>
<tr><td>ECB</td><td>BFS</td><td>221</td></tr>
<tr><td>ECB</td><td>FAO</td><td>210</td></tr>
<tr><td>IMF</td><td>World Bank</td><td>26</td></tr>
<tr><td>IMF</td><td>Transparency International</td><td>23</td></tr>
<tr><td>IMF</td><td>DBpedia</td><td>25</td></tr>
<tr><td>IMF</td><td>BFS</td><td>24</td></tr>
<tr><td>IMF</td><td>FAO</td><td>23</td></tr>
<tr><td>IMF</td><td>ECB</td><td>26</td></tr>
<tr><td>UIS</td><td>World Bank</td><td>964</td></tr>
<tr><td>UIS</td><td>Transparency International</td><td>854</td></tr>
<tr><td>UIS</td><td>DBpedia</td><td>964</td></tr>
<tr><td>UIS</td><td>BFS</td><td>849</td></tr>
<tr><td>UIS</td><td>FAO</td><td>825</td></tr>
<tr><td>UIS</td><td>ECB</td><td>855</td></tr>
<tr><td>UIS</td><td>IMF</td><td>119</td></tr>
<tr><td>UIS</td><td>OECD</td><td>17337</td></tr>
<tr><td>UIS</td><td>FRB</td><td>800</td></tr>
<tr><td>UIS</td><td>Geonames</td><td>959</td></tr>
<tr><td>UIS</td><td>IATI</td><td>964</td></tr>
<tr><td>UIS</td><td>Humanitarian Response</td><td>835</td></tr>
<tr><td>UIS</td><td>Eurostat</td><td>964</td></tr>
<tr><td>FRB</td><td>DBpedia</td><td>280</td></tr>
<tr><td>FRB</td><td>World Bank</td><td>280</td></tr>
<tr><td>FRB</td><td>ECB</td><td>276</td></tr>
<tr><td>FRB</td><td>OECD</td><td>34</td></tr>
<tr><td>FRB</td><td>UIS</td><td>280</td></tr>
<tr><td>BIS</td><td>DBpedia</td><td>534</td></tr>
<tr><td>BIS</td><td>World Bank</td><td>487</td></tr>
<tr><td>BIS</td><td>ECB</td><td>574</td></tr>
<tr><td>BIS</td><td>Transparency International</td><td>406</td></tr>
<tr><td>BIS</td><td>UIS</td><td>2310</td></tr>
<tr><td>BIS</td><td>IMF</td><td>72</td></tr>
<tr><td>BIS</td><td>FRB</td><td>54</td></tr>
<tr><td>BIS</td><td>OECD</td><td>17</td></tr>
<tr><td>BIS</td><td>FAO</td><td>472</td></tr>
<tr><td>BIS</td><td>BFS</td><td>466</td></tr>
<tr><td>BIS</td><td>Geonames</td><td>515</td></tr>
<tr><td>BIS</td><td>Eurostat</td><td>489</td></tr>
<tr><td>BIS</td><td>NASA</td><td>46</td></tr>
<tr><td>ABS</td><td>DBpedia</td><td>99</td></tr>
<tr><td>ABS</td><td>World Bank</td><td>99</td></tr>
<tr><td>ABS</td><td>ECB</td><td>99</td></tr>
<tr><td>ABS</td><td>Transparency International</td><td>99</td></tr>
<tr><td>ABS</td><td>UIS</td><td>495</td></tr>
<tr><td>ABS</td><td>IMF</td><td>99</td></tr>
<tr><td>ABS</td><td>FAO</td><td>99</td></tr>
<tr><td>ABS</td><td>BFS</td><td>99</td></tr>
<tr><td>ABS</td><td>BIS</td><td>198</td></tr>
<tr><td>ABS</td><td>Geonames</td><td>99</td></tr>
<tr><td>ABS</td><td>Eurostat</td><td>99</td></tr>
</tbody>
</table>
<p>Figure [<a href="#linked-sdmx-concept-links">SDMX concept links</a>] gives an overview of the complete connectivity of a concept that's linked internally, externally, and with sdmx-codes where applicable, as well as the interlinking that was done to an external concept.</p>
<figure id="linked-sdmx-concept-links">
<figcaption>SDMX Concept links</figcaption>
<object type="image/svg+xml" data="linked-sdmx-concept-links.svg" width="557" height="175"></object>
</figure>
</div>
</section>
<section id="rdf-data-storage" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#rdf-data-storage]" property="schema:name">RDF Data Storage</h3>
<div about="[this:#rdf-data-storage]" property="schema:description">
<p><a href="http://jena.apache.org/">Apache Jena</a>’s <a href="http://incubator.apache.org/jena/documentation/tdb/">TDB</a> storage system is used to load the RDFized data using TDB’s incremental <code>tdbloader</code> utility. <code>tdbstats</code>, the tool for <a href="http://jena.apache.org/documentation/tdb/optimizer.html">TDB Optimizer</a> is executed after a complete load to internally update the count of resources in order for TDB to make the best decision to come up with future query results.</p>
<p>Individual datasets from each organization were transformed to N-Triples format before loading into the store. Each RDF Data Cube dataset was imported into its own <code>NAMED GRAPH</code> in the store. Given the significant load speed on an empty database, N-Triples were ordered from largest to smallest, and then loaded.</p>
</div>
</section>
</div>
</section>
<section id="publication" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#publication]" property="schema:name">Publication</h2>
<div about="[this:#publication]" property="schema:description">
<p>The publication steps are described in this section.</p>
<section id="dataset-discovery-and-statistics" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#dataset-discovery-and-statistics]" property="schema:name">Dataset Discovery and Statistics</h3>
<div about="[this:#dataset-discovery-and-statistics]" property="schema:description">
<p>As VoID file is generally intended to give an overview of the dataset metadata i.e., what it contains, ways to access it or query it, each dataspace contains <a href="http://www.w3.org/TR/void/">Vocabulary of Interlinked Datasets</a> (<dfn><abbr title="Vocabulary of Interlinked Datasets">VoID</abbr></dfn>) files accessible through their <code>.well-known/void</code>. Each <a href="http://oecd.270a.info/.well-known/void">OECD</a>, <a href="http://bfs.270a.info/.well-known/void">BFS</a>, <a href="http://fao.270a.info/.well-known/void">FAO</a>, <a href="http://ecb.270a.info/.well-known/void">ECB</a>, <a href="http://imf.270a.info/.well-known/void">IMF</a>, <a href="http://uis.270a.info/.well-known/void">UIS</a>, <a href="http://frb.270a.info/.well-known/void">FRB</a>, <a href="http://bis.270a.info/.well-known/void">BIS</a>, <a href="http://abs.270a.info/.well-known/void">ABS</a> VoID contains locations to RDF datadumps, named graphs that are used in the SPARQL endpoint, used vocabularies, size of the datasets, interlinks to external datasets, as well as the provenance data which was gathered through the retrieval and transformation process. The VoID files were generated automatically by first importing the LODStats information into respective <code>graph/void</code> into the store, and then a SPARQL <code>CONSTRUCT</code> query to include all triples as well as additional ones which could be actively created based on the available information in all graphs.</p>
<p>Dataset statistics are generated and are included in the VoID file using LODStats, <a href="http://svn.aksw.org/papers/2011/RDFStats/public.pdf"><cite>LODStats – An Extensible Framework for High-performance Dataset Analytics</cite></a> (Demter, J., 2012).</p>
</div>
</section>
<section id="user-interface" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#user-interface]" property="schema:name">User-interface</h3>
<div about="[this:#user-interface]" property="schema:description">
<p>The HTML pages are generated by the <a href="https://github.com/csarven/linked-data-pages">Linked Data Pages</a> framework, where <a href="http://code.google.com/p/moriarty/">Moriarty</a>, <a href="http://code.google.com/p/paget/">Paget</a>, and <a href="https://github.com/semsol/arc2">ARC2</a> does the heavy lifting for it. Given the lessons learned over the years about Linked Data publishing, there is a consideration to either take Linked Data Pages further (originally written in 2010), or to adapt one of the existing frameworks after careful analysis.</p>
</div>
</section>
<section id="sparql-endpoint" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#sparql-endpoint]" property="schema:name">SPARQL Endpoint</h3>
<div about="[this:#sparql-endpoint]" property="schema:description">
<p>Apache Jena <a href="http://incubator.apache.org/jena/documentation/serving_data/index.html">Fuseki</a> is used to run the SPARQL server for the three datasets. SPARQL Endpoints are publicly accessible and read only at their respective <code>/sparql</code> and <code>/query</code> locations for <a href="http://oecd.270a.info/sparql">OECD</a>, <a href="http://bfs.270a.info/sparql">BFS</a>, <a href="http://fao.270a.info/sparql">FAO</a>, <a href="http://ecb.270a.info/sparql">ECB</a>, <a href="http://imf.270a.info/sparql">IMF</a>, <a href="http://uis.270a.info/sparql">UIS</a>, <a href="http://frb.270a.info/sparql">FRB</a>, <a href="http://bis.270a.info/sparql">BIS</a>, <a href="http://abs.270a.info/sparql">ABS</a>. Currently, 12000MB of memory is allocated for the single Fuseki server running all datasets.</p>
</div>
</section>
<section id="data-dumps" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-dumps]" property="schema:name">Data Dumps</h3>
<div about="[this:#data-dumps]" property="schema:description">
<p>The data dumps for the datasets are available from their respective <code>/data/</code> directories: <a href="http://oecd.270a.info/data/">OECD</a>, <a href="http://bfs.270a.info/data/">BFS</a>, <a href="http://fao.270a.uisinfo/data/">FAO</a>, <a href="http://ecb.270a.info/data/">ECB</a>, <a href="http://imf.270a.info/data/">IMF</a>, <a href="http://frb.270a.info/data/">FRB</a>, <a href="http://uis.270a.info/data/">UIS</a>, <a href="http://bis.270a.info/data/">BIS</a>, <a href="http://abs.270a.info/data/">ABS</a>. Additionally, they are mentioned in the VoID files. The <a href="http://datahub.io/">the Data Hub</a> entries (<a href="#dataset-announcement">see below</a>) also contains links to the dumps.</p>
</div>
</section>
<section id="source-code" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#source-code]" property="schema:name">Source Code</h3>
<div about="[this:#source-code]" property="schema:description">
<p>The code for transformations is at <a href="https://github.com/csarven/linked-sdmx">csarven/linked-sdmx</a>, and for retrieval and data loading to RDF store for OECD is at <a href="https://github.com/csarven/oecd-linked-data">csarven/oecd-linked-data</a>, for BFS is at <a href="https://github.com/csarven/bfs-linked-data">csarven/bfs-linked-data</a>, for FAO is at <a href="https://github.com/csarven/fao-linked-data">csarven/fao-linked-data</a>, for ECB is at <a href="https://github.com/csarven/ecb-linked-data">csarven/ecb-linked-data</a>, for IMF is at <a href="https://github.com/csarven/imf-linked-data">csarven/imf-linked-data</a>, for UIS is at <a href="https://github.com/csarven/uis-linked-data">csarven/uis-linked-data</a>, for FRB is at <a href="https://github.com/csarven/frb-linked-data">csarven/frb-linked-data</a>, for ABS is at <a href="https://github.com/csarven/abs-linked-data">csarven/abs-linked-data</a>. All using the <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache License 2.0</a>.</p>
</div>
</section>
<section id="data-license" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#data-license]" property="schema:name">Data License</h3>
<div about="[this:#data-license]" property="schema:description">
<p>All published Linked Data adheres to original data publisher’s data license and terms of use. Additionally attributions are given on the websites. The <em>Linked Data</em> version of the data is licensed under <a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0 1.0 Universal (CC0 1.0) Public Domain Dedication</a>.</p>
</div>
</section>
<section id="dataset-announcement" about="[this:]" rel="schema:hasPart">
<h3 about="[this:#dataset-announcement]" property="schema:name">Announcing the Datasets</h3>
<div about="[this:#dataset-announcement]" property="schema:description">
<p>For other ways for these datasets to be discovered, they are announced at mailing lists, status update services, and at the Data Hub: OECD is at <a href="http://datahub.io/dataset/oecd-linked-data">oecd-linked-data</a>, BFS is at <a href="http://datahub.io/dataset/bfs-linked-data">bfs-linked-data</a>, FAO is at <a href="http://datahub.io/dataset/fao-linked-data">fao-linked-data</a>, ECB is at <a href="http://datahub.io/dataset/ecb-linked-data">ecb-linked-data</a>, IMF is at <a href="http://datahub.io/dataset/imf-linked-data">imf-linked-data</a>, UIS is at <a href="http://datahub.io/dataset/uis-linked-data">uis-linked-data</a>, FRB is at <a href="http://datahub.io/dataset/frb-linked-data">frb-linked-data</a>, BIS is at <a href="http://datahub.io/dataset/bis-linked-data">bis-linked-data</a>, ABS is at <a href="http://datahub.io/dataset/abs-linked-data">abs-linked-data</a>.</p>
</div>
</section>
</div>
</section>
<section id="conclusions" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#conclusions]" property="schema:name">Conclusions</h2>
<div about="[this:#conclusions]" property="schema:description" typeof="deo:Conclusion">
<p>With this work we provided an automated approach for transforming statistical SDMX-ML data to Linked Data in a single step. As a result, this effort helps to publish and consume large amounts of quality statistical Linked Data. Its goal is to shift focus from mundane development efforts to automating the generation of quality statistical data. Moreover, it facilitates to provide RDF serializations alongside the existing formats used by high profile statistical data owners. Our approach to employ XSLT transformations does not require changes to well established workflows at the statistical agencies.</p>
<p>One aspect of future work is to improve the SDMX-ML to RDF transformation quality and quantity. Regarding quality, we aim to test our transformation with further datasets to identify shortcomings and special cases being currently not yet covered by the implementation. Also, we plan the development of a coherent approach for (semi-)automatically interlinking different statistical dataspaces, which establishes links on all possible levels (e.g. classifications, observations). With regard to quantity, we plan to publish statistical dataspaces for Bank for International Settlements (BIS), World Bank and Eurostat based on SDMX-ML data.</p>
<p>The current transformation is mostly based on the generic SDMX format. Since some of the publishers make their data available in compact SDMX format, the transformation toolkit has to be extended. Alternatively, the compact format can be transformed to the generic format first (for which tools exist) and then Linked SDMX transformations can be applied. Ultimately, we hope that Linked Data publishing will become a direct part of the original data owners workflows and data publishing efforts. Therefore, further collaboration on this will expedite the provision of uniform access to statistical Linked Data.</p>
</div>
</section>
<section id="acknowledgements" about="[this:]" rel="schema:hasPart">
<h2 about="[this:#acknowledgements]" property="schema:name">Acknowledgements</h2>
<div about="[this:#acknowledgements]" property="schema:description">
<p>We thank <a href="http://richard.cyganiak.de/">Richard Cyganiak</a> for his ongoing support, as well as graciously offering to host the dataspaces on a server at <a href="http://deri.ie/">Digital Enterprise Research Institute</a>. We also acknowledge the support of <a href="http://bfh.ch/">Bern University of Applied Sciences</a> for partially funding the transformation effort for the pilot Swiss Statistics Linked Data project and thank <a href="http://www.bfs.admin.ch/">Swiss Federal Statistical Office</a> for the excellent collaboration from the very beginning.</p>
</div>
</section>
</div>
</article>
</body>
</html>