Skip to content

Situating a FAqT Brick into csv2rdf4lod automation

Tim L edited this page Jun 3, 2013 · 39 revisions

What is first

FAqT Brick talks about how to specify and execute an analysis with:

  • A set of evaluation services
  • A set of datasets to evaluate
  • Performed at different epochs over time

Conceptually, this is exactly what we need to achieve our Linked Open metaData goals. But practically, it has been difficult to work with, share, and replicate.

In contrast, the "source-dataset-version" organization fostered by csv2rdf4lod-automation provides approachable, sharable, and replicable structure.

So, how can we combine the best of both worlds? That's what we try to tackle here.

What we will cover

The organization of a FAqT Brick was designed independently of csv2rdf4lod-automation, but the organization "SDV organization" principles that csv2rdf4lod-automation fosters is compelling within practical applications, so this page explores how a FAqT Brick can live within a data conversion root. As a concrete example, we'll figure out how to use the https://github.com/timrdf/lodcloud project to reproduce http://www.licensius.com/blog/lodlicenses.

Let's get to it

Comparing organizational schemes

First, we need to review the organizing schemes that csv2rdf4lod and DataFAQs use.

csv2rdf4lod organizes datasets by forming a hierarchy out of the following aspects:

  • source
  • dataset
  • version

Using these aspects, we can create the URIs:

A FAqT Brick in DataFAQs also uses three aspects, but they are different. They are also not strictly hierarchical like csv2rdf4lod is.

  • epoch - dataset
  • faqt - epoch
  • faqt - dataset - epoch

Using df: and cr: to distinguish terminology scope, the df:epoch aspect is analogous to cr:version, since each time the FAqT Brick is run we have a new subset of data.

df:dataset is NOT like cr:dataset, since df:dataset is a multi-element dimension in DataFAQS while cr:dataset is just the name of the bucket of data that is being gathered (this is the distinction between metadata and data; DataFAQs does the former and csv2rdf4lod does the latter).

df:faqt does not have an analog in csv2rdf4lod. Like df:dataset, it is the multi-element dimension of the evaluation service that provides metadata about each of the elements in the df:dataset dimension.

cr:dataset is analogous to the fixed specification of df:dataset and df:faqt.

Setting it up - naming

csv2rdf4lod requires the following aspects, in that order:

Setting it up - epoch.ttl

Automated creation of a new Versioned Dataset provides some conventions for where to situate triggers that csv2rdf4lod-automation can recognize to automate the reconstruction of a dataset. In DataFAQs, an "epoch.ttl" file sits at the root to specify what evaluations should be performed. This aligns with choosing "faqt-brick" as the version identifier above, resulting in its placement at https://github.com/timrdf/lodcloud/blob/master/data/source/us/how-o-is-lod/version/faqt-brick/epoch.ttl.

Clone this wiki locally