-
Notifications
You must be signed in to change notification settings - Fork 7
Situating a FAqT Brick into csv2rdf4lod automation
FAqT Brick talks about how to specify and execute an analysis with:
- A set of evaluation services
- A set of datasets to evaluate
- Performed at different epochs over time
Conceptually, this is exactly what we need to achieve our Linked Open metaData goals. But practically, it has been difficult to work with, share, and replicate.
In contrast, the "source-dataset-version" organization fostered by csv2rdf4lod-automation provides approachable, sharable, and replicable structure.
So, how can we combine the best of both worlds? That's what we try to tackle here.
The organization of a FAqT Brick was designed independently of csv2rdf4lod-automation, but the organization "SDV organization" principles that csv2rdf4lod-automation fosters is compelling within practical applications, so this page explores how a FAqT Brick can live within a data conversion root. As a concrete example, we'll figure out how to use the https://github.com/timrdf/lodcloud project to reproduce http://www.licensius.com/blog/lodlicenses.
First, we need to review the organizing schemes that csv2rdf4lod and DataFAQs use.
csv2rdf4lod organizes datasets by forming a hierarchy out of the following aspects:
- source
- dataset
- version
Using these aspects, we can create the URIs:
- http://datafaqs.tw.rpi.edu/source/epa-gov (a foaf:Organization)
- http://datafaqs.tw.rpi.edu/source/epa-gov/dataset/air-quality-system (an abstract dataset; union of versions)
- http://datafaqs.tw.rpi.edu/source/epa-gov/dataset/air-quality-system/version/2013-Jan-01 (a concrete dataset of triples created from one retrieval of EPA's data files)
A FAqT Brick in DataFAQs also uses three aspects, but they are different. They are also not strictly hierarchical like csv2rdf4lod is.
- epoch - dataset
- faqt - epoch
- faqt - dataset - epoch
Using df: and cr: to distinguish terminology scope, the df:epoch aspect is analogous to cr:version, since each time the FAqT Brick is run we have a new subset of data.
df:dataset is NOT like cr:dataset, since df:dataset is a multi-element dimension in DataFAQS while cr:dataset is just the name of the bucket of data that is being gathered (this is the distinction between metadata and data; DataFAQs does the former and csv2rdf4lod does the latter).
df:faqt does not have an analog in csv2rdf4lod. Like df:dataset, it is the multi-element dimension of the evaluation service that provides metadata about each of the elements in the df:dataset dimension.
cr:dataset is analogous to the fixed specification of df:dataset and df:faqt.