-
Notifications
You must be signed in to change notification settings - Fork 7
FAqT Brick
A FAqT brick contains the materials created when using FAqT Services to analyze a set of datasets. A FAqT brick starts as a directory structure whose contents are then loaded into a SPARQL endpoint. Each time an analysis is performed, a new slice is added to the brick for the current time frame, or epoch. The three dimensions of a FAqT brick are dataset, evaluation service, and epoch, as illustrated below.
You can choose any location for a FAqT brick directory, and you can have many FAqT bricks for different purposes. The name of a FAqT brick's root directory must be named faqt-brick
. The core services follow directory conventions rooted on this name. For example, we can create a FAqT brick directory with the following commands:
mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
datafaqs-evaluate.sh --help
datafaqs-evaluate.sh
is available after Installing DataFAQs and prints usage similar to the following:
usage: datafaqs-evaluate.sh [-n] [--force-epoch | --reuse-epoch <existing-epoch>]
[--faqts <rdf-file> <service-uri>]
[--datasets <rdf-file> <service-uri>]
-n: perform dry run (not implemented yet).
--faqts: override the service-uri and its input (to evaluate with a different set of FAqT evaluation
--datasets: override the service-uri and its input (to evaluate a different set of datasets).
--force-epoch: force new epoch; ignore 'once per day' convention.
--reuse-epoch: reapply FAqT evaluation services to datasets in existing epoch. Takes precedence over --force-epoch.
Running datafaqs-evaluate.sh
will create a FAqT brick slice using a default configuration. Its output reports:
- the name of the epoch it is going to create (e.g. 2012-01-13), then
- the [DataFAQs Core Service](DataFAQs Core Services) (e.g. via-sparql-query) that it will use to obtain a list of FAqT services to apply, then
- the DataFAQs Core Service (e.g. by-ckan-group) that it will use to obtain a list of datasets to evaluate, and finally
- the DataFAQs Core Service (e.g. with-preferred-uri-and-ckan-meta-void) to use to obtain descriptions for each dataset.
mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
datafaqs-evaluate.sh
[INFO] Using datafaqs.localhost/epochs/2012-01-13
[INFO] Requesting FAqT services from
http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Requesting datasets from
http://sparql.tw.rpi.edu/services/datafaqs/core/select-datasets/by-ckan-group
[INFO] Requesting dataset descriptions from
http://sparql.tw.rpi.edu/services/datafaqs/core/augment-datasets/with-preferred-uri-and-ckan-meta-void
After datafaqs-evaluate.sh
lists the FAqT Services and dataset URIs, it gathers RDF descriptions of the datasets. It shows the URIs that it requests to accumulate descriptions about each dataset, along with the first line of each response.
[INFO] 5 FAqT services will evaluate 3 datasets.
[INFO] FAqT Services:
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] CKAN Datasets:
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] Gathering information about CKAN Datasets, for input to FAqT evaluation services.
thedatahub.org/dataset/congresspeople (1/3)
<?xml version="1.0" encoding="utf-8"?>
1: http://logd.tw.rpi.edu/source/contactingthecongress/dataset/directory-for-the-112th-congress
<?xml version="1.0" encoding="utf-8" ?>
thedatahub.org/dataset/farmers-markets-geographic-data-united-states (2/3)
<?xml version="1.0" encoding="utf-8"?>
1: http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29
<?xml version="1.0" encoding="utf-8" ?>
2: http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
thedatahub.org/dataset/white-house-visitor-access-records (3/3)
<?xml version="1.0" encoding="utf-8"?>
The accumulated dataset description responses are then submitted to each FAqT service, so that they have some basic information to start with when performing their evaluation. The RDF that each FAqT service returns is stored, and its size and format is reported by datafaqs-evaluate.sh
.
[INFO] Submitting CKAN dataset information to FAqT evaluation services.
[INFO] dataset 1/3, FAqT 1/5 (1/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 32K of results
[INFO] dataset 1/3, FAqT 2/5 (2/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 32K of results
[INFO] dataset 1/3, FAqT 3/5 (3/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of text/turtle results
[INFO] dataset 1/3, FAqT 4/5 (4/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of text/turtle results
[INFO] dataset 1/3, FAqT 5/5 (5/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of text/turtle results
[INFO] dataset 2/3, FAqT 1/5 (6/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 2/5 (7/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 3/5 (8/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 4/5 (9/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 5/5 (10/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 4.0K of results
[INFO] dataset 3/3, FAqT 1/5 (11/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 2/5 (12/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 3/5 (13/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 4/5 (14/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 5/5 (15/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 15M of results
The following illustrates the process of
- (1) obtaining a dataset list from CKAN,
- (2) obtaining a list of FAqT evaluation services from the SADI registry,
- (3) obtaining descriptions of the dataset via URI dereference and VoID files,
- (4) obtaining (via GET) a description of the FAqT evaluation service, and
- (5) POSTing the dataset description to each FAqT evaluation service to obtain an evaluation described in RDF.
This process is done for each dataset and FAqT evaluation service to create a single slice of the FAqT brick.
When their URI is requested, FAqT evaluation services provide RDF descriptions of themselves. These are stored in a file faqt-service.ttl
that is nested by both the faqt and the epoch. For example, the RDF that was returned by requesting the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
during epoch 2012-01-13 is stored at:
faqt-brick/
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/
2012-01-19/faqt-service.ttl
The accumulated descriptions of the CKAN datasets are stored in a file post.ttl
that is nested by both the epoch and the dataset. For example, the RDF that is POSTed to all FAqT services during epoch 2012-01-13 to evaluate dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
is stored at:
faqt-brick/
datafaqs.localhost/epochs/2012-01-13/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl
The contents of post.ttl
is the union of the files:
faqt-brick/
datafaqs.localhost/epochs/2012-01-13/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/part-*.{ttl,rdf,nt}
part-0.{ttl,rdf,nt}
is the result of dereferencing the URI, while remaining part-
files come from other resources such as the VoID file or dereferencing the dataset's con:preferredURIs (as provided by an augment-dataset service; see DataFAQs Core Services).
When the RDF description of a dataset it POSTed to a FAqT evaluation service, the service returns an RDF evaluation of the dataset. The response from the FAqT evaluation service is stored in a file evaluation.ttl
that is nested by the faqt, dataset, and epoch. For example, the RDF returned by the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
during epoch 2012-01-13 when evaluating dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
is stored at:
faqt-brick/
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/
2012-01-13/evaluation.ttl
datafaqs-evaluate.sh
assumes that you wouldn't want more than one epoch per day. If that's not the case, go ahead and --force-epoch
:
bash-3.2$ datafaqs-evaluate.sh
An evaluation epoch has already been initiated today (2012-01-13).
Start one tomorrow, use --force-epoch to create another one today, or use --help.
bash-3.2$ datafaqs-evaluate.sh --force-epoch
[INFO] Using datafaqs.localhost/epochs/2012-01-13_17_49_46
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
...
If you want to get rid of an epoch, first remove the epoch-specific materials from datafaqs.localhost/epochs
and use datafaqs-purge-unlisted-epochs.sh
to take care of the rest:
bash-3.2$ rm -rf datafaqs.localhost/epochs/2012-01-13_17_49_46/
bash-3.2$ datafaqs-purge-unlisted-epochs.sh
usage: datafaqs-purge-unlisted-epochs.sh <-n | -w>
-n: perform dry run; do not modify anything.
-w: remove all epochs that are not listed in datafaqs.localhost/epochs/
bash-3.2$ datafaqs-purge-unlisted-epochs.sh -w
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
...
datafaqs-purge-unlisted-epochs.sh
walks the rest of the FAqT brick and removes all materials created during epochs that are not listed in datafaqs.localhost/epochs
. The example usage above removes the forced epoch that was created in the --force-epoch
example earlier.
The CKAN dataset descriptions that were accumulated in an existing epoch can be reused to reapply the FAqT service evaluations within the same epoch. Because this replaces the results within the designated epoch, this should only be done for the latest epoch. The following usage shows that of the two epochs in the FAqT brick, the dataset listing and descriptions from the later one are reused.
ls datafaqs.localhost/epochs/
2012-01-12 2012-01-13
datafaqs-evaluate.sh --reuse-epoch datafaqs:latest
[INFO] Using datafaqs.localhost/epochs/2012-01-13 (datafaqs:latest)
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Reusing dataset listing and descriptions from datafaqs.localhost/epochs/2012-01-13
For a given epoch, the following files contain graphs that are interesting for analysis. They need to be named and loaded into a triple store so that they can be available for SPARQL query.
(todo: the rdf config with provo describing core services)
datafaqs.localhost/epochs/2012-01-19/faqt-services.ttl # the evaluation services that were used.
datasets.ttl # the datasets that were evaluated.
dataset-references.ttl # rdfs:seeAlso to more descriptions
The following files contain the FAqT evaluation services' descriptions of themselves:
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph;
void:triples 17;
prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>
a datafaqs:FAqTService;
prov:specializationOf <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.
The following files contain the dataset descriptions (including the additional references):
datafaqs.localhost/epochs/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/post.ttl
datafaqs.localhost/epochs/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl
datafaqs.localhost/epochs/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/white-house-visitor-access-records/post.ttl
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph;
void:triples 14861;
prov:wasDerivedFrom
<http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29>,
<http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/datafaqs.localhost/epochs/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>
a void:Dataset;
prov:specializationOf <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/datafaqs.localhost/epochs/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.
The following files contain the evaluation of each dataset from each evaluation service:
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix formats: <http://www.w3.org/ns/formats/media_type> .
@prefix prov: <http://www.w3.org/ns/prov-o/> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-09/faqt/1/dataset/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph, datafaqs:Evaluation;
void:triples 14;
prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-09/faqt/1/dataset/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-09/faqt/1/dataset/1>
a void:Dataset;
prov:specializationOf <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-09/faqt/1/dataset/1>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.