-
Notifications
You must be signed in to change notification settings - Fork 7
FAqT Brick
This page describes how to invoke a new FAqT Brick epoch, i.e. run an analyses by asking a bunch of evaluation services what they think about some datasets. The result of running an analysis ends up in a triple store and is thus available for query from the FAqT Brick Explorer.
A diagram illustrating the directory conventions, the data flow, and the data element overlaps is available as OmniGraffle or PDF.
A FAqT brick contains the materials created when using FAqT Services to analyze a set of datasets. A FAqT brick starts as a directory structure whose contents are then loaded into a SPARQL endpoint. Each time an analysis is performed, a new slice is added to the brick for the current time frame, or epoch. The three dimensions of a FAqT brick are dataset, evaluation service, and epoch, as illustrated below.
You can choose any location for a FAqT brick directory, and you can have many FAqT bricks for different purposes. The name of a FAqT brick's root directory must be named faqt-brick
. The core services follow directory conventions rooted on this name. For example, we can create a FAqT brick directory with the following commands:
mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
df-epoch.sh --help
df-epoch.sh
is available after Installing DataFAQs and prints usage similar to the following:
usage: df-epoch.sh [-n] [--force-epoch | --reuse-epoch <existing-epoch>]
[--faqts <rdf-file> <service-uri>]
[--datasets <rdf-file> <service-uri>]
-n: perform dry run (not implemented yet).
--faqts: override the service-uri and its input (to evaluate with a different set of FAqT evaluation
--datasets: override the service-uri and its input (to evaluate a different set of datasets).
--force-epoch: force new epoch; ignore 'once per day' convention.
--reuse-epoch: reapply FAqT evaluation services to datasets in existing epoch. Takes precedence over --force-epoch.
Running df-epoch.sh
will create a FAqT brick slice using a default configuration. Its output reports:
- the name of the epoch it is going to create (e.g. 2012-01-13), then
- the [DataFAQs Core Service](DataFAQs Core Services) (e.g. via-sparql-query) that it will use to obtain a list of FAqT services to apply, then
- the DataFAQs Core Service (e.g. by-ckan-group) that it will use to obtain a list of datasets to evaluate, and finally
- the DataFAQs Core Service (e.g. with-preferred-uri-and-ckan-meta-void) to use to obtain descriptions for each dataset.
mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
df-epoch.sh
[INFO] Using __PIVOT_epoch/2012-01-13
[INFO] Requesting FAqT services from
http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Requesting datasets from
http://sparql.tw.rpi.edu/services/datafaqs/core/select-datasets/by-ckan-group
[INFO] Requesting dataset descriptions from
http://sparql.tw.rpi.edu/services/datafaqs/core/augment-datasets/with-preferred-uri-and-ckan-meta-void
After df-epoch.sh
lists the FAqT Services and dataset URIs, it gathers RDF descriptions of the datasets. It shows the URIs that it requests to accumulate descriptions about each dataset, along with the first line of each response.
[INFO] 5 FAqT services will evaluate 3 datasets.
[INFO] FAqT Services:
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] CKAN Datasets:
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] Gathering information about FAqT evaluation services.
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter (1/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop (2/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples (3/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties (4/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag (5/5)
[INFO] Gathering information about CKAN Datasets, for input to FAqT evaluation services.
thedatahub.org/dataset/congresspeople (1/3)
<?xml version="1.0" encoding="utf-8"?>
1: http://logd.tw.rpi.edu/source/contactingthecongress/dataset/directory-for-the-112th-congress
<?xml version="1.0" encoding="utf-8" ?>
thedatahub.org/dataset/farmers-markets-geographic-data-united-states (2/3)
<?xml version="1.0" encoding="utf-8"?>
1: http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29
<?xml version="1.0" encoding="utf-8" ?>
2: http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
thedatahub.org/dataset/white-house-visitor-access-records (3/3)
<?xml version="1.0" encoding="utf-8"?>
The accumulated dataset description responses are then submitted to each FAqT service, so that they have some basic information to start with when performing their evaluation. The RDF that each FAqT service returns is stored, and its size and format is reported by df-epoch.sh
.
[INFO] Submitting CKAN dataset information to FAqT evaluation services.
[INFO] dataset 1/3, FAqT 1/5 (1/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 32K of results
[INFO] dataset 1/3, FAqT 2/5 (2/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 32K of results
[INFO] dataset 1/3, FAqT 3/5 (3/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of text/turtle results
[INFO] dataset 1/3, FAqT 4/5 (4/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of text/turtle results
[INFO] dataset 1/3, FAqT 5/5 (5/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of text/turtle results
[INFO] dataset 2/3, FAqT 1/5 (6/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 2/5 (7/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 3/5 (8/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 4/5 (9/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of results
[INFO] dataset 2/3, FAqT 5/5 (10/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 4.0K of results
[INFO] dataset 3/3, FAqT 1/5 (11/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 2/5 (12/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 3/5 (13/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 4/5 (14/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 15M of results
[INFO] dataset 3/3, FAqT 5/5 (15/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 15M of results
The following illustrates the process of
- (1) obtaining a dataset list from CKAN,
- (2) obtaining a list of FAqT evaluation services from the SADI registry,
- (3) obtaining descriptions of the dataset via URI dereference and VoID files,
- (4) obtaining (via GET) a description of the FAqT evaluation service, and
- (5) POSTing the dataset description to each FAqT evaluation service to obtain an evaluation described in RDF.
This process is done for each dataset and FAqT evaluation service to create a single slice of the FAqT brick.
When their URI is requested, FAqT evaluation services provide RDF descriptions of themselves. These are stored in a file faqt-service.ttl
that is nested by both the faqt and the epoch. For example, the RDF that was returned by requesting the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
during epoch 2012-01-13 is stored at:
faqt-brick/
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/
2012-01-19/faqt-service.ttl
The accumulated descriptions of the CKAN datasets are stored in a file post.ttl
that is nested by both the epoch and the dataset. For example, the RDF that is POSTed to all FAqT services during epoch 2012-01-13 to evaluate dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
is stored at:
faqt-brick/
__PIVOT_epoch/2012-01-13/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl
The contents of post.ttl
is the union of the files:
faqt-brick/
__PIVOT_epoch/2012-01-13/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/part-*.{ttl,rdf,nt}
part-0.{ttl,rdf,nt}
is the result of dereferencing the URI, while remaining part-
files come from other resources such as the VoID file or dereferencing the dataset's con:preferredURIs (as provided by an augment-dataset service; see DataFAQs Core Services).
When the RDF description of a dataset is POSTed to a FAqT evaluation service, the service returns an RDF evaluation of the dataset. The response from the FAqT evaluation service is stored in a file evaluation.ttl
that is nested by the faqt, dataset, and epoch. For example, the RDF returned by the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
during epoch 2012-01-13 when evaluating dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
is stored at:
faqt-brick/
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/
thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/
2012-01-13/evaluation.ttl
df-epoch.sh
assumes that you wouldn't want more than one epoch per day. If that's not the case, go ahead and --force-epoch
:
bash-3.2$ df-epoch.sh
An evaluation epoch has already been initiated today (2012-01-13).
Start one tomorrow, use --force-epoch to create another one today, or use --help.
bash-3.2$ df-epoch.sh --force-epoch
[INFO] Using __PIVOT_epoch/2012-01-13_17_49_46
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
...
If you want to get rid of an epoch, first remove the epoch-specific materials from datafaqs.localhost/epochs
and use df-purge-orphaned-epochs.sh
to take care of the rest:
bash-3.2$ rm -rf __PIVOT_epoch/2012-01-13_17_49_46/
bash-3.2$ df-purge-orphaned-epochs.sh
usage: df-purge-orphaned-epochs.sh <-n | -w>
-n: perform dry run; do not modify anything.
-w: remove all epochs that are not listed in __PIVOT_epoch/
bash-3.2$ df-purge-orphaned-epochs.sh -w
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
...
df-purge-orphaned-epochs.sh
walks the rest of the FAqT brick and removes all materials created during epochs that are not listed in __PIVOT_epoch/
. The example usage above removes the forced epoch that was created in the --force-epoch
example earlier.
The CKAN dataset descriptions that were accumulated in an existing epoch can be reused to reapply the FAqT service evaluations within the same epoch. Because this replaces the results within the designated epoch, this should only be done for the latest epoch. The following usage shows that of the two epochs in the FAqT brick, the dataset listing and descriptions from the later one are reused.
ls __PIVOT_epoch/
2012-01-12 2012-01-13
df-epoch.sh --reuse-epoch datafaqs:latest
[INFO] Using __PIVOT_epoch/2012-01-13 (datafaqs:latest)
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Reusing dataset listing and descriptions from __PIVOT_epoch/2012-01-13
For a given epoch, the following files contain graphs that are interesting for analysis. They need to be named and loaded into a triple store so that they can be available for SPARQL query.
(todo: describe rdf config with provo describing core services e.g.)
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/>;
The following files are produced by the DataFAQs Core Services.
__PIVOT_epoch/2012-01-19/faqt-services.ttl # the evaluation services that were used.
datasets.ttl # the datasets that were evaluated.
dataset-references.ttl # rdfs:seeAlso to more descriptions
# __PIVOT_epoch/2012-01-19/faqt-services.ttl.sd_name contains string:
# http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services
# __PIVOT_epoch/2012-01-19/faqt-services.ttl.meta contains following graph.
# (load all sd metadata into GRAPH sd:NamedGraph { })
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>;
sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>
a void:Dataset, sd:Graph;
void:triples 6;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/faqt-services.ttl>;
.
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>;
sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>
a void:Dataset, sd:Graph;
void:triples 7;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/datasets.ttl>;
.
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>;
sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>
a void:Dataset, sd:Graph;
void:triples 6;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/dataset-references.ttl>;
.
The following files contain the FAqT evaluation services' descriptions of themselves:
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl
# sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl.sd_name
# http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1
# sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl.meta
# (load all sd metadata into GRAPH sd:NamedGraph { })
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph;
void:triples 17;
prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>
a datafaqs:FAqTService;
prov:specializationOf <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.
The following files contain the dataset descriptions (including the additional references):
__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/post.ttl
__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl
__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/white-house-visitor-access-records/post.ttl
# load all sd metadata into GRAPH sd:NamedGraph { }
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph;
void:triples 14861;
prov:wasDerivedFrom
<http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>,
<http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29>,
<http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>
a void:Dataset;
prov:specializationOf <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.
The following files contain the evaluation of each dataset from each evaluation service:
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
# load all sd metadata into GRAPH sd:NamedGraph { }
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix formats: <http://www.w3.org/ns/formats/media_type> .
@prefix prov: <http://www.w3.org/ns/prov-o/> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
[]
a sd:NamedGraph;
sd:name <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>;
sd:graph [
a prov:Account, sd:Graph, void:Graph, datafaqs:Evaluation;
void:triples 14;
prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
foaf:primaryTopic <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>;
void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>;
]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>
a void:Dataset;
prov:specializationOf <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>;
dcterms:date "2012-01-19"^^xsd:date;
.
<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>
formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle>
rdfs:label "Turtle";
dcterms:identifier "text/turtle";
.
(note: see df-find for a more complete and encapsulated way to find invalid results, etc.)
The following command will list the files returned that were not valid RDF.
find __PIVOT_epoch/2013-04-14/__PIVOT_dataset/ -name "augmentation-*" -o -name "reference-*" | xargs valid-rdf.sh -v | grep "^no"
e.g., when reference-1
contains HTML, we can see where it came from from the corresponding get-reference-1.sh
:
bash-3.2$ head -5 __PIVOT_epoch/2013-04-14/__PIVOT_dataset//thedatahub.org/dataset/webnmasunotraveler/reference-1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /metadata</title>
</head>
bash-3.2$ cat __PIVOT_epoch/2013-04-14/__PIVOT_dataset//thedatahub.org/dataset/webnmasunotraveler/get-reference-1.sh
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://webenemasuno.linkeddata.es/metadata > reference-1
All of the files used to find out about a dataset:
ls __PIVOT_epoch/2014-07-04/__PIVOT_dataset/thedatahub.org/dataset/dbpedia
augmentation-1.rdf
dataset.ttl
get-augmentation-1.sh
get-reference-0.sh
get-reference-1.sh
get-references-1.sh
post.meta.ttl
post.nt
post.nt.rdf
post.nt.sd_name
post.nt.ttl
reference-0.rdf
reference-1.rdf
references-1.ttl
references.csv
for get in __PIVOT_epoch/2014-07-04/__PIVOT_dataset/thedatahub.org/dataset/dbpedia/get*.sh; do
echo "============ `basename $get` ============";
cat $get; echo '============================================';
echo; echo;
done
returns something like:
============ get-augmentation-1.sh ============
curl -s -H 'Content-Type: application/rdf+xml' -d @post.nt.rdf http://aquarius.tw.rpi.edu/projects/datafaqstest/sadi-services/lift-ckan > augmentation-1
============================================
============ get-reference-0.sh ============
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://thedatahub.org/dataset/dbpedia > reference-0
============================================
============ get-reference-1.sh ============
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://dbpedia.org/void/Dataset > reference-1
============================================
============ get-references-1.sh ============
curl -s -H 'Content-Type: text/turtle' -d @dataset.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/augment-datasets/with-preferred-uri-and-ckan-meta-void > references-1
============================================
- DataFAQs Core Services which provide the configuration for a FAqT Brick.
- FAqT Brick Explorer provides a web application that lets users see the evaluation results in a FAqT Brick.
- Situating a FAqT Brick into csv2rdf4lod automation