-
Notifications
You must be signed in to change notification settings - Fork 7
FAqT Service
- Getting started
- CKAN, if you're writing a service that uses their API.
- Installing DataFAQs, to extend the faqt service (python) superclass.
- DataFAQs Core Services for selecting datasets and FAqT Services, and adding descriptions to datasets.
This page will walk you through the steps to create a new FAqT evaluation service. By creating and deploying an evaluation service, others will be able to ask what you think about their dataset by calling your service.
A FAqT Service is a [SADI](SADI Semantic Web Services framework) service that accepts any dataset URI and returns an RDF-encoded evaluation using the FAqT Vocabulary. If a FAqT Service is invoked during an evaluation epoch, it becomes part of the FAqT Brick that accumulates evaluation results and can be browsed using the FAqT Brick Explorer.
(This is for python, but we switched to Java because python kept falling over on unicode issues).
-
- First, git clone your fork of git://github.com/timrdf/DataFAQs.git, which creates a directory
DataFAQs
on your local system.
- First, git clone your fork of git://github.com/timrdf/DataFAQs.git, which creates a directory
-
- Decide the local name and relative path of the service that you want to create.
- Choosing our new service's relative path keeps it organized among the other services that we have created.
- The path that we choose organizes the service's source code in our code repository, as well as when it is deployed on a server.
- e.g.
services/sadi/faqt/sparql-service-description
is the relative path for named-graphs.py in this code repository. Similarly,services/sadi/faqt/sparql-service-description
is the (same) relative path for named-graphs, which is the deployment location of the code above, relative to this server. - Make the directory for the relative path. For example, if the service's relative path is
services/sadi/faqt/sparql-service-description
,mkdir -p services/sadi/faqt/sparql-service-description
from withinDataFAQs/
.
-
- Copy the template.
-
cp services/sadi/faqt-template.py <relative-path>/<local-name>
, e.g.cp services/sadi/faqt-template.py services/sadi/faqt/sparql-service-description/named-graphs.py
-
- Edit your copy of the template to make it your own.
-
cd <relative-path>
e.g.cd services/sadi/faqt/sparql-service-description/
-
vi <local-name>.py
e.g.vi named-graphs.py
- 3.A) Replace the value of
servicePath = 'services/sadi'
(usepwd | sed 's/^.*services/services/'
). - 3.B) Replace
TEMPLATE-CLASS-NAME
with a name for the python class. - 3.C) Replace
TEMPLATE-NAME
with a name for the service (will become part of its external URI); use local-name that you chose in Step 1. - 3.D) Provide a description in the attribute
serviceDescriptionText
. - 3.E) [optional] Provide a comment in the attribute
comment
. - 3.F) Replace the value of
result.protegedc_creator = ''
with your email address. - 3.G) Replace the value of
dev_port = 9106
with a port reserved in this list (add a new entry for your service).
-
- Implement the
process(self, input, output)
method.
- Set the return values of
getInputClass
andgetOutputClass
to characterize your SADI service. - Add any new namespace prefixes that you want to use (e.g.
ns.register(sd='http://www.w3.org/ns/sparql-service-description#')
) - Evaluate the dataset URI
input.subject
indef process(self, input, output):
and say what you think about it by describingoutput
. (For the SuRF and rdflib concepts, see SADI Semantic Web Services framework) - Use [Beautiful Soup](FAqT Service using Beautiful Soup) or [Ripple](FAqT Service using Ripple)
- Use SuRF to execute SPARQL queries against the POSTed RDF graph similar to how add-metadata.py does it.
- Implement the
-
- Test your service.
- Create sample inputs in
<TEMPLATE-NAME>-materials/sample-inputs/
(e.g. mondeca.ttl) - Temporarily deploy the service on localhost (e.g.
python named-graphs.py
) - Invoke the service
- Modify the example call that the service offers:
curl -H "Content-Type: text/turtle" -d @my.ttl http://localhost:9106/named-graphs
- Modify the example call that the service offers:
Add the following to __main__
, like in add-metadata.py.
reader= open(sys.argv[1],"r")
mimeType = "application/rdf+xml"
if len(sys.argv) > 2:
mimeType = sys.argv[2]
if len(sys.argv) > 3:
writer = open(sys.argv[3],"w")
graph = resource.processGraph(reader,mimeType)
if len(sys.argv) > 3:
writer.write(resource.serialize(graph,mimeType))
else:
print resource.serialize(graph,mimeType)
In this section, we'll walk through a second example. The FAqT service that we create here will reproduce some of the analysis that LODStats does. On 4 Feb 2012, they report that 59 datasets were accessible via SPARQL endpoints and 142 datasets had SPARQL endpoint errors.
We'll pick one successful dataset and one unsuccessful dataset from their lists and try to reproduce their results:
- http://thedatahub.org/dataset/fu-berlin-stitch reports an "successful" endpoint at http://www4.wiwiss.fu-berlin.de/stitch/sparql
- http://thedatahub.org/dataset/2000-us-census-rdf reports an "unsuccessful" endpoint at http://www.rdfabout.com/sparql
First, we'll choose the relative URI of our new FAqT evaluation service:
services/sadi/faqt/access/in-sparql-endpoint
We'll make a new directory in our github repository (you could do yours in your fork of this repository if you'd like):
/opt/DataFAQs$ ls
bin
doc
lib
ontology
queries
readme.md
services
ui
/opt/DataFAQs$ mkdir services/sadi/faqt/access/
/opt/DataFAQs$ cd services/sadi/faqt/access/
Then, we'll copy the template and change the names and development port:
/opt/DataFAQs/services/sadi/faqt/access/$ cp ../../faqt-template.py in-sparql-endpoint.py
/opt/DataFAQs/services/sadi/faqt/access/$ vi in-sparql-endpoint.py
:% s/TEMPLATE-NAME/in-sparql-endpoint/gc
:% s/TEMPLATE-CLASS-NAME/InSPARQLEndpoint/gc
:% s/9090/9109/gc
serviceDescriptionText = 'Queries into the void:sparqlEndpoint of the dcat:Dataset and reports if the endpoint is there.'
comment = 'Initial purpose was to evaluate LOD datasets.'
In a second terminal, we can temporarily deploy the service on localhost (ignore the DeprecationWarning
for the md5 and sha modules):
$ cd /opt/DataFAQs/github/DataFAQs/services/sadi/faqt/access
$ python in-sparql-endpoint.py
...
in-sparql-endpoint running on port 9109. Invoke it with:
curl -H "Content-Type: text/turtle" -d @my.ttl http://localhost:9109/in-sparql-endpoint
So, our service is up and ready for someone to ask it what it thinks about a dataset. We can make sure by opening a third terminal and asking the service to describe itself:
$ cd /opt/DataFAQs/github/DataFAQs/services/sadi/faqt/access
$ curl http://localhost:9109/in-sparql-endpoint
@prefix mygrid: <http://www.mygrid.org.uk/mygrid-moby-service#> .
...
<> a <http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription>;
rdfs:label "in-sparql-endpoint";
...
<#input> a <http://www.mygrid.org.uk/mygrid-moby-service#parameter>;
mygrid:objectType <http://www.w3.org/ns/dcat#Dataset> .
...
<#output> a <http://www.mygrid.org.uk/mygrid-moby-service#parameter>;
mygrid:objectType <http://purl.org/twc/vocab/datafaqs#EvaluatedDataset> .
...
From this, we see that the evaluation service accepts RDF descriptions of dcat:Datasets and returns RDF descriptions of the same instances that will then be typed as datafaqs:EvaluatedDataset. This conforms to the design of the SADI Semantic Web Services framework.
Let's make the sample input using the examples we are using from LODStats:
$ cd /opt/DataFAQs/services/sadi/faqt/access/
/opt/DataFAQs/services/sadi/faqt/access/$ mkdir -p in-sparql-endpoint-materials/sample-inputs
$ cd in-sparql-endpoint-materials/sample-inputs
$ curl -s http://prefix.cc/dcat,datafaqs.file.n3 > 1-good-1-bad-from-lodstat.ttl
Then make 1-good-1-bad-from-lodstat.ttl
list the two datasets that we want to evaluate. The type needs to match the type returned by your evaluation service's getInputClass
function (which is used to create the service description above).
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
<http://thedatahub.org/dataset/fu-berlin-stitch> a dcat:Dataset .
<http://thedatahub.org/dataset/2000-us-census-rdf> a dcat:Dataset .
Next, send the descriptions of the datasets to the evaluation service and see what it thinks about them:
curl -H "Content-Type: text/turtle" -d @1-good-1-bad-from-lodstat.ttl http://localhost:9109/in-sparql-endpoint
<http://thedatahub.org/dataset/2000-us-census-rdf> a <http://purl.org/twc/vocab/datafaqs#Unsatisfactory> .
<http://thedatahub.org/dataset/fu-berlin-stitch> a <http://purl.org/twc/vocab/datafaqs#Unsatisfactory> .
Because the template that we copied asserts Unsatisfactory
by default, every dcat:Dataset we send this service will be Unsatisfactory
until we implement the def process(self, input, output):
function.
To do that, we'll need a bit more than the URI of the dataset. A FAqT Service is allowed to assume that the RDF descriptions it receives about a dcat:Dataset already includes the RDF obtained by the dcat:Dataset's URI dereference. That's because [DataFAQs core](FAqT Brick) does this beforehand as it constructs an evaluation epoch. To avoid setting up a FAqT Brick now, we can grab the RDF descriptions ourselves:
cd /opt/DataFAQs/services/sadi/faqt/access/in-sparql-endpoint-materials/sample-inputs/
curl -sH "Accept: application/rdf+xml" -L http://thedatahub.org/dataset/fu-berlin-stitch > fu-berlin-stitch.rdf
curl -sH "Accept: application/rdf+xml" -L http://thedatahub.org/dataset/2000-us-census-rdf > 2000-us-census-rdf.rdf
In there, we see the association between the dataset we're trying to evaluate and the endpoint that we want to make sure works:
<dcat:Dataset rdf:about="http://thedatahub.org/dataset/fu-berlin-stitch">
<void:sparqlEndpoint rdf:resource="http://www4.wiwiss.fu-berlin.de/stitch/sparql"/>
...
<dcat:Dataset rdf:about="http://thedatahub.org/dataset/2000-us-census-rdf">
<void:sparqlEndpoint rdf:resource="http://www.rdfabout.com/sparql"/>
...
Seeing the input that we'll be processing, we can implement the function:
def process(self, input, output):
print 'processing ' + input.subject
if input.void_sparqlEndpoint:
output.void_sparqlEndpoint = input.void_sparqlEndpoint.first
result = {}
try:
print ' ',
print input.void_sparqlEndpoint.first
queries = [ 'select distinct ?type where { graph ?g { [] a ?type } } limit 1',
'select distinct ?type where {[] a ?type} limit 1' ]
for query in queries:
if ns.DATAFAQS['Satisfactory'] not in output.rdf_type:
store = Store(reader = 'sparql_protocol', endpoint = input.void_sparqlEndpoint.first)
session = Session(store)
session.enable_logging = False
result = session.default_store.execute_sparql(query)
if result['results'] != None:
for binding in result['results']['bindings']:
type = binding['type']['value']
output.rdf_type.append(ns.DATAFAQS['Satisfactory'])
print ' ',
print type
except:
print ' BAD ENDPOINT'
output.rdf_type.append(ns.DATAFAQS['Unsatisfactory'])
output.datafaqs_error = result.read()
else:
print ' NO ENDPOINT'
output.rdf_type.append(ns.DATAFAQS['Unsatisfactory'])
output.datafaqs_error = 'Dataset was not described with predicate void:sparqlEndpoint.'
if ns.DATAFAQS['Satisfactory'] not in output.rdf_type:
output.rdf_type.append(ns.DATAFAQS['Unsatisfactory'])
output.save()
Redeploy the service (python in-sparql-endpoint.py
) and call it for each dataset to see that we can reproduce the results that LODStat reports (fu-berlin-stitch good, 2000-us-census-rdf bad):
$ curl -sH "Content-Type: application/rdf+xml" -d @fu-berlin-stitch.rdf http://localhost:9109/in-sparql-endpoint
@prefix void: <http://rdfs.org/ns/void#> .
<http://thedatahub.org/dataset/fu-berlin-stitch> a <http://purl.org/twc/vocab/datafaqs#Satisfactory>;
void:sparqlEndpoint <http://www4.wiwiss.fu-berlin.de/stitch/sparql> .
$ curl -sH "Content-Type: application/rdf+xml" -d @2000-us-census-rdf.rdf http://localhost:9109/in-sparql-endpoint
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
@prefix void: <http://rdfs.org/ns/void#> .
<http://thedatahub.org/dataset/2000-us-census-rdf> a <http://purl.org/twc/vocab/datafaqs#Unsatisfactory>;
datafaqs:error """
""";
void:sparqlEndpoint <http://www.rdfabout.com/sparql> .
After [deploying the service](Sample FAqT deployment to) to its public home, we can register it at the SADI registry and see it listed at http://sadiframework.org/registry/services.
If you want to run an evaluation epoch with just the in-sparql-endpoint
evaluation service and the two datasets in the example, use this epoch configuration.
This section is used to reserve ports for each FAqT evaluation service, so we can test many at the same time without having collisions. The services listed are available in the repository.
- 9090 https://github.com/timrdf/DataFAQs/blob/master/services/sadi/ckan/add-metadata.rpy
- 9091 https://github.com/timrdf/DataFAQs/blob/master/services/sadi/faqt/void-triples.rpy
- 9092 https://github.com/timrdf/DataFAQs/blob/master/services/sadi/faqt/internet-domain.rpy
- 9093 redirect-loop.rpy
- 9094 class-and-predicate-capitalization.rpy
- 9095 triple-count-accuracy.rpy
- 9096 instances-are-explicitly-typed.rpy
- 9097 instances-are-typed-by-domain-and-range.rpy
- 9098 by-ckan-group.rpy
- 9099 with-preferred-uri-and-ckan-meta-void.rpy
- 9100 vocabulary-resolves-to-description.rpy
- 9101 via-sparql-query.rpy on sparql.tw
- 9102 void-properties.rpy
- 9103 predicate-counter.rpy
- 9104 lodcloud/max-1-tag.rpy
- 9105 lodcloud/identity
- 9106 faqt/sparql-service-description/named-graphs.rpy
- 9107 csv2rdf4lod-as-ckan.rpy
- 9108 select-datasets/identity.rpy
- 9109 access/in-sparql-endpoint.rpy (deployed) (meta)
- 9110 core/select-dataset/by-ckan-tag.rpy
- 9111 contributor-email.rpy
- 9112 fake-goef-coverage.rpy
- 9113 select-datasets/via-sparql-query.rpy
- 9114 logd-catalog-listing.rpy
- 9115 wikitable-gspo.rpy
- 9116 wikitable-fol.rpy
- 9117 rdf2asn.rpy
- 9118 lena-example.rpy
- 9119 faqt/access/void-subset-tree-dumps
- 9120 faqt/provenance/named-graph-derivation.rpy
- 9121 core/select-faqts/towards/ckan-tag.rpy
- 9122 references-instance-hub.rpy
- 9223 datascape/size.rpy
- 9224 connected/void-linkset.py
- 9225 core/augment-dataset/lift-ckan.py
- 9226 core/augment-dataset/sameas-org.py
- 9227 access/void-datadump.py
- 9228 visko-planner.py
- 9229 w3c-mail-archives.py
- 9230 w3c-mail-archives-per-month.py
- 9231 w3c-mail-archives-message.py
- 9232 via-hypermail/groups.py
- 9233 by-ckan-installation.py
- 9234 with-rdf-extension.py
- 9235 services/sadi/faqt/naming/between-the-edges
- 9236 services/sadi/faqt/vocabulary/uses/prov
- 9237 services/sadi/faqt/vocabulary/uses/dcat
- 9238 services/sadi/faqt/vocabulary/uses/void
- 9239 services/sadi/faqt/vocabulary/uses/dcterms
- 9240 services/sadi/bibo/subject-broader.py
- 9241 lod-tag-and-lodcloud-group-contacts.py
The faqt-template.rpy includes a print out with a sample of how to call it:
if __name__ == '__main__':
print resource.name + ' running on port ' + str(resource.dev_port) + '. Invoke it with:'
print 'curl -H "Content-Type: text/turtle" -d @my.ttl http://localhost:' + str(resource.dev_port) + '/' + resource.name
sadi.publishTwistedService(resource, port=resource.dev_port)
which is usually either of:
curl -H "Content-Type: text/turtle" -d @my.ttl http://localhost:9090/add-metadata
curl -H "Content-Type: application/rdf+xml" -d @my.rdf http://localhost:9090/add-metadata
- FAqT Service using Ripple
- FAqT Service using Beautiful Soup
- FAqT Service with Secondary Parameters
- Sample FAqT deployment to see how we use twistd to deploy the FAqT evaluation service from a working copy of the github repository.
- FAqT Bricks accumulate evaluations provided by FAqT Services.
- DataFAQs Core Services
- SADI Semantic Web Services framework