Skip to content
This repository has been archived by the owner on Jan 3, 2021. It is now read-only.

Setting up a Pubby Website

Richard Cyganiak edited this page Oct 22, 2013 · 1 revision

This document describes how you can use Pubby to turn an RDF dataset into a browsable website.

An example dataset

Let's say your dataset contains the following RDF triples, and is loaded into a SPARQL endpoint at <http://data.example.com/sparql>:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.

<http://data.example.com/alice> a foaf:Person;
  foaf:name "Alice".

<http://data.example.com/company> a foaf:Organization;
  foaf:name "Example, Inc.";
  foaf:member <http://data.example.com/alice>;
  foaf:homepage <http://example.com/>.

Instead of accessing your dataset through a SPARQL endpoint, Pubby can also load an RDF file into memory. This is preferable if your dataset is small and you don't have a SPARQL endpoint up and running already.

A single Pubby site can use data from multiple such datasets. For example, you could have two datasets accessed through SPARQL endpoints that contain different parts of your organization's data (employee database and product database), and a third dataset that loads some extra triples and links between the two from a static RDF file.

Linked Data Server versus RDF Browser

There are two main reasons for setting up a Pubby site for this dataset.

  1. Pubby is a Linked Data Server. It can make the dataset Linked Data compliant. This means, when you put http://data.example.com/alice or into a browser, you get back information about Alice, either in HTML format or in RDF format depending on the capabilities of your browser. This makes the information part of the Web, rather than allowing access to it only through SPARQL.

  2. Pubby is an RDF Browser. This means it creates a nice and simple web site that allows browsing through your RDF dataset. It puts a simple user interface over your dataset. You can explore the dataset by following the links between resources, for example, browse from Example, Inc. to Alice, or vice versa.

Historically, the focus of Pubby has been to act as a Linked Data Server. It would only act as an RDF Browser for data that it also serves. Setting it up as an RDF Browser for other people's datasets (that is, for datasets whose IRIs are not on your own domains) was tricky, especially if the dataset contains data with IRIs on many different domains. However, we aim to make Pubby work better in the RDF Browser role.

Here, our goal will be to serve all the resources in the <http://data.example.com/> namespace (that is, Alice and Example, Inc.) as Linked Data and make them browsable at the same time.

Setting up the web base for a public Web server

The first question is: where will Pubby be running? On what server, what port, and with what root URL? To turn IRIs in the <http://data.example.com/> namespace into Linked Data, Pubby needs to run on the data.example.com server, on port 80, at the server root.

Here is the configuration to make that work. It announces where Pubby is running (conf:webBase), sets up our single dataset, and indicates that the dataset is accessed through a SPARQL endpoint.

@prefix conf: <http://richard.cyganiak.de/2007/pubby/config.rdf#>.

<> a conf:Configuration;
    conf:webBase <http://data.example.com/>;
    conf:dataset [
        conf:sparqlEndpoint <http://data.example.com/sparql>;
    ];
    .

The result is that you can now put http://data.example.com/alice or http://data.example.com/company into your browser and get a browsable HTML page (or RDF response, for RDF-capable clients).

Note that any IRIs outside of the conf:webBase namespace will not be served by Pubby, and are treated as simple links out into the Web. For example, when you click on the homepage URL (<http://example.com/>), or on the foaf:Person class (which corresponds to <http://xmlns.com/foaf/0.1/Person>), you will simply be taken to that location on the Web and out of the browsable Pubby graph.

Setting up web base and dataset base for a development server

Let's say you don't want (or can't) run a server on data.example.com, but still want browsable Linked Data from our dataset, on a test server running on your own machine. For example, your Pubby may be running as the root web application of a local Tomcat server. Its base will then be <http://localhost:8080/>.

To do this, Pubby needs to rewrite the dataset's IRIs into local IRIs:

<http://localhost:8080/alice> a foaf:Person;
  foaf:name "Alice".

<http://localhost:8080/company> a foaf:Organization;
  foaf:name "Example, Inc.";
  foaf:member <http://localhost:8080/alice>;
  foaf:homepage <http://example.com/>.

The configuration for this:

@prefix conf: <http://richard.cyganiak.de/2007/pubby/config.rdf#>.

<> a conf:Configuration;
    conf:webBase <http://localhost:8080/>;
    conf:dataset [
        conf:datasetBase <http://data.example.com/>;
        conf:sparqlEndpoint <http://data.example.com/sparql>;
    ];
    .

The conf:datasetBase tells Pubby that it should rewrite IRIs in that dataset by replacing the dataset base with the conf:webBase. Note that the homepage URL, <http://example.com/>, was left unchanged because it's not under the conf:datasetBase.

The result is that you can now put http://localhost:8080/alice or http://localhost:8080/company into your browser and get a browsable HTML page (or RDF response, for RDF-capable clients). Of course, this only works on your local machine.

Adding site metadata

@@TODO

Showing prefixed names or labels instead of IRIs

@@TODO

Adding vocabulary labels and weights

@@TODO

Dealing with high-degree properties

@@TODO

Dealing with blank nodes

@@TODO

Working with multiple datasets

Here is a configuration that accesses data from two databases and a static RDF file:

<> a conf:Configuration;
    conf:webBase <http://data.example.com/>;
    conf:dataset [
        conf:sparqlEndpoint <http://data.example.com/sparql-staff>;
    ];
    conf:dataset [
        conf:sparqlEndpoint <http://data.example.com/sparql-products>;
    ];
    conf:dataset [
        conf:loadRDF <file:///home/pubby/extra-data.ttl>;
    ];
    .

If the same IRI (let's say <http://data.example.com/alice>) is mentioned in all three datasets, then the Pubby page at that IRI will show information from all three sources.

Pubby rewrites hashes and question marks

For various technical reasons, Pubby does some extra complications when the dataset has IRIs that contain hashes (#) or question marks (?). Pubby sets up new Linked Data compliant IRIs that have these characters %-escaped. So, if the dataset contains the following triples:

<http://data.example.com/people#alice> foaf:name "Alice".

<http://data.example.com/people?staffid=42> foaf:name "Alice".

Then Pubby's web pages will treat it as if it contained the following triples:

<http://data.example.com/people%23alice> foaf:name "Alice".

<http://data.example.com/people%3Fstaffid=42> foaf:name "Alice".

This rewriting only applies to IRIs served by Pubby, that is, they are under the conf:webBase/conf:datasetBase. An IRI <http://elsewhere.com/?q=xyz> would not be modified.