-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
72 changed files
with
4,530 additions
and
1,523 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ scratch | |
.idea | ||
.vscode | ||
MANIFEST | ||
prof | ||
|
||
#---- | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,102 @@ | ||
Versa | ||
===== | ||
# Versa | ||
|
||
The Versa model for Web resources and relationships. Think of it as an evolution of Resource Description Framework (RDF) | ||
that's at once simpler and more expressive. | ||
Versa is a model for Web resources and relationships. It has a lot in common | ||
with Resource Description Framework (RDF) or Property Graphs (PG). It is | ||
a way to express and work with data on the Web, in direct terms of resources | ||
and rich linking between these resources. This also makes it a good and | ||
natural way to exrpess Knowledge Grapgs (KG). | ||
|
||
This repository provides specification as well as tools for using Versa in | ||
practice, and which serve as reference implementations. | ||
|
||
# Brief introduction to Versa | ||
|
||
To get a simple idea of Versa, think about how you can express the relationship | ||
between a Web page and its author in HTML5. | ||
|
||
<a href="http://uche.ogbuji.net" rel="author">Uche Ogbuji</a> | ||
|
||
Let's say the page being described is `http://uche.ogbuji.net/ndewo/`. | ||
Versa makes it easy to pull together all these author link components into a single construct for easy understanding and manipulation. | ||
|
||
http://uche.ogbuji.net/ndewo/ author http://uche.ogbuji.net (caption="Uche Ogbuji") | ||
|
||
In Versa this is called a link, and a link has four basic components, an | ||
origin, a relationship, a target and a set of attributes. Link relationships | ||
(also known as link types) are critical because they place links in context, | ||
and Versa expects relationships to be IRIs so the context (meaning, if you like) | ||
is properly expressed and fully scoped. Since rel=author is defined in HTML5, | ||
you can complete the above as follows (using a made-up IRI for sake of example): | ||
|
||
http://uche.ogbuji.net/ndewo/ http://www.w3.org/TR/html5/link-type/author http://uche.ogbuji.net (caption="Uche Ogbuji") | ||
|
||
You can express Versa links in JSON, for example: | ||
|
||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/link-type/author", "http://uche.ogbuji.net", {"caption": "Uche Ogbuji"}] | ||
|
||
Usually you think of links in groups, ro example the many links from one page, | ||
or all the various links across, out of and into a Web site. Versa is | ||
designed for working with such collections of links. A collection of links | ||
in Versa is called a linkset. Again you can express a linkset in JSON. | ||
|
||
[ | ||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/link-type/author", "http://uche.ogbuji.net", {"http://www.w3.org/TR/html5/link/caption": "Uche Ogbuji"}], | ||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/link-type/see-also", "http://www.goodreads.com/book/show/18714145-ndewo-colorado", {"http://www.w3.org/TR/html5/link/label": "Goodreads"}], | ||
["http://uche.ogbuji.net/", "http://www.w3.org/TR/html5/link-type/see-also", "http://uche.ogbuji.net/ndewo/"] | ||
] | ||
|
||
Notice that the third link has no attributes. Attributes are optional. I | ||
invented a `see-also` relationship to represent a simple HTML link with no | ||
`rel` attribute. The second link captures the idea of an HTML `alt` | ||
attribute with a label attribute. In fact, HTML defines a bunch of | ||
attributes which can be used with links, and you can add your own using XML | ||
namespaces or HTML5 data attributes. This is why attributes are a core part | ||
of a link in Versa. A Web link ties together multiple bits of information | ||
in an extensible way, and attributes provide the extensibility, ensuring you | ||
can work with all these bits of information as a unit. | ||
|
||
If you think about data on the Web, links from one resource to another are | ||
useful, but it's also useful to be able to express simple properties of a | ||
resource. Versa supports this in the form of what's called a data link. | ||
For example you could capture the title and other metadata about a resource. | ||
|
||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/title", "Ndewo, Colorado"] | ||
|
||
The target of a data link is not a Web resource but rather a simple piece of | ||
information. Technically, in Versa syntax you should always signal resources | ||
as IRIs. In Javascript form this looks as follows: | ||
|
||
[ | ||
["<http://uche.ogbuji.net/ndewo/>", "<http://www.w3.org/TR/html5/link-type/author>", "<http://uche.ogbuji.net>", {"<http://www.w3.org/TR/html5/link/description>": "Uche Ogbuji"}], | ||
["<http://uche.ogbuji.net/ndewo/>", "<http://www.w3.org/TR/html5/link-type/see-also>", "<http://www.goodreads.com/book/show/18714145-ndewo-colorado>", {"<http://www.w3.org/TR/html5/link/label>": "Goodreads"}], | ||
["<http://uche.ogbuji.net/>", "<http://www.w3.org/TR/html5/link-type/see-also>", "<http://uche.ogbuji.net/ndewo/>"] | ||
["<http://uche.ogbuji.net/ndewo/>", "<http://www.w3.org/TR/html5/title>", "Ndewo, Colorado"] | ||
] | ||
|
||
The angle brackets signal to Versa what should be treated as an IRI. | ||
Versa origins and relationships are always IRIs, so you can omit the angle | ||
brackets in those cases. | ||
|
||
[ | ||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/link-type/author", "<http://uche.ogbuji.net>", {"<http://www.w3.org/TR/html5/link/description>": "Uche Ogbuji"}], | ||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/link-type/see-also", "<http://www.goodreads.com/book/show/18714145-ndewo-colorado>", {"<http://www.w3.org/TR/html5/link/label>": "Goodreads"}], | ||
["http://uche.ogbuji.net/", "http://www.w3.org/TR/html5/link-type/see-also", "<http://uche.ogbuji.net/ndewo/>"] | ||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/title", "Ndewo, Colorado"] | ||
] | ||
|
||
All Versa data link targets are represented as strings, but they can be | ||
interpreted as e.g. numbers, dates or other data types. Attributes are | ||
useful for signaling such interpretation. | ||
|
||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/created", "2013-09-01", {"<@type>", "<@datetime>"}] | ||
|
||
Notice the syntax used in the attribute. Versa provides some common data | ||
modeling primitives such as a way to express the interpreted type of a data | ||
link target. `@type` is just a convenient abbreviation for referring | ||
to this Versa built-in concept. You can write out this link in full as follows: | ||
|
||
["http://uche.ogbuji.net/ndewo/", "http://www.w3.org/TR/html5/created", "2013-09-01", {"<http://purl.org/versa/type>", "<http://purl.org/versa/datetime>"}] | ||
|
||
# Developer notes | ||
|
||
Dosctring style: [Google](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) + Markdown |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Title,Author,Author date,ISBN,Publisher,Pub date | ||
Half of a Yellow Sun,Chimamanda Ngozi Adichie,1977,9780008205249,Fourth Estate,2006 | ||
Things Fall Apart,Chinụalụmọgụ Achebe,1930,9781841593272,William Heinemann Ltd.,1958 | ||
"Death and the King's Horseman ",Olúwolé Sóyíinká,1934,9780413333506,Eyre Methuen,1975 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
[ | ||
{"title": "Half of a Yellow Sun", | ||
"author": {"name": "Chimamanda Ngozi Adichie", "date": "1977"}, | ||
"publication": {"name": "Fourth Estate", "date": "2006"}, | ||
"isbn": "9780008205249"}, | ||
{"title": "Things Fall Apart", | ||
"author": {"name": "Chinụalụmọgụ Achebe", "date": "1930"}, | ||
"publication": {"name": "William Heinemann Ltd.", "date": "1958"}, | ||
"isbn": "9781841593272"}, | ||
{"title": "Death and the King's Horseman", | ||
"author": {"name": "Olúwolé Sóyíinká", "date": "1934"}, | ||
"publication": {"name": "Eyre Methuen", "date": "1975"}, | ||
"isbn": "9780413333506"} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
#!/usr/bin/env python | ||
#-*- mode: python -*- | ||
# csv_to_bibframe.py | ||
|
||
''' | ||
Demo of Versa Pipeline. Converts a CSV with book info into BIBFRAME Lite | ||
You might first want to be familar with dc_to_schemaorg.py | ||
and csv_to_schemaorg.py | ||
python demo/csv_to_bibframe.py demo/books.csv | ||
http://bibfra.me/ | ||
''' | ||
|
||
import sys | ||
import random | ||
import warnings | ||
import functools | ||
from pathlib import Path | ||
|
||
import click # Cmdline processing tool. pip install click | ||
|
||
from amara3 import iri | ||
|
||
from versa import ORIGIN, RELATIONSHIP, TARGET | ||
from versa import I, VERSA_BASEIRI, VTYPE_REL, VLABEL_REL | ||
from versa import util | ||
from versa.driver.memory import newmodel | ||
from versa.serial import csv, literate, mermaid | ||
from versa.pipeline import * | ||
from versa.contrib.datachefids import idgen as default_idgen | ||
|
||
BOOK_NS = I('https://example.org/') | ||
IMPLICIT_NS = I('http://example.org/vocab/') | ||
BF_NS = I('http://bibfra.me/') | ||
|
||
|
||
from versa.pipeline import * | ||
|
||
FINGERPRINT_RULES = { | ||
# Fingerprint DC book by ISBN & output resource will be a SCH Book | ||
|
||
# Outermost parens here are not really needed, used for formatting. | ||
# You can use an actual tuple here, though, to trigger multiple | ||
# rules per matched type | ||
IMPLICIT_NS('Book'): ( | ||
materialize(BF_NS('Instance'), | ||
fprint=[ | ||
(BF_NS('isbn'), follow(IMPLICIT_NS('identifier'))), | ||
], | ||
links=[ | ||
(BF_NS('provenance'), var('provenance')), | ||
(BF_NS('instantiates'), | ||
materialize(BF_NS('Work'), | ||
fprint=[ | ||
(BF_NS('name'), follow(IMPLICIT_NS('title'))), | ||
], | ||
), | ||
) | ||
] | ||
) | ||
) | ||
} | ||
|
||
|
||
# Data transformation rules. In general this is some sort of link from an | ||
# Input pattern being matched to output generated by Versa pipeline actions | ||
|
||
# In this case we use a dict of expected relationships from fingerprinted | ||
# resources dict values are the action function that updates the output model | ||
# by acting on the provided context (in this case just the triggered | ||
# relationship in the input model) | ||
|
||
# Work & instance types | ||
WT = BF_NS('Work') | ||
IT = BF_NS('Instance') | ||
|
||
|
||
DC_TO_SCH_RULES = { | ||
# Rules that are the same regardless of matched output resource type | ||
IMPLICIT_NS('title'): link(rel=BF_NS('name')), | ||
|
||
# Rules differentiated by matched output resource type | ||
(IMPLICIT_NS('author'), WT): materialize(BF_NS('Person'), | ||
BF_NS('creator'), | ||
fprint=[ | ||
(BF_NS('name'), attr(IMPLICIT_NS('name'))), | ||
(BF_NS('birthDate'), attr(IMPLICIT_NS('date'))), | ||
], | ||
links=[ | ||
(BF_NS('name'), attr(IMPLICIT_NS('name'))), | ||
(BF_NS('birthDate'), attr(IMPLICIT_NS('date'))), | ||
] | ||
), | ||
} | ||
|
||
|
||
LABELIZE_RULES = { | ||
# Labels come from input model's DC name rels | ||
BF_NS('Book'): follow(BF_NS('name')) | ||
} | ||
|
||
|
||
# Just use Python's built-in string.format() | ||
# Could also use e.g. Jinja | ||
VLITERATE_TEMPLATE = '''\ | ||
# @docheader | ||
* @iri: | ||
* @base: https://example.org/ | ||
* @schema: http://example.org/vocab/ | ||
# /{ISBN} [Book] | ||
* title: {Title} | ||
* author: | ||
* name: {Author} | ||
* date: {Author_date} | ||
* publisher: | ||
* name: {Publisher} | ||
* date: {Pub_date} | ||
* identifier: {ISBN} | ||
* type: isbn | ||
''' | ||
|
||
|
||
class csv_bibframe_pipeline(definition): | ||
def __init__(self): | ||
''' | ||
csv_bibframe_pipeline initializer | ||
''' | ||
self._provenance = I('http://example.com/SOME_CSV_FILE') | ||
super().__init__() | ||
|
||
@stage(1) | ||
def fingerprint(self): | ||
''' | ||
Generates fingerprints from the source model | ||
Result of the fingerprinting phase is that the output model shows | ||
the presence of each resource of primary interest expected to result | ||
from the transformation, with minimal detail such as the resource type | ||
''' | ||
# Prepare a root context | ||
ctx_vars = {'provenance': self._provenance} | ||
ctx = DUMMY_CONTEXT.copy(variables=ctx_vars) | ||
|
||
# Apply a common fingerprinting strategy using rules defined above | ||
new_rids = self.fingerprint_helper(FINGERPRINT_RULES, root_context=ctx) | ||
|
||
# In real code following lines could be simplified to: return bool(new_rids) | ||
if not new_rids: | ||
# Nothing found to process, so ret val set to False | ||
# This will abort pipeline processing of this input & move on to the next, if any | ||
return False | ||
|
||
# ret val True so pipeline run will continue for this input | ||
return True | ||
|
||
|
||
@stage(2) | ||
def main_transform(self): | ||
''' | ||
Executes the main transform rules to go from input to output model | ||
''' | ||
# Apply a common transform strategy using rules defined above | ||
# | ||
def missed_rel(link): | ||
''' | ||
Callback to handle cases where a transform wasn't found to match a link (by relationship) in the input model | ||
''' | ||
warnings.warn(f'Unknown, so unhandled link. Origin :{link[ORIGIN]}. Rel: {link[RELATIONSHIP]}') | ||
|
||
new_rids = self.transform_by_rel_helper(DC_TO_SCH_RULES, handle_misses=missed_rel) | ||
return True | ||
|
||
|
||
@stage(3) | ||
def labelize(self): | ||
''' | ||
Executes a utility rule to create labels in output model for new (fingerprinted) resources | ||
''' | ||
# XXX Check if there's already a label? | ||
# Apply a common transform strategy using rules defined above | ||
def missed_label(origin, type): | ||
''' | ||
Callback to handle cases where a transform wasn't found to match a link (by relationship) in the input model | ||
''' | ||
warnings.warn(f'No label generated for: {origin}') | ||
labels = self.labelize_helper(LABELIZE_RULES, handle_misses=missed_label) | ||
return True | ||
|
||
|
||
@click.command() | ||
@click.argument('source') | ||
def main(source): | ||
'Transform CSV SOURCE file to BF Lite in Versa' | ||
ppl = csv_bibframe_pipeline() | ||
input_model = newmodel() | ||
with open(source) as csvfp: | ||
for row_model in csv.parse_iter(csvfp, VLITERATE_TEMPLATE): | ||
if row_model: input_model.update(row_model) | ||
|
||
# Debug print of input model | ||
# literate.write([input_model], out=sys.stdout) | ||
output_model = ppl.run(input_model=input_model) | ||
print('Low level JSON dump of output data model: ') | ||
util.jsondump(output_model, sys.stdout) | ||
print('\n') # 2 CRs | ||
print('Versa literate form of output: ') | ||
literate.write(output_model, out=sys.stdout) | ||
|
||
print('Diagram from extracted a sample: ') | ||
out_resources = [] | ||
for vs in ppl.fingerprints.values(): | ||
out_resources.extend(vs) | ||
ITYPE = BF_NS('Instance') | ||
instances = [ r for r in out_resources if ITYPE in util.resourcetypes(output_model, r) ] | ||
zoomed, _ = util.zoom_in(output_model, random.choice(instances), depth=2) | ||
mermaid.write(zoomed) | ||
# literate.write(zoomed) | ||
|
||
|
||
if __name__ == '__main__': | ||
main() |
Oops, something went wrong.