Karthick Sankarachary http://github.com/karthicks
The gremlin-objects
module defines a library that puts an object-oriented spin on the gremlin property graph.
It aims to make it much easier to specify business domain specific languages
around Gremlin, without any loss of expressive power.
While it targets the Gremlin-Java variant, the concept itself is language-independent.
Every element in the property graph, whether it be a vertex (property) or an edge, is made up of properties.
Each such property is a String
key and an arbitrary Java
value.
It only seems fitting then to try and represent that property as a strongly-typed Java
field.
The specific class in which that field is defined then becomes the vertex (property) or edge, which the property describes.
A gremlin object model
such as this would need abstractions to query and update the graph in terms of those objects.
To get the library that facilitates all of this, add this dependency to your pom.xml
:
<dependency>
<groupId>com.github.karthicks</groupId>
<artifactId>gremlin-objects</artifactId>
<version>3.3.1-RC1</version>
</dependency>
A reference use case of this library is available in the following tinkergraph-test
module:
<dependency>
<groupId>com.github.karthicks</groupId>
<artifactId>tinkergraph-test</artifactId>
<version>3.3.1-RC1</version>
</dependency>
In this section, we go over how gremlin elements may be modeled, and how those models may be queried and stored.
Let’s consider the example of the person
vertex, taken from the "modern" and "the crew" graphs defined in the TinkerFactory.
In our object world, it would be defined as a Person class that extends Vertex.
By default, the vertex’s label matches its simple class name, hence we have to un-capitalize it using the @Alias annotation.
The person’s name
and age
properties become primitive fields in the class.
The @PrimaryKey and @OrderingKey annotations on them not only indicate that they are mandatory,
but also allow the person
to be found easily through the HasKeys.of(person) SubTraversal
.
Think of the SubTraversal as a reusable function that takes a GraphTraversal
, performs a few steps on it, and returns it back (to allow for chaining).
The KnowsPeople
field in this class is an example of an in-line SubTraversal
, albeit a stronger-typed version of it called ToVertex
, to indicate that it ends up selecting vertices.
Note that these traversal functions are not stored in the graph.
@Data
@Alias(label = "person")
public class Person extends Vertex {
public static ToVertex KnowsPeople = traversal -> traversal
.out(Label.of(Knows.class))
.hasLabel(Label.of(Person.class));
@PrimaryKey
private String name;
@OrderingKey
private int age;
private Set<String> titles;
private List<Location> locations;
}
Next, we look at its titles
field, which is defined to be a Set
.
As you might expect, the cardinality of the underlying property becomes set
.
Similarly, the locations
field takes on the list
cardinality.
Further, each element in the locations
list has it’s own meta-properties, and ergo deserves a Location class of it’s own.
@Data
@Alias(label = 'location')
public class Location extends Element {
@OrderingKey
@PropertyValue
private String name;
@OrderingKey
private Instant startTime;
private Instant endTime;
}
Note
|
The value of the location is stored in name , due to the placement of the @PropertyValue annotation.
Every other field in the Location class becomes the `location’s meta-property.
|
An edge is defined much like the vertex, except it extends the Edge class.
By default, an edge’s label is it’s un-capitalized simple class name, and hence no @Alias
is needed:
@Data
public class Knows extends Edge {
private Double weight;
private Instant since;
}
The Graph interface lets you update the graph using Vertex
or Edge
objects.
You can get it via dependency injection, assuming you’ve an Object
provider for GraphTraversalSource
:
@Inject @Object
private Graph graph;
Or, the good old fashioned way, using the GraphFactory:
private GraphFactory graphFactory =
GraphFactory.of(TinkerGraph.open().traversal()); // This gets you the factory for TinkerGraph.
private Graph = graphFactory.graph();
Now that we know how to obtain a Graph
instance, let’s see how to change it using Java
objects.
Here, we create software
vertices for tinkergraph
and gremlin
, and add a traverses
edge from gremlin
to tinkergraph
.
graph
.addVertex(Software.of("tinkergraph")).as("tinkergraph")
.addVertex(Software.of("gremlin")).as("gremlin")
.addEdge(Traverses.of(), "tinkergraph");
Below, a person
vertex containing a list of locations
is added, along with three outgoing edges.
graph
.addVertex(
Person.of("marko",
Location.of("san diego", 1997, 2001),
Location.of("santa cruz", 2001, 2004),
Location.of("brussels", 2004, 2005),
Location.of("santa fe", 2005))).as("marko")
.addEdge(Develops.of(2010), "tinkergraph")
.addEdge(Uses.of(Proficient), "gremlin")
.addEdge(Uses.of(Expert), "tinkergraph")
To see how the modern
and the crew
reference graphs may be created using the object Graph
interface, go here.
Tip
|
Since the object being added may already exist in the graph, we provide various options to resolve "merge conflicts", such as MERGE , REPLACE , CREATE , IGNORE AND INSERT .
|
There are two ways to get a handle to the Query interface. You can inject it like so:
@Inject @Object
private Query query;
Otherwise, you can create it using the GraphFactory
like so:
private GraphFactory graphFactory = GraphFactory.of(TinkerGraph.open().traversal());
private Query = graphFactory.query();
Next, let’s see how to use the Query
interface.
The following snippet queries the graph by chaining two SubTraversals
(a function denoting a partial traversal), and parses the result into a list of Person
vertices.
List<Person> friends = query
.by(HasKeys.of(modern.marko), Person.KnowsPeople)
.list(Person.class);
Below, we query by an AnyTraversal (a function on the GraphTraversalSource
), and get a single Person
back.
Person marko = Person.of("marko");
Person actual = query
.by(g -> g.V().hasLabel(marko.label()).has("name", marko.name()))
.one(Person.class);
The type of the result may be primitives too, and that is handled as shown below.
long count = query
.by(HasKeys.of(crew.marko), Count.of())
.one(Long.class);
Last, we show a traversal involving select steps, which requires special handling as it may return a map.
Selections selections = query
.by(g -> g.V().as("a").
properties("locations").as("b").
hasNot("endTime").as("c").
order().by("startTime").
select("a", "b", "c").by("name").by(T.value).by("startTime").dedup())
.as("a", String.class)
.as("b", String.class)
.as("c", Instant.class)
.select();
To see more examples showcasing how the object Query
interface may be used, go here.
In this section, we talk about how the gremlin-objects
library can be customized for a graph system
provider.
A provider that wishes to plug into gremlin-objects
through dependency injection, will need to provide a GraphTraversalSource
of it’s choice, through the Object
qualifier.
For users that don’t use dependency injection, they may manually pass the GraphTraversalSource
to the GraphFactory.
Typically, gremlin property values are Java primitives.
Sometimes, a provider treats a custom type as a primitive.
For instance, DataStax
lets you define property keys of the primitive geometric type Point
.
Such types can be registered using the Primitives#registerPrimitiveClass
methods.
When a GraphTraversal
is completed, it usually returns (a list of) gremlin Element(s)
.
However, when some providers execute a traversal, the result comprises custom element types.
For instance, when DataStax
executes a graph query, it returns a result set made up of GraphNode(s)
, a proprietary element type.
We give such providers a way to tell us how to parse such custom elements using the Parsers#registerElementParser
method.
While there exist similar OGM libraries, this one has some key differentiating factors. Now, let’s consider the alternatives:
The gremlin-core
module defines a GremlinDsl annotation that lets you define custom traversals by extending the GraphTraversal
and GraphTraversalSource
.
However, it requires some familiarity of gremlin-core
internals.
Peopod represents elements as annotated interfaces or abstract classes. While it generates boilerplate for traversals to adjacent vertices, it doesn’t let you co-locate arbitrary traversals. This library is less intrusive and more flexible.
An older version of TinkerPop allowed you to define custom steps using Closures
, not unlike the AnyTraversal
and SubTraversal
functions.
However, they aren’t as developer friendly as the functional interfaces provided here.
Moreover, it doesn’t allow for co-locating the traversal logic along with the element model, as we do here.
So far, we have the gremlin-objects
library, and a tinkergraph-test
reference use case for it.
Here, we list a few directions in which we see the library evolving:
The concept of lifting the property graph into objects is language-independent.
To quote the TinkerPop docs, "with JSR-223, any language compiler written for the JVM can directly access the JVM and any of its libraries", and that would include gremlin-objects
.
For GLVs not written for the JVM, it can be ported over as long as it supports basic reflection.
Case in point, the Gremlin-Python variant could achieve the object mapping through the dir, getattr and setattr built-in functions.
In reality, it is fairly easy for a provider to plug-into gremlin-objects
simply by supplying a GraphTraversalSource
of their choosing.
The ability to register custom primitive types and traversal result parsers allows for further customization.
Since neo4j
already has it’s own Neo4jGraph, it’s a good candidate to become the next test case.
Some providers use GraphFrames to execute bulk operations and graph algorithms on top of Tinkerpop.
Assuming they can work with DataFrames, one could build a GraphTraversalSource
,
which translates the object Graph
and Query
operations into DataFrame
tables, and adapt’s it to the provider’s GraphFrame
.
The AnyTraversal and SubTraversal
interfaces extend Formattable so that the steps defined in it’s body can be revealed.
Let’s say that we stored the bytecode of these types of functional fields as a hidden property in the element.
That could potentially allow us to execute user defined traversals
using a, say, traversal.call('function-name')
step.