Native Engine graphs benefits from automatic index analysis either explicitly via the schema API or via error messages.
The index analyzer has the ability to figure out what indexes a given traversal requires. It will propose those necessary indexes and the user has the option to either create those indexes manually (through copy/paste) or automatically (through .apply()
).
In order to let the index analyzer figure out what indexes a particular traversal requires, a user needs to execute schema.indexFor(<your_traversal>).analyze()
as shown below:
gremlin> schema.indexFor(g.V().has("age", 23)).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('person').materializedView('person_by_age').ifNotExists().partitionBy('age').clusterBy('firstname', Asc).clusterBy('lastname', Asc).create()
The index analyzer will also indicate in case a traversal can be fulfilled by existing indexes:
gremlin> schema.indexFor(g.V().has("name", "x")).analyze()
==>Traversal can be satisfied by existing indexes
This can be done through schema.indexFor(<your_traversal>).apply()
. It is perfectly valid to skip schema.indexFor(<your_traversal>).analyze()
and execute .apply()
directly. The .apply()
step will indicate what indexes are being created.
gremlin> schema.indexFor(g.V().has("age", 23)).apply()
==>Creating the following indexes:
schema.vertexLabel('person').materializedView('person_by_age').ifNotExists().partitionBy('age').clusterBy('firstname', Asc).clusterBy('lastname', Asc).create()
OK
Depending on the particular traversal, the index analyzer can suggest the creation of one or more indexes of the following types:
- a Materialized View for predicates that are not search-specific or specific to CQL collections
- a Search Index for specific predicates (e.g.
token
/regex
/phrase
/neq
/ ...) that can only be fulfilled by a search index - a Secondary index for specific predicates (e.g.
contains(x)
/containsKey(x)
/containsValue(x)
/entryEq(x, y)
) against CQL collections (Lists/Sets/Maps)
A Materialized View will generally be created if predicates in the traversal are not specific to CQL collections and are not search-specific.
The below example filters on phone
and orders by age
. Since this can be fulfilled by a Materialized View, the output will be:
gremlin> schema.indexFor(g.V().has("phone", "123-456-789").order().by("age", desc)).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('person').materializedView('person_by_phone_age_Desc').ifNotExists().partitionBy('phone').clusterBy('age', Desc).clusterBy('firstname', Asc).clusterBy('lastname', Asc).create()
The index analyzer will pick a Search Index if the predicate is Search-specific, such as token
/ tokenPrefix
/ tokenRegex
/ tokenFuzzy
/ phrase
/ regex
/ prefix
/ fuzzy
.
Additional predicates that require a Search Index are Geo predicates (inside
/ insideCartesian
) and neq
/ without
.
The below example uses the predicate regex
and filters on age
and so the index analyzer will suggest a Search Index:
gremlin> schema.indexFor(g.V().has('name', Search.regex('Alan')).has('age', gt(30))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('p').searchIndex().ifNotExists().by('name').by('age').create()
A slightly more complicated example that will result in the suggestion of a Search Index due to using the regex
/ prefix
predicates:
gremlin> schema.indexFor(g.V().has('title', Search.regex('dse')).has('lang', Search.prefix('jav')).order().by('url')).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('software').searchIndex().ifNotExists().by('title').by('lang').by('url').create()
The index analyzer will pick a Secondary Index for predicates (contains(x)
/ containsKey(x)
/ containsValue(x)
/ entryEq(x, y)
) against CQL collections, such as Lists/Sets/Maps.
The below example suggests a Secondary Index when filtering data in a list:
gremlin> schema.indexFor(g.V().has("list", contains(23))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('collections').secondaryIndex('collections_2i_by_list').ifNotExists().by('list').indexValues().create()
The same applies for queries against a set:
gremlin> schema.indexFor(g.V().has("set", contains(45))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('collections').secondaryIndex('collections_2i_by_set').ifNotExists().by('set').indexValues().create()
The below examples show usage of the map predicates containsKey(x)
/ containsValue(x)
/ entryEq(x, y)
and the suggested indexes:
gremlin> schema.indexFor(g.V().has("map", containsKey(45))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('collections').secondaryIndex('collections_2i_by_map1').ifNotExists().by('map').indexKeys().create()
gremlin> schema.indexFor(g.V().has("map", containsValue("some item"))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('collections').secondaryIndex('collections_2i_by_map2').ifNotExists().by('map').indexValues().create()
gremlin> schema.indexFor(g.V().has("map", entryEq(45, "some item"))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('collections').secondaryIndex('collections_2i_by_map3').ifNotExists().by('map').indexEntries().create()
This is generally the case when the traversal will hit more than one table (e.g. when we're not filtering on a particular label).
Given the traversal g.V().has("list", contains(23))
and the three vertex labels shown below will require three separate indexes.
gremlin> schema.describe()
==>schema.vertexLabel('a').ifNotExists().partitionBy('id', Int).property('age', Int).property('name', Varchar).property('list', listOf(Int)).create()
schema.vertexLabel('b').ifNotExists().partitionBy('id', Int).property('age', Int).property('name', Varchar).property('list', listOf(Int)).create()
schema.vertexLabel('c').ifNotExists().partitionBy('id', Int).property('age', Int).property('name', Varchar).property('list', listOf(Int)).create()
gremlin> schema.indexFor(g.V().has("list", contains(23))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('c').secondaryIndex('c_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('a').secondaryIndex('a_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('b').secondaryIndex('b_2i_by_list').ifNotExists().by('list').indexValues().create()
The traversal g.V().has("list", contains(23)).has("age", 23)
would require a Secondary Index (due to filtering on a CQL collection) and a Materialized View for each table.
Since no vertex label filtering is performed, the index analyzer will suggest six indexes in total:
gremlin> schema.indexFor(g.V().has("list", contains(23)).has("age", 23)).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('c').materializedView('c_by_age').ifNotExists().partitionBy('age').clusterBy('id', Asc).create()
schema.vertexLabel('c').secondaryIndex('c_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('a').materializedView('a_by_age').ifNotExists().partitionBy('age').clusterBy('id', Asc).create()
schema.vertexLabel('a').secondaryIndex('a_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('b').materializedView('b_by_age').ifNotExists().partitionBy('age').clusterBy('id', Asc).create()
schema.vertexLabel('b').secondaryIndex('b_2i_by_list').ifNotExists().by('list').indexValues().create()
The traversal g.V().has("list", contains(23)).has("age", 23).has("name", Search.regex(".*ohn"))
would also require two indexes per table.
A Secondary Index (due to filtering on a CQL collection) and a search index (due to regex
). The index analyzer will not create a Materialized View for age
as was shown in the previous example, but rather use the search index for that:
gremlin> schema.indexFor(g.V().has("list", contains(23)).has("age", 23).has("name", Search.regex(".*ohn"))).analyze()
==>Traversal requires that the following indexes are created:
schema.vertexLabel('c').searchIndex().ifNotExists().by('age').by('list').by('name').create()
schema.vertexLabel('c').secondaryIndex('c_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('a').searchIndex().ifNotExists().by('age').by('list').by('name').create()
schema.vertexLabel('a').secondaryIndex('a_2i_by_list').ifNotExists().by('list').indexValues().create()
schema.vertexLabel('b').searchIndex().ifNotExists().by('age').by('list').by('name').create()
schema.vertexLabel('b').secondaryIndex('b_2i_by_list').ifNotExists().by('list').indexValues().create()
NOTE
It is generally always recommended to use an element label when filtering as otherwise multiple CQL statements will be executed and data might get filtered in-memory by Tinkerpop and not by C*
When filtering on the vertex label a
in g.V().hasLabel("a").has("list", contains(23)).has("age", 23).has("name", Search.regex(".*ohn"))
, the index analyzer will fail to suggest any indexes, because it tries to leverage a search index for both conditions:
gremlin> schema.indexFor(g.V().hasLabel("a").has("list", contains(23)).has("age", 23).has("name", "hans")).analyze()
==>Failed to suggest indexes for traversal. Some steps in the traversal might not be supported yet.
This is currently a limitation that will be addressed in a future release.