-
Notifications
You must be signed in to change notification settings - Fork 72
Query Engine
This is an internal design doc. You don't need to know any of this unless you work on LiteCore's query implementation.
LiteCore executes queries by translating their JSON form into SQL, compiling the SQL into a SQLite 'statement', and then evaluating that statement. Ta-da, that's it!
...except for the details. There are a number of complications:
- Document properties are not stored in SQL columns; there's just one blob column called
body
, and the properties are Fleece-encoded inside that. - JSON documents can contain arrays; SQL has no notion of this, nor of the
UNNEST
and 'ANY'/EVERY
features that N1QL uses to query arrays. - JSON (and N1QL) has a
null
value that is, confusingly, unlike SQL'snull
.
A C4Query
reference is a wrapper for a C++ SQLiteQuery
instance. The first thing the SQLiteQuery
does is instantiate a QueryParser
and pass it the JSON query string.
The QueryParser
translates the JSON into Fleece to make it easy to work with, then recursively descends into the node tree. As it processes each node, it writes SQL to an output stream. (This is much like the code-generation pass of a compiler, since the JSON format is already structured much like a parse tree.)
The SQLiteQuery
then compiles the SQL and stores the compiled statement.
To evaluate the query, a SQLiteQueryEnumerator
steps the statement, translates each output row into a Fleece array, and writes those into an outer array. The enumerator holds on to this encoded Fleece, and hands out an iterator to its rows.
Note: This has the unfortunate effect of buffering all the query results in memory at once, but it prevents the problem of leaving a SQLite statement running while changes are potentially being made (in between calls to
c4queryenum_next()
) by the same database connection -- this is explicitly disallowed and can cause garbage results.
Result columns of types other than string, number, or missing
(SQL null
) appear as SQLite blobs containing Fleece. The enumerator detects these blobs and handles them correctly.
The QueryParser
translates a document property reference into a call to the custom SQLite function fl_value
. So for example the JSON query operation [".name.first"]
would translate to fl_value(body, '.name.first')
. The implementation of fl_body
(in SQLiteFleeceFunctions.cc) does this:
- Interprets the first parameter, a blob, as a document body, finds the current revision, and gets a Fleece pointer to it.
- Uses Fleece's
Path
class to traverse the key-path given in the second parameter. - Returns a SQLite value (
sqlite3_value*
) corresponding to the property value.
SQLite only has five data types (NULL, integer, float, text, blob). But fortunately it provides an API to tag a sqlite3_value
with an application-defined 8-bit "subtype".
- Booleans are represented as the numbers 0 and 1 with the subtype
kFleeceIntBoolean
(0x68). - Arrays and Dictionaries are represented as SQLite blobs containing their Fleece encoding, with no subtype.
- Blob data is represented as a SQLite blob with the subtype
kPlainBlobSubtype
(0x66). - A JSON
null
(as opposed to a SQLNULL
, which as per N1QL we callMISSING
) is represented as a zero-length SQLite blob with the subtypekFleeceNullSubtype
(0x67).
The works OK inside a query except for some edge cases -- for example, the results of SQLite expressions don't have tags, so ==
results in an integer 0
or 1
, not a (tagged) false
or true
.
Unfortunately the subtype tags are lost when values are returned from a SQLite query (probably because they were a later addition to the API.) So we're back to the regular five SQLite types. To work around this, all projected values (the expressions right after SELECT
) are wrapped in a call to fl_result()
. This function converts tagged bools, blobs and nulls into equivalent Fleece-encoded blobs. Then the exterior part of query handling, the SQLiteQueryEnumerator
, handles all blob-typed values by Fleece-decoding them.
Both UNNEST
and the ANY
/EVERY
operators provide a sort of nested query on an array. LiteCore has two ways to translate this to SQL, depending on whether there is a LiteCore index on that array.
If there's no index, LiteCore uses its fl_each
SQL function (SQLiteFleeceEach.cc) This is a complicated thing called a SQLite Virtual Table. The primary use of virtual tables is to create SQLite tables that aren't implemented in the normal way (this is how FTS works), but they have a secondary use as table-valued functions, and that's how we use fl_each
. A table-valued function can appear in a JOIN clause as though it were a table, and can present rows of data that are specific to each row the query is processing. So what fl_each
does is make a Fleece array look like a SQL table, like a KeyStore table in fact. Each array element appears as a row with its own body
column containing the Fleece value.
- An UNNEST clause is translated to a JOIN on an
fl_each
call whose parameters are (likefl_value
) the document body and the path to the array. - An
ANY
/EVERY
expression is translated into a nestedSELECT
statement whoseFROM
is a similarfl_each
call, with a test on its row count.
Note: This is super clever, but it's not original with us. It, and also our approach to querying doc properties, were inspired by SQLite's own JSON extension.
If the array is indexed, the fl_each
call is replaced by the name of the index's table, which has the same schema as fl_each
's virtual table. There's an ON
condition that matches the current document's rowid
to the index table's docid
.