Different batch size for cursors produces different results #65

mircicd · 2021-01-25T11:21:06Z

Hello

We like the clean principles by which this library is being developed and have been using it with great joy so far.
I came across some strange behavior when using the cursor implementation though.
According to the sample:

async myRepositoryFunction(params: any): Promise<Cursor<any>> {
   ...
   const result = await this.persistenceManager.openCursor(
         new CursorSpecification<any>().withStatement(myQuery).bind(params).batchSize(X)
   );
   return result;
}

Consuming the result like this

for (await const t of timeline) {
      const [myEntities,...] = result;
}

I've started to experiment with different batch sizes X. In my setup I get back 544 result records, no matter what value I choose for the batch size. What is interesting however is that certain elements are missing in the result set, depending on what batch size I choose. I've only checked 3 elements definitely missing in the result set so far, but there are probably more. The fact that the size is always the same of course indicates that there can be duplicates.

To compare the varying results of different batch sizes, I take the non-cursor implementation (using QuerySpecification and .query) as my reference of the correct result.
Choosing a cursor batch size of 1 leads to the same (correct result), 100 does not, 500 does and 544 does as well (reminder that 544 is the total number of records). Not specifying the batch size will default to a value of 100 which as stated skips some values in the result set.

I don't expect this behavior to be wanted from Drivine's side. As far as I've checked using cursors adds a LIMIT X to the cypher query, which shouldn't change the logic of the query. So I'm a bit lost here.

I can try to come up with a repo to reproduce that issue if needed.

Cheers!

Dejan

The text was updated successfully, but these errors were encountered:

jasperblues · 2021-04-08T04:40:16Z

Hi @mircicd Thanks for reporting.

For AgensGraph the streaming API uses true cursors (results will be calculated on the server once, then streamed to the client as fast as the client is able to consume (client controls this process, thus no back-pressure).

For Neo4j, the JavaScript driver only supports push-based so we use skip and limit until it supports pull style. (Again to avoid back-pressure, which is a likely reason one has opted for streaming).

Regardless the results should be the same. Are you able to create a test case that reproduces this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different batch size for cursors produces different results #65

Different batch size for cursors produces different results #65

mircicd commented Jan 25, 2021

jasperblues commented Apr 8, 2021

Different batch size for cursors produces different results #65

Different batch size for cursors produces different results #65

Comments

mircicd commented Jan 25, 2021

jasperblues commented Apr 8, 2021