Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different batch size for cursors produces different results #65

Open
mircicd opened this issue Jan 25, 2021 · 1 comment
Open

Different batch size for cursors produces different results #65

mircicd opened this issue Jan 25, 2021 · 1 comment

Comments

@mircicd
Copy link

mircicd commented Jan 25, 2021

Hello

We like the clean principles by which this library is being developed and have been using it with great joy so far.
I came across some strange behavior when using the cursor implementation though.
According to the sample:

async myRepositoryFunction(params: any): Promise<Cursor<any>> {
   ...
   const result = await this.persistenceManager.openCursor(
         new CursorSpecification<any>().withStatement(myQuery).bind(params).batchSize(X)
   );
   return result;
}

Consuming the result like this

for (await const t of timeline) {
      const [myEntities,...] = result;
}

I've started to experiment with different batch sizes X. In my setup I get back 544 result records, no matter what value I choose for the batch size. What is interesting however is that certain elements are missing in the result set, depending on what batch size I choose. I've only checked 3 elements definitely missing in the result set so far, but there are probably more. The fact that the size is always the same of course indicates that there can be duplicates.

To compare the varying results of different batch sizes, I take the non-cursor implementation (using QuerySpecification and .query) as my reference of the correct result.
Choosing a cursor batch size of 1 leads to the same (correct result), 100 does not, 500 does and 544 does as well (reminder that 544 is the total number of records). Not specifying the batch size will default to a value of 100 which as stated skips some values in the result set.

I don't expect this behavior to be wanted from Drivine's side. As far as I've checked using cursors adds a LIMIT X to the cypher query, which shouldn't change the logic of the query. So I'm a bit lost here.

I can try to come up with a repo to reproduce that issue if needed.

Cheers!

Dejan

@jasperblues
Copy link
Member

Hi @mircicd Thanks for reporting.

For AgensGraph the streaming API uses true cursors (results will be calculated on the server once, then streamed to the client as fast as the client is able to consume (client controls this process, thus no back-pressure).

For Neo4j, the JavaScript driver only supports push-based so we use skip and limit until it supports pull style. (Again to avoid back-pressure, which is a likely reason one has opted for streaming).

Regardless the results should be the same. Are you able to create a test case that reproduces this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants