Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the client render node types, etc. on large networks #215

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JackWilb
Copy link
Member

Does this PR close any open issues?

No

Give a longer description of what this PR addresses and why it's needed

The client was hitting a limit in python-arango when trying to use the multinetjs functions api.nodes and api.edges. The code for that triggers a UNION in the API, which blows up memory usage. To get around those limitations, I switched the API routes used here to use ones that don't require a UNION.

I think that the solution for the nodes panel in the bottom right is ugly and we could do something else. I wasn't sure what to do, so I left that for discussion.

Provide pictures/videos of the behavior before and after these changes (optional)

Old:
image

New:
image

Are there any additional TODOs before this PR is ready to go?

TODOs:

  • Consider alternative designs for bottom right node panel. Could be pushed to a new issue.

@netlify
Copy link

netlify bot commented Jan 19, 2022

Deploy Preview for multinet-client ready!

Name Link
🔨 Latest commit cb0ae18
🔍 Latest deploy log https://app.netlify.com/sites/multinet-client/deploys/62829bb7b0be7b000806aded
😎 Deploy Preview https://deploy-preview-215--multinet-client.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@waxlamp
Copy link
Contributor

waxlamp commented Jan 19, 2022

Do you have an example of a memory blowup? I'd like to explore an Arango EXPLAIN statement before we make this change.

Also, could you summarize your new approach here? It seems like maybe it shows max 10 nodes from one of the tables represented in the network?

@JackWilb
Copy link
Member Author

Do you have an example of a memory blowup? I'd like to explore an Arango EXPLAIN statement before we make this change.

Sorry for the long wait for a reply. Here's what's happening:

  1. The multinet-api nodes route calls the from_collections helper in arango.py.
  2. The from_collections helper adds f'UNION({", ".join(coll_vars)})' if len(coll_vars) > 1 else coll_vars[0] where coll_vars are actually the names of the tables (collections) that the query operates over.
  3. The nodes routed then calls paginate which adds a limit to the query.

That means that we generate a query that looks like:

FOR doc IN (FOR doc in union(people,places,census) RETURN doc) LIMIT 0, 10 RETURN doc

The union causes a collection scan over all the tables, which is super slow and causes a memory blow up. Here's the explain statement:

Query String (61 chars, cacheable: true):
 FOR doc IN union(people,places,census) LIMIT 0, 10 RETURN doc

Execution plan:
 Id   NodeType                      Est.   Comment
  1   SingletonNode                    1   * ROOT
 22   SubqueryStartNode                1     - LET #13 = ( /* subquery begin */
 11   EnumerateCollectionNode     429165       - FOR #11 IN census   /* full collection scan  */
 23   SubqueryEndNode                  1         - RETURN  #11 ) /* subquery end */
 20   SubqueryStartNode                1     - LET #9 = ( /* subquery begin */
  7   EnumerateCollectionNode   20601490       - FOR #7 IN places   /* full collection scan  */
 21   SubqueryEndNode                  1         - RETURN  #7 ) /* subquery end */
 18   SubqueryStartNode                1     - LET #5 = ( /* subquery begin */
  3   EnumerateCollectionNode    9236085       - FOR #3 IN people   /* full collection scan  */
 19   SubqueryEndNode                  1         - RETURN  #3 ) /* subquery end */
 14   CalculationNode                  1     - LET #1 = UNION(#5, #9, #13)   /* simple expression */
 15   EnumerateListNode              100     - FOR doc IN #1   /* list iteration */
 16   LimitNode                       10       - LIMIT 0, 10
 17   ReturnNode                      10       - RETURN doc

Indexes used:
 none

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 UNION   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-down
  2   splice-subqueries

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 splice-subqueries                                0.00002
 move-calculations-down                           0.00001
 use-indexes                                      0.00001
 inline-subqueries                                0.00000
 optimize-subqueries                              0.00000

41 rule(s) executed, 1 plan(s) created

Also, could you summarize your new approach here? It seems like maybe it shows max 10 nodes from one of the tables represented in the network?

Yes, the approach here is to use a different route, table to grab the first 10 nodes from the first table in the collections that make up the network. This means we'll avoid the memory blow up and the slow queries, since it doesn't use the union that causes the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants