Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune map analysis search to save time and memory #506

Open
robvanderveer opened this issue May 16, 2024 · 1 comment
Open

Prune map analysis search to save time and memory #506

robvanderveer opened this issue May 16, 2024 · 1 comment
Assignees

Comments

@robvanderveer
Copy link
Collaborator

Map analysis takes days to calculate and gigabytes to store, but we're only interested in the top matches from standard to standard. It's no use to find 100 weakly linked topics if you have a couple of direct links and a number of strong links.
The idea therefore is to prune the search we do for map analysis, by stopping search early if we know that we're looking

  1. design a strategy for pruning. First stab: don't search for average and weak links if we have direct and strong links. Don't search for weak links if we have direct, strong or average links'.
  2. update the search algorithm. This will require some smartness, depending on how it is implemented.

By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?

First step: find expertise in Neo4J. Rob

@robvanderveer robvanderveer self-assigned this May 16, 2024
@northdpole
Copy link
Collaborator

By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?

The query is the following:

OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p
followed by

OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[:(LINKED_TO|CONTAINS)*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p

and then we filter and assign weights in code
we can start by optimizing these pretty brute-forcy queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants