Refactoring NEO4J Schema for Fewer Data Types and Minimised Duplication #776
Closed
Arnedeklerk
started this conversation in
Ideas
Replies: 1 comment
-
Please discuss here: https://github.com/KnetMiner/knetminer/discussions/7 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background
This discussion is raised to stimulate some thoughts and trigger a discussion before we venture into planning any concrete actions. I only know so much about the technicalities of such changes and am unsure about (1) whether this is to be completed by us or a partner company, (2) how much of it is related to the Neo4j converter (I think most, hence the issue is here...), (3) how much of KM will need remapping after the fact and, (4) exact details on our previous discussions surrounding this topic - so feel free to add to the discussion below.
Our current NEO4J schema is functional, however, there's room for improvement, especially with respect to optimisation and simplification. We've spotted a need for:
Objectives
Fewer Data Types: An excess of data types can lead to complicated queries and challenges when integrating with new features. By streamlining the schema to fewer types, we could simplify the querying process, especially for our upcoming Latent Linguistic Modelling (LLM) Natural Language Processing (NLP) Querying feature.
All Required Data: As we endeavour to reduce data types, it's crucial to ensure that the schema encapsulates all necessary data fields. Identifying any missing data fields that are critical to our application's performance is a key aspect of this task. I know that the cyverse neo4j version is now outdated, so perhaps the data just needs updating there.
Minimised Duplication: Currently, our schema presents a significant level of data redundancy, which contributes unnecessary complexity. A refactoring process that identifies and eliminates these duplications will lead to more efficient data handling. I do, however, recall Marco mentioning this to be a difficult fix, because of the nature of how the data is being stored or handled.
Questions to Discuss
Is it feasible to restructure our schema? Where do we begin? I have this Google Sheet which I hoped we could use to understand the schema (Property Keys Sheet), but it's not very good and was initially to be used for a different purpose. There is probably a cleaner way to extract the full schema from Neo4j (though I think previously it didn't give me all the info I hoped for, to be investigated.). What effects could this restructuring have on the current version of Knetminer, and what implications might it carry for the future Knetminer 6.0?
Moving Forward
More exploration and expertise on the subject are needed to define concrete suggestions and a plan of action. A discussion during one of our meetings would be a good start.
Beta Was this translation helpful? Give feedback.
All reactions