Open Source, In-Memory Graph Query:
TranQL will demonstrate a prototype in-memory query capability developed entirely with open source software operating over Translator BioLink-Model compliant graphs. This design will serve as the basis for a horizontally scalable Translator query architecture enabling it to execute computations with general query graphs without constraining the query to s…
TranQL will demonstrate a prototype in-memory query capability developed entirely with open source software operating over Translator BioLink-Model compliant graphs. This design will serve as the basis for a horizontally scalable Translator query architecture enabling it to execute computations with general query graphs without constraining the query to specific identifiers as is currently necessary. Current scalability constraints are the result of cost vs. scale tradeoffs entailed in our dependence on proprietary software. For the prototype we will demonstrate ingesting data generated by the KGX data transformation toolkit. We will run Spark on the many core architecture of the Arrival server at RENCI which is part of our Kubernetes cluster and has 160 logical cores and terabytes of SSD storage. This work entails exporting data sets from existing BioLink-Model compliant databases and investigating approaches for importing those data sets into Spark, evaluating existing Spark based graph query libraries, and creating customized installation procedures for installing Spark with libraries required for a KGX data integration pipeline. This will serve as the basis of a KGS OpenAPI providing query to the consortium. At each step, we will compare Spark’s graph query capabilities, maturity, commitment to openness, and tool ecosystem to RedisGraph and other emerging alternatives to determine the most robust course towards reliable, scalable, open source graph query.