Skip to content

Software Knowledge Repository

Lars Blümke edited this page Sep 18, 2017 · 54 revisions

Motivation

Until now, the analysis has used files to store information about the observed system. This had the disadvantage that when different analyses wanted to read and modify the information simultaneously, information might got lost or was overwritten. To solve this issue, we needed a knowledge repository which was able to handle concurrent access and different versions of the system. This article presents a solution created during the 2017 master project at Kiel University which uses Neo4j to serialize and query the information.

Foundations

Palladio Component Model

The Palladio Component Model which we will reference to as PCM in this article is a meta model for modelling component based software systems. It allows the modelling of different domains of a software system with different submodels. You can get more detailed information about it from its developer's website or this technical report. The different PCM submodels are:

  • Repository Model, which includes among others different components of a software system, their roles, interfaces between them and the signatures of interface methods.
  • System Model, which puts different components and their roles in concrete contexts. For example a component for placing orders in a webstore could be used in the context of private orders and also in the context of business orders.
  • Resource Environment Model, which includes different resource containers representing the available hardware components.
  • Allocation Model, which represents the actual allocation of those hardware components.
  • Usage Model, which represents use cases of the entire software system.

Neo4j

Short introduction, dependency to neo4j, propose separate neo4j installation + browser visualisation for developers who continue developing the repository and need to debug the provider. Neo4j is a graph database which means that instead of tables like a relational database it uses labeled graphs to store your data. You can find more detailed information and tutorials on the developer's website. Basically a Neo4j graph consists of nodes and directed edges which are called relationships in the context of Neo4j. Both can have multiple properties attached to them. Properties are key value pairs. Additionally, nodes can also have multiple labels attached to them. A label can be used to mark a certain role or a type.

In this project we used Neo4j 3.2.0 embedded in our Java application as described here. You can find the following line in the build.gradle file in the analysis root directory:

compile 'org.neo4j:neo4j:3.2.0'

If you continue working with Neo4j in this project, I recommend to also use the Neo4j desktop application which allows you to inspect the data in your database with an easy to use web interface which also provides a nice visualization of your graphs. Simply install the application, chose your database directory and open http://127.0.0.1:7474/ in your browser. If you choose to use the web interface you will also have to take a look at Cypher which is the query language for Neo4j graphs. For first steps

match (n) return n

will return you all nodes and the relationships between them, while

match (n) detach delete n

will delete all nodes and relationships from the graph.

Getting started

Basic Components

For getting a basic understanding of the knowledge repository and how it is working, we will introduce the basic components in this section. The goal of this implementation is to store PCM Ecore objects in a Neo4j graph database. So on the one hand we have the Neo4j graph database which is represented as a Graph object in the code and which we will simply reference to as a "Graph" in the following. On the other hand we have a class called ModelProvider which provides operations to create, read, update or delete PCM Ecore models in a Graph. A ModelProvider always belongs to exactly one Graph, whereas a Graph can be modified by an arbitrary number of ModelProviders.

The Graph

So how to get a Graph and a ModelProvider? Well, it depends. With the pipe-and-filter architecture of iObserve's analysis there is a chance that a Graph has already been created earlier and is passed to your stage or that you might even get an existing ModelProvider passed to your stage as it was done LINK!!! here. In this case you can continue with the section about the ModelProvider.

If this is not the case and you are the lucky one to set up the database, don't worry. There is a class called GraphLoader which will make it easy for you to get a Graph from an existing Neo4j database directory on your hard drive or to initialize a new Graph with a PCM Ecore model.

As you already know, a Neo4j graph database is basically a folder on your file system. This implementation stores each of the 5 PCM models in its own graph database, i.e. in its own folder. Additionally different versions of each model are supported which are again stored in different graph databases, i.e. different folders. To keep all these different models and their versions ordered the GraphLoader class uses the following file system structure. An arbitrary root directory is passed via the constructor of the GraphLoader.

GraphLoader graphLoader = new GraphLoader(new File("./basedir"));

In this base directory the GraphLoader will create subfolders for the different model types: basedir/repositorymodel, basedir/systemmodel, basedir/resourceenvironmentmodel and basedir/usagemodel. In such a subfolder the different versions of the particular model are stored in their Graph folder (which is their actual Neo4j database folder) named by the model type and the version, for example basedir/repositorymodel/repositorymodel_v1, basedir/repositorymodel/repositorymodel_v2, ... Note that these subfolders are only created when they are needed, i.e. when you call the appropriate method of the GraphLoader.

If you want to create a brand new database or if you already have a database in one of the described folders you can load the Graph with the get...ModelGraph method. If there are no Graph folders in the repositorymodel folder yet, this method will create a new one and return its Graph. It will always return the highest version number Graph.

Graph repositoryModelGraph = modelProvider.getRepositoryModelGraph();

If you already have a PCM Ecore model and want to store it in a Graph, you can also use the initialize...ModelGraph method which will store the model in the Graph and return the Graph including the model:

Repository repository = ...;
Graph repositoryModelGraph = modelProvider.initializeRepositoryModelGraph(repository);

Note that in this case an existing model in the Graph's folder will be overwritten.

The ModelProvider

As you should have a Graph by now, you can finally take a look at the most important component of this implementation, the ModelProvider. You can always create a new ModelProvider with its constructor. The following code creates a new ModelProvider for the Repository component of the PCM:

ModelProvider<Repository> modelProvider = new ModelProvider<>(graph);

Note that you have to specify the type of the PCM component and the Graph where you want to store it to or read it from. The Repository component is the root of the PCM Repository model, so it contains everything what is contained in this model. However, you might just be interested in certain parts of a model and don't want to always read the whole thing. For example, instead of reading the whole Repository model, you maybe just want to read a certain OperationInterface. No Problem, just create a suitable ModelProvider:

ModelProvider<OperationInterface> modelProvider = new ModelProvider<>(graph);

Implementation Details

Let's take a look at the most important classes that you will find in this package.

ModelProvider

This class contains the logic to create, read, update or delete EObjects in the graph database. How it is used is discussed later in this article. The key features of this class are that it is generic, i.e. all types and subtypes of PCM components are generally treated the same.

Graph

Graph is a container class which includes the Neo4j graph database service on one hand and its storage path in the file system on the other hand.

GraphLoader

This class is responsible for loading graphs, initializing graphs with given models or creating new versions of a graph. The basis for these methods is a predefined structure on the file system.

ModelProviderSynchronizer

Existing Tests for the Model Provider

An example bookstore application

TestModelBuilder

Existing test cases

Development Log

This log was created during the implementation and documents the development process.

Development Goals

1. Set up and Get Familiar with the Technologies to Be Used

The goal is to gain a general idea of the technologies to be used. Inspect the Palladio metamodel and iObserve analysis, get familiar with Neo4J and set everything up. A small prototype (can be found here) was implemented to get familiar with Neo4J and to get a first impression of how models could be mapped to graphs. This prototype can be expanded in the future to test features in a small environment.

2. Implementation of a First Version of an Embedded Neo4J Database to Store PCM Models

In this step the main focus is on

  1. serializing the different types of Palladio models in a Neo4J graph
  2. deserializing such a graph back to a model

For this reason a mapping from model to graph must be defined and implemented. The implementation takes place on this branch.

3. Design and Implementation of a Database API

The API shall provide access to the Neo4J graph database and provide an alternative approach to the existing implementation which uses files to store the models and ModelProvider classes to access them. Key features of the new API will be:

  1. CRUD Operations: The API shall provide basic CRUD operations.
  2. Basic versioning: The database shall include the current version of the models as well as several modified version of this current models resulting from the analysis.
  3. Partial access: Instead of always having to read the complete model from the database, requesting single parts shall be possible.

4. Test the API

All operations finally provided by the API have to be tested to make sure they work as intended and, for example, do not damage the models.

Schedule

Goals:

  • 1.6.
+ Implement partial create/read methods for repository model (reading without datatypes)
  • 8.6.
+ Implement read with datatypes
  • 15.6.
+ Implement delete
+ Test compatibility to other model types
  • 22.6.
+ Fix compatibility to other model types
+ Implement update (using delete and create)
+ Reimplement old providers with new api for compatibility 
  • 29.6.
- use emf proxy mechanism for references to other models
+ writing: store uri relative to root in the graph
+ reading: create "proxy object" using the uri stored in the node
- with partial reading no Resource objects are created so emf proxy mechansim is not 100% applicable
+ Come up with a concept for model versioning
  • 6.7.
+ Create concept for the basic lock mechanism with read-only and read-and-edit methods we discussed
+ Implement versioning
  • 13.7.
+ Implement basic lock mechanism
+ New update method (no longer delete + create) which now also works correctly with partial updates
  • 20.7.
+ Learn for exams (17.7. + 21.7.)
  • 27.7.
+ Learn for exam (26.7.)
  • 3.8. until 30.9.
+ Write tests
- Write wiki
+ Fix bugs