-
Notifications
You must be signed in to change notification settings - Fork 2
Machine Learning based Recognition
This module is responsible for the annotation of the tokenized judment, with help of machine learning (GermaNER). Furthermore, it processes those annotations to be able to use them as anonymizations. After all found anonymizations were reviewed and reworked by human interaction the training data is build of this information and appended to the already contained training data to later on retrain the model to improve the pre annotation. The functionality of GermaNER is also used for retraining the model.
To be able to run (e.g. with spring-boot:run
) the machine learning module, it is necessary to fulfill a few setup steps:
-
Clone
anonml-core from https://github.com/anon-ml/anonml-core.git - run
mvn clean install
to install it in your local maven repository
-
Clone
https://github.com/seyyaw/cleartk - run
mvn clean install -Dmaven.test.skip=true
to install Cleartk in your local maven repository
-
Clone
https://github.com/tudarmstadt-lt/GermaNER - run
mvn clean install -Drat.skip=true
to install GermaNER in your local maven repository
-
Clone
anonml-recognition-ml from https://github.com/anon-ml/anonml-recognition-ml.git - run
mvn clean install
- Download the feature file
- Place it in ./src/main/resources/GermaNER of the service module of the cloned anonml-recognition-ml project
- Open the AnnotationService.java (anonml-recognition-ml/service/src/main/java/ml/anon/recognition/machinelearning/service/AnnotationService.java)
- Edit the
pathToModel
constant (this constant should hold the path to the model.jar, on default it is contained in anonml-recognition-ml/service/src/main/resources/GermaNER/model/)
The Api paths to access the functionalities of the machine learning module. The ml based service is accessible under http://localhost:9003.
Method | Path | Result | Comment |
---|---|---|---|
GET | /ml/get/evaluation/data/ | evaluation data (F_1, Precision, Recall) | |
GET | /ml/get/evaluation/reset/ | resettet data | |
GET | /ml/get/doc/evaluation/{documentId} | evaluation data of the requested document | |
GET | /ml/get/training/data/ | training data (as String) | In format of GermaNER |
GET | /ml/retrain/ | true if retraining was successful | starts the retraining process of GermaNER with the saved training data |
GET | /ml/retrain/status | the time the training started (as String) | |
POST | /ml/annotate/{id} | a list of Anonymization objects | expected parameter is id of the actual document |
POST | /ml/update/training/data/{id} | true if the appending of training data was succesful | expected parameter is id of the reviewed and saved document |
POST | /ml/calculate/f/one/{id} | true if calculation worked | expected parameter is id of the reviewed and saved document and a list of correct Anonymization objects |
POST | /ml/post/training/data/{resetOld}/ | true if the send training data was appended | expected parameter is a string with training data in format of GermaNER training data and a boolean if the data should overwrite |