Skip to content

Machine Learning based Recognition

Matthias Schildwächter edited this page Oct 17, 2017 · 8 revisions

Description

This module is responsible for the annotation of the tokenized judment, with help of machine learning (GermaNER). Furthermore, it processes those annotations to be able to use them as anonymizations. After all found anonymizations were reviewed and reworked by human interaction the training data is build of this information and appended to the already contained training data to later on retrain the model to improve the pre annotation. The functionality of GermaNER is also used for retraining the model.

Installation

To be able to run (e.g. with spring-boot:run) the machine learning module, it is necessary to fulfill a few setup steps:

Install anonml-core

  1. Clone anonml-core from https://github.com/anon-ml/anonml-core.git
  2. run mvn clean install to install it in your local maven repository

Install Cleartk (adjusted version)

  1. Clone https://github.com/seyyaw/cleartk
  2. run mvn clean install -Dmaven.test.skip=true to install Cleartk in your local maven repository

Install GermaNER

  1. Clone https://github.com/tudarmstadt-lt/GermaNER
  2. run mvn clean install -Drat.skip=true to install GermaNER in your local maven repository

Install the project itself

  1. Clone anonml-recognition-ml from https://github.com/anon-ml/anonml-recognition-ml.git
  2. run mvn clean install

Feature file

  1. Download the feature file
  2. Place it in ./src/main/resources/GermaNER of the service module of the cloned anonml-recognition-ml project

Model.jar position

  1. Open the AnnotationService.java (anonml-recognition-ml/service/src/main/java/ml/anon/recognition/machinelearning/service/AnnotationService.java)
  2. Edit the pathToModel constant (this constant should hold the path to the model.jar, on default it is contained in anonml-recognition-ml/service/src/main/resources/GermaNER/model/)

API

The Api paths to access the functionalities of the machine learning module. The ml based service is accessible under http://localhost:9003.

Method Path Result Comment
GET /ml/get/evaluation/data/ evaluation data (F_1, Precision, Recall)
GET /ml/get/evaluation/reset/ resettet data
GET /ml/get/doc/evaluation/{documentId} evaluation data of the requested document
GET /ml/get/training/data/ training data (as String) In format of GermaNER
GET /ml/retrain/ true if retraining was successful starts the retraining process of GermaNER with the saved training data
GET /ml/retrain/status the time the training started (as String)
POST /ml/annotate/{id} a list of Anonymization objects expected parameter is id of the actual document
POST /ml/update/training/data/{id} true if the appending of training data was succesful expected parameter is id of the reviewed and saved document
POST /ml/calculate/f/one/{id} true if calculation worked expected parameter is id of the reviewed and saved document and a list of correct Anonymization objects
POST /ml/post/training/data/{resetOld}/ true if the send training data was appended expected parameter is a string with training data in format of GermaNER training data and a boolean if the data should overwrite