Skip to content

Creating an ElasticSearch index from UIMA analysis results

Erik edited this page Oct 18, 2016 · 2 revisions

We use the semedico-app UIMA pipeline(s) to create an ElasticSearch index on basis of UIMA NLP analysis results.

The semedico-app uses the internal JULIE Lab database reader to retrieve the actual NLP analysis results from our PostgreSQL database. The database is filled using the jules-preprocessing-pipelines project.

The central mechanics for the creation of an ElasticSearch index are given by the jules-cas-to-elasticsearch-consumer. This project offers code to - more or less - easily create Document objects. Such documents are modelled closely to ElasticSearch / Lucene documents in that each document is basically a collection of named fields. As such, a document could be interpreted as a row in a conventional database table.

Documents are created by writing extensions of the FieldsGenerator class and using them with JsonWriter (creates JSON files, very well for development of FieldsGenerators) or ElasticSearchConsumer.

Currently, we need to write an appropriate FieldsGenerator for index format GePi will use.

Clone this wiki locally