Skip to content

Commit

Permalink
add simple lemmatizer token filter, move versions to gradle propertie…
Browse files Browse the repository at this point in the history
…s, clean docs dir
  • Loading branch information
jprante committed Feb 27, 2017
1 parent 87fc8b3 commit 93ed7cb
Show file tree
Hide file tree
Showing 122 changed files with 1,233 additions and 59,324 deletions.
28 changes: 26 additions & 2 deletions CREDITS.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,28 @@
Thanks to David Weiss for
The plugin bundle wouldn't be possible without the hard work of many authors
who generously published their work under an open source license.

https://github.com/dweiss/compound-splitter
This file should contain all the credits to them. If you miss a credit, please
notify me about it and it will be added as soon as possible.

The ICU analysis is heavily based on Apache Lucene ICU

https://github.com/apache/lucene-solr/tree/master/lucene/analysis/icu

The AutoPhraseTokenFilter is derived from

https://github.com/lucidworks/auto-phrase-tokenfilter

The ConcatTokenFilter is authored by Sujit Pal and was taken from

http://sujitpal.blogspot.de/2011/07/lucene-token-concatenating-tokenfilter_30.html

The Decompound token filter is a reworked implementation of the
link:http://wortschatz.uni-leipzig.de/~cbiemann/software/toolbox/Baseforms%20Tool.htm[Baseforms Tool]
found in the http://wortschatz.uni-leipzig.de/~cbiemann/software/toolbox/index.htm[ASV toolbox]
of http://asv.informatik.uni-leipzig.de/staff/Chris_Biemann[Chris Biemann],
Automatische Sprachverarbeitung of Leipzig University.

The FSA in package org.xbib.elastixsearch.common.fsa which provides the dictionary structure for
the baseform tokenizer is a derived version of

https://github.com/morfologik/morfologik-stemming/tree/master/morfologik-fsa/src/main/java/morfologik/fsa
Loading

0 comments on commit 93ed7cb

Please sign in to comment.