All JavaScript stemmers have been transpiled from Java implementation of Snowball stemming algorithms using ESJava transpiler.
This project provides not only pre-built JavaScript stemmers, but allows to create new ones.
Stemmers for 20+ languages are packed in one file in two ECMAScript standards:
You can test stemmers directly in online demo.
As there are several limitations of ESJava transpiler, the build process has to be complemented by pre- and post-transpiling tweaks.
- Unix-like OS (or Cygwin on Windows)
- Node.js + npm
- rsync (for syncing Snowball repository, required only in specific scenarios)
- perl (for generating Java code from Snowball algorithms (SBL files), required only in specific scenarios)
- Building Java stemmers from most recent Snowball stemmers
- Creating a Java bundle
- Tweaking the Java bundle
- Transpiling the Java bundle to JavaScript
- Modifying the transpiled JavaScript
- Building Java stemmers from most recent Snowball stemmers
- Building Java stemmers from custom Snowball stemmers
- Creating a Java bundle
- Adding custom Java stemmers into the bundle
- Tweaking the Java bundle
- Transpiling the Java bundle to JavaScript
- Modifying the transpiled JavaScript
git clone https://github.com/mazko/jssnowball.git
cd jssnowball/
make bundle
- Change directory to
jssnoball/snowball-master/
- Create new subfolder in the
algorithms
folder and copy there the given SBL file renamed tostem_Unicode.sbl
- Add stemmer configuration into
libstemmer/modules.txt
andlibstemmer/modules_utf8.txt
- Add stemmer to the GNUmakefile's
libstemmer_algorithms
variable - Compile the Snowball using
make dist
As ESJava can convert a single file only, all Java source files have to be bundled first.
git checkout -- js_snowball/eclipse/
make bundle
Copy the Java stemmer code from jssnoball/snowball-master/java/org/tartarus/snowball/ext/
into jssnowball/js_snowball/lib/snowball.bundle.java
.
It also recommended to remove unused code like copy_from
, hashCode
etc. Here is Eclipse EE Mars.1 Release (4.5.1) example:
source -> cleanup
There are some Java constructions that can't be translated to JavaScipt directly, e.g. reflection etc. Such fragments has to be tweaked a bit.
Fortunately, most of them are in the common code, not in stemmers themselves (except for finnishStemmer). They are wrapped inside :es6:
code :end:
and should be edited as suggested in comments.
On top of that, these further tweaks are required:
- removing package names in method references (
org.tartarus.snowball
,java.lang
) - removing some overloaded methods
The result should match the original snowball.bundle.java file.
npm i -g esjava babel-cli
npm i babel-preset-es2015 babel-plugin-transform-es2015-modules-umd
make esjava
In the final JavaScript files (stored in jssnowball/js_snowball/lib/
directory) it is necessary to replace s.length()
with s.length
in eq_s
and eq_s_b
methods. Otherwise the code returns a TypeError: s.length is not a function
.