This is a really clumsy attempt at porting Sphinx built-in stemmers to PHP without losing much in performance.
Almost every stemming algorithm that's implemented in Sphinx as of 2.0.6-release version is available here as well (stem_cz
is an exception).
Building procedure for Unix-like systems is quite usual, but you need to build libsphinx
first:
cd <extension path>
./build-libsphinx.sh
phpize
./configure
make
sudo make install
Make sure you've got wget
and autoconf
installed, build-libsphinx.sh
and phpize
scripts just won't work otherwise!
Also, C++ compiler is required to build this extension (and libsphinx
too). GCC is known to work.
If you build this as a shared library, do not forget to add yaus.so
to your php.ini
file!
API is pretty old-fashioned, this may be changed in the future.
<?php
/**
* gets the stem of a russian word
*/
function stemword_ru($russian_word);
/**
* gets the stem of an english word
*/
function stemword_en($english_word);
/**
* gets the stem of a russian or english word
*/
function stemword_enru($russian_or_english_word);
/**
* gets the stem of an english word using soundex algorithm
*/
function stemword_soundex($english_word);
/**
* gets the stem of an english word using metaphone algorithm
*/
function stemword_dmetaphone($english_word);
/**
* initializes the snowball stemmer
* returns a resource
*/
function stemword_snowball_new($algorithm, $encoding = 'UTF_8');
/**
* gets the stem of a word using stemmer resource
*/
function stemword_snowball_stem($stemmer, $word);
/**
* gets the list of available stemming algorithms via libstemmer
*/
function stemword_snowball_algorithm_list();
/**
* frees the stemmer resource
*/
function stemword_snowball_delete($stemmer);
-
v 0.5.0
-
PHP 7 compatible
-
$is_utf8
parameter is eliminated -
windows builds are not supported for the time being, as building sphinx 2.2.10 with VC14 requires some serious patching
-
-
v 0.4.0
-
built-in russian stemmer for cp1251 encoding is not supported anymore
-
$is_utf8
parameter is ignored for now -
this is the last version compatible with PHP 5
-
-
v 0.3.3
-
updated libsphinx to 2.2.10
-
changed
stemword_dmetaphone
to be compatible with new libsphinx -
fixed building procedure yet again
-
-
v 0.3.2
- fixed building procedure with new automake
-
v 0.3.1
- fixed phpinfo() formatting
-
v 0.3.0
-
added the rest of builtin sphinx stemmers (except
stem_cz
) -
added stemmers from
libsphinx_c
-
fixed memory allocation for strings that passed into russian utf-8 stemmer
-
added Windows support (build process is not documented yet); prebuilt shared library is available for download
-
-
v 0.2.1
Minor portability fixes.
-
v 0.2.0
Added
$is_utf8
parameter to stem functions for russian words. -
v 0.1
Yay, first release!
-
building guide for Windows
-
lemmatizer interface
-
OOP-style interface