🍅 tomoto - high performance topic modeling - for Ruby
Add this line to your application’s Gemfile:
gem "tomoto"
Train a model
model = Tomoto::LDA.new(k: 2)
model.add_doc(["tokens", "from", "document", "one"])
model.add_doc(["tokens", "from", "document", "two"])
model.add_doc(["tokens", "from", "document", "three"])
model.train(100) # iterations
Get the summary
model.summary
Get topic words
model.topic_words
Save the model to a file
model.save("model.bin")
Load the model from a file
model = Tomoto::LDA.load("model.bin")
Get topic probabilities for a document
doc = model.docs[0]
doc.topics
Get the number of words for each topic
model.count_by_topics
Get the vocab
model.vocabs
Get the log likelihood per word
model.ll_per_word
Perform inference for unseen documents
doc = model.make_doc(["unseen", "doc"])
topic_dist, ll = model.infer(doc)
Supports:
- Latent Dirichlet Allocation (
LDA
) - Labeled LDA (
LLDA
) - Partially Labeled LDA (
PLDA
) - Supervised LDA (
SLDA
) - Dirichlet Multinomial Regression (
DMR
) - Generalized Dirichlet Multinomial Regression (
GDMR
) - Hierarchical Dirichlet Process (
HDP
) - Hierarchical LDA (
HLDA
) - Multi Grain LDA (
MGLDA
) - Pachinko Allocation (
PA
) - Hierarchical PA (
HPA
) - Correlated Topic Model (
CT
) - Dynamic Topic Model (
DT
)
This library follows the tomotopy API. There are a few changes to make it more Ruby-like:
- The
get_
prefix has been removed from methods (topic_words
instead ofget_topic_words
) - Methods that return booleans use
?
instead ofis_
(live_topic?
instead ofis_live_topic
)
If a method or option you need isn’t supported, feel free to open an issue.
tomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:
Tomoto.isa
Choose a parallelism algorithm with:
model.train(parallel: :partition)
Supported values are :default
, :none
, :copy_merge
, and :partition
.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone --recursive https://github.com/ankane/tomoto-ruby.git
cd tomoto-ruby
bundle install
bundle exec rake compile
bundle exec rake test