Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python extension #26

Open
lucianolorenti opened this issue Dec 9, 2019 · 20 comments
Open

Python extension #26

lucianolorenti opened this issue Dec 9, 2019 · 20 comments

Comments

@lucianolorenti
Copy link

In case is anyone interested. I've made a python extension out of this code. It is more or less the same code, except it is wrapped with python-boost. And it avoids all the intermediate files.
You can use it something like this:

import btm
number_of_topics = 2
alpha = 50/2
beta = 0.0005
n_iters = 50000
btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, 3, True)
btm_model.fit(["sentence 1", "sentence 2", "sentence 2"])
pz = btm_model.get_pz()
pw_z = btm_model.get_pw_z( )
vocabulary = btm_model.vocabulary()
b = btm_model.predict(["ANother sentence"], "sum_b")
@Logos23333
Copy link

I tried your version but something is wrong.

output.txt

@lucianolorenti
Copy link
Author

lucianolorenti commented Dec 10, 2019

I think there are two issues: The first one is that I forgot to add the __init__.py file.
And the second is that

This file requires compiler and library support for the ISO C++ 2011 standard.
I was using gcc 9.2.0 which I suppose it uses c++11 as default. Now I added the init.py file and the explicit argument -std=c++11.
Tell me if not works for you

@Logos23333
Copy link

I think there are two issues: The first one is that I forgot to add the __init__.py file.
And the second is that

This file requires compiler and library support for the ISO C++ 2011 standard.
I was using gcc 9.2.0 which I suppose it uses c++11 as default. Now I added the init.py file and the explicit argument -std=c++11.
Tell me if not works for you

It works, successfully installed btm-0.1.0, thanks for your solution.

@Logos23333
Copy link

In case is anyone interested. I've made a python extension out of this code. It is more or less the same code, except it is wrapped with python-boost. And it avoids all the intermediate files.
You can use it something like this:

import btm
number_of_topics = 2
alpha = 50/2
beta = 0.0005
n_iters = 50000
btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, 3, True)
btm_model.fit(["sentence 1", "sentence 2", "sentence 2"])
pz = btm_model.get_pz()
pw_z = btm_model.get_pw_z( )
vocabulary = btm_model.vocabulary()
b = btm_model.predict(["ANother sentence"], "sum_b")

when i run the example code above, i got something like this:
ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 1 of 50001 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 2 of 50001
Is it the expected result or not?

@lucianolorenti
Copy link
Author

lucianolorenti commented Dec 10, 2019

No is not. Somehow is accessing the pvec in the position 3 when it has only 3 elements. I am going to try in another PC to see if I get the same error.

I've tried with another ArchLinux and it worked. I'm going to try with an ubuntu.

@lucianolorenti
Copy link
Author

lucianolorenti commented Dec 18, 2019

I tried in a Debian 10. And the version of boost-python was old. I had to recompile boost-python in order to work. But apart from that, I did not have any other problem. I don't know what is happening in your case.

@ChangweiZhou
Copy link

Hi!

I tried but the code is not working. It says:

C:\train\B-Python>pip install .
Processing c:\train\b-python
Building wheels for collected packages: btm
Running setup.py bdist_wheel for btm ... error
Complete output from command C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\07390\AppData\Local\Temp\pip-wheel-tgike6d6 --python-tag
cp37:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\btm
copying btm_init_.py -> build\lib.win-amd64-3.7\btm
running build_ext
building 'btm_cpp' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\btm
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3
/GL /DNDEBUG /MT -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\include -IC:\Users\07390\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
model.cpp
C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失数

C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2


Failed building wheel for btm
Running setup.py clean for btm
Failed to build btm
Installing collected packages: btm
Found existing installation: btm 1.0.15
Uninstalling btm-1.0.15:
Successfully uninstalled btm-1.0.15
Running setup.py install for btm ... error
Complete output from command C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\07390\AppData\Local\Temp\pip-record-bydnsmqf\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\btm
copying btm_init_.py -> build\lib.win-amd64-3.7\btm
running build_ext
building 'btm_cpp' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\btm
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\include -IC:\Users\07390\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
model.cpp
C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失
数据
C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2

----------------------------------------

Rolling back uninstall of btm
Command "C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\07390\AppData\Local\Temp\pip-record-bydnsmqf\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\

Not sure what is wrong.

@lucianolorenti
Copy link
Author

For what I see
The compiler can't find the boost numpy headers

...model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory

Do you have boost correctly installed? And did you add the headers path to the include path dir?

@ChangweiZhou
Copy link

Hi!

I installed boost, but I do not know how to add the header path to the include path directory.

@ChangweiZhou
Copy link

So I tried to install boost using anaconda, and again it does not work:

(d2l) C:\train\B-Python>pip install .
Processing c:\train\b-python
Building wheels for collected packages: btm
Building wheel for btm (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\07390\AppData\Local\Temp\pip-wheel-4iss4_6i' --python-tag cp37
cwd: C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3
Complete output (18 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\btm
copying btm_init_.py -> build\lib.win-amd64-3.7\btm
running build_ext
building 'btm_cpp' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\btm
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3
/GL /DNDEBUG /MD -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\envs\d2l\include -IC:\Users\07390\Anaconda3\envs\d2l\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
model.cpp
C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失数

C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2

ERROR: Failed building wheel for btm
Running setup.py clean for btm
Failed to build btm
Installing collected packages: btm
Found existing installation: btm 1.0.15
Uninstalling btm-1.0.15:
Successfully uninstalled btm-1.0.15
Running setup.py install for btm ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\07390\AppData\Local\Temp\pip-record-hlpib9u3\install-record.txt' --single-version-externally-managed --compile
cwd: C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3
Complete output (18 lines):
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\btm
copying btm_init_.py -> build\lib.win-amd64-3.7\btm
running build_ext
building 'btm_cpp' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\btm
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\envs\d2l\include -IC:\Users\07390\Anaconda3\envs\d2l\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program
Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11
cl: 命令行 warning D9002 :忽略未知选项“-std=c++11”
model.cpp
C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失
数据
C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
Rolling back uninstall of btm
Moving to c:\users\07390\anaconda3\envs\d2l\lib\site-packages\btm-1.0.15.dist-info
from c:\users\07390\anaconda3\envs\d2l\lib\site-packages~tm-1.0.15.dist-info
Moving to c:\users\07390\anaconda3\envs\d2l\lib\site-packages\btm
from c:\users\07390\anaconda3\envs\d2l\lib\site-packages~tm
ERROR: Command errored out with exit status 1: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\07390\AppData\Local\Temp\pip-record-hlpib9u3\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.

I have also included the directory of boost in the system path variable.

@lucianolorenti
Copy link
Author

The include path are the paths where the compiler looks for headers file (the .h files). It is not related to the system path which are the paths where the operating system looks for executables. I will try to add a configuration file to specify these paths and make the compilation easier.

In the meantime you can edit the setup.py and add it yourself.


btm_cpp = Extension('btm_cpp',
                    define_macros = [('MAJOR_VERSION', '1'),
                                     ('MINOR_VERSION', '0')],
                    libraries = ['boost_python3', 'boost_numpy3'],
                    language='c++11',
+                   include_dirs=[ THE_PATH_WHERE_THE_BOOST_HEADERS_ARE_LOCATED ],
+                   library_dirs=[ THE_PATH_WHERE_THE_BOOST_LIBRARIES_ARE_LOCATED],
                    extra_compile_args=extra_compile_args,
                    sources = ['btm/model.cpp','btm/infer.cpp'])

The THE_PATH_WHERE_THE_BOOST_HEADERS_ARE_LOCATED should end with an include, i.e.
C:\something\something\include
The THE_PATH_WHERE_THE_LIBRARIES_ARE_LOCATED perhaps end with a bin, i.e.
'C:\something\something\bin'. It should be a folder with a lot of dll

Depending on how boost was installed you probably will need to change the name of the libraries [''boost_numpy3', 'boost_python3']
This names make references to library files (in this case .dll files). For example, In the case of boost_numpy3 the last step of the compilation (the linker) will search for libboost_numpy3.dll, perhaps in your machine the file is called libboost_numpy.dll and you should change the libraries in setup.py to 'boost_numpy'

@ChangweiZhou
Copy link

Hi!

Thanks. I have a question:

In your set up:

btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, 3, True)

What does 3 mean at here? Should not all the parameters be fixed already?

@lucianolorenti
Copy link
Author

Hi!
It is a parameter that does nothing :S.
Is what it was the save_step in the original code. But in my fork nothing is saved in intermediate iterations.

@ChangweiZhou
Copy link

Hi!

Thanks for the prompt reply. Wish you are safe!

I am giving a try with this on a large data set. One question - is it possible for this to be displaying progressing bars like tqdm? So far I am not seeing any indicator at all. Since training a large model takes a lot of time, I feel this could be useful.

@lucianolorenti
Copy link
Author

That's odd. The progress bar is the same that in the original code, I can see it.
I just pushed a few commits removing the save_step parameter and add a boolean show_progressbar to make the progress bar optional. Because previously the progress bar was always present.
Also now is also possible to do this:

btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, background_topic, show_progressbar)
btm_model.initialize(["sentence 1", "sentence 2", "sentence 2"])
for j in range(500):
    btm_model.fit_step()

To perform the fit steps in python. The fit_step performs only one pass of the algorithm.

@ChangweiZhou
Copy link

It is wierd. Here is a public ipynb file:

https://colab.research.google.com/drive/1Rr2WsY7MRy3Pin8Eak9HNa6rddBLSn07

I tried your commands but it says


NameError Traceback (most recent call last)
in ()
----> 1 get_ipython().run_cell_magic('time', '', '\nnumber_of_topics = 2\nalpha = 50/2\nbeta = 0.0005\nn_iters = 50000\nbtm_model = btm.Model(number_of_topics, alpha, beta, n_iters, background_topic, show_progressbar)\nbtm_model.initialize(["sentence 1", "sentence 2", "sentence 2"])\nfor j in range(500):\n btm_model.fit_step()')

2 frames
in time(self, line, cell, local_ns)

/usr/local/lib/python3.6/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None

in ()

NameError: name 'background_topic' is not defined

@ChangweiZhou
Copy link

I am training using Google colab, not windows. So theoretically the issue should be from Google colab.

@lucianolorenti
Copy link
Author

lucianolorenti commented Mar 17, 2020

You did not define the background_topic variable.
Follow the readme thoughtfully.

I run it in google colab and is working

@ChangweiZhou
Copy link

Thanks! I figured out how to use it now. The second method works for me.

Quick question: Is it possible to speed up the training using GPU/TPU? I know it uses Gibbs sampling in the background. Just wondering if we can speed up the training process since colab offer GPU/TPU support.

@Huakui-Zhang
Copy link

Huakui-Zhang commented May 26, 2020

when i run the example code above, i got something like this:
ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 1 of 50001 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 2 of 50001
Is it the expected result or not?

@Logos23333 I've encountered the same problem. And I found out that this is caused by the following line of code

this->w2id[w] = this->w2id.size();

in line 118 in model.cpp. For example, when this->w2id is empty, i.e., its size is 0, the above code will assign this->w2id[w] to 1. That is. the resultant ids of the words are one greater than the expected ids, which causes the index out of boundary error. However, since I am not too familiar with c++, I am not sure why I run into this. The line of code can be changed to the following to avoid the error:

int new_id = this->w2id.size();
this->w2id[w] = new_id;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants