google · vmarkovtsev · May 18, 2023 · May 18, 2023 · May 18, 2023 · May 18, 2023
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
+.idea
 Makefile
 Makefile.in
 /ar-lib
@@ -72,6 +73,5 @@ libsentencepiece.so*
 libsentencepiece_train.so*
 python/bundled
 _sentencepiece.*.so
-third_party/abseil-cpp
 
 python/sentencepiece
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "third_party/abseil-cpp"]
+	path = third_party/abseil-cpp
+	url = https://github.com/abseil/abseil-cpp.git
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.!
 
-cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
+cmake_minimum_required(VERSION 3.9 FATAL_ERROR)
 file(STRINGS "VERSION.txt" SPM_VERSION)
 message(STATUS "VERSION: ${SPM_VERSION}")
 

diff --git a/README.md b/README.md
@@ -18,6 +18,30 @@ with the extension of direct training from raw sentences. SentencePiece allows u
 
 **This is not an official Google product.**
 
+## Vadim's notes
+
+Proper installation:
+
+```
+sudo apt install libgoogle-perftools-dev
+cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo -D SPM_USE_EXTERNAL_ABSL=off -D SPM_ENABLE_TCMALLOC=on -D SPM_ENABLE_NFKC_COMPILE=on ..
+```
+
+1. The built-in abseil's containers are aliases to stdlib. Building with a real abseil is WIP.
+2. Adding new spec options requires regenerating the protobuf sources.
+3. tcmalloc is a must. The stdlib's malloc fails to return the freed memory back to the system.
+4. nfkc compilation is needed to edit the normalization rules
+
+If you want to install the library and you're not using installing the python package globally, you may need to set the package configuration path:
+
+```
+make install
+ldconfig 
+cd ../python
+PKG_CONFIG_PATH=../build python -m pip install -ve .
+```
+
+
 ## Technical highlights
 - **Purely data driven**: SentencePiece trains tokenization and detokenization
   models from sentences. Pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is not always required.