Make process improvements #59

gouenji-shuuya · 2023-03-12T12:24:19Z

Move to bash as that was implicitly expected (ref. #vidyut on Discord)
Some refactoring.
Sub-make is correctly called when using make create_all_data.
Use -jnproc in make.
Ignore venv in git.

akprasad · 2023-03-22T03:12:02Z

Thank you for this! Forgive the late look.

akprasad

Thank you very much!

akprasad · 2023-03-22T03:17:14Z

README.md

@@ -78,7 +78,7 @@ tests:
 ```shell
 $ git clone https://github.com/ambuda-org/vidyut.git
 $ cd vidyut
-$ make test
+$ make -j`nproc` test


My understanding is that -j controls the number of make recipes that run in parallel. If so, what is the benefit of using -j here?

Cargo can respect the jobserver settings. Refer rust-lang/rust#42682.

Even if the workflow is currently serial, there is a possibility of decoupling steps in future for faster builds.

For instance, create_kosha executes successfully before create_sandhi_rules, so there is no dependency (though the latter is quick). I think the cloning can also be parallelized, by use of recipes like this:

mkdir build get_corpus_data: @if [[ -e "data/raw/dcs" ]]; then \ echo "Training data already exists -- skipping fetch."; \ else \ echo "Training data does not exist -- fetching."; \ mkdir -p "data/raw/dcs"; \ git clone https://github.com/OliverHellwig/sanskrit.git \ --depth=1 build/dcs-data; \ mv build/dcs-data/dcs/data/conllu data/raw/dcs/conllu; \ fi get_linguistic_data: @if [[ -e "data/raw/lex" ]]; then \ echo "Lexical data already exists -- skipping fetch."; \ else \ echo "Lexical data does not exist -- fetching."; \ mkdir -p "data/raw/lex"; \ git clone https://github.com/sanskrit/data.git \ --depth=1 build/data-git; \ python3 build/data-git/bin/make_data.py \ --make_prefixed_verbals; \ mv build/data-git/all-data/* data/raw/lex; \ fi

So the create_all_data.sh file really is a bottleneck. There are several make commands there and they execute one by one. Maybe a bit of parallelization will help, like in test/train?

vidyut-cheda/scripts/fetch_training_data.py

- Move to bash as that was implicitly expected (ref. #vidyut on Discord) - Some refactoring. - Sub-make is correctly called when using make create_all_data. - Use -j`nproc` in make. - Ignore venv in git.

gouenji-shuuya force-pushed the make-fix branch 2 times, most recently from f38da63 to df9c149 Compare March 12, 2023 12:31

akprasad requested changes Mar 22, 2023

View reviewed changes

Make process improvements

7154b43

- Move to bash as that was implicitly expected (ref. #vidyut on Discord) - Some refactoring. - Sub-make is correctly called when using make create_all_data. - Use -j`nproc` in make. - Ignore venv in git.

gouenji-shuuya force-pushed the make-fix branch from 34b15c8 to 7154b43 Compare March 26, 2023 13:21

akprasad force-pushed the main branch 2 times, most recently from 7ddaafe to 6969526 Compare April 16, 2023 15:14

akprasad force-pushed the main branch from 973b8b1 to 75728aa Compare November 29, 2023 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make process improvements #59

Make process improvements #59

gouenji-shuuya commented Mar 12, 2023

akprasad commented Mar 22, 2023

akprasad left a comment

akprasad Mar 22, 2023

gouenji-shuuya Mar 26, 2023

Make process improvements #59

Are you sure you want to change the base?

Make process improvements #59

Conversation

gouenji-shuuya commented Mar 12, 2023

akprasad commented Mar 22, 2023

akprasad left a comment

Choose a reason for hiding this comment

akprasad Mar 22, 2023

Choose a reason for hiding this comment

gouenji-shuuya Mar 26, 2023

Choose a reason for hiding this comment