Separating workspace from TranslationModel #223

jerinphilip · 2021-09-22T15:59:35Z

WIP.

Status: Freeing things is sketchy. :D Might be overwriting existing model-weights by not namespacing scorer names properly.

…ntime segfault

XapaJIaMnu · 2021-09-22T16:28:08Z

A small test app? I want to see how exactly it gets used.

jerinphilip · 2021-09-22T16:35:47Z

Umm.. this is a draft PR so I can take something concrete to discussions.

There is a forward backward test-app (demonstrating working for outbound translation) which passes in tests.

bergamot-translator/src/tests/apps.cpp

Line 59 in cf541c6

    
           void forwardAndBackward(AsyncService &service, std::vector<Ptr<TranslationModel>> &models) {

However, I don't trust the current state in a multi-threaded high volume multiple models setting. So we are looking at a larger experiment where n > 1 models are in play on sources.shuf - translating to translated.tgt. I was thinking run each models individually to get translated.individual.tgt, and compare it with output in a multi model setting.

Due to batching differences and floating-point approximations I don't expect exact matches. Therefore it will be compared with relaxations (BLEU?).

The above is a large test-app, so will take some time.

In any case, there are known issues:

1, in namespacing which will cause overwrites which will need to be fixed by altering createScorers taking in a prefix perhaps. browsermt/marian-dev#57 (comment)
2, Freeing scorer created tensors/params upon model deletion. I'm not sure if this happens automatically (I suspect lifetime for these are beyond scorer's lifetimes hence leaking).

XapaJIaMnu · 2021-09-22T16:45:50Z

What's the problem with multiple threads? Each thread has its own scorer and graph?

jerinphilip · 2021-09-22T16:51:51Z

Each thread has its own scorer and graph?

No, scorer is a property of TranslationModel, tied to (nmt) architecture. Only graph has moved to thread (along with it the workspace).

What's the problem with multiple threads?

There are no problems with multiple threads. translateBatch is triggered from the worker who has the graph, so thread-safety should be okay. We have duplicates of scorers on each graph (for each thread), so that should be okay as well I suppose.

The problem is graph has a variable store - (string name, Tensor(?) value). name can conflict between "multiple translation models" active at the same time. It won't show up right now I'm guessing but might show up in high volume? In the simple case the first one might have gone to thread-1 and the second to thread-2, also being small-inputs never colliding and this overwrite creating funny behaviour won't show up.

jerinphilip · 2021-10-26T10:19:51Z

Too many ways to fire oneself in the foot with this one, closing.

Jerin Philip added 2 commits September 22, 2021 15:55

Moved graph to <>Service, scorerEnsemble model property. Compiles. Ru…

7fe1d11

…ntime segfault

Hardcode workspace size for now

6965da2

jerinphilip added the invalid This doesn't seem right label Oct 9, 2021

jerinphilip closed this Oct 26, 2021

jerinphilip mentioned this pull request Nov 3, 2021

Separate workspace from graph #257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separating workspace from TranslationModel #223

Separating workspace from TranslationModel #223

jerinphilip commented Sep 22, 2021 •

edited

Loading

XapaJIaMnu commented Sep 22, 2021

jerinphilip commented Sep 22, 2021 •

edited

Loading

XapaJIaMnu commented Sep 22, 2021

jerinphilip commented Sep 22, 2021 •

edited

Loading

jerinphilip commented Oct 26, 2021

Separating workspace from TranslationModel #223

Separating workspace from TranslationModel #223

Conversation

jerinphilip commented Sep 22, 2021 • edited Loading

XapaJIaMnu commented Sep 22, 2021

jerinphilip commented Sep 22, 2021 • edited Loading

XapaJIaMnu commented Sep 22, 2021

jerinphilip commented Sep 22, 2021 • edited Loading

jerinphilip commented Oct 26, 2021

jerinphilip commented Sep 22, 2021 •

edited

Loading

jerinphilip commented Sep 22, 2021 •

edited

Loading

jerinphilip commented Sep 22, 2021 •

edited

Loading