Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating workspace from TranslationModel #223

Closed
wants to merge 2 commits into from

Conversation

jerinphilip
Copy link
Contributor

@jerinphilip jerinphilip commented Sep 22, 2021

WIP.

Status: Freeing things is sketchy. :D Might be overwriting existing model-weights by not namespacing scorer names properly.

@XapaJIaMnu
Copy link
Collaborator

A small test app? I want to see how exactly it gets used.

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Sep 22, 2021

Umm.. this is a draft PR so I can take something concrete to discussions.

There is a forward backward test-app (demonstrating working for outbound translation) which passes in tests.

void forwardAndBackward(AsyncService &service, std::vector<Ptr<TranslationModel>> &models) {

However, I don't trust the current state in a multi-threaded high volume multiple models setting. So we are looking at a larger experiment where n > 1 models are in play on sources.shuf - translating to translated.tgt. I was thinking run each models individually to get translated.individual.tgt, and compare it with output in a multi model setting.

Due to batching differences and floating-point approximations I don't expect exact matches. Therefore it will be compared with relaxations (BLEU?).

The above is a large test-app, so will take some time.

In any case, there are known issues:

1, in namespacing which will cause overwrites which will need to be fixed by altering createScorers taking in a prefix perhaps. browsermt/marian-dev#57 (comment)
2, Freeing scorer created tensors/params upon model deletion. I'm not sure if this happens automatically (I suspect lifetime for these are beyond scorer's lifetimes hence leaking).

@XapaJIaMnu
Copy link
Collaborator

What's the problem with multiple threads? Each thread has its own scorer and graph?

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Sep 22, 2021

Each thread has its own scorer and graph?

No, scorer is a property of TranslationModel, tied to (nmt) architecture. Only graph has moved to thread (along with it the workspace).

What's the problem with multiple threads?

There are no problems with multiple threads. translateBatch is triggered from the worker who has the graph, so thread-safety should be okay. We have duplicates of scorers on each graph (for each thread), so that should be okay as well I suppose.

The problem is graph has a variable store - (string name, Tensor(?) value). name can conflict between "multiple translation models" active at the same time. It won't show up right now I'm guessing but might show up in high volume? In the simple case the first one might have gone to thread-1 and the second to thread-2, also being small-inputs never colliding and this overwrite creating funny behaviour won't show up.

@jerinphilip jerinphilip added the invalid This doesn't seem right label Oct 9, 2021
@jerinphilip
Copy link
Contributor Author

Too many ways to fire oneself in the foot with this one, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants