-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate workspace from graph #257
Comments
So far, following #210 we've been pursuing how best to approach separating graph from the workspace. #223 is a failed attempt at storing tensors in the same graph (there are thread-safety and leak issues here). Recent internal discussions provides the following pointers from @kpu:
@XapaJIaMnu suspects we might be using "Workspace" to store what are more permanent tensors which should stay with |
The next action-item is to
The above will be accomplished building on top of #225. |
Adding some more information on related to this here:
|
THe layers generic is not quite related to this issue (somewhat related, but not completely relevant) |
Note to self: base models without 8-bit stuff are available for download from https://github.com/browsermt/students/blob/962d419729ff5145b2ba30be182c1e90991ca7a4/deen/download-models.sh. |
This issue documents the progress on separating workspace from the graph for purposes of making multiple models translation more memory efficient. This is the bergamot-translator counterpart of browsermt/marian-dev#57.
Background
For #209, in order to provide Mozilla more dev time, we checked in an inefficient but API-guaranteed (no changes) implementation supporting multiple models. By supporting multiple models, we mean the extension can hold models and manage them using a model required during translation. This was necessary for outbound translation, for which simultaneously two models were needed:
The API works for this use case without having to kill and respawn a
Service
by parameterizingtranslate(...)
calls by models.(This is, not to be confused with "pivot-translation", where two models are used back to back going from languages
A -> P; P -> B
to translateA -> B
. Pivoting requires stronger guarantees of sentence consistency for the functionality to seamlessly work.)Inefficiency
Due to certain design issues in the Marian primitives available to us (where Marian only ever assumes one model, or an ensemble of models of the same language direction) for time being we have workspaces per model per thread while in an ideal scenario we only have workspaces per threads (for thread safety) and models share the workspaces as required.
For purposes of this issue, a graph describes the operations on a neural network and a workspace describes the storage used to store tensors (intermediate while computation - input matrices, activations following inputs etc, hence "workspace"). In simple words, Marian's workspace sits behind a graph (ie, each graph will have a constituent workspace). If we disconnect the graph (operations) from the workspace (storage) and bind the workspace to threads/workers we will have an efficient design. The most effort here is finding how best to cut Marian's existing API to suit this case.
The text was updated successfully, but these errors were encountered: