Separate workspace from graph #257

jerinphilip · 2021-11-03T19:50:24Z

This issue documents the progress on separating workspace from the graph for purposes of making multiple models translation more memory efficient. This is the bergamot-translator counterpart of browsermt/marian-dev#57.

Background

For #209, in order to provide Mozilla more dev time, we checked in an inefficient but API-guaranteed (no changes) implementation supporting multiple models. By supporting multiple models, we mean the extension can hold models and manage them using a model required during translation. This was necessary for outbound translation, for which simultaneously two models were needed:

One to translate the existing page (eg: an English speaker trying to read a French page in English).
One to translate the inverse direction (eg: the same English speaker trying to input data into a French page in English, translated into French before they're sent out because the French website expects input in French).

The API works for this use case without having to kill and respawn a Service by parameterizing translate(...) calls by models.

(This is, not to be confused with "pivot-translation", where two models are used back to back going from languages A -> P; P -> B to translate A -> B. Pivoting requires stronger guarantees of sentence consistency for the functionality to seamlessly work.)

Inefficiency

Due to certain design issues in the Marian primitives available to us (where Marian only ever assumes one model, or an ensemble of models of the same language direction) for time being we have workspaces per model per thread while in an ideal scenario we only have workspaces per threads (for thread safety) and models share the workspaces as required.

For purposes of this issue, a graph describes the operations on a neural network and a workspace describes the storage used to store tensors (intermediate while computation - input matrices, activations following inputs etc, hence "workspace"). In simple words, Marian's workspace sits behind a graph (ie, each graph will have a constituent workspace). If we disconnect the graph (operations) from the workspace (storage) and bind the workspace to threads/workers we will have an efficient design. The most effort here is finding how best to cut Marian's existing API to suit this case.

The text was updated successfully, but these errors were encountered:

jerinphilip · 2021-11-03T19:57:34Z

So far, following #210 we've been pursuing how best to approach separating graph from the workspace.

#223 is a failed attempt at storing tensors in the same graph (there are thread-safety and leak issues here).
Another ongoing effort involves inserting an injection on TensorAllocator (which is what a workspace effectively boils down to) and is available in #225 with corresponding Marian changes in browsermt/marian-dev#58.

Recent internal discussions provides the following pointers from @kpu:

Scorers are light weight, it’s the memory behind them

setworkspace switching out memory. Shortlist anything allocator of the graph is put into workspace.

WrappedDevice might potentially be useful.

Keep a scorer for each model

One workspace per thread

Scorers are not threadsafe cause they own a graph, but have one per model per thread

Use WrappedDevice

@XapaJIaMnu suspects we might be using "Workspace" to store what are more permanent tensors which should stay with TranslationModel, like a prepared matrix for intgemm or something for shortlist.

jerinphilip · 2021-11-03T20:02:49Z

@XapaJIaMnu suspects we might be using "Workspace" to store what are more permanent tensors which should stay with TranslationModel, like a prepared matrix for intgemm or something for shortlist.

The next action-item is to

check if this is indeed the case, by trying a marian-model without intgemm (fp32) and without shortlist and checking if the translations are intact.

The above will be accomplished building on top of #225. concurrentMultimodelsIntensive runs multiple models in parallel over threads and checks if the output is consistent with how it would be if they were run on a single-thread with no other model operating on the shared-workspace. If a second model overwrites something on the workspace that belongs to the first model this would corrupt the translations. #225 also provides additional logging through browsermt/marian-dev#58 on when tensors are cleared or created.

jerinphilip · 2021-11-10T13:58:45Z

Adding some more information on related to this here:

Everything is intgemm interface, if you fix that you would be able to work with on-demand models (fp32 npzs). Get things to work with fp32 models without a shortlist. Next step, fp32 with a shortlist, it will require changing this spaghetti. https://github.com/browsermt/marian-dev/blob/200e81c0cc88259c540b96afc6e0867cb05570b0/src/layers/generic.cpp#L294

XapaJIaMnu · 2021-11-10T16:19:16Z

THe layers generic is not quite related to this issue (somewhat related, but not completely relevant)

jerinphilip · 2022-01-31T10:34:20Z

Note to self: base models without 8-bit stuff are available for download from https://github.com/browsermt/students/blob/962d419729ff5145b2ba30be182c1e90991ca7a4/deen/download-models.sh.

jerinphilip added mod: marian Changes affecting marian-dev component mod: bergamot Changes affecting bergamot-library labels Nov 3, 2021

This was referenced Dec 23, 2021

Cannot create multiple instances of AsyncService #290

Open

Loading time is really slow with large thread count once again #293

Closed

jerinphilip added the high-priority Needs to be done at the earliest label Jan 10, 2022

jerinphilip mentioned this issue Jan 26, 2022

Batteries included python package #310

Merged

jerinphilip mentioned this issue Feb 8, 2022

Support model-loading from .npz files #341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate workspace from graph #257

Separate workspace from graph #257

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Nov 10, 2021

XapaJIaMnu commented Nov 10, 2021

jerinphilip commented Jan 31, 2022 •

edited

Loading

Separate workspace from graph #257

Separate workspace from graph #257

Comments

jerinphilip commented Nov 3, 2021 • edited Loading

Background

Inefficiency

jerinphilip commented Nov 3, 2021 • edited Loading

jerinphilip commented Nov 3, 2021 • edited Loading

jerinphilip commented Nov 10, 2021

XapaJIaMnu commented Nov 10, 2021

jerinphilip commented Jan 31, 2022 • edited Loading

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Nov 3, 2021 •

edited

Loading

jerinphilip commented Jan 31, 2022 •

edited

Loading