-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate workspace/allocations from graph #57
Comments
Found this function, which replaces marian-dev/src/graph/expression_graph.h Lines 242 to 244 in 62bac85
The following example snippet from translator is what eventually makes way into bergamot-translator. marian-dev/src/translator/translator.h Lines 96 to 120 in 62bac85
This is super stateful, taking a shot at figuring what's happening below: // creates empty vessel
auto graph = New<ExpressionGraph>(/*inference=*/true);
auto prec = options_->get<std::vector<std::string>>("precision", {"float32"});
graph->setDefaultElementType(typeFromString(prec[0]));
// configures device (CPU/GPU) and prepares a Tensors/TensorAllocator.
graph->setDevice(device);
// GEMM config stuff.
graph->getBackend()->configureDevice(options_);
// Grows the static storage on the Tensors/TensorAllocator and manually
// manages allocations ahead on this to minimize lot of malloc troubles.
graph->reserveWorkspaceMB(options_->get<size_t>("workspace"));
// Could replace with a graph supplied from worker externally?
// Overwrites an already allocated something, which hasn't grown to workspace-size.
// (Why are things like this?).
graph->reuseWorkspace(workerGraph);
// Later down the line
scorer->init(graph);
// ^ Only here are the notions of an architecture (transformer, s2s etc coming).
// So this should be where graph is constructed. Allocations done.
// When are these deallocated?
The above has to be "dynamic", i.e, check if the worker already has a graph and related created for the particular We additionally want the allocations created on this workerGraph by a given Current understanding is that There is a long-term and short-term memory of which long-term corresponds to constant nodes which are not "parameters" (Can't all weights be treated thus in inference mode?). Short-term memory is everything else, so should be gradients, parameters being modified, inputs and that sort. I'm guessing keeping long-term has performance benefits. Where are these getting freed? The following clears everything except longterm-storage. marian-dev/src/graph/expression_graph.h Lines 518 to 527 in 62bac85
I understand this to be triggered every BeamSearch call. marian-dev/src/translator/beam_search.cpp Lines 265 to 267 in 62bac85
Marian operates always with one-model? Does this imply that the long-term created corresponding to constants or params are alive forever? marian-dev/src/graph/expression_graph.h Line 154 in 62bac85
The above function is available to clear everything in long-term, but is unused in source(?). |
If I understand the graph subsystem to be an EDSL with storage using some hash-map of string-identifiers and corresponding allocated tensors, there seems to be a way to hijack creating scorers onto the same graph by modifying the marian-dev/src/translator/scorers.cpp Lines 114 to 131 in 62bac85
I expect all further parameters would be prefixed (namespaced) by this string Do I create a new method to find all variable names by prefix and trigger a free along with TranslationModel destruction? Seems like an idea. I can filter for keys in the storage and manually free them? |
Global config object fail. I'm guessing I have to fix API to take only parameters required (this is common to TranslationModels, I no longer have Options to create this. Going to do horror for now). marian-dev/src/tensors/cpu/backend.h Lines 63 to 67 in 62bac85
|
The case of same arch models being init on same graph. If params dim match, they get loaded and all sorts of funny things can happen? marian-dev/src/graph/expression_graph.h Lines 349 to 368 in 62bac85
This can potentially be fixed by providing an alternate way to createScorers namespacing these correctly? marian-dev/src/translator/scorers.cpp Lines 114 to 131 in 62bac85
|
@XapaJIaMnu suspects the above namespace hijack might not be feasible due to possible AD assertions of all nodes in the graph to be connected. In this case, the map containing variable names to tensors is with expression-graph, at: marian-dev/src/graph/expression_graph.h Line 196 in 62bac85
In this case, we can have dummy marian-dev/src/graph/expression_graph.h Lines 242 to 244 in 62bac85
Nobody appears to be using the above function, at least in master, so uncharted territory. On the bright side, we can do with this function what we want? @graemenail mentioned some concern about this not inclusive of cache, but looks like it is: marian-dev/src/graph/expression_graph.h Lines 22 to 25 in 62bac85
|
So if inference mode, params become memoized... possibly long-term and sticks forever... marian-dev/src/graph/node_operators.cpp Lines 39 to 50 in 4005259
|
browsermt/bergamot-translator#225 tried to share workspaces (which are identified as TensorAllocators) and ran into finding tensors active during translation which if manually cleared would corrupt translations afterwards. While I can identify which tensors are associated with which models, I'll need to put mutexes to protect TensorAllocator to die clearing the tensors associated with a model (with workspaces associated with threads and hence races). @kpu's recommendation is to get the processed form (after conversion to intgemm or whatever), store it as const for the lifetime of the model and all threads can read-access it if need be. When the model is destructed, workers are not holding any shared_ptrs and it's safe to clear these tensors since no translation with this model is active and/or waiting. Development is on pause until the higher powers decide the right direction/approach. |
Feature description
In the current state of browsermt/marian-dev, the concept of a workspace which manages allocation of tensors is placed behind a graph accessible to the library API bergamot-translator uses. This leads to a temporarily inefficient implementation of multiple-models handling (browsermt/bergamot-translator#210), where the workspaces grow proportional to the number of models active.
@XapaJIaMnu and @kpu have previously solved swapping multiple models by means of swapping tensors onto an active graph. This is "dynamic" and a reference implementation available at https://github.com/kpu/marian-dev/blob/dynamic_swap_mvp/src/translator/swappable.cpp. While this is doable in the case of shared-architectures without incurring much expense, a change in architecture involves reconstructing the graph (eg: tied embedding model swapped out for a non-tied embedding model).
It is optimal to keep the concept of a workspace bound to threads/workers active instead, separate the graph and architecture aside to avoid the blow-up in memory usage than what is originally required.
This issue is intended to investigate how best to make the modifications to solve the above problem in this repository.
/cc @graemenail
The text was updated successfully, but these errors were encountered: