-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: On-device training with TensorFlow Lite #390
base: master
Are you sure you want to change the base?
Conversation
Thanks for sharing the RFC. But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup? And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them? |
I suggest to take a look at Continual Learning on the Edge with TensorFlow Lite |
/cc @vlomonaco |
Another interesting scenario to evaluate is training in the context of Edge federated learning: https://github.com/tensorflow/federated/issues/749 |
Thanks @bhack for the tag! @lrzpellegrini, the main author of "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" will take a look and provide some feedback. |
Replying to @jijoongmoon
Great question. When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.
For sure. We're focusing on making it generally work first. Once we reach that point, we can do more benchmarking and profiling to figure out what's most significant to be optimized, and work on it. |
Thanks @bhack. |
@miaout17 can you elaborate on how such a shape change process would work? I do not see such a use case in the current proposal. Thanks! |
Hi @miaout17, I had a more in-depth look. This direction looks promising and we are excited to finally see training on-device on the TFLite radar. I think for many Transfer Learning problems these features would be great. However, for Continual Learning (CL) flexibility is all that matters.
It would be difficult to implement a CL approach without those features, apart from basic experience replay. @lrzpellegrini will provide more details. |
Hi there, I had a look at the RFC. It seems to me that it moves in a very good direction. I'm not aware of the current capabilities of TF-Lite as I only had the chance to use it in a very high-level way, but I really appreciate that the focus of the RFC is on the ability to transfer whole As a comparison, while implementing the CORe app described in "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" I had to manually translate the Python version of our Continual Learning algorithm in C++ so that it could be used along the Caffe deep learning library. In this scenario even simple things like moving data, accessing tensors (weights, inputs, ...) add a lot of complexity and with that comes an absurd overhead on the programming side, so I really appreciate this As Vincenzo pointed out, the main issues are on the flexibility side. In the simple scenario of a limited on-device fine-tuning, a simple fit based approach seems the best solution. However, this would really limit the capabilities of the framework: as I suspect, a fit-based approach would only allow for a very simple instance replay mechanism, which may be insufficient when working with Continual Learning algorithms. On the other hand, supporting Continual Learning algorithms may require some flexibility on:
Of course not all CL algorithms need all these capabilities. Consider that CL is a very variegated field but most algorithms leverage an instance replay mechanism (implemented by inserting/replacing new instances into the dataset) plus some simple regularization/distillation/bias normalization algorithm (which mostly require flexibility on the tensors manipulation side). More recent algorithms really push on the idea of manipulating the architecture of the model, but I guess that supporting this behavior would be the most problematic part of this. Alas, I don't have a clear understanding of the translation capabilities of |
I think that fedarated and continual learning are more relevant in the on device/edge use case cause, in this context, It is still hard to achieve few-shot/zero-shot learning of "general pourpose" (recent) very large scale models. At least untill we figure out how knowledge "hard distillation" on these models could be achieved efficently on constrained devices. |
Replying to @lc0 For example
We're building low level features to make describing the semantic possible. It's considerable to wraps these into easier to use API to make it more friendly for developers. Let me know if this makes sense. I'm happy to try to write this as a more concrete pseudo code as well. |
Replying to @vlomonaco and @lrzpellegrini Thanks for the feedback! For clarification: It sounds the continual learning automatically can modify the model structure without human interfering. Is my rough understand correct? This seems more advanced than what we're currently targeting. Trying to break down the requirements:
I think this should be doable (by wrapping required logic into TF functions).
We haven't tried these yet. However I think in theory:
This should be doable with control flow (e.g. skip some gradient computation and variable update when a boolean value is true) |
Thanks for sharing this, excited to see progress here. As one of the authors of the Flower federated learning framework, I can say that on-device training support is one of the biggest challenges for cross-device federated learning right now. After reading the RFC I was wondering how setting/changing hyperparameters would work on-device. Would we just add additional arguments (like @tf.function
def train(self, inputs, labels, epochs):
self.model.fit(inputs, labels, epochs=epochs) and then call |
About changing the model in training mode check: https://discuss.tensorflow.org/t/how-to-implement-layerdrop-in-tensorflow-transformers/2396 |
\cc @vassilisvas is the co-author of Continual Learning on the Edge with TensorFlow Lite and the leader of the Learning Agents & Robots MRG. This is an interesting conversation to keep our eyes on and maybe contribute to the discussion. |
Thank you for bringing on-device training to TFLite! Based on this proposal I am not sure where do you plan to manage a training loop. Are you thinking of (1) keeping it inside of TFLite or (2) letting developer decide how to the training loop will be structured on device? As @danieljanes pointed out, the API doesn’t show how the actual training step or training phase would be controlled. Moreover, optimizer and loss do not seem to be accessible from saved model. How would |
I have similar question to @martinkersner regarding the training loop from the context of Federated ML with TF-lite. It would be fantastic to let developer to decide how to train and structure the training loop on device. In this way, it opens up the possibility to forward the gradients from the training loop to further orchestration structure to allow centralised and decentralised Fed. ML. I can understand the benefits to keep the training loop and structure inside TFLite, so that it can be distributed unified across all the platforms. And with the training loops open up to different platforms, you might need an additional lib extension for android, IoT and so on. But with the additional lib extensions to control training loop, you can reduce the dependencies on different platforms and speed up the development cycle for TFLite, since all the extension libs can have their own deployment cycle. |
We had already some research work at ICML 2021 to joint Federated and Continual learning with a TF reference impl: https://github.com/wyjeong/FedWeIT It could be nice to open this research subdomain to the Edge devices with TFlite. |
Is this finalized/approved? |
https://www.tensorflow.org/lite/examples/on_device_training/overview |
Another interesting use case, also if Imagenet probably It is a too large dataset for many edge computing TFlite platforms, Is this recent Deepmind paper |
Is this ready for community feedback? Are you ready to take this through review? |
We're sharing this RFC to reflect our newest thoughts of implementing on-device training in TensorFlow Lite.
We didn't setup a timeline to close the comments. We want to surface the RFC early for transparency and get feedback.
Introduction
TensorFlow Lite is TensorFlow's solution for on-device machine learning.
Initially it only focused on inference use cases. We have increasingly heard
from users regarding the need for on-device training. This proposal lays out
the concrete plan & roadmap for supporting training in TensorFlow Lite.