Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with GPU, inference on CPU with a pickled model #244

Open
BenjaminBossan opened this issue Apr 1, 2016 · 18 comments
Open

Training with GPU, inference on CPU with a pickled model #244

BenjaminBossan opened this issue Apr 1, 2016 · 18 comments

Comments

@BenjaminBossan
Copy link
Collaborator

Training with Cuda on a GPU machine, it is not possible to load the model on a machine without GPU in a straightforward fashion because some theano parameters are Cuda Arrays. My reading of this is that this is a theano thing and that there are only workarounds. One way is to

  1. save_params_to on the GPU machine
  2. initialize an identical NeuralNet on the CPU machine
  3. load_params_from on that machine.

However, this requires quite some extra effort in some situations, e.g. if the NeuralNet is part of an sklearn Pipeline. You have to

  1. save the Pipeline's steps except for the one containing the NeuralNet on the GPU machine
  2. save the latter's parameters separately
  3. load the Pipeline on the CPU machine
  4. add a fresh NeuralNet to the right place in the Pipeline
  5. load the parameters to that net.

I wonder if there is a better way. Maybe it is possible to add a method to NeuralNet that converts Cuda Arrays to normal theano shared variables? Does someone know of a better approach?

@dnouri
Copy link
Owner

dnouri commented Apr 1, 2016

You're probably seeing the effect of this change.

Can you try setting config.reoptimize_unpickled_function to True?

@BenjaminBossan
Copy link
Collaborator Author

I was referring to the same problem as mentioned here. The option you mentioned would not solve this, or would it (can't test it right now)?

@dnouri
Copy link
Owner

dnouri commented Apr 1, 2016

That thread is a little confusing, with multiple issues. Can you paste the traceback that you're getting?

@BenjaminBossan
Copy link
Collaborator Author

I get the same Cuda not found. Cannot unpickle CudaNdarray error. Shared variables are saved in Cuda Arrays that you can't load on a machine without Cuda. One suggestion would be a method that somehow implements the solution proposed in the thread. I'm not sure though whether this would cover all use cases and whether there might not be a better solution.

@BenjaminBossan
Copy link
Collaborator Author

I have a solution that seems to work:

class PortableNeuralNet(NeuralNet):
    def __setstate__(self, state):  # BBB for pickles that don't have the graph
        with tempfile.TemporaryDirectory() as tmpdirname:
            filename = os.path.join(tmpdirname, 'tmp_weights.pkl')
            with open(filename, 'wb') as f:
                pickle.dump(state['_params_temp_save'], f, -1)

            del state['_params_temp_save']
            self.__dict__.update(state)

            self.initialize()
            self.load_params_from(filename)

    def __getstate__(self):
        state = dict(self.__dict__)
        params = self.get_all_params_values()

        for key in list(state.keys()):  # to avoid RuntimeError
            if key == 'train_history_':
                continue
            if key.endswith('_'):
                del state[key]
        del state['_output_layer']
        del state['_initialized']
        state['_params_temp_save'] = params
        return state

I don't know whether this is worth integrating or not. Instead of a new class, it could be a switch in the NeuralNet class. What do you think, Daniel?

@dnouri
Copy link
Owner

dnouri commented Apr 6, 2016

@BenjaminBossan Do you mind describing the difference between this and the implementation that was removed in #228?

@dnouri
Copy link
Owner

dnouri commented Apr 6, 2016

I looked at the problem a little bit more, and understand the issue now better. The code that was removed in #228 had the same issue since it did not do anything with the layer instances (layers_) in __getstate__.

So then I tried to come up with my own variation of the code that you proposed:

class YetAnotherPortableNeuralNet(NeuralNet):
    def __setstate__(self, state):
        params = state.pop('__params__', None)
        self.__dict__.update(state)
        self.initialize()
        if params is not None:
            self.load_params_from(params)

    def __getstate__(self):
        state = dict(self.__dict__)
        if self._initialized:
            params = self.get_all_params_values()
        else:
            params = None

        for attr in (
            'train_iter_',
            'eval_iter_',
            'predict_iter_',
            '_initialized',
            '_get_output_fn_cache',
            '_output_layer',
            'layers_',
            'layers',
                ):
            if attr in state:
                del state[attr]
        state['__params__'] = params
        return state

Because I thought your proposal was good, just needed a bit of refactoring (to remove writing out the file, and I also wanted to be more explicit about attributes removed on the way out).

But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances, it won't work, since those instances will then contain the cuda arrays which will then be pickled. Deleting those instances on the way out also doesn't work for obvious reasons.

I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself. But we can point people to solutions that might work for them. One such solution might be the this script inside of pylearn2 which I'm about to try out.

@BenjaminBossan
Copy link
Collaborator Author

But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances

Right, I did not think about that possibility. We could raise an error in that case but it is not a satisfying solution.

One such solution might be this script inside of pylearn2 which I'm about to try out.

I believe @alattner tried that to no avail.

I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself.

I agree but it would be nice to be able to somehow use this kludge by checking out a specific nolearn branch or something.

@alattner
Copy link

alattner commented Apr 6, 2016

I believe @alattner tried that to no avail.

No, I haven't tried that script inside pylearn2. I tried the config.experimental.unpickle_gpu_on_cpu option with no success.

@dnouri
Copy link
Owner

dnouri commented Apr 6, 2016

I tried the script and it failed with some weird recursion error.

@BenjaminBossan
Copy link
Collaborator Author

sys.setrecursionlimit(10 ** 999)

@dnouri
Copy link
Owner

dnouri commented Apr 6, 2016

OK just let me know if that's a joke or if it actually works. ;-)

@BenjaminBossan
Copy link
Collaborator Author

I would not try it :)

Anyway, do you see a working solution for this?

@dnouri
Copy link
Owner

dnouri commented Apr 7, 2016

I'll take another look next week. So far didn't have much luck.

@BenjaminBossan
Copy link
Collaborator Author

So much for not breaking code in this PR :)

For those who use this snippet, in the part shown below, change '_output_layer' to '_output_layers':

        for attr in (
            'train_iter_',
            'eval_iter_',
            'predict_iter_',
            '_initialized',
            '_get_output_fn_cache',
            '_output_layer',
            'layers_',
            'layers',
                ):
            if attr in state:
                del state[attr]

@JamesOwers
Copy link

Is there any update on how to train on GPU, save, and load on CPU for inference?

@dnouri
Copy link
Owner

dnouri commented Jan 25, 2017

@kungfujam Note that as per the original post, you can always do this:

  1. save_params_to on the GPU machine
  2. initialize an identical NeuralNet on the CPU machine
  3. load_params_from on that machine.

This issue is about not being able to use a 'pickled' network trained on a GPU and use it in a CPU environment, which is sometimes more convenient.

@dnouri dnouri changed the title Training with GPU, inference on CPU Training with GPU, inference on CPU with a pickled model Jan 25, 2017
@JamesOwers
Copy link

Thanks for clarification. Am using that method currently. Not a huge deal but would be nice to pickle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants