Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring from checkpoint failed in evaluation #18

Open
asif3058 opened this issue Jan 9, 2019 · 3 comments
Open

Restoring from checkpoint failed in evaluation #18

asif3058 opened this issue Jan 9, 2019 · 3 comments

Comments

@asif3058
Copy link

asif3058 commented Jan 9, 2019

I tried to run the model for evaluation and got some error. The log is posted here:

Command: python document_summarizer_training_testing.py --use_gpu /gpu:2 --data_mode cnn --exp_mode test --model_to_load 2 --train_dir training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5 --num_sample_rollout 5 > training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/test.model2.log

Error:
Traceback (most recent call last):
File "document_summarizer_training_testing.py", line 291, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "document_summarizer_training_testing.py", line 287, in main
test()
File "document_summarizer_training_testing.py", line 259, in test
model.saver.restore(sess, selected_modelpath)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1562, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2
[[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op u'save/RestoreV2', defined at:
File "document_summarizer_training_testing.py", line 291, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "document_summarizer_training_testing.py", line 287, in main
test()
File "document_summarizer_training_testing.py", line 244, in test
model = MY_Model(sess, len(vocab_dict)-2)
File "/media/gtx/data/Asif/Refresh-master/my_model.py", line 73, in init
self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Tensor name "PolicyNetwork/ConvLayer/Conv1D_1/conv_biases_1" not found in checkpoint files training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/model.ckpt.epoch-2
[[node save/RestoreV2 (defined at /media/gtx/data/Asif/Refresh-master/my_model.py:73) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

what could be the issue here?

@shashiongithub
Copy link
Collaborator

Could you please check that you have a right version of Tensorflow installed for this?

@asif3058
Copy link
Author

I have used Tensorflow 1.10 here and updated the code for that version. It worked well for the training part. I'm not sure whether I can only run this model in Tensorflow 0.10 or it can be run in version 1.10 too.

@shashiongithub
Copy link
Collaborator

Unfortunately, I don't think you can use those pre-trained models with the newer version of Tensorflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants