'gbk' codec can't encode character #30

ZukyLi · 2019-05-16T13:42:26Z

hello shashi,
When I use CNN data sets to run code for training, I encounter some problems.

File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 329, in prepare_vocab_embeddingdict
for line in fembedd:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 4009: illegal multibyte sequence

embed_line = ""
linecount = 0
with open(wordembed_filename, "r", encoding='utf-8') as fembedd:
for line in fembedd:
if linecount == 0:
vocabsize = int(line.split()[0])
I added code " encoding='utf-8' " after the code “ with open(wordembed_filename, "r" ”, it worked out.
But then ,
File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 353, in prepare_vocab_embeddingdict
foutput.write("\n".join(vocab_list)+"\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\xa3' in position 714: illegal multibyte sequence

the original code is:
foutput = open(vocabfilename,"w")
vocab_list = [(vocab_dict[key], key) for key in vocab_dict.keys()]
vocab_list.sort()
vocab_list = [item[1] for item in vocab_list]
foutput.write("\n".join(vocab_list)+"\n")
foutput.close()
return vocab_dict, word_embedding_array
I tried the same method as above, but it didn't work. What can I do to solve this problem? Could you help me? I use Windows to run the code.
thank you very much，
Zuky Li

ZukyLi · 2019-05-16T13:45:37Z

Here is a screenshot.

ZukyLi · 2019-05-16T14:17:10Z

I'm sorry to raise this question, but I've solved it. In fact, it works with the above method.
And I met a new problem：
WARNING:tensorflow:From D:\Python-Pytorch\Anaconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
File "D:\Python-Pytorch\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 1659, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 559183. Shapes are [1,200] and [559183,200].
From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,200], [1,200], [559183,200].

here is the train.log

which is right, 1 or 559183? I've upgraded all your code with the tensorflow upgrade tool. My env is tf 1.13-gpu. Do you need backtracking data?
I'm sorry to have so many questions. I appreciate it very much if you can help me solve my problem.
thank you very much,
Zuky Li

ZukyLi · 2019-05-17T10:29:52Z

Would you let me know about your code running environment? Which version are you using? tensorflow 0.12？ CUDA 8.0？ CudNN 5.1？python2.7？What other environments or dependencies do you have? I use windows10.
thank you very much,
Zuky Li

tlifcen · 2019-06-10T16:26:54Z

I have also met the 'gbk' problem. When I solved it, the same dimension error raised. I hope the author can help us.
Thanks very much! @shashiongithub

shashiongithub · 2019-06-11T10:26:57Z

Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux.

What shape is it printing? What is 200?

tlifcen · 2019-06-11T11:49:05Z

This error is returned after I executed the command, but I don't know where this dimension error came from.
I use python 3.7 and tensorflow 1.13.1, the environment is windows server 2012.
I used this command:

python document_summarizer_training_testing.py --use_gpu /gpu:2 --data_mode cnn --train_
dir ./test/to/training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5 --num_sample_rollout 5 > ./
test/to/training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/train.log

Then this problem raised:

2019-06-11 19:39:31.230404: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this T
ensorFlow binary was not compiled to use: AVX AVX2
2019-06-11 19:39:35.156198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:25:00.0
totalMemory: 15.88GiB freeMemory: 14.62GiB
2019-06-11 19:39:35.170608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-11 19:39:35.948700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with
 strength 1 edge matrix:
2019-06-11 19:39:35.954584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-06-11 19:39:35.958858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-06-11 19:39:35.968196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localh
ost/replica:0/task:0/device:GPU:0 with 14158 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id:
 0000:25:00.0, compute capability: 6.0)
WARNING:tensorflow:From D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with
(from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1659, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 55918
3. Shapes are [1,200] and [559183,200].
        From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,20
0], [1,200], [559183,200].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "document_summarizer_training_testing.py", line 282, in <module>
    tf.app.run()
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "document_summarizer_training_testing.py", line 277, in main
    train()
  File "document_summarizer_training_testing.py", line 102, in train
    model = MY_Model(sess, len(vocab_dict)-2)
  File "F:\DL\data\Constant-TL\Refresh\my_model.py", line 58, in __init__
    self.extractor_output, self.logits = model_docsum.policy_network(self.vocab_embed_variable, self.document_placeholder,
self.label_placeholder)
  File "F:\DL\data\Constant-TL\Refresh\model_docsum.py", line 142, in policy_network
    fullvocab_embed_variable = tf.concat(0, [pad_embed_variable, unk_embed_variable, vocab_embed_variable])
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1253, in concat
    dtype=dtypes.int32).get_shape().assert_is_compatible_with(
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1175, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1102, in _autopacking_conversion_functi
on
    return _autopacking_helper(v, dtype, name or "packed")
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1054, in _autopacking_helper
    return gen_array_ops.pack(elems_as_tensors, name=scope)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 5448, in pack
    "Pack", values=values, axis=axis, name=name)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1823, in __init__
    control_input_ops)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1662, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 1 and 559183. Shapes are [1,200] and [559183,200].
        From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,20
0], [1,200], [559183,200].

ZukyLi · 2019-06-28T16:23:10Z

Hi, Shashi, I'm very glad to receive your reply. Thank you for answering my question. I tried many times to configure the environment. I found that the corresponding Tensorflow 0.10 in Python 2.7 could not match Windows 10. So I planned to run the code with Python 3.6 and Tensorflow 0.13. It took me quite a few time to upgrade the code and make the Rouge work properly on Windows. I finally succeeded in running the code and got good results. For me, Research needs patience! The number of shapes in the previous problem was caused by different versions of Tensorflow, and the code itself had no problems. If I don't have any new questions, there is no need to reply to me again. Finally, thank you very much for your reply. Best wishes for better results in your work! Zuky Li

…

------------------ 原始邮件 ------------------ 发件人: "Shashi Narayan"<[email protected]>; 发送时间: 2019年6月11日(星期二) 晚上6:26 收件人: "EdinburghNLP/Refresh"<[email protected]>; 抄送: "2454409903"<[email protected]>;"Author"<[email protected]>; 主题: Re: [EdinburghNLP/Refresh] 'gbk' codec can't encode character (#30) Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux. What shape is it printing? What is 200? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

azadFKM · 2019-11-14T10:58:02Z

Hi, Shashi, I'm very glad to receive your reply. Thank you for answering my question. I tried many times to configure the environment. I found that the corresponding Tensorflow 0.10 in Python 2.7 could not match Windows 10. So I planned to run the code with Python 3.6 and Tensorflow 0.13. It took me quite a few time to upgrade the code and make the Rouge work properly on Windows. I finally succeeded in running the code and got good results. For me, Research needs patience! The number of shapes in the previous problem was caused by different versions of Tensorflow, and the code itself had no problems. If I don't have any new questions, there is no need to reply to me again. Finally, thank you very much for your reply. Best wishes for better results in your work! Zuky Li
…
------------------ 原始邮件 ------------------ 发件人: "Shashi Narayan"[email protected]; 发送时间: 2019年6月11日(星期二) 晚上6:26 收件人: "EdinburghNLP/Refresh"[email protected]; 抄送: "2454409903"[email protected];"Author"[email protected]; 主题: Re: [EdinburghNLP/Refresh] 'gbk' codec can't encode character (#30) Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux. What shape is it printing? What is 200? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Hi, I'm having the same problem as you had in here, I wanted to ask if you could share the code that u have revised, I would appreciate your help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'gbk' codec can't encode character #30

'gbk' codec can't encode character #30

ZukyLi commented May 16, 2019

ZukyLi commented May 16, 2019

ZukyLi commented May 16, 2019 •

edited

Loading

ZukyLi commented May 17, 2019 •

edited

Loading

tlifcen commented Jun 10, 2019

shashiongithub commented Jun 11, 2019

tlifcen commented Jun 11, 2019

ZukyLi commented Jun 28, 2019 via email

azadFKM commented Nov 14, 2019

'gbk' codec can't encode character #30

'gbk' codec can't encode character #30

Comments

ZukyLi commented May 16, 2019

ZukyLi commented May 16, 2019

ZukyLi commented May 16, 2019 • edited Loading

ZukyLi commented May 17, 2019 • edited Loading

tlifcen commented Jun 10, 2019

shashiongithub commented Jun 11, 2019

tlifcen commented Jun 11, 2019

ZukyLi commented Jun 28, 2019 via email

azadFKM commented Nov 14, 2019

ZukyLi commented May 16, 2019 •

edited

Loading

ZukyLi commented May 17, 2019 •

edited

Loading