Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'gbk' codec can't encode character #30

Open
ZukyLi opened this issue May 16, 2019 · 8 comments
Open

'gbk' codec can't encode character #30

ZukyLi opened this issue May 16, 2019 · 8 comments

Comments

@ZukyLi
Copy link

ZukyLi commented May 16, 2019

hello shashi,
When I use CNN data sets to run code for training, I encounter some problems.

File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 329, in prepare_vocab_embeddingdict
for line in fembedd:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 4009: illegal multibyte sequence

embed_line = ""
linecount = 0
with open(wordembed_filename, "r", encoding='utf-8') as fembedd:
for line in fembedd:
if linecount == 0:
vocabsize = int(line.split()[0])
I added code " encoding='utf-8' " after the code “ with open(wordembed_filename, "r" ”, it worked out.
But then ,
File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 353, in prepare_vocab_embeddingdict
foutput.write("\n".join(vocab_list)+"\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\xa3' in position 714: illegal multibyte sequence

the original code is:
foutput = open(vocabfilename,"w")
vocab_list = [(vocab_dict[key], key) for key in vocab_dict.keys()]
vocab_list.sort()
vocab_list = [item[1] for item in vocab_list]
foutput.write("\n".join(vocab_list)+"\n")
foutput.close()
return vocab_dict, word_embedding_array
I tried the same method as above, but it didn't work. What can I do to solve this problem? Could you help me? I use Windows to run the code.
thank you very much,
Zuky Li

@ZukyLi
Copy link
Author

ZukyLi commented May 16, 2019

Here is a screenshot.
YMIRE}29R_OGWP6PK3S4I

@ZukyLi
Copy link
Author

ZukyLi commented May 16, 2019

I'm sorry to raise this question, but I've solved it. In fact, it works with the above method.
And I met a new problem:
WARNING:tensorflow:From D:\Python-Pytorch\Anaconda\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
File "D:\Python-Pytorch\Anaconda\lib\site-packages\tensorflow\python\framework\ops.py", line 1659, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 559183. Shapes are [1,200] and [559183,200].
From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,200], [1,200], [559183,200].

here is the train.log
TIM图片20190517005102
which is right, 1 or 559183? I've upgraded all your code with the tensorflow upgrade tool. My env is tf 1.13-gpu. Do you need backtracking data?
I'm sorry to have so many questions. I appreciate it very much if you can help me solve my problem.
thank you very much,
Zuky Li

@ZukyLi
Copy link
Author

ZukyLi commented May 17, 2019

Would you let me know about your code running environment? Which version are you using? tensorflow 0.12? CUDA 8.0? CudNN 5.1?python2.7?What other environments or dependencies do you have? I use windows10.
thank you very much,
Zuky Li

@tlifcen
Copy link

tlifcen commented Jun 10, 2019

I have also met the 'gbk' problem. When I solved it, the same dimension error raised. I hope the author can help us.
Thanks very much! @shashiongithub

@shashiongithub
Copy link
Collaborator

Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux.

What shape is it printing? What is 200?

@tlifcen
Copy link

tlifcen commented Jun 11, 2019

This error is returned after I executed the command, but I don't know where this dimension error came from.
I use python 3.7 and tensorflow 1.13.1, the environment is windows server 2012.
I used this command:

python document_summarizer_training_testing.py --use_gpu /gpu:2 --data_mode cnn --train_
dir ./test/to/training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5 --num_sample_rollout 5 > ./
test/to/training/directory/cnn-reinforcementlearn-singlesample-from-moracle-noatt-sample5/train.log

Then this problem raised:

2019-06-11 19:39:31.230404: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this T
ensorFlow binary was not compiled to use: AVX AVX2
2019-06-11 19:39:35.156198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:25:00.0
totalMemory: 15.88GiB freeMemory: 14.62GiB
2019-06-11 19:39:35.170608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-11 19:39:35.948700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with
 strength 1 edge matrix:
2019-06-11 19:39:35.954584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-06-11 19:39:35.958858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-06-11 19:39:35.968196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localh
ost/replica:0/task:0/device:GPU:0 with 14158 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id:
 0000:25:00.0, compute capability: 6.0)
WARNING:tensorflow:From D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with
(from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1659, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 55918
3. Shapes are [1,200] and [559183,200].
        From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,20
0], [1,200], [559183,200].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "document_summarizer_training_testing.py", line 282, in <module>
    tf.app.run()
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "document_summarizer_training_testing.py", line 277, in main
    train()
  File "document_summarizer_training_testing.py", line 102, in train
    model = MY_Model(sess, len(vocab_dict)-2)
  File "F:\DL\data\Constant-TL\Refresh\my_model.py", line 58, in __init__
    self.extractor_output, self.logits = model_docsum.policy_network(self.vocab_embed_variable, self.document_placeholder,
self.label_placeholder)
  File "F:\DL\data\Constant-TL\Refresh\model_docsum.py", line 142, in policy_network
    fullvocab_embed_variable = tf.concat(0, [pad_embed_variable, unk_embed_variable, vocab_embed_variable])
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1253, in concat
    dtype=dtypes.int32).get_shape().assert_is_compatible_with(
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1175, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1102, in _autopacking_conversion_functi
on
    return _autopacking_helper(v, dtype, name or "packed")
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1054, in _autopacking_helper
    return gen_array_ops.pack(elems_as_tensors, name=scope)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 5448, in pack
    "Pack", values=values, axis=axis, name=name)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
    op_def=op_def)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1823, in __init__
    control_input_ops)
  File "D:\DL\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1662, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 1 and 559183. Shapes are [1,200] and [559183,200].
        From merging shape 1 with other shapes. for 'PolicyNetwork/concat/concat_dim' (op: 'Pack') with input shapes: [1,20
0], [1,200], [559183,200].

@ZukyLi
Copy link
Author

ZukyLi commented Jun 28, 2019 via email

@azadFKM
Copy link

azadFKM commented Nov 14, 2019

Hi, Shashi, I'm very glad to receive your reply. Thank you for answering my question. I tried many times to configure the environment. I found that the corresponding Tensorflow 0.10 in Python 2.7 could not match Windows 10. So I planned to run the code with Python 3.6 and Tensorflow 0.13. It took me quite a few time to upgrade the code and make the Rouge work properly on Windows. I finally succeeded in running the code and got good results. For me, Research needs patience! The number of shapes in the previous problem was caused by different versions of Tensorflow, and the code itself had no problems. If I don't have any new questions, there is no need to reply to me again. Finally, thank you very much for your reply. Best wishes for better results in your work! Zuky Li

------------------ 原始邮件 ------------------ 发件人: "Shashi Narayan"[email protected]; 发送时间: 2019年6月11日(星期二) 晚上6:26 收件人: "EdinburghNLP/Refresh"[email protected]; 抄送: "2454409903"[email protected];"Author"[email protected]; 主题: Re: [EdinburghNLP/Refresh] 'gbk' codec can't encode character (#30) Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux. What shape is it printing? What is 200? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Hi, I'm having the same problem as you had in here, I wanted to ask if you could share the code that u have revised, I would appreciate your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants