-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'gbk' codec can't encode character #30
Comments
Would you let me know about your code running environment? Which version are you using? tensorflow 0.12? CUDA 8.0? CudNN 5.1?python2.7?What other environments or dependencies do you have? I use windows10. |
I have also met the 'gbk' problem. When I solved it, the same dimension error raised. I hope the author can help us. |
Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux. What shape is it printing? What is 200? |
This error is returned after I executed the command, but I don't know where this dimension error came from.
Then this problem raised:
|
Hi, Shashi,
I'm very glad to receive your reply. Thank you for answering my question. I tried many times to configure the environment. I found that the corresponding Tensorflow 0.10 in Python 2.7 could not match Windows 10. So I planned to run the code with Python 3.6 and Tensorflow 0.13. It took me quite a few time to upgrade the code and make the Rouge work properly on Windows. I finally succeeded in running the code and got good results. For me, Research needs patience! The number of shapes in the previous problem was caused by different versions of Tensorflow, and the code itself had no problems. If I don't have any new questions, there is no need to reply to me again. Finally, thank you very much for your reply.
Best wishes for better results in your work!
Zuky Li
…------------------ 原始邮件 ------------------
发件人: "Shashi Narayan"<[email protected]>;
发送时间: 2019年6月11日(星期二) 晚上6:26
收件人: "EdinburghNLP/Refresh"<[email protected]>;
抄送: "2454409903"<[email protected]>;"Author"<[email protected]>;
主题: Re: [EdinburghNLP/Refresh] 'gbk' codec can't encode character (#30)
Hi, First of all sorry for the delay in my response. The code was implemented with Tensorflow 0.10 with Python 2.7, cuda-8.0.44 and cuDNN-5.1_8.0. I was using Linux.
What shape is it printing? What is 200?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi, I'm having the same problem as you had in here, I wanted to ask if you could share the code that u have revised, I would appreciate your help |
hello shashi,
When I use CNN data sets to run code for training, I encounter some problems.
File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 329, in prepare_vocab_embeddingdict
for line in fembedd:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbd in position 4009: illegal multibyte sequence
embed_line = ""
linecount = 0
with open(wordembed_filename, "r", encoding='utf-8') as fembedd:
for line in fembedd:
if linecount == 0:
vocabsize = int(line.split()[0])
I added code " encoding='utf-8' " after the code “ with open(wordembed_filename, "r" ”, it worked out.
But then ,
File "D:\Python-Pytorch\myrefresh\Refresh-master\data_utils.py", line 353, in prepare_vocab_embeddingdict
foutput.write("\n".join(vocab_list)+"\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\xa3' in position 714: illegal multibyte sequence
the original code is:
foutput = open(vocabfilename,"w")
vocab_list = [(vocab_dict[key], key) for key in vocab_dict.keys()]
vocab_list.sort()
vocab_list = [item[1] for item in vocab_list]
foutput.write("\n".join(vocab_list)+"\n")
foutput.close()
return vocab_dict, word_embedding_array
I tried the same method as above, but it didn't work. What can I do to solve this problem? Could you help me? I use Windows to run the code.
thank you very much,
Zuky Li
The text was updated successfully, but these errors were encountered: