You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This results because data is automatically interpreted as though it used utf-8 encoding. This page uses SHIFT_JIS. The encoding is correctly detected by chardet.detect (this is a third-party module). If you want, you can change scrapemark to use this internally and automatically detect and decode. Or call scrapemark's fetch_html directly and decode it yourself:
Reported by [email protected], Jan 30, 2010
What steps will reproduce the problem?
Scrape the
<title>
of http://www.sony.jp/Print the result
What is the expected output? What do you see instead?
Expected result is
'ソニー製品情報 | ソニー'
Instead i get
'\j[i | \j['
What version of the product are you using? On what operating system?
Version 0.9 tested on MacOSX and Ubuntu Linux
The text was updated successfully, but these errors were encountered: