Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in _substitute_entity() substituting '#x201C' like strings #11

Open
arshaw opened this issue Feb 9, 2011 · 1 comment
Open

Comments

@arshaw
Copy link
Owner

arshaw commented Feb 9, 2011

Reported by [email protected], Oct 29, 2010

What steps will reproduce the problem?

  1. when m.group(0) == '#x201C' in _substitute_entity().
  2. unichr(int(ent)) (where ent=='x201C') throws ValueError.

What is the expected output? What do you see instead?
unichr() wants integer 0x201C.

What version of the product are you using? On what operating system?
scrapemark-0.9-py2.5.egg
Python 2.6.4
Ubuntu 9.10 x64

Please provide any additional information below.

adding this function:

def my_int(s):
        try: return int(s)
        except: pass
        try: return int(s, 16)
        except: pass

        if len(s)>0 and s[0].lower() == 'x':
                try: return int('0'+s, 16)
                except: pass

        return 0

and substitute:
  unichr(int(ent)) with  unichr(my_int(ent))

seems to fix the problem.

@quink
Copy link

quink commented Aug 7, 2011

Probably fixed in #9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants