Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute value ignored when capturing another attribute value in the same tag #17

Open
ackalker opened this issue May 23, 2012 · 1 comment

Comments

@ackalker
Copy link

First, thanks for this wonderful tool!

I have the following problem: when trying the following snippet:

import scrapemark

html = """
    <div>
    <a href="http://site.com/page1" title="Page 1">Page 1</a>
    <a href="http://site.com/page2" title="Page 2">Page 2</a>
    <a href="http://site.com/page3" title="Page 3">Page 3</a>
    <a href="http://site.com/page2" title="Next">&gt;</a>
    </div>
    """

res = scrapemark.scrape("""<a href="{{ nextpage }}" title="Next" />""", html)
print res

results in:
{'nextpage': u'http://site.com/page1'}
which is simply the first link, not the link to the next page as I would expect.
Capturing to a list with:

res = scrapemark.scrape("""{* <a href="{{ [nextpage] }}" title="Next" /> *}""", html)

returns:
{'nextpage': [u'http://site.com/page1', u'http://site.com/page2', u'http://site.com/page3', u'http://site.com/page2']}
i.e. a list of all links, not a list with just the link to the next page.

It appears as if scrapemark is ignoring the title attribute's value when it can capture the href attribute's value.
Am I doing something wrong here, is this simply a quirk we'll have to be aware of, or is this a bug?

@ackalker
Copy link
Author

Pull request #15 from quink seems to resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant