Attribute value ignored when capturing another attribute value in the same tag #17

ackalker · 2012-05-23T19:44:25Z

First, thanks for this wonderful tool!

I have the following problem: when trying the following snippet:

import scrapemark

html = """
    <div>
    <a href="http://site.com/page1" title="Page 1">Page 1</a>
    <a href="http://site.com/page2" title="Page 2">Page 2</a>
    <a href="http://site.com/page3" title="Page 3">Page 3</a>
    <a href="http://site.com/page2" title="Next">&gt;</a>
    </div>
    """

res = scrapemark.scrape("""<a href="{{ nextpage }}" title="Next" />""", html)
print res

results in:
{'nextpage': u'http://site.com/page1'}
which is simply the first link, not the link to the next page as I would expect.
Capturing to a list with:

res = scrapemark.scrape("""{* <a href="{{ [nextpage] }}" title="Next" /> *}""", html)

returns:
{'nextpage': [u'http://site.com/page1', u'http://site.com/page2', u'http://site.com/page3', u'http://site.com/page2']}
i.e. a list of all links, not a list with just the link to the next page.

It appears as if scrapemark is ignoring the title attribute's value when it can capture the href attribute's value.
Am I doing something wrong here, is this simply a quirk we'll have to be aware of, or is this a bug?

The text was updated successfully, but these errors were encountered:

ackalker · 2012-05-23T20:16:17Z

Pull request #15 from quink seems to resolve this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attribute value ignored when capturing another attribute value in the same tag #17

Attribute value ignored when capturing another attribute value in the same tag #17

ackalker commented May 23, 2012

ackalker commented May 23, 2012

Attribute value ignored when capturing another attribute value in the same tag #17

Attribute value ignored when capturing another attribute value in the same tag #17

Comments

ackalker commented May 23, 2012

ackalker commented May 23, 2012