Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler project: fix error caused by aiohttp update that use URL object replace url str #259

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gasxia
Copy link

@gasxia gasxia commented May 9, 2017

Clawler project use aiohttp version 0.21.But now aiohttp version is 2.0.7
Since aiohttp 1.1 the library uses yarl for URL processing, instead of url string.
That result in some str parsing errors.

So must change code to adapt the aiohttp new version.

use URL.human_repr() to print a humanable string.
change response.url type to fix error.
Line 146 File crawling.py:

                if urls:
                    LOGGER.info('got %r distinct urls from %r',
                                len(urls), response.url.human_repr())
                for url in urls:
                    normalized = urllib.parse.urljoin(str(response.url), url)
                    defragmented, frag = urllib.parse.urldefrag(normalized)
                    if self.url_allowed(defragmented):
                        links.add(defragmented)

change _stat.url type to fix error.
Line 32 File reporting.py

show.sort(key=lambda _stat: str(_stat.url)) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant