Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - crawl sitemap.xml #124

Open
Mark-Hetherington opened this issue Mar 10, 2021 · 4 comments
Open

Feature request - crawl sitemap.xml #124

Mark-Hetherington opened this issue Mar 10, 2021 · 4 comments

Comments

@Mark-Hetherington
Copy link

We are finding sitediff works quite well, however it may not find all the URLS on a site by links from the home page. We would like (optionally) to be able to add the URLS listed in the sitemap.xml to the paths.

@cleaver
Copy link
Contributor

cleaver commented Mar 10, 2021

This is a good idea. We haven't been actively updating the project lately, but let's keep this one open.

@dergachev
Copy link
Contributor

dergachev commented Mar 13, 2021 via email

@cleaver
Copy link
Contributor

cleaver commented Mar 13, 2021

Yes, you could manually reformat the paths from sitemap.xml into paths.txt.

@Mark-Hetherington
Copy link
Author

Yes adding to the paths.txt by parsing a json sitemap is the approach we have been taking. However it is currently semi automated, and I would presume that a good automation using XML instead would be useful to other people as well. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants