Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit post URLs to Archive.org and store archived URL when crawling #15

Open
wackget opened this issue Jun 10, 2023 · 2 comments
Open
Labels
feature request New feature or request

Comments

@wackget
Copy link

wackget commented Jun 10, 2023

It would be great if the script could submit post URLs to archive.org and then store the archived URL as part of the exported data.

To be clear, I don't mean archiving the comment-level URL but rather the top-level post URL itself, with everybody's comments visible.

It would be one possible way of saving a full post's worth of context without excessive reddit API queries.

@brandongalbraith
Copy link

brandongalbraith commented Jun 11, 2023

Recommend pulling in https://github.com/akamhy/waybackpy for polling CDX of existing captures as well as for initiating archive ops for this use case.

@xavdid xavdid added the feature request New feature or request label Jun 12, 2023
@xavdid
Copy link
Owner

xavdid commented Jun 12, 2023

I'll look into this as an option. No promises either way, especially with so many subs shut down right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants