Wayback

Overview

wayback is a simple Python package built on waybackpy and designed to further simplify the process of collecting historical snapshot data of URLs from the Internet Archive's Wayback Machine. It's particularly useful if you want to save all of the IA's snapshot links (e.g. https://web.archive.org/web/<timestamp>/https://<url_of_interest>) for a particular set of URLs that you can then sort/filter/examine without having to search and click around the Wayback Machine's interface.

Installation

pip install git+https://github.com/gitronald/wayback

Example Usage

import wayback

data = wayback.get_url_snapshots(
    url="lazerlab.net",
    start_timestamp="20180101",
    end_timestamp="20231231",
)
print(f"snapshots found: {len(data):,}")
show_cols = ['url', 'statuscode', 'archive_url', 'timestamp']
print(data[show_cols].head(5).to_markdown())

Output:

processing lazerlab.net
extracting items
reshaping items
snapshots found: 207

url	statuscode	archive_url	timestamp
lazerlab.net	200	https://web.archive.org/web/20180129175333/http://lazerlab.net:80/	20180129175333
lazerlab.net	200	https://web.archive.org/web/20180203075259/http://www.lazerlab.net:80/	20180203075259
lazerlab.net	200	https://web.archive.org/web/20180301214059/http://lazerlab.net:80/	20180301214059
lazerlab.net	200	https://web.archive.org/web/20180306094604/http://www.lazerlab.net:80/	20180306094604
lazerlab.net	200	https://web.archive.org/web/20180314153547/http://lazerlab.net/	20180314153547

Dependencies

waybackpy: For interacting with the Wayback Machine API.
requests: For making HTTP requests.
hashlib: For generating hash values from URLs.
pandas: For data manipulation and analysis.
tqdm: For providing progress bars during data collection.

Acknowledgments

This package is primarily a wrapper for waybackpy, which provides a broader range of tools for using the Wayback Machine CDX Server API provided by the Internet Archive.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
wayback		wayback
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wayback

Overview

Installation

Example Usage

Dependencies

Acknowledgments

About

Releases

Packages

Languages

License

gitronald/wayback

Folders and files

Latest commit

History

Repository files navigation

Wayback

Overview

Installation

Example Usage

Dependencies

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages