Substack Archiver

Substack Archiver is a Python program that allows users to archive the contents of their Substack newsletter. The program saves all the contents in separate HTML files and also creates multiple JSON dumps. It uses Playwright to interact with the Substack API, making it more resilient to changes in the website's structure.

Features

Archives Substack posts without requiring login
Bypasses captcha issues
Saves posts as individual HTML files
Creates JSON dumps of post data
Includes links to audio content (no direct download)
Works with both Substack.com and custom domain newsletters

Getting Started

To use Substack Archiver, follow these steps:

Clone this repository
Install the requirements:

pip install -r requirements.txt
playwright install

Update config.env accordingly:

SUBSTACK_USERNAME=""
SUBSTACK_PASSWORD=""

Run the Substack Archiver:

python substack_archiver.py "substack_name" "url_of_substack"

Example:

python substack_archiver.py "example" "https://example.substack.com/"

or for custom domains:

python substack_archiver.py "example" "https://newsletter.example.com/"

Output

The program will save each Substack post in a separate HTML file and also create multiple JSON dumps. The HTML files will contain the articles, images, and links to audio content from your Substack newsletter. The JSON dumps will contain metadata such as the title, date, author, and tags for each article.

Note: The HTML is raw, without any CSS or JS. It provides a basic structure for reading the content. You can use the Firefox Reader View or Chrome's Reader Mode to get a prettier view of the web content.

TODO

Add support for downloading audio files
Add support for video files
Implement CSS styling for HTML output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
README/images		README/images
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
colors.py		colors.py
config.env		config.env
requirements.txt		requirements.txt
substack_archiver.py		substack_archiver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Substack Archiver

Features

Getting Started

Output

TODO

Contributing

License

About

Releases

Packages

Languages

License

pwrtux/substack_archiver

Folders and files

Latest commit

History

Repository files navigation

Substack Archiver

Features

Getting Started

Output

TODO

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages