This project aims to automatically provide longform audio podcast episodes with chapter markers. This is achieved with statistical natrual language processing algorithms that try to subdivide transcribed podcast episodes into topically cohesive parts.
[work in progress]
- Python 3.6+
- ffmpeg
- MP4Box
- matplotlib (intall via package manager)
- Java
- Python module requirements (installable via
pip3 install -r requirements.txt
)
- node.js
- npm
This program can be used in the command line or as an HTTP API with a web interface.
Usage: python3 main.py [subcommand] [options] ...
- Help:
python3 main.py --help
- Subcommand help:
python3 main.py [subcommand] --help
- Possible subcommands:
python3 main.py run
: Start chapterization process from podcast RSS feed URLpython3 main.py transcribe
: Transcribe podcast episode from RSS feed URLpython3 main.py chapterize
: Chapterize transcript
API:
- Create python3 venv:
python3 -m venv venv
- Activate venv:
source venv/bin/activate
- (optional) Set environment variables for IP address and Port in the ´.flaskenv´ file
- start API server with
flask run
Frontend:
- Serve frontend files 'web/client/dist' on web server
If the server is not running on the same machine:
- specify API host in web/client/.env
- cd into 'web/client'
- install dependencies with
npm install
- build files with
npm run build
- serve built files from (web/client/dist) on web server