Extract text from a document (textract) and convert it into a natural sounding synthesised speech (Cloud Text-To-Speech), which is able to leverage Deepminds Wavenet models.
Available source formats (from textract
)
- .csv
- .doc
- .docx
- .eml
- .epub
- .gif
- .jpg and .jpeg
- .json
- .html and .htm
- .mp3
- .msg
- .odt
- .ogg
- .png
- .pptx
- .ps
- .rtf
- .tiff
- .txt
- .wav
- .xlsx
- .xls
GCP
- Select or create a Google Cloud Platform project.
- Enable billing for your project.
- Enable the Cloud Text-to-Speech API.
- Setup Authentication using a Service Account.
Host Machine
- Docker
/doc2audiobook/data/input
: directory to hold all input files./doc2audiobook/data/output
: directory to store all output files./doc2audiobook/.secrets/client_secret.json
: GCP authentication token.
git clone [email protected]:danthelion/doc2audiobook.git
cd doc2audiobook
docker build -t doc2audiobook .
Make sure to put your documents in the folder that is mapped to /data
before running!
List available voices
docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook -list-voices
Convert all documents in the mapped input folder to audiobooks using the en-GB-Standard-C voice.
docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook --voice en-GB-Standard-C
Convert a single document in the mapped input folder to an audiobook using the en-GB-Standard-C voice.
docker run \
-v /doc2audiobook/data:/data:rw \
-v /doc2audiobook/.secrets/client_secret.json:/.secrets/client_secret.json:ro \
doc2audiobook --voice en-GB-Standard-C --input test_input.txt