An Amazon Transcribe (JSON) to Word (docx) converter for multi-speaker transcriptions
A small script to help convert a multi-speaker Amazon Transcribe JSON to Word Document (docx).
Given a JSON transcript, a Word document with clean output is produced. Output is organised by speaker and time, followed by what was said by each speaker. Phrases are colored based on the transcription confidence from Amazon Transcribe.
I don't really plan on maintaining this, just thought it might be useful to others. Feel free to open a PR, though.
- First install the requirements:
{however you run python} -m pip install -r requirements.txt
- Update the desired input and output attributes at the bottom of the file
transcribe.py
, underif __name__ == "__main__":
- The transcript JSON location
- The output document title
- The mapping of speakers in the file to the desired speaker name (must match the speaker IDs in the file, e.g.
spk_0
)
- Run the script with
{however you run python} transcribe.py
- Consider adding filtering for certain phrases and words (e.g. removing "um"s)
- Consider adding conditional formatting (e.g. bold for proper nouns)
- Move from variable-based arguments to actual script arguments (e.g. with
argparse
)