Releases · CheshireCC/faster-whisper-GUI

16 Sep 06:54

CheshireCC

0.8.1

7ef7369

0.8.1 Latest

Latest

0.8.1——2024.7.29

HASH

CRC32: DE11F5CC
MD5: CA6DB2876A74649166B2FFCBA9D9991E
SHA-1: CCFC485F8FEBFE9C4498939DB1CC4CA82925D10F

0.8.1 改动

- 升级 faster-whisper 至 1.0.3 版本
  - Silero VAD V5 model
    - 经过反复测试 Silero V5 模型在背景复杂的情况下表现不佳，请按需升级

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）
需要安装 ffmpeg

0.8.1 Changes

Upgrade faster-whisper to version 1.0.3
- Silero VAD V5 model
  - After repeated testing, the Silero V5 model does not perform well with complex backgrounds. Please upgrade as needed

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
ffmpeg is required to be installed

Assets 3

03 Jun 14:49

CheshireCC

0.8.0

66b7450

0.8.0

HASH

CRC32：13530B56
MD5：61ED9AF5F712A27DA32EE34A8E88A5A0
SHA-1：B415E721D6C54D5E835FD5BD4AD870DA985644AC

0.8.0 改动

修复没有赞助渠道的 bug #126
升级 faster-whisper 到 1.02 版本
- 添加 distil-large-v3 模型在线模式支持 #130
  - 最新的 Distil-Whisper 模型 distil-large-v3 本质上是为与 OpenAI 顺序算法配合使用而设计的。
- 支持初始化更多 whisper 模型参数
  - 音频分段设置
    - max_new_tokens: 每个区块生成的新令牌的最大数量。如果未设置，最大值将通过默认的 max_size 设置。
    - chunk_length: 音频段的长度。如果不是 None，它将覆盖 FeatureExtractor 的默认chunk_size。
    - clip_timestamps: 逗号分隔的要处理的剪辑的时间戳列表（以秒为单位）开始,结束,开始,结束......。最后一个结束时间戳默认为文件的结束。如果使用 clip_timestamps，将忽略 VAD 设置。
  - 幻听参数
    - hallucination_silence_threshold: 当 word_timestamps 为 True 时，当检测到可能的幻觉时，跳过长于此阈值（以秒为单位）的静默期。
  - 其他设置
    - hotwords: 为模型提供的热词/提示短语。如果 prefix 不是 None，则无效。你可以输入提示词，类似于：“the video is about comfyUI”。
  - 常规
    - language_detection_threshold: 如果语言标记的最大概率高于此值，则会检测为该语言。
    - language_detection_segments: 语言检测需要考虑的分段数量。
  - 其他新特性：https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.0.2
修复复制字幕功能的 bug
更新一些 UI 文字
停用转写参数页面的保存参数、读取参数功能
起止时间、说话人列居中显示
升级 pytorch 到 2.3.0 , CUDA12

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）
需要安装 ffmpeg
使用 V3 模型时，如果频繁出现显存溢出，请尝试更新显卡驱动程序到最新或者回退到上一个稳定版本，当前版本（2024.5.29）测试结果稳定。

0.8.0 Changes

Fixed bug with no sponsorship channels #126
Upgrade faster-whisper to version 1.02
- Add online mode support for the [distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3-ct2) model #130
  - The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
- Support initializing more whisper model args
  - max_new_tokens: Maximum number of new tokens to generate per-chunk. If not set, the maximum will be set by the default max_length.
  - chunk_length: The length of audio segments. If it is not None, it will overwrite the default chunk_length of the FeatureExtractor.
  - clip_timestamps: Comma-separated list start,end,start,end,... timestamps (in seconds) of clips to process. The last end timestamp defaults to the end of the file.vad_filter will be ignored if clip_timestamps is used.
  - hallucination_silence_threshold: When word_timestamps is True, skip silent periods longer than this threshold (in seconds) when a possible hallucination is detected
  - hotwords: Hotwords/hint phrases to provide the model with. Has no effect if prefix is not None.
  - language_detection_threshold: If the maximum probability of the language tokens is higher than this value, the language is detected.
  - language_detection_segments: Number of segments to consider for the language detection.
- fixed bug of copy subtitles
- Update some UI text
- Disable the functions of saving parameters and reading parameters on the transfer parameter page
- Start and end times and speaker columns are displayed in the center
- Upgrade pytorch to 2.3.0 , CUDA12

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
ffmpeg is required to be installed
When using the V3 model, if memory overflows frequently, please try updating the graphics card driver to the latest or fallback to the previous stable version. The test results of the current version (2024.5.29) are stable.

Assets 4

17 Apr 04:24

CheshireCC

0.7.6

70fe950

0.7.6

0.7.6 改动

修复转写结束后崩溃的 bug #111
修复手动添加多个字幕后表格不能关闭的 bug
Demucs 新增非人声音轨合并输出 #110
- 新增人声、其他音轨二分输出
字幕显示及编辑相关功能更新
- 字幕编辑：添加批量增减时间戳功能
- 字幕表格显示：
  - 添加持续时间过短的时间戳背景色提示功能
  - 说话人 在表格中单独显示为一列

Hash

SHA-1 : 37C8C46BE3D297AD06FA4C887A69E2FB46CB49AB
MD5 : 09681381F2AF06749BB70030A411DFB6
CRC32 : D6FA10B2

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）
需要安装 ffmpeg
使用 V3 模型时，如果显存溢出，请尝试关闭 单词级时间戳 ，如果仍然溢出，那么请将量化方式更改为 float16 或者 int8

0.7.6 Changes

Fixed bug that crashed after transcriptions ended #111
Fixed a bug where manually adding multiple words behind the scenes table cannot be closed
Demucs adds combined output of non-vocal tracks #110
- Added dichotomy output of vocals and other audio tracks
Updated functions related to subtitle display and editing
- Subtitle editing: Add batch advance and delay timestamp function
- The subtitle table shows:
  - Add a prompt function for background color of a timestamp with too short duration
  - Speakers are displayed as a separate column in the table

Hash

SHA-1 : 37C8C46BE3D297AD06FA4C887A69E2FB46CB49AB
MD5 : 09681381F2AF06749BB70030A411DFB6
CRC32 : D6FA10B2

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
ffmpeg is required to be installed
When using the V3 model, if the memory of GPU overflows, try turning off the word-level timestamp. If it still overflows, change the quantization method to float16 or int8

Assets 3

29 Mar 05:08

CheshireCC

0.7.2

f03e95d

0.7.2

0.7.2 改动

修正界面翻译不彻底的问题 #106
修复添加表格的逻辑 bug
修复 whisperX 不能批量处理的 bug
精简安装包大小

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）
需要安装 ffmpeg
转写结束之后或许存在不稳定的崩溃状况，如果转写结束之后崩溃，请关闭转写完成自动跳转功能，并在转写结束之后稍等片刻再点击跳转到结果页面

0.7.2 Changes

Fixed the problem of incomplete interface translation #106
Fix logic bug in adding tables
Fix WhisperX bug that can't be processed in batches
Thin Package Size

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
ffmpeg is required to be installed
-there may be an unstable crash condition after the end of processing. if the crash occurs after the transcript ends, please turn off the automatic jump function, and wait a moment after process, and then click to jump to the result page.

Assets 3

19 Mar 18:23

CheshireCC

0.7.0

c7dee02

0.7.0

0.7.0 改动

json 格式字幕支持
- 支持使用 json 格式保存字幕及单词级时间戳
ass 格式的支持
- 支持输出 ass 格式字幕文件，执行标准为：ssa v4.00+
读取 json 格式的字幕文件
- json 格式作为自动读取的首选格式
- 支持从 json 格式字幕文件中读取字幕及单词级时间戳
修复标签关闭但表格不被删除的 bug
修复 smi 格式的一些 bug

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）
需要安装 ffmpeg
转写结束之后或许存在不稳定的崩溃状况，如果转写结束之后崩溃，请关闭转写完成自动跳转功能，并在转写结束之后稍等片刻再点击跳转到结果页面

0.7.0 Changes

json format subtitle output
- we can use json file to save subtitles and word-level timestamp now
read subtitles from excited json file
- json format as the preferred format for automatic reading
- we can read subtitles and word-level timestamp from json file now
fixed bug that tab closed but tables don't be deleted
fixed bugs of smi format subtitle

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)
ffmpeg is required to be installed
-there may be an unstable crash condition after the end of processing. if the crash occurs after the transcript ends, please turn off the automatic jump function, and wait a moment after process, and then click to jump to the result page.

Assets 3

07 Mar 09:31

CheshireCC

0.6.7

dc356cf

0.6.7

0.6.7 改动

增加相同说话人字幕内容聚合功能 #82
- 参数页面增加相关设置项
- 输出 txt 格式字幕时，可以那顺序将相同说话人的说话内容聚合在一起。
数据标注功能 #78
- 按照 vocal_path,speaker_name,language,text 格式输出标注信息到 csv 文件
修复文件目录带有空格造成的 bug #71
- 暂时修复带有空格的文件目录被强制去除目录造成的问题
修复 whisperX 参数的 bug
- 修复 min_speakers、max_speakers 参数设置异常的 bug
字幕戳编辑功能进一步改进
- 新增右键菜单批量修改说话人的功能
- 新增右键菜单合并字幕语句的功能
简繁体转换问题 #77
- 语言 参数新增简体中文——zhs-Simplified Chinese 和繁体中文——zht-Traditional Chinese 选项
- 转写结束之后将会自动转换简繁体
- 打开已存在的字幕文件将会自动转换简繁体
修复单元格列宽 bug
- 修正单元格列宽逻辑
- 修正自适应列宽

提示

软件需要完全卸载旧版之后安装新版（cache文件夹可不做清理）

0.6.7 Changes

add the same speaker subtitle content aggregation function #82
- add related setting items to the parameters page
- when outputting subtitles in txt format, the words of the same speaker can be grouped together in that order.
data annotation function #78
- output the annotation information to the csv file in the format of vocal_path,speaker_name,language,text
fixed bug caused by spaces in the file directory #71
- temporarily fix the problem caused by the forced removal of file directories with spaces
repair the bug of the whisperX parameter
- fixed bug with abnormal setting of min_ Secreters and max_ Secreters parameters
subtitle stamp editing function is further improved
- added right-click menu to modify the speaker in batches
- added the function of right-click menu merging subtitle statements
conversion between simplified and traditional Chinese #77
- added simplified Chinese-zht-Traditional Chinese and traditional Chinese-zht-Traditional Chinese options for Language parameters
- the simplified and traditional Chinese will be converted automatically after the transliteration is finished.
- opening an existing subtitle file will automatically convert simplified and traditional Chinese
repair cell grid width bug
- modified cell lattice width logic
- modified adaptive column width

tips

the software needs to install the new version after completely uninstalling the old version (the cache folder can not be cleaned)

Assets 3

03 Mar 16:37

CheshireCC

0.6.0

c088113

0.6.0

0.6.0 改动

为 whisperX 添加粤语模型
- Whisper 模型参数中，"语言" 选择 yue ，实现粤语转写，输出结果为粤语口语，非中文书面语
- wshiperX 可以在粤语模式下进行时间戳对齐了
修复包含字幕轨的音视频文件识别失败的问题 #90
添加对 dilist 模型的支持
- 升级 faster-whisper 后端到 1.0.1 版本
- 升级 Ctranslate2 到 4.0.0 版本
升级 pytorch 、cuda 引擎
- 升级 pytorch 至 2.2.1
- 升级 CUDA 引擎支持 12.1
更新弹窗自动关闭逻辑
- “成功”提示弹窗将会在 5 秒后自动关闭

提示

dilist 模型暂时只支持英语输出

0.6.0 Changes

add a Cantonese model to whisperX
- in the parameters of the Whisper model, select yue for "language" to achieve Cantonese rewriting. The output result is spoken Cantonese and not Chinese written language.
- wshiperX can be time stamped in Cantonese mode.
fixed failure in audio and video file recognition with word screen tracks #90
add support for the dilist model
- upgrade the backend of faster- roomper to version 1.0.1
- upgrade version Ctranslate2 to version 4.0.0
upgrade pytorch and cuda engines
- upgrade pytorch to 2.2.1
- upgrade CUDA engine to support 12.1
update pop-up window automatic closing logic
- the "success" prompt will automatically close after 5 seconds.

Tips

dilist model only output with English language.

Assets 3

11 Jan 10:20

CheshireCC

0.5.7

1ccd169

0.5.7

0.5.7 改动

修复关闭表格时，当前转写结果不更新的 bug
文件列表功能更新 #66
- 添加从剪贴板读取、粘贴文件名到文件列表的功能
- 文件列表一键清除功能
- 完善文件列表多选时移除文件的功能逻辑
- 文件拖放支持文件夹功能
- 文件拖放支持子文件夹递归
添加手动导出、导入配置的功能
设置页面添加滚动
修复重复转写时同名不同路径的文件导致，表格覆盖且添加失败的问题。 #61
修复 V3 模型的在线下载功能
- 升级 faster- whisper 到 0.10.0
修复单词级时间戳占用显存过多导致速度变慢甚至崩溃的 bug
- 已经升级 CTranslate2 至最新版本，如果还是存在上述问题，请升级显卡驱动。
添加改变主题色功能
再次修复部分音视频文件无法识别音频流的 bug

提示

手动卸载 whisper 模型失败或者软件崩溃的情况下，请将 温度 参数设置为一个 0，温度候选个数设置为 1。
转写结果较多时窗体可能崩溃，建议关闭自动跳转功能
由于本人打包安装包的时候经常忘记封装 ffmpeg (悲)，所以以后的安装包可能都不再提供 ffmpeg，请大家自己安装好自己的 ffmpeg。本次单独提供 ffmpeg.7z 的下载，解压缩之后放在任意目录并添加该目录到环境变量，或者放入软件安装目录下。

0.5.7 Changes

fixed bug in which the result of the current overwrite is not updated when the table is closed
File list function update # 66
- add the ability to read and paste file names from the clipboard to the file list
- one-click clear function of file list
- improve the functional logic of removing files when multiple selections in the file list
- File drag and drop support folder function
- File drag and drop support subfolder recursion
add the ability to manually export and import configurations
set up the page to add scrolling
fixed the problem that the file with the same name and different path was overwritten and failed to be added when it was repeated. # 61
fixed the online download function of the V3 model
- upgrade faster-whisper to 0.10.0
fixed bug where word- level timestamps take up too much video memory, resulting in slowdowns or even crashes
- CTranslate2 has been upgraded to the latest version. If the above problems still exist, please upgrade the video card driver.
add the function of changing theme color
fixed bug that some audio/video files can not be read again.

Tips

if you fail to uninstall the whisper model manually or the software crashes, set the temperature parameter to 0 and the number of temperature candidates to 1.
The form may crash when there are many rewriting results. It is recommended to turn off the automatic jump function.
Since I often forget to package ffmpeg when packing and installing packages（sad）, I may no longer provide ffmpeg in future installation packages. Please install your own ffmpeg. Download ffmpeg.7z separately, unzip it and put it in any directory and add it to the environment variable, or put it in the software installation directory.

Assets 4

01 Jan 11:45

CheshireCC

0.5.4

4528607

0.5.4

0.5.4 改动

添加根据说话人和字幕时间戳进行音频分段输出的功能 (#54)
- 根据字幕时间戳和说话人将音频分割为多段音频并输出
升级字幕显示和编辑功能
- 字幕表格显示时间戳显示为 hh:mm:ss 形式
- 字幕时间戳编辑功能完善
- 字幕时间戳编辑功能支持单词级时间戳
- 自动修改所有相同说话人
修复文件添加到文件列表后会被占用的 bug
修复手动添加的已存在的字幕文件不能修改、保存的 bug
修复表格相关显示状态不随配置文件变化的 bug
修复部分音视频文件不能读取的 bug (#55)
- 修复文件 tag 含有特殊字符时，音视频文件可能读取失败的 bug

提示

手动卸载 whisper 模型失败或者软件崩溃的情况下，请将 温度 参数设置为一个 0，温度候选个数设置为 1。

0.5.4 Changes

add the ability to segment audio output according to the speaker and subtitle timestamp (#54)
- divide audio into multiple audio segments and output based on subtitle timestamp and speaker
upgrade subtitle display and editing functions
- the subtitle table displays the timestamp as hh:mm:ss
- the subtitle timestamp editing function is perfect.
- the subtitle timestamp editing function supports word- level timestamps
- automatically modify all the same speakers
fixed bug that will be occupied when files are added to the file list
fixed bug that manually added existing subtitle files that cannot be modified or saved
fixed bug where the display status of the table does not change with the profile
fixed bug that cannot be read from some audio and video files (#55)
- fixed that audio and video files may fail to read bug when the file tag contains special characters

Tips

if you fail to uninstall the whisper model manually or the software crashes, set the temperature parameter to 0 and the number of temperature candidates to 1.

Assets 3

19 Nov 11:08

CheshireCC

0.5.0_p1

1b5d4d0

0.5.0 patch

0.5.0 紧急修复

修复只能输出英语的 bug
- 下载 0.5.0 patch.7z 文件并解压缩，然后将解压得到的文件和文件夹放入 0.5.0 版本的安装目录下并替换原文件

0.5.0 Emergency Repair

Fixed bug of only English output
- Download the file 0.5.0 patch.7z and extract it, then put the unzipped files and folders in the installation directory of the 0.5.0 version and replace the original file.

Assets 3

Releases: CheshireCC/faster-whisper-GUI

0.8.1

0.8.1——2024.7.29

HASH

0.8.1 改动

提示

0.8.1 Changes

tips

0.8.0

0.8.0

HASH

0.8.0 改动

提示

0.8.0 Changes

tips

0.7.6

0.7.6

0.7.6 改动

Hash

提示

0.7.6 Changes

Hash

tips

0.7.2

0.7.2

0.7.2 改动

提示

0.7.2 Changes

tips

0.7.0

0.7.0

0.7.0 改动

提示

0.7.0 Changes

tips

0.6.7

0.6.7

0.6.7 改动

提示

0.6.7 Changes

tips

0.6.0

0.6.0

0.6.0 改动

提示

0.6.0 Changes

Tips

0.5.7

0.5.7

0.5.7 改动

提示

0.5.7 Changes

Tips

0.5.4

0.5.4

0.5.4 改动

提示

0.5.4 Changes

Tips

0.5.0 patch

0.5.0 patch

0.5.0 紧急修复

0.5.0 Emergency Repair