Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fastapi script #191

Merged
merged 270 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
270 commits
Select commit Hold shift + click to select a range
57816aa
add logger functions
jhj0517 Jul 6, 2024
7487339
Rename file
jhj0517 Oct 27, 2024
e488079
Rename model
jhj0517 Oct 27, 2024
db6e3b8
Rename model
jhj0517 Oct 27, 2024
4ecca5a
Rename model
jhj0517 Oct 27, 2024
b1afb93
Refactor dataclasses
jhj0517 Oct 27, 2024
819d618
Add `as_list()` to use in gradio function
jhj0517 Oct 27, 2024
29db2c8
Add `from_list()` to use in gradio function
jhj0517 Oct 27, 2024
5096321
Remove meaningless line
jhj0517 Oct 27, 2024
183da7d
Add missing parameter
jhj0517 Oct 27, 2024
a6b915f
Fix to_list error
jhj0517 Oct 27, 2024
7de504c
Update model usage
jhj0517 Oct 27, 2024
0506e4a
Fix class method attribute access
jhj0517 Oct 28, 2024
3bc4502
Update comment
jhj0517 Oct 28, 2024
9dd4fea
Refactor to gradio functions
jhj0517 Oct 28, 2024
e91bc87
Rename function and variable
jhj0517 Oct 28, 2024
7f33eaf
Remove meaningless info
jhj0517 Oct 28, 2024
09a6937
Remove meaningless info
jhj0517 Oct 28, 2024
bb2ddcc
Receive device as param
jhj0517 Oct 28, 2024
321ba7d
Use enum
jhj0517 Oct 28, 2024
6d33890
Pass device as param
jhj0517 Oct 28, 2024
19e180d
Update visibility by whisper implementation
jhj0517 Oct 28, 2024
77ad7d8
Rename variable
jhj0517 Oct 28, 2024
7938543
Fix order
jhj0517 Oct 28, 2024
5684894
Post cache
jhj0517 Oct 28, 2024
8933c2e
Fix component type
jhj0517 Oct 28, 2024
e58ee71
Handle gradio None values
jhj0517 Oct 28, 2024
d48aaa6
Handle gradio none values
jhj0517 Oct 28, 2024
383f1fc
Fix factory function
jhj0517 Oct 28, 2024
56acdc9
Use enum for string
jhj0517 Oct 28, 2024
22f99b6
Fix param validation
jhj0517 Oct 28, 2024
aae2366
Rename function
jhj0517 Oct 28, 2024
d8d2260
Rename class & file
jhj0517 Oct 28, 2024
587891d
Clean import
jhj0517 Oct 28, 2024
b22fbef
Add gradio value validation
jhj0517 Oct 28, 2024
3d55c8f
Use constant for gradio none validation values
jhj0517 Oct 28, 2024
1278fff
Update to use enum
jhj0517 Oct 28, 2024
9dc2df0
Update model
jhj0517 Oct 28, 2024
cf401aa
Fix VAD syntax & add vad handling case
jhj0517 Oct 28, 2024
0d0f4a1
Merge branch 'master' into feature/add-api
jhj0517 Oct 29, 2024
2d14a2b
Fix gradio input visibility by implementation type
jhj0517 Oct 29, 2024
95073dd
Add Segment model
jhj0517 Oct 29, 2024
cfab18d
Apply Segment model to the pipeline
jhj0517 Oct 29, 2024
cc9d982
Merge branch 'master' into feature/add-api
jhj0517 Oct 29, 2024
99fd02a
Fix vtt format
jhj0517 Oct 29, 2024
7bf4502
Separate logics and add router for it
jhj0517 Oct 29, 2024
937e8cd
Remove deprecates
jhj0517 Oct 30, 2024
d0e1903
Merge branch 'master' into feature/add-api
jhj0517 Oct 30, 2024
0e379db
Merge branch 'master' into feature/add-api
jhj0517 Nov 1, 2024
f02899a
Add transcription router
jhj0517 Nov 1, 2024
f1ef895
Init the model with lru cache function
jhj0517 Nov 1, 2024
033f702
Add config
jhj0517 Nov 1, 2024
5c3d1bb
Add space
jhj0517 Nov 1, 2024
a7165c2
Init when start up
jhj0517 Nov 1, 2024
184f95b
Add type hint
jhj0517 Nov 1, 2024
d4b1187
Fix function anme
jhj0517 Nov 1, 2024
8f3a502
Add attr
jhj0517 Nov 1, 2024
6bfe32b
Add router for bgm separation
jhj0517 Nov 1, 2024
61b8f36
Merge branch 'master' into feature/add-api
jhj0517 Nov 15, 2024
4e51128
Update comment
jhj0517 Nov 15, 2024
5810aaa
Update indent
jhj0517 Nov 15, 2024
8b276cc
Update comment
jhj0517 Nov 15, 2024
8befd00
rename function
jhj0517 Nov 15, 2024
321c409
Add colon
jhj0517 Nov 15, 2024
97b8b48
Update docs
jhj0517 Nov 15, 2024
91f0800
Add task scheme
jhj0517 Nov 15, 2024
77215a0
Use cached init_pipeline()
jhj0517 Nov 15, 2024
b82078a
Use cached instance
jhj0517 Nov 15, 2024
97a43e9
Use cached instance
jhj0517 Nov 15, 2024
97d162d
Add gitignore
jhj0517 Nov 16, 2024
e6b4984
Add gitignore
jhj0517 Nov 16, 2024
7378c61
Add dotenv
jhj0517 Nov 16, 2024
d553981
Add functools and dotenv function
jhj0517 Nov 16, 2024
ae4c8fa
Handle None values
jhj0517 Nov 16, 2024
3e54c1a
Rename functions to `get_`
jhj0517 Nov 16, 2024
1a3a12d
Add requirements-backend.txt
jhj0517 Nov 16, 2024
87c7a6a
Update to use default
jhj0517 Nov 16, 2024
2955815
Add db instance
jhj0517 Nov 16, 2024
11ca11d
Add about db instance
jhj0517 Nov 16, 2024
7637666
Add sqlmodel
jhj0517 Nov 16, 2024
5e72a9c
Refactor to SQLModel
jhj0517 Nov 16, 2024
9312448
Move models to `/task`
jhj0517 Nov 16, 2024
bd6640f
Move models to `db/task`
jhj0517 Nov 16, 2024
5c1145d
Rename `util` to `common`
jhj0517 Nov 16, 2024
19e5bdc
Add dao module
jhj0517 Nov 16, 2024
53754a7
Add router
jhj0517 Nov 17, 2024
11a8e2d
Add common models
jhj0517 Nov 17, 2024
cdb2e4a
Update docstring
jhj0517 Nov 17, 2024
b7a7a84
Update documentation
jhj0517 Nov 17, 2024
25099a1
Update documentation
jhj0517 Nov 17, 2024
37c9965
Update documentation
jhj0517 Nov 17, 2024
a530378
Update documentation
jhj0517 Nov 17, 2024
ee55742
Update documentation
jhj0517 Nov 17, 2024
2ba58c3
Add task type enum
jhj0517 Nov 17, 2024
f2b4a4f
Add audio info
jhj0517 Nov 17, 2024
f8c6796
Update que and polling system
jhj0517 Nov 17, 2024
4deb6d3
Implement polling system
jhj0517 Nov 17, 2024
df6db01
Add updated_at
jhj0517 Nov 17, 2024
4a4f9b2
Update que message
jhj0517 Nov 17, 2024
6da5642
Update to que system
jhj0517 Nov 17, 2024
56f6714
Update description of the fields
jhj0517 Nov 17, 2024
16a814c
Add `ResultType`
jhj0517 Nov 17, 2024
42e406d
Update default `ResultType`
jhj0517 Nov 17, 2024
aaee56a
Add result model
jhj0517 Nov 17, 2024
1b0122c
Update to queue system
jhj0517 Nov 17, 2024
99c08ee
Set duration
jhj0517 Nov 17, 2024
32e750c
Set duration
jhj0517 Nov 17, 2024
186225d
Add file download endpoint
jhj0517 Nov 17, 2024
c51a509
Refactoring main.py
jhj0517 Nov 17, 2024
5044828
Rename to main.py
jhj0517 Nov 17, 2024
699317f
Add config initialization
jhj0517 Nov 17, 2024
8b7be73
Fix enum type
jhj0517 Nov 17, 2024
9db73b0
Merge branch 'master' into feature/add-api
jhj0517 Nov 18, 2024
c528710
Merge branch 'master' into feature/add-api
jhj0517 Nov 18, 2024
01e807d
Fix JSON column type
jhj0517 Nov 18, 2024
c5fadb9
Fix path
jhj0517 Nov 18, 2024
838cbc2
Fix default path
jhj0517 Nov 18, 2024
6d54d7c
Fix kwarg
jhj0517 Nov 18, 2024
5e423fc
Move `download_file()`
jhj0517 Nov 18, 2024
b0bb075
Fix param
jhj0517 Nov 18, 2024
3f981a2
Inject dependency for session
jhj0517 Nov 18, 2024
650596b
Remove fastapi logic from the db
jhj0517 Nov 18, 2024
a91de0f
Add table creation
jhj0517 Nov 18, 2024
6935ef5
Add table creation process
jhj0517 Nov 18, 2024
2a45e9a
Fix status response
jhj0517 Nov 18, 2024
958e161
Add session open & closing logic to wrapper function
jhj0517 Nov 18, 2024
650926d
Fix uuid key error
jhj0517 Nov 18, 2024
382d0a7
Rename duplicate parameter `device` to `diarization_device` in query …
jhj0517 Nov 18, 2024
bf709ac
Rename duplicate UVR parameter in query parameters
jhj0517 Nov 18, 2024
0e2db2a
Add test dependency
jhj0517 Nov 18, 2024
e4d23f6
Merge branch 'master' into feature/add-api
jhj0517 Nov 19, 2024
83fe220
Fix wrong reference
jhj0517 Nov 19, 2024
69c18b5
Add transcription test
jhj0517 Nov 19, 2024
715c25d
Setup `pytest.fixture`
jhj0517 Nov 19, 2024
a2b8e76
Setup `pytest.fixture` for test
jhj0517 Nov 19, 2024
6a6ccfb
Add requirements for backend
jhj0517 Nov 19, 2024
f056ed7
Add test for backend
jhj0517 Nov 19, 2024
03b11b0
Relocate to `routers`
jhj0517 Nov 19, 2024
8def38f
Remove deprecated test file
jhj0517 Nov 19, 2024
9d7c81c
Update comment
jhj0517 Nov 19, 2024
663e030
Fix wrong key
jhj0517 Nov 19, 2024
ed5cd87
Fix wrong key
jhj0517 Nov 19, 2024
631fcb3
Fix wrong path
jhj0517 Nov 19, 2024
38f8131
Set it True
jhj0517 Nov 19, 2024
e8e7b0e
Fix json serialization
jhj0517 Nov 19, 2024
912d16e
Add comment
jhj0517 Nov 19, 2024
6b34bab
Update server config
jhj0517 Nov 19, 2024
ba5e593
Remove caching
jhj0517 Nov 19, 2024
9aa9d5b
Remove meaningless config setup after server initialization
jhj0517 Nov 19, 2024
f3c1b75
Ignore server config and use own setup
jhj0517 Nov 19, 2024
85a14f7
Update comment
jhj0517 Nov 19, 2024
3a298f8
Update conditional config load
jhj0517 Nov 19, 2024
b575b78
Use TEST_ENV for action
jhj0517 Nov 19, 2024
ccd3de9
Merge test into one workflow
jhj0517 Nov 19, 2024
d168fe2
Fix default location of the db file
jhj0517 Nov 19, 2024
a4f2f81
Rename function and enable caching
jhj0517 Nov 20, 2024
43398f2
Add metadatas
jhj0517 Nov 20, 2024
bc8a0bd
Fix response model and key bug
jhj0517 Nov 20, 2024
7b310c8
Include task router
jhj0517 Nov 20, 2024
1bbf2ac
Add type hint
jhj0517 Nov 20, 2024
bf3040b
Add status test
jhj0517 Nov 20, 2024
6b72d54
Make client test to not async
jhj0517 Nov 20, 2024
5d909c5
Add WER cal func
jhj0517 Nov 20, 2024
6402345
Add ANSWER config
jhj0517 Nov 20, 2024
4cf4155
Examine WER
jhj0517 Nov 20, 2024
0af463a
Add comment
jhj0517 Nov 20, 2024
e4e07b4
Fix param type
jhj0517 Nov 20, 2024
16b01e1
Remove meaningless check
jhj0517 Nov 20, 2024
180666b
Add VAD test
jhj0517 Nov 20, 2024
8eb1da9
Add hash file functions
jhj0517 Nov 20, 2024
9c4bc60
Update to use hash
jhj0517 Nov 20, 2024
5ed47e3
Add bgm separation backend
jhj0517 Nov 20, 2024
5b62667
Rename arg
jhj0517 Nov 20, 2024
25feb7c
Find files by hash
jhj0517 Nov 20, 2024
7c46394
Fix router
jhj0517 Nov 20, 2024
d2d9e03
Fix key error
jhj0517 Nov 20, 2024
b0e96bd
Fix key error
jhj0517 Nov 21, 2024
56c3f31
Use temporal output path
jhj0517 Nov 21, 2024
d5183cd
Add task fetching API for file response
jhj0517 Nov 21, 2024
8f4a5a6
Download actual file and check the type
jhj0517 Nov 21, 2024
ad23792
Add cache config
jhj0517 Nov 21, 2024
3b8336b
Add clean up function
jhj0517 Nov 21, 2024
b130453
Add cache path
jhj0517 Nov 21, 2024
4cd2a57
Update git ignore
jhj0517 Nov 21, 2024
a55223e
Added cache Clean up logic
jhj0517 Nov 21, 2024
9dc4543
Update to caching path
jhj0517 Nov 21, 2024
7e2c3ca
Update to remove in subdir also
jhj0517 Nov 21, 2024
8ca1b01
Except placeholder
jhj0517 Nov 21, 2024
d3dfa57
Disable delete method
jhj0517 Nov 21, 2024
1fb7ca3
Update description
jhj0517 Nov 21, 2024
4b66628
Add comment
jhj0517 Nov 21, 2024
578f59e
Set conditional `local_files_only`
jhj0517 Nov 21, 2024
82ccf01
Fix wrong key
jhj0517 Nov 21, 2024
96cbdc1
Skip UVR test if gpu is unavailable
jhj0517 Nov 21, 2024
4fd1319
Wrap with `Query` to fix doc issue
jhj0517 Nov 21, 2024
8a1aef6
Include error
jhj0517 Nov 21, 2024
8cd0188
Fix wrong key
jhj0517 Nov 21, 2024
33bd3cb
Fix abstraction error
jhj0517 Nov 21, 2024
3941bbb
Fix wrong key
jhj0517 Nov 21, 2024
2efc746
Revert "Wrap with `Query` to fix doc issue"
jhj0517 Nov 21, 2024
a6cccf7
Fix pydantic issue : https://github.com/pydantic/pydantic/issues/10912
jhj0517 Nov 21, 2024
f810e44
Specify reason
jhj0517 Nov 21, 2024
f3d2cf2
Rename app instance
jhj0517 Nov 22, 2024
4b613f1
Merge branch 'master' into feature/add-api
jhj0517 Nov 23, 2024
22bfb10
Set default to redoc
jhj0517 Nov 23, 2024
bd69919
Create README.md
jhj0517 Nov 23, 2024
681b065
Use absolute path
jhj0517 Nov 24, 2024
55e7846
Update description
jhj0517 Nov 24, 2024
4096b9a
Disable "all" endpoint
jhj0517 Nov 24, 2024
6bfd436
Update README.md
jhj0517 Nov 24, 2024
5dcb329
Update README.md
jhj0517 Nov 24, 2024
dfbb098
Merge branch 'master' into feature/add-api
jhj0517 Nov 25, 2024
d4b0a98
Merge branch 'master' into feature/add-api
jhj0517 Nov 25, 2024
e88c84c
Merge branch 'feature/add-api' of https://github.com/jhj0517/Whisper-…
jhj0517 Nov 25, 2024
ef4c067
Merge branch 'master' into feature/add-api
jhj0517 Nov 25, 2024
d13d773
Add dockerfile
jhj0517 Nov 25, 2024
7c26e87
Update README.md
jhj0517 Nov 25, 2024
fad840f
Set output dir as cache dir
jhj0517 Nov 26, 2024
a465f61
Update README.md
jhj0517 Nov 26, 2024
d911223
Update README.md
jhj0517 Nov 26, 2024
ab90fba
Update README.md
jhj0517 Nov 27, 2024
a28a332
Add CD pipeline for the backend
jhj0517 Nov 27, 2024
6f268fb
Remove version constraint of pydantic
jhj0517 Nov 28, 2024
932563e
Update app description
jhj0517 Nov 28, 2024
be7cfb2
Update comment
jhj0517 Nov 28, 2024
87f06df
Add documentation
jhj0517 Nov 29, 2024
1bca692
Fix path
jhj0517 Nov 29, 2024
9a34059
Fix path
jhj0517 Nov 29, 2024
0a8f0a6
Fix wrong use of async functions
jhj0517 Dec 4, 2024
a5ee3f3
Merge branch 'master' into feature/add-api
jhj0517 Dec 6, 2024
b6d28da
Update default mounting path
jhj0517 Dec 6, 2024
d5dd432
Fix wrong type
jhj0517 Dec 6, 2024
a0d9597
Update docs
jhj0517 Dec 6, 2024
6f36c07
Add response model for task status
jhj0517 Dec 6, 2024
2dbb278
Add converter function
jhj0517 Dec 6, 2024
2df6135
Apply converter function
jhj0517 Dec 6, 2024
20190e7
Fix circulation import error and remove `@classmethod`
jhj0517 Dec 7, 2024
eee8422
Update response model to TaskStatusResponse
jhj0517 Dec 7, 2024
effa17f
Update default docs from Redoc to Swagger UI
jhj0517 Dec 7, 2024
a043cc3
Fix type error in result
jhj0517 Dec 7, 2024
32e8e06
Merge branch 'master' into feature/add-api
jhj0517 Dec 7, 2024
d8f1dae
Fix wrong use of async
jhj0517 Dec 7, 2024
3a3759c
Merge branch 'master' into feature/add-api
jhj0517 Dec 14, 2024
eb601c2
Merge branch 'master' into feature/add-api
jhj0517 Dec 15, 2024
ac9057e
Remove legacy codes
jhj0517 Dec 15, 2024
a5539dd
Update comment
jhj0517 Dec 16, 2024
ce6bc6b
Fix readme URL
jhj0517 Dec 16, 2024
1dd708e
Check README
jhj0517 Dec 16, 2024
5565ccd
Update README
jhj0517 Dec 16, 2024
1146e9b
Update README
jhj0517 Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 25 additions & 156 deletions app.py

Large diffs are not rendered by default.

Empty file added backend/__init__.py
Empty file.
325 changes: 325 additions & 0 deletions backend/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,325 @@
import os
import argparse
import json
from io import BytesIO
import numpy as np
import faster_whisper
from faster_whisper.vad import VadOptions
from fastapi import (
File,
HTTPException,
Query,
UploadFile,
Form,
FastAPI,
Request,
WebSocket,
)
from typing import Annotated, Any, BinaryIO, Literal, Generator, Union, Optional, List, Tuple
from scipy.io.wavfile import write
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
import uvicorn
import requests
import io

from modules.whisper.faster_whisper_inference import FasterWhisperInference
from modules.utils.logger import get_backend_logger
from modules.vad.silero_vad import SileroVAD
from modules.diarize.diarizer import Diarizer
from modules.diarize.audio_loader import SAMPLE_RATE


backend_app = FastAPI()
backend_app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
vad_inferencer = None
whisper_inferencer = None
diarization_inferencer = None
logger = get_backend_logger()


def format_stream_result(generator: Generator[dict[str, Any], Any, None]):
for seg in generator:
yield json.dumps({
"seek": seg.seek,
"start": seg.start,
"end": seg.end,
"text": seg.text,
"tokens": seg.tokens
}, ensure_ascii=False) + "\n\n"
yield "[DONE]\n\n"


def format_json_result(
segments: Union[Generator[dict[str, Any], Any, None], List[dict]]
) -> dict[str, Any]:
result = []
for seg in segments:
result.append({
"seek": seg.seek,
"start": seg.start,
"end": seg.end,
"text": seg.text,
"tokens": seg.tokens
})
text = "\n".join([seg["text"] for seg in result])
return {
"text": text,
"segments": result,
}


async def read_audio(
file: Optional[UploadFile] = None,
file_url: Optional[str] = None
):
if (file and file_url) or (not file and not file_url):
raise HTTPException(status_code=400, detail="Provide only one of file or file_url")

if file:
file_content = await file.read()
elif file_url:
file_response = requests.get(file_url)
if file_response.status_code != 200:
raise HTTPException(status_code=422, detail="Could not download the file")
file_content = file_response.content
file_bytes = BytesIO(file_content)
return faster_whisper.audio.decode_audio(file_bytes)


@backend_app.post("/vad")
async def vad(
file: UploadFile = File(None),
threshold: float = Form(0.5),
min_speech_duration_ms: int = Form(250),
max_speech_duration_s: Optional[int] = Form(999),
min_silence_duration_ms: int = Form(2000),
window_size_samples: int = Form(1024),
speech_pad_ms: int = Form(400)
):
global vad_inferencer

if not isinstance(file, np.ndarray):
audio = await read_audio(file=file)
else:
audio = file

vad_options = VadOptions(
threshold=threshold,
min_speech_duration_ms=min_speech_duration_ms,
max_speech_duration_s=max_speech_duration_s,
min_silence_duration_ms=min_silence_duration_ms,
window_size_samples=window_size_samples,
speech_pad_ms=speech_pad_ms
)

preprocessed_audio = vad_inferencer.run(
audio=audio,
vad_parameters=vad_options
)

audio_output = io.BytesIO()
write(audio_output, SAMPLE_RATE, preprocessed_audio)
audio_output.seek(0)
return StreamingResponse(audio_output, media_type="audio/wav")


@backend_app.post("/diarization")
async def diarization(
file: UploadFile = File(None),
use_auth_token: str = Form(None),
transcript: Optional[List[dict]] = None,
):
global diarization_inferencer
global whisper_inferencer

if not isinstance(file, np.ndarray):
audio = await read_audio(file=file)
else:
audio = file

if transcript is None:
generator, info = whisper_inferencer.model.transcribe(
audio=audio
)
transcript = format_json_result(generator)["segments"]

diarized_transcript, elapsed_time = diarization_inferencer.run(
audio=audio,
transcribed_result=transcript,
use_auth_token=use_auth_token,
device=diarization_inferencer.device
)
return diarized_transcript


@backend_app.post("/transcription")
async def transcription(
file: UploadFile = File(default=None, description="Input file for video or audio. This will be pre-processed with ffmpeg in the server"),
file_url: str = Form(default=None, description="Input file url for video or audio. You need to provide either the file or the file URL, but not both"),
response_format: str = Form(default="json"),
model_size: str = Form(default="large-v2"),
language: Optional[str] = Form(default=None),
task: str = Form(default="transcribe"),
beam_size: int = Form(default=5),
best_of: int = Form(default=5),
patience: float = Form(default=1),
length_penalty: float = Form(default=1),
repetition_penalty: float = Form(default=1),
no_repeat_ngram_size: int = Form(default=0),
temperature: Union[float, List[float], Tuple[float, ...]] = Form(default=0.0),
compression_ratio_threshold: Optional[float] = Form(default=2.4),
log_prob_threshold: Optional[float] = Form(default=-1.0),
no_speech_threshold: Optional[float] = Form(default=0.6),
condition_on_previous_text: bool = Form(default=True),
prompt_reset_on_temperature: float = Form(default=0.5),
initial_prompt: Optional[Union[str, List[int]]] = Form(default=None),
prefix: Optional[str] = Form(default=None),
suppress_blank: bool = Form(default=True),
suppress_tokens: Optional[List[int]] = Form(default=None),
without_timestamps: bool = Form(default=False),
max_initial_timestamp: float = Form(default=1.0),
word_timestamps: bool = Form(default=False),
prepend_punctuations: str = Form(default="\"'“¿([{-"),
append_punctuations: str = Form(default="\"'.。,,!!??::”)]}、"),
max_new_tokens: Optional[int] = Form(default=None),
chunk_length: Optional[int] = Form(default=None),
clip_timestamps: Union[str, List[float]] = Form(default="0"),
hallucination_silence_threshold: Optional[float] = Form(default=None),
hotwords: Optional[str] = Form(default=None),
language_detection_threshold: Optional[float] = Form(default=None),
language_detection_segments: int = Form(default=1),

vad_filter: bool = Form(default=False),
threshold: float = Form(default=0.5),
min_speech_duration_ms: int = Form(default=250),
max_speech_duration_s: Optional[int] = Form(default=999),
min_silence_duration_ms: int = Form(default=2000),
window_size_samples: int = Form(default=1024),
speech_pad_ms: int = Form(default=400),

is_diarization: bool = Form(default=False),
use_auth_token: str = Form(default=None),
):
global whisper_inferencer
global vad_inferencer
global diarization_inferencer

if model_size != whisper_inferencer.current_model_size or whisper_inferencer.model is None:
whisper_inferencer.update_model(model_size, whisper_inferencer.current_compute_type)
logger.info("Model loaded")

audio = await read_audio(file=file, file_url=file_url)

if vad_filter:
vad_options = VadOptions(
threshold=threshold,
min_speech_duration_ms=min_speech_duration_ms,
max_speech_duration_s=max_speech_duration_s,
min_silence_duration_ms=min_silence_duration_ms,
window_size_samples=window_size_samples,
speech_pad_ms=speech_pad_ms
)
audio = vad_inferencer.run(
audio=audio,
vad_parameters=vad_options
)

segments, info = whisper_inferencer.model.transcribe(
audio=audio,
language=language,
task=task,
beam_size=beam_size,
best_of=best_of,
patience=patience,
length_penalty=length_penalty,
repetition_penalty=repetition_penalty,
no_repeat_ngram_size=no_repeat_ngram_size,
temperature=temperature,
compression_ratio_threshold=compression_ratio_threshold,
log_prob_threshold=log_prob_threshold,
no_speech_threshold=no_speech_threshold,
condition_on_previous_text=condition_on_previous_text,
prompt_reset_on_temperature=prompt_reset_on_temperature,
initial_prompt=initial_prompt,
prefix=prefix,
suppress_blank=suppress_blank,
suppress_tokens=suppress_tokens,
without_timestamps=without_timestamps,
max_initial_timestamp=max_initial_timestamp,
word_timestamps=word_timestamps,
prepend_punctuations=prepend_punctuations,
append_punctuations=append_punctuations,
max_new_tokens=max_new_tokens,
chunk_length=chunk_length,
clip_timestamps=clip_timestamps,
hallucination_silence_threshold=hallucination_silence_threshold,
hotwords=hotwords,
language_detection_threshold=language_detection_threshold,
language_detection_segments=language_detection_segments
)

if response_format == "stream":
return StreamingResponse(
format_stream_result(segments),
media_type="text/event-stream",
)

if is_diarization:
segments = [seg for seg in segments]
segments = diarization_inferencer.run(
audio=audio,
transcribed_result=segments,
use_auth_token=use_auth_token,
device=diarization_inferencer.device
)

elif response_format == "json":
return format_json_result(segments)

raise HTTPException(400, "Invalid response_format")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", default="0.0.0.0", type=str)
parser.add_argument("--port", default=5000, type=int)
parser.add_argument("--device", type=str, help="Device for the whisper models between ['cuda', 'cpu', 'auto']. It will use cuda if it's enabled by default")
parser.add_argument("--diarization_device", type=str,
help="Device for the diarization models between ['cuda', 'cpu', 'mps']. It will use cuda if it's enabled by default")
parser.add_argument("--initial_model", type=str,
default="large-v2", help="The whisper model to load initially when server start")
parser.add_argument('--faster_whisper_model_dir', type=str,
default=os.path.join("models", "Whisper", "faster-whisper"),
help='Directory path of the faster-whisper model')
parser.add_argument('--diarization_model_dir', type=str, default=os.path.join("models", "Diarization"),
help='Directory path of the diarization model')

args = parser.parse_args()

whisper_inferencer = FasterWhisperInference(
model_dir=args.faster_whisper_model_dir,
output_dir=os.path.join("outputs"),
args=args
)

if not (args.initial_model in whisper_inferencer.available_models):
raise HTTPException(400, f"The initial model you set \"{args.initial_model}\" is not available.")
if args.device is not None:
whisper_inferencer.device = args.device

whisper_inferencer.update_model(model_size="large-v2", compute_type=whisper_inferencer.current_compute_type)
vad_inferencer = SileroVAD()
diarization_inferencer = Diarizer(
model_dir=args.diarization_model_dir
)
if args.diarization_device is not None:
diarization_inferencer.device = args.diarization_device

uvicorn.run(backend_app, host=args.host, port=args.port)
Loading