You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.
Describe the solution you'd like
A clear and concise description of what you want to happen.
vad.dart
import 'dart:io';
import 'dart:async';
import 'package:flutter/services.dart';
import 'package:fonnx/models/sileroVad/silero_vad.dart';
import 'package:path_provider/path_provider.dart' as path_provider;
import 'package:path/path.dart' as path;
import 'package:flutter/foundation.dart';
class VadUtil {
SileroVad? vad;
dynamic hn;
dynamic cn;
init() async {
final modelPath = await getModelPath('silero_vad.onnx');
vad = SileroVad.load(modelPath);
}
Future<bool> predict(Uint8List bytes) async {
if (vad == null) return true;
final result = await vad!.doInference(bytes, previousState: {
'hn': hn,
'cn': cn,
});
hn = result['hn'];
cn = result['cn'];
debugPrint('Result output: ${result['output'][0]}');
return result['output'][0] > 0.1; // what's the right threshold?
}
Future<String> getModelPath(String modelFilenameWithExtension) async {
if (kIsWeb) {
return 'assets/$modelFilenameWithExtension';
}
final assetCacheDirectory = await path_provider.getApplicationSupportDirectory();
final modelPath = path.join(assetCacheDirectory.path, modelFilenameWithExtension);
File file = File(modelPath);
bool fileExists = await file.exists();
final fileLength = fileExists ? await file.length() : 0;
// Do not use path package / path.join for paths.
// After testing on Windows, it appears that asset paths are _always_ Unix style, i.e.
// use /, but path.join uses \ on Windows.
final assetPath = 'assets/${path.basename(modelFilenameWithExtension)}';
final assetByteData = await rootBundle.load(assetPath);
final assetLength = assetByteData.lengthInBytes;
final fileSameSize = fileLength == assetLength;
if (!fileExists || !fileSameSize) {
debugPrint('Copying model to $modelPath. Why? Either the file does not exist (${!fileExists}), '
'or it does exist but is not the same size as the one in the assets '
'directory. (${!fileSameSize})');
debugPrint('About to get byte data for $modelPath');
List<int> bytes = assetByteData.buffer.asUint8List(
assetByteData.offsetInBytes,
assetByteData.lengthInBytes,
);
debugPrint('About to copy model to $modelPath');
try {
if (!fileExists) {
await file.create(recursive: true);
}
await file.writeAsBytes(bytes, flush: true);
} catch (e) {
debugPrint('Error writing bytes to $modelPath: $e');
rethrow;
}
debugPrint('Copied model to $modelPath');
}
return modelPath;
}
}
At transcript.dart this should be implemented before calling transcribeAudioFile2.
https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.
Describe alternatives you've considered
Part 1: determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.
Part 2: More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.
The text was updated successfully, but these errors were encountered:
josancamon19
changed the title
Finish local VAD integration preprocessing audio before being sent to deepgram/server
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150)
Jun 20, 2024
josancamon19
changed the title
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150)
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200)
Jun 20, 2024
josancamon19
changed the title
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200)
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300)
Jun 30, 2024
josancamon19
changed the title
Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300)
Finish local VAD integration preprocessing audio before being sent to deepgram/server
Jul 3, 2024
Is your feature request related to a problem? Please describe.
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.
Describe the solution you'd like
A clear and concise description of what you want to happen.
vad.dart
At
transcript.dart
this should be implemented before callingtranscribeAudioFile2
.https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.
Describe alternatives you've considered
Part 1: determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.
Part 2: More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.
The text was updated successfully, but these errors were encountered: