Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

josancamon19 · 2024-06-20T18:29:19Z

Is your feature request related to a problem? Please describe.
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.

Describe the solution you'd like
A clear and concise description of what you want to happen.

vad.dart


import 'dart:io';

import 'dart:async';
import 'package:flutter/services.dart';
import 'package:fonnx/models/sileroVad/silero_vad.dart';
import 'package:path_provider/path_provider.dart' as path_provider;
import 'package:path/path.dart' as path;

import 'package:flutter/foundation.dart';

class VadUtil {
  SileroVad? vad;
  dynamic hn;
  dynamic cn;


  init() async {
    final modelPath = await getModelPath('silero_vad.onnx');
    vad = SileroVad.load(modelPath);
  }

  Future<bool> predict(Uint8List bytes) async {
    if (vad == null) return true;
    final result = await vad!.doInference(bytes, previousState:  {
      'hn': hn,
      'cn': cn,
    });
    hn = result['hn'];
    cn = result['cn'];
    debugPrint('Result output: ${result['output'][0]}');
    return result['output'][0] > 0.1; // what's the right threshold?
  }

  Future<String> getModelPath(String modelFilenameWithExtension) async {
    if (kIsWeb) {
      return 'assets/$modelFilenameWithExtension';
    }
    final assetCacheDirectory = await path_provider.getApplicationSupportDirectory();
    final modelPath = path.join(assetCacheDirectory.path, modelFilenameWithExtension);

    File file = File(modelPath);
    bool fileExists = await file.exists();
    final fileLength = fileExists ? await file.length() : 0;

// Do not use path package / path.join for paths.
// After testing on Windows, it appears that asset paths are _always_ Unix style, i.e.
// use /, but path.join uses \ on Windows.
    final assetPath = 'assets/${path.basename(modelFilenameWithExtension)}';
    final assetByteData = await rootBundle.load(assetPath);
    final assetLength = assetByteData.lengthInBytes;
    final fileSameSize = fileLength == assetLength;
    if (!fileExists || !fileSameSize) {
      debugPrint('Copying model to $modelPath. Why? Either the file does not exist (${!fileExists}), '
          'or it does exist but is not the same size as the one in the assets '
          'directory. (${!fileSameSize})');
      debugPrint('About to get byte data for $modelPath');

      List<int> bytes = assetByteData.buffer.asUint8List(
        assetByteData.offsetInBytes,
        assetByteData.lengthInBytes,
      );
      debugPrint('About to copy model to $modelPath');
      try {
        if (!fileExists) {
          await file.create(recursive: true);
        }
        await file.writeAsBytes(bytes, flush: true);
      } catch (e) {
        debugPrint('Error writing bytes to $modelPath: $e');
        rethrow;
      }
      debugPrint('Copied model to $modelPath');
    }

    return modelPath;
  }
}

At transcript.dart this should be implemented before calling transcribeAudioFile2.

https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.

Describe alternatives you've considered

Part 1: determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.

Part 2: More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.

The text was updated successfully, but these errors were encountered:

josancamon19 · 2024-07-04T03:00:05Z

First attempt, about 3 hours. 3 weeks ago.
Failed Miserably with silero vad again. 3 hours again.

josancamon19 · 2024-07-27T05:11:18Z

We will do this in the backend, will expand on details later.

kodjima33 added this to omi TODO Jun 20, 2024

kodjima33 moved this to Backlog in omi TODO Jun 20, 2024

josancamon19 added Paid Bounty 💰 task labels Jun 20, 2024

josancamon19 changed the title ~~Finish local VAD integration preprocessing audio before being sent to deepgram/server~~ Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150) Jun 20, 2024

josancamon19 changed the title ~~Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150)~~ Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200) Jun 20, 2024

josancamon19 added the flutter flutter work label Jun 21, 2024

josancamon19 changed the title ~~Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200)~~ Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300) Jun 30, 2024

josancamon19 moved this from Backlog to In progress in omi TODO Jul 3, 2024

josancamon19 self-assigned this Jul 3, 2024

josancamon19 changed the title ~~Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300)~~ Finish local VAD integration preprocessing audio before being sent to deepgram/server Jul 3, 2024

josancamon19 removed the Paid Bounty 💰 label Jul 3, 2024

josancamon19 moved this from In progress to Backlog in omi TODO Jul 8, 2024

kodjima33 removed the status in omi TODO Jul 8, 2024

josancamon19 moved this to Backlog in omi TODO Jul 27, 2024

josancamon19 closed this as completed Jul 27, 2024

github-project-automation bot moved this from Backlog to Done in omi TODO Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

josancamon19 commented Jun 20, 2024

josancamon19 commented Jul 4, 2024

josancamon19 commented Jul 27, 2024

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Comments

josancamon19 commented Jun 20, 2024

josancamon19 commented Jul 4, 2024

josancamon19 commented Jul 27, 2024