Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Closed
josancamon19 opened this issue Jun 20, 2024 · 4 comments
Assignees
Labels
flutter flutter work task

Comments

@josancamon19
Copy link
Contributor

Is your feature request related to a problem? Please describe.
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.

Describe the solution you'd like
A clear and concise description of what you want to happen.

vad.dart


import 'dart:io';

import 'dart:async';
import 'package:flutter/services.dart';
import 'package:fonnx/models/sileroVad/silero_vad.dart';
import 'package:path_provider/path_provider.dart' as path_provider;
import 'package:path/path.dart' as path;

import 'package:flutter/foundation.dart';

class VadUtil {
  SileroVad? vad;
  dynamic hn;
  dynamic cn;


  init() async {
    final modelPath = await getModelPath('silero_vad.onnx');
    vad = SileroVad.load(modelPath);
  }

  Future<bool> predict(Uint8List bytes) async {
    if (vad == null) return true;
    final result = await vad!.doInference(bytes, previousState:  {
      'hn': hn,
      'cn': cn,
    });
    hn = result['hn'];
    cn = result['cn'];
    debugPrint('Result output: ${result['output'][0]}');
    return result['output'][0] > 0.1; // what's the right threshold?
  }

  Future<String> getModelPath(String modelFilenameWithExtension) async {
    if (kIsWeb) {
      return 'assets/$modelFilenameWithExtension';
    }
    final assetCacheDirectory = await path_provider.getApplicationSupportDirectory();
    final modelPath = path.join(assetCacheDirectory.path, modelFilenameWithExtension);

    File file = File(modelPath);
    bool fileExists = await file.exists();
    final fileLength = fileExists ? await file.length() : 0;

// Do not use path package / path.join for paths.
// After testing on Windows, it appears that asset paths are _always_ Unix style, i.e.
// use /, but path.join uses \ on Windows.
    final assetPath = 'assets/${path.basename(modelFilenameWithExtension)}';
    final assetByteData = await rootBundle.load(assetPath);
    final assetLength = assetByteData.lengthInBytes;
    final fileSameSize = fileLength == assetLength;
    if (!fileExists || !fileSameSize) {
      debugPrint('Copying model to $modelPath. Why? Either the file does not exist (${!fileExists}), '
          'or it does exist but is not the same size as the one in the assets '
          'directory. (${!fileSameSize})');
      debugPrint('About to get byte data for $modelPath');

      List<int> bytes = assetByteData.buffer.asUint8List(
        assetByteData.offsetInBytes,
        assetByteData.lengthInBytes,
      );
      debugPrint('About to copy model to $modelPath');
      try {
        if (!fileExists) {
          await file.create(recursive: true);
        }
        await file.writeAsBytes(bytes, flush: true);
      } catch (e) {
        debugPrint('Error writing bytes to $modelPath: $e');
        rethrow;
      }
      debugPrint('Copied model to $modelPath');
    }

    return modelPath;
  }
}

At transcript.dart this should be implemented before calling transcribeAudioFile2.

CleanShot 2024-06-20 at 11 14 05@2x

https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.

Describe alternatives you've considered

Part 1: determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.

Part 2: More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.

@kodjima33 kodjima33 moved this to Backlog in omi TODO Jun 20, 2024
@josancamon19 josancamon19 changed the title Finish local VAD integration preprocessing audio before being sent to deepgram/server Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150) Jun 20, 2024
@josancamon19 josancamon19 changed the title Finish local VAD integration preprocessing audio before being sent to deepgram/server ($150) Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200) Jun 20, 2024
@josancamon19 josancamon19 added the flutter flutter work label Jun 21, 2024
@josancamon19 josancamon19 changed the title Finish local VAD integration preprocessing audio before being sent to deepgram/server ($200) Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300) Jun 30, 2024
@josancamon19 josancamon19 moved this from Backlog to In progress in omi TODO Jul 3, 2024
@josancamon19 josancamon19 self-assigned this Jul 3, 2024
@josancamon19 josancamon19 changed the title Finish local VAD integration preprocessing audio before being sent to deepgram/server ($300) Finish local VAD integration preprocessing audio before being sent to deepgram/server Jul 3, 2024
@josancamon19
Copy link
Contributor Author

First attempt, about 3 hours. 3 weeks ago.
Failed Miserably with silero vad again. 3 hours again.

@josancamon19 josancamon19 moved this from In progress to Backlog in omi TODO Jul 8, 2024
@kodjima33 kodjima33 removed the status in omi TODO Jul 8, 2024
@josancamon19 josancamon19 moved this to Backlog in omi TODO Jul 27, 2024
@josancamon19
Copy link
Contributor Author

We will do this in the backend, will expand on details later.

@github-project-automation github-project-automation bot moved this from Backlog to Done in omi TODO Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flutter flutter work task
Projects
Status: Done
Development

No branches or pull requests

2 participants
@josancamon19 and others