Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update speech lunch #238

Open
wants to merge 2 commits into
base: source
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions _pages/speech-lunch.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ Please contact Yifan Peng ([email protected]) and Shinji Watanabe (shinjiw
<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRQJKbd_caVWoWstQ4W93XP9jikGDp6ablHQQJoV4iIxV7kVuDfj7F9zz8VBvDG6Crbh8jLjadBd6GN/pubhtml?widget=true&amp;headers=false" width="100%" height="600"></iframe>

## Previous Talks
- November 21, 2024
- Title: Generalizing Audio Deepfake Detection
- Speaker: You Zhang (University of Rochester)
- Abstract: The rapid evolution of AI-driven speech generation has made it increasingly difficult to distinguish between authentic and deepfake audio, enabling potential misuse in criminal activities. This highlights the pressing need for robust audio deepfake detection (ADD) systems capable of effectively mitigating these threats. For reliable performance, ADD systems must generalize well to emerging and unknown deepfake techniques, remain robust to variations in speech attributes (e.g., speaker identity, channel, codec), and integrate seamlessly with other biometric tools. In this presentation, we introduce SAMO, a novel multicenter one-class learning training strategy tailored for ADD. SAMO addresses the distribution mismatches between training and evaluation data while accounting for speaker diversity. We will also discuss ongoing efforts to extend speech deepfake detection to singing voice deepfakes and further expand from audio-only (uni-modal) to audio-visual (multi-modal) detection to combat video deepfakes. Another initiative explores audio watermarking, a proactive technique that embeds generative algorithm identifiers into audio or the generative model itself, enabling authorized entities to trace the origins of deepfake speech.
- Bio: You (Neil) Zhang is a PhD candidate in the Audio Information Research (AIR) Lab at the University of Rochester, working with Prof. Zhiyao Duan. His research focuses on applied machine learning in speech and audio processing, including audio deepfake detection, spatial audio rendering, and audio-visual analysis. His work has been recognized by the Rising Star Program in Signal Processing at ICASSP 2023, the NIJ Graduate Research Fellowship Program, and the IEEE SPS Scholarship.

- November 14, 2024
- Title: Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
- Speaker: Li-Wei Chen (CMU)
Expand Down
Loading