WhaleClassification

Project whose goal is the automatic classification of Whales and Dolphins from recordings of their underwater sounds. Toothed whales production sterotyped echolocation clicks and baleen whales produce sounds that are sometimes called songs. Examples of whale and dolphin sounds can be found at the Voices in the Sea.

Notebooks

SETUP:

InitialSetup.ipynb: Install Git; Clone existing repo; Copy Data.
DATA ANALYSIS:

Data_Processing_Whales.ipynb: includes a detailed description of echo-location clicks data fields, and some sample plots of waveforms and spectra for both species. Two species generate different waveforms. peak2peak measures the difference between the max and the min of the wave form. Spectra is computed from the waveform using FFT. Applying PCA on spectra, the top 5 eigenvectors explain ~85% variances. Still, Overlaps exist on the projection of top eigenvectors of two species. It is not clear how eigenvectors of spectra can be used to distinguish two species.

XGBoost_Whales.ipynb: compares the feature's relative importance among first 10 eigen vectors, rmse and peak2peak of spectra, by fitting into an XGBModel. The first 2-3 eigenvectors turn out to be the most important, usually assigned at least ~20% more weights. The ROC Analysis suggests that by abstaining some part of the region in between gave us better classification accuracy.

Q: What are the drawbacks of abstaining crossing area between two species in order to achieve high prediction accuracy?

Training and Feature Extraction with Reassigned Labels - ICI Mode, Peak2Peak, RMSE, Eigen.ipynb: introduces Interclick Interval(ICI), the time difference between two clicks within a given bout. Two species have obvious differences on the overall distribution of ICI. Mode of ICI has more distinguishable distribution than the median of ICI between two species.
MODEL

Training and Feature Extraction - ICI Mode, Peak2Peak, RMSE, Eigen.ipynb: designs a feature vector which includes 1)PCA projection values of spectra by taking the first 5 eigenvectors, 2) rmse of spectra, 3)peak2peak, and 4)ICI Mode. The goal is to optimize the prediction accuracy of species with objective function:

Five classification models are applied to truly detected and correctly classified data samples.

Logistic Regression SVM Model Decision Tree Random Forest GB Trees

Training Accuracy 0.8302 0.8303 0.8544 0.8447 0.8574

Testing Accuracy 0.8301 0.8301 0.8542 0.8445 0.8572

Training and Feature Extraction-ICI median.ipynb: uses all same features except for ICI Median instead of ICI Mode. The classification models produce similar performance regrading to prediction accuracy.

Training and Feature Extraction with Reassigned Labels - ICI Mode, Peak2Peak, RMSE, Eigen.ipynb: relabels predicted species according to the ratio for each bout, i.e. all clicks in a specific bout are relabeled to the same species. Using all the same features as the first one except for using relabeled predictions. The accuracy on general classification model raises to ~90%.

Data

Data is stored on two buckets in S3

s3://gulf-whales: Contains underwaters sound clips of echolocation clicks from two kinds of beaked whales (Cuvier's and Gervais') that were recorded in the Gulf of Mexico after the Deepwater Horizon oil spill. The goal is a binary classification that separates these two species.
- Two page description
s3://hdsi-whales: 4TB of sound data from the Pacific Ocean and 4TB of data from the Atlantic Ocean which were annotated for whale and dolphin sounds for the Marine Mammal Detection, Classification, Localization and Density Estimation Workshops (DCLDE) that were conducted in:
- 2015 7th International DCLDE which is based on marine mammal sounds in the Pacific
- 2018 8th International DCLDE which is based on marine mammal sounds in the Atlantic
- Listing of files is in hdsi-whales.ls

Kait Frasier's method

Kate Frasier recently published a new approach for Automated classification of dolphin echolocation click types based on unsupervised network-based classification that has achieved excellent results for classifiying a variety of species.

Proposed Project Steps

Preparatory Steps

Write a high level summary of click descriptions including the following terms: click detector, click spectra, inter-click-interval, click bout, peak-to-peak amplitude, and click time series.
Perform low level signal processing including click detection and definition of an amplitude threshold, spectral calculations and principal-component-analysis (PCA).

Replication of Prior Results

Re-do the notebooks for the Gulf of Mexico beaked whale dataset.
Re-write Kate Frasier's code in Python.
Replicate the click type analysis found in Kate's paper (note that only a subset of all the Atlantic species were found in the Gulf of Mexico and visa-versa).

Possible Direction for Novel Project Analysis

Use the annonated data sets for the DCLDE 2015 and 2018 to discover the full range of sounds present.
Explore alternative unsupervised learning methods.
Explore supervised learning methods such as boosting, and random forest.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
DSE230_version		DSE230_version
Jingwu		Jingwu
Sumit_et_al		Sumit_et_al
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhaleClassification

Notebooks

Data

Kait Frasier's method

Proposed Project Steps

Preparatory Steps

Replication of Prior Results

Possible Direction for Novel Project Analysis

About

Releases 1

Packages

Contributors 6

Languages

	Logistic Regression	SVM Model	Decision Tree	Random Forest	GB Trees
Training Accuracy	0.8302	0.8303	0.8544	0.8447	0.8574
Testing Accuracy	0.8301	0.8301	0.8542	0.8445	0.8572

yoavfreund/BeakedWhaleClassification

Folders and files

Latest commit

History

Repository files navigation

WhaleClassification

Notebooks

Data

Kait Frasier's method

Proposed Project Steps

Preparatory Steps

Replication of Prior Results

Possible Direction for Novel Project Analysis

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages