Skip to content

Digitalswede/ADS_2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADS_2021

Portfolio for applied data science minor

Name: Björn Appehl
Student ID: 21087024
Group: Team Dialogue

Introduction

This is the reader's guide for my portfolio created for the Applied Data Science minor at The Hague University of Applied Sciences.
Below are the criteria from the scoring matrix with detailed information about each criteria.
The reader's guide is numbered for easier reading.
A big thank you to the rest of Team Dialogue, and thanks to to Jeroen, Tony, Ruud and Hani for all your advice in the past six months.

1. Reflection and evaluation

In this chapter I explain the project from a evaluating perspective, and give context to these perspectives through the STARR method. The chapter is divided up in three parts, self reflection, reflection on learning goals, and group reflection.

1.1 Reflection on own contribution to the project
  • Situation: Our project group consisted of 6 members, and we worked with audio data to detect conversation for the Smart Teddy Bear project. We all worked together to ensure everyone would get hands-on experience with every aspect of the project work, although this was hard to realize and in the end some work ended up being unevenly distributed. Since I don't have a great deal of experience writing code, I was a little out of the loop at the end of the project when the code we had for our CNN's became more and more complex. However, at that point I took on other duties which helped the group as a whole but did not give me as much programming experience as some others.

  • Task: My tasks in the group varied, early on there was a lot of hands-on with coding simple models. One example is creating a simple nerual network (see section 3.1) that ended up being the first real algorithm the group used, since it had the best results at that stage. David and I worked on that iteration of our prototype. Later on I started exploring different datasets and drew up some requirements and comparisons for the datasets we ended up using. As the groups priorities shifted, I found myself taking on a lot of presentations and other communication duties along with writing the paper, since we had other people who were simply better at crunching code and it became a matter of time in the final stages. I also helped David & Maria who gave the learning lab feedback and suggestions for topics for them to cover, however I didn't end up taking part in presenting our learning lab.

  • Activities: The first model I created in the minor was a Logistic Regression model which was based on transcripts from a TV show. The model's purpose was to estimate which line was most likely being said by which speaker. On top of this, I was also splicing audio, normalizing sound levels and transforming our datasets to be more difficult. I helped streamline our image data pipeline, unfortunately I finished it right when we shifted to using numerical arrays instead of image data, so it was in the end not necessary. These are only some examples of what I did and you can read more about all of it below.

  • Result: For the presentations I was a part of, I created a lot of the slides along with the overall layout of the presentations. During the period in which I was scrummaster I took care of our task board single handedly. The code I wrote early on was a simple logistic regression model that was later converted to take audio data (in the format of NumPy arrays) as input, however at that point the model also had to change since the RFC model had better accuracy. My work on the dataset helped us get good data quite early on the project, which I see as a great benefit for our neural networks.

  • Reflect: The contributions I made to the project gave me a much better understanding of data science as a whole. I have a strong feeling that the techniques and methods used in this minor will be of use to me in a professional setting. I regret not being a bigger part of the learning lab our group gave, since it would have been a good chance to expand my own knowledge in the domain.

1.2 Reflection on own learning objectives
  • Situation: Since I am studying Business Process Development & Informatics at my home university, which contains a lot of information about theoretical ICT usage, I wanted to try something more hands on for my exchange. This is part of why I chose the ADS minor, but also since I have always been interested in understanding how algorithms, machine learning and neural networks & such work. In order to put myself in a position where I could learn as much as possible, I did not want to choose a field where I was already well aquainted with the contents. In the group project, we also had to learn about audio data processing a lot. This is something very unexpected, but I'm glad it happened since I now have a much better understanding of audio data processing.

  • Task: As a group, we had to figure out a way to use audio data in such a way that we could make predictions on the amount of speakers, and the duration of speech. For the majority of the project, we did not use predetermined roles for our development cycle. Some of my important tasks included: Data cleaning, data transformation, coding the neural networks, giving presentations, and working on our paper. All these tasks helped me understand more about data science as a whole.

  • Action: I created machine learning models, such as a CNN and a Linear Regression model, to explore and get a better understanding of the domain which is data science. We all worked with the algorithms, and other than those, I also spent time looking for and creating datasets for our group to use.

  • Result: I ended up getting a very deep understanding of data science during this semester, more so than I thought I would. Our algorithm performed well, and I think this is in part due to all of the group learning from eachother and working in a good pace with little downtime during our productive hours. Working with data science techniques was extremely interesting to me, and I consider my learning goals fulfilled.

  • Reflection: All the tasks I completed helped me understand more about data science as a whole. I now consider myself a lot more educated when it comes to data science in general, and my personal goals were achieved. I think the workflow in our group was over expectation, and I am very happy with how the group turned out. One thing I would have improved upon is to stay even more on top of the coding work, since I fell behind a little bit right at the end, due to other group members keeping a very high tempo.

1.3 Evaluation on the group project as a whole
  • Situation: Right from the start, our group contained a lot of different skill sets and this showed during our project. Some were better at writing code, while some had more experience in working with Scrum or other benefitial traits. The cohesion was always quite high in my opinion, and there was never any conflict in the group. Early on, we made it clear what we expect from eachother in terms of workload (i.e not scheduling project work on weekends or after 5pm), punctuality, etc, which helped us work more effectively and better as a group.

  • Task: For the duration of the entire project, the workload of all members shifted depending on what stage the project was in. Despite this, some ended up doing a lot more coding than some others, but everyone still partook in presentations and communication along with participating on writing the paper. While everyone did get a little bit of experience in all areas, the workload could have been changed to avoid this. However we wanted to avoid a set schedule with responsibilities in order to not have a member doing something they would rather not do. An unmotivated member working on a task just because it has been assigned to them is not always optimal, in our case we instead focused on everyone doing what they wanted to do based on the current workload at the time.

  • Activities: Our application of Scrum consisted of daily online standups, which we had mostly every weekday for the minor unless something else was said. We still had physical standup meetings on days where we were all gathering to work at campus. These 'working days' on campus became quite central in our work, and 2-3 days every week was spent on campus.

  • Results: I, and I belive all other group members, are happy with the results we achieved. Not only are we happy with the algoithm, which gives great results as far as we can see, we also achieved the results working in a sustainable and reasonable pace with little conflict or unnecessary stress. My knowledge about statistics has also increased after taking this minor.

  • Reflection: I'm sure none of our group members are finishing this minor without having learned something. The distribution of workload in retrospect was, according to me, a very good way to make sure noone is understimulated or has too much to do. While it took a few weeks to get this running smoothly, mostly due to all members getting to know eachother and their skill sets, it ended up being very benefitial for us. If I were to do this project again, I would happily work with the same group in the same manner as we did. The working days on campus was, according to me, a big factor in our projects success and helped us work better together and make social connections.

2. Research Project

In this chapter I go over the work which relates to the research we did, and how we performed it. It contains information about the results & early stages of our research paper and information about how Scrum was applied to this project, as well as how we planned the project.

2.1 Task Definition

My contribution: I gave feedback and discussed with the group members (David & Maria, who had created the initial draft) about which research questions we should keep, and which questions we should move forward with. Here is a link to a very early draft of our paper with the questions still in there, this was used for reference later on in the project. https://drive.google.com/file/d/1tm8MRCr17ix6i32tT9nXcVKYS6k9HhKh/view?usp=sharing

I was mostly working on our datasets when our first drafts of the research paper was created, so as soon as I finished work on the dataset I helped out with the questions. Below are some examples of my questions that made it, and those that did not (along with our reasoning):

  • How can we detect multiple voices from audio data?
    This question was central in the project, since the context for our project consists of defining when conversation is happening. Detecting multiple voices makes the difference between a monologue and an actual conversation and is very important for the end result. The solution to this question came to be a separate model which uses MFCC data to try and compare speech from different speakers, to identify whether a speaker is the same or different.

  • Which characteristics make a conversation?
    This question was discussed a lot internally. We ultimately decided that the answer to this question is not up to us, and that our problem owner along with "dementia coaches" / healthcare staff can specify what exactly a conversation is in the context of dementia patients. We had to discuss this with the problem owner several times, as we didn't want to make assumptions ourselves, but ultimately we could not identify these characteristics - we also ended up not needing to, since we removed it from our scope.

  • Can we detect whether someone is 1. speaking to themselves or 2. speaking to a non-audible third party (i.e over the phone)? This question was considered to be out of scope. There are probably easier ways to determine when an elderly person is using their phone than only listening to them speak, and we wanted to focus on specifically conversation in a physical setting. That said, assuming the incoming audio from the phone is picked up by the microphone, there is a good chance it can be detected but not processed properly since "live" human speech has different high & low notes than speech played from a speaker. But we did not have enough datasets to try this in practice.

  • Can we detect if the person speaking is physically present?
    This question relates to the question above, and was ultimately also decided to be out of scope. It did come up for discussion more than a few times. Essentially, a voice being played from a speaker will most likely not have the same frequency range as a human speaking. This makes it possible - in theory, we never got far enough to actually work on it - to determine when a voice is "fake" or "real". This is a suitable area for further research in my opinion, since we never had time to try it out the results would be very interesting.

2.2 Evaluation My contribution: For the paper, I gave some ideas for future work with our prototype. I put this in the paper so other group members could also put ideas in, and build off mine if they agree. Here is an early draft of our paper where on page 5, my first ideas for future work are listed: https://drive.google.com/file/d/1_IV_NqUBWdRstXnUaXdXFCy66YA4UpWQ/view?usp=sharing

A few of them include:

  • Comparing the accuracy of our speaker differentiation model with human results. This could be done by a study where correspondants listen to short clips of speech and asses whether all clips are said by the same speaker or not. It would be very interesting to see if humans or the model perform better if voices are very similar for instance. Since our research has only measured the accuracy of our model, a baseline "human" accuracy score would be an interesting metric to consider.

  • Since the model for speaker differentation we used came to be quite complex, it would be interesting to see new projects aim to identify the patient's voice as a profile to compare other voices against. Samples might be collected over a period of time and eventually could be used to compare all detected speech to the patient themselves, instead of always comparing every segment with all voices therein. This might result in higher accuracy for determining whether or not it is the patient who is speaking.

  • It would have been interesting to see how the algorithm performs on conversations being played from a speaker, such as a TV or cellphone. Since sound being played from speaker generally has less frequency range than speech from a human being (as discussed in 2.1), it might decrease the performance of the model. However, a model trained to detect these 'artifical' sounds might be very useful. Such a model could, for instance, identify when a dementia patient is talking to their TV (a common sign of dementia progressing according to one of our sources).

2.3 Conclusions My conclusions from this project are that it is indeed possible to use data science techniques (in our case, convolutional neural networks) to detect conversation to some degree. By converting audio data to MFCC's, and feeding them through two separate neural networks, we can with 89% (for detecting speech) and 94% (for detecting changes in speaker) accuracy determine if a conversation is happening. The data format of input data can impact the results a lot, which is why we ended up not using images for our final version. With the final product that combines the first and second model, I would say we have results that support our research problem "”How can data science techniques detect if there is a conversation between at least two people by analyzing audio files?”" and can now state that by using CNNs, MFCC data format and measuring speaker activity & speech duration, data science techniques can detect conversation.
2.4 Planning My contribution: I, along with Leander Loomans, were in charge of documentation. This included taking notes whenever important information was recieved from teachers or the problem owner (or internal meetings). It also included making documents (internal and external, such as found papers) available for other group members to take part in. On top of this, every group member was equally involved in updating the Scrumboard on Taiga and making sure it was up to date.

A screenshot of some of the notes that were taken:

Image of Notes Taken

GitHub Pages

In our group, efficiency was quite important, and led to us having daily stand up meetings so we could all keep up with the progress happening. This worked very well, and I attended these meetings to increase our group cohesion and keep the others informed about my part in the project. All in all, Scrum as a method worked very well for us in the context that we applied it, and all group members were in charge of project planning to some degree, as the role of Scrummaster rotated in our group. In my eyes, everyone did this job very well, and tasks were evenly spread out among everyone. Also as a part of Scrum, we used sprint retrospectives when finishing sprints. These really helped us develop as a group, as we evaluted factors like communication and workload distribution in order to improve. This gave good results, and after only 2-3 retrospectives our communication had improved a lot, a big benefit early on in the project.

A screenshot from Taiga with almost everyone's activity (It's difficult to get a really descriptive image):

Image of Taiga Activity

GitHub Pages

Maria Hoendermis, one of our groupmembers, was very helpful in the planning process and communication to clear obstacles for the group.

3. Predictive Analysis

In this chapter, my contributions in model selection, configuration, training and evaluations are shown. This is an area that I wished I spent more time in, since I enjoyed it very much and found it interesting. However, the project progressed at a faster rate than me fully understanding the syntax we used.

3.1 Selecting a Model The decision to use convolutional neural networks was taken early on, and it was very much a group decision. In order to get the best results, we argued that spending a lot of time 'perfecting' one method (we decided on CNN) will lead to better results than spending the same amount of time trying out different models. Also, following the no free lunch theorem, there is no 'optimal solution' we had to look for, all types of models can perform well in any situation. It was up to us to create something good enough for the project. We looked at literature, such as Ashar, Bhatti and Mushtaq (2020) that use CNNs with MFCCs in combination specifically. This also meant that, since CNN's are able to learn features based on data, we did not have to do much feature extraction/selection. In retrospect, I think the decision to use CNNs was the right thing, but exploring other models would have been very interesting, too.
3.2 Configuring a Model Early on, I managed to train and get results from a neural network. The code can be found here, and contains all steps (including training and fitting on test data):
https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/early_neural_network.ipynb
My contribution: The source code was found online and modified by me to fit our project & data, some of the values were changed in accordance with feedback from Jeroen to get things working. As you can see, it is an old version since it uses images for input data.

I also configured a simple Logistic Regression model early in the course as a first test of machine learning models, using one of the example notebooks provided as the foundation. This file is available here: https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/dialogueprediction.ipynb

3.3 Training a Model The models I trained was the same as in 3.2, training happens specifically in block [5]. Link:
https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/early_neural_network.ipynb
When training, the model, data, loss function and optimizer are passed to the training function. Model.train is used to initiate the training of the model on the dataset. During training, the loss function and backpropagation (which is responsible for tuning the weights in the NN are differently on different epochs) are also initiated. " print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")" is the code that prints the loss for each iteration.
3.4 Evaluating a Model I unfortunately missed out on evaluating our models, as the models we used evolved rapidly and our results were constantly changing as the work process went on. The final model we used was evaluated by a confusion matrix. The confusion matrix indicates good results, and we are happy with the performance as evaluated here. The confusion matrix can be found here:
Image of Confusion Matrix

GitHub Pages

3.5 Visualising the outcome of a model In terms of visual outcomes, towards the end of the project we made some good progress. Until that point, there wasn't much reason to visualize the outcomes since they were highly likely to change. The most prominent visual outcomes of our models are the confusion matrix described in 3.4, along with some handcrafted visualisations that represent the speech detection from the first model. In the visualization that we used, the value is 1 when a voice is detected, and 0 when a voice is not detected. This can look like this:
Image of Voice Detection Visulisation

GitHub Pages

4. Domain Knowledge

This chapter contains information about the domain in which we worked. It mostly relates to audio signal processing from a machine learning perspective. I found this domain very interesting, as I had no previous experience in working with audio, nor working with applied machine learning and they were both very intricate. The domain is mostly explained from a perspective of our project group.

4.1 Introduction to the subject field As we worked with the Smart Teddy project together with our problem owner Hani, our subject field came to be audio signal processing. The end goal for us was an algorithm able to detect conversation based on sound files, to help monitor dementia patients. This meant we had to use recordings of audio as input to an algorithm in order to make predictions on the audio itself. In order to do this, audio data is transformed into MFCC data, since MFCCs are good at representing a lot of features useful in voice recognition. This process can be seen here, in block [5]: https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/make%20npy%20array%20of%20audio.ipynb

Sound data can also be represented with spectrograms, and other image representations of sound (such as oscillograms/waveforms). However, we achieved the best results working with MFCCs. The sample rate of recordings is also an important factor to consider, since it is a measure of how many samples are recorded over a period of time. A high sample rate will contain a lot of samples, but might be computationally expensive or contain unnecessarily many samples. While a low sample rate has some information loss, but can be faster to process.

All of these techniques mentioned above were relevant in our Dialogue project, which is a part of the bigger Smart Teddy Bear project. This is a very interesting domain, as it contains a lot of unique problems that are not observed when working with, for instance, numerical data. Not only did we need to create a functional algorithm to classify speech, we also needed to work with and get a deep understanding of audio data, and relate all of our work to the healthcare domain for dementia patients.

4.2 Literature Research I found several pieces of relevant literature during this minor. One of the more interesting ones is Udin *et al.* (2018) The topic for their study is Ambient Sensors for Elderly Care, and this study looks at results and data from other works and summarizes their findings. This helped us a lot since in this study, since it gave a good overview of other studies with the same end goal (determine quality of life based on household environment data). From studying this paper, it became apparent that using sound data for the purpose of recognizing daily activity is not as common as some other methods, such as video or infrared sensors. From the study of Udin *et al.* (2018), I found other interesting studies. Such as Vacher *et al.* (2011), a study with some similarities to ours, such as the fact that they are also processing audio data in a household setting for assisted care purposes. Their study mainly relies on audio technologies in smart homes. However, it does not relate to dementia patients, only elderly to some degree. This was used to establish some of the background in the paper and give perspective for our research, however the technical details (such as the model architecture of a CNN tuned for voice detection) we had to look for in other papers, such as Salehghaffari (2018). In that paper (among others), we found inspiration for parameters like learning rates and epochs for our CNNs.
4.3 Explaination of Terminology, jargon and definitions

Below follows an explaination for terms or definitions that are viewed as important:

  • MFC: Mel-Frequency Cepstrum, an aggregation of several MFCC's (coefficients).
  • MFCC : A coefficient to MFC's, meaning one MFC is made up of many MFCCs. MFCCs are a method of visualising features from audio data, and is heavily related to feature extraction.
  • Epoch : An iteration over the entire dataset during the training process for a neural network.
  • Learning Rate : The rate at which a neural network adapts to the data. A learning rate that's too big has a chance to oscillate and "jump over" the optimal solution. This might mean the model never reaches a good result. While a learning rate that's too small might take very long to train as the "jumps" it makes are very small.
  • Dataset : A set of data that can be split into train, test and validation parts. Datasets generally consist of negative data (data that is not correct, in our case non-speech) and some positive data (in our case speech). Negative and positive data should generally be balanced to avoid algorithms being biased towards one or the other.
  • Overfitting : Overfitting might occur when a model is trained on a limited data set, and only predicts in accordance with training data instead of adapting to validation or other 'non-training' data.
  • Spectrogram : A visualisation of audio data which highlights changes to sound over time. A spectrogram is generated from a collection of Fourier Transforms, thus creating a more detailed representation of the data.
  • (Machine learning) model: A program that is trained to detect certain patterns in data.
  • Confusion Matrix: A form of evaluation on a model, where the amount of false negatives, false positives and correct estimations are displayed.
  • Sample Rate: An attribute of audio describing the amount of samples over a period of time. A high sample rate is generally good, but might be more computationally expensive. While a low sample rate generally means less samples over time, but can be easier to process.
  • Loss function: A function that is able to determine how the performance of a model relates to the 'true values' of a dataset used.
  • Neural Network: A type of algorithm that works by using layers containing nodes (also called neurons) that recieve and pass on weighted data in order to make predictions on datasets. Needs to be properly trained in order to work.
  • Outliers: Data points that differ a lot from other data in the set.
  • Regression: A method of estimating how a target variable relates to chosen features. The classic example is how the sales prices of houses are related to their size in square footage.

5. Data Preprocessing

Data preprocessing is a very central part of data science, and this chapter will discuss my experience with it. Since data preprocessing can translate to other instances than data science, for instance when working with databases, I think it's a very valuable skill to have in ICT and I had positive experiences with it.

5.1 Data Exploration

In order to familiarize myself with the data we were using, I had to inspect the data to be able to work with it as best as possible. One of the instances of data exploration I did is in this notebook: https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/wav%20data%20filter%2Bexploration.ipynb. Here, I started experimenting with using attributes from the data (such as sample rates) while also looking at the labels for our data, and making sure the labels add up with the speech. It was helpful in order to learn about the format of our data, and what our data can be used for. We also based the half-second increments around this information that was retrieved from exploring data.

I also explored the data by looking at it in the software Audacity. Using this software to visualise amplitude of the audio files helped us in selecting data that was well suited to our purposes (detecting speech). I was primarily looking for data that was not too loud, nor too silent, as not balancing this correctly might mean our algorithm will perform poorly (such as, by training a voice detection model on loud speech only).

5.2 Data Cleansing

Some of the data cleansing I did can be found in this notebook, specifically in block [6] (but also in the blocks leading up to it): https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/dataset%20incl%20neg%20data.ipynb Here, I filter out some specific columns (the ones that will be of use to us) from the 'negativedf' dataframe (this dataframe contains the lables for all negative samples). Afterwards, I concatenate this dataframe with our positive data labels, resulting in a cleaned up version of the negative labels being concatenated to the positive labels.

I also did some data transformation by overlapping background noises on top of speech. The file I created through this transformation process came to be used a lot, and referred to (internally) as the 'difficult' data set, which we ran through the first model to evaluate its tolerance to speech with overlapping noise. This step was taken again at the end of the project, but then I also amplified the background noises overlaid by 20db, making the dataset even harder for the algorithm. We used this file for evaluation right at the end of the project, and the accuracy from the algorithm (speech detection model) was reduced by around 5%.

5.3 Data Preparation While the project was still using images as input data, I created a dataloader to standardize the data preparation process for the group. Unfortunately this tool never really came to be used since, shortly after I finished it, we switched to not using images anymore as our input. Some of my work on data prep can be found in this notebook: https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/Standardized%20Image%20Generator.ipynb

After the dataloader for images ended up being scrapped due to new requirements, Leander and I created a new version, which was used for the remainder of the project. It can be found here: https://github.com/Digitalswede/ADS_2021/blob/main/codesamples/make%20npy%20array%20of%20audio.ipynb Leander and I made equal contributions to the file.

Luckily we didn't seem to be impacted by outliers or missing values in our data, our results were high enough without accounting for that. As we created the datasets ourselves, we were confident in that data was consistent and uniform. Since we were working with audio data though, this was hard to prove. We also did not find many useful strategies for managing outliers in audio, if they even existed in our dataset in the first place.

5.4 Data Explaination We used multiple datasets and had to combine them ourselves since our problem owner did not provide data. An important factor for our datasets came to be labeling, which we all spent of time working on. Since the data was not categorical, and we could not label it ourselves in a reliable (or convenient) fashion, all our datasets had to be labeled to describe which parts of the audio contained speech. For the speaker differentation model, this was even more important, as the speakers now had to be labeled too. Luckily we managed to find good, suitable candidates.
  • AVA-Speech is one of the datasets we used for speech detection. It contains around 45 hours of dialogue from movies, which means it also contains some overlaid background noises. However, the speech is labled, and it is possible by using these labels to only get "CLEAN_SPEECH", which is speech without overlaying noise. We decided to also use the other labels, to train the algorithm and increase its tolerance. We made sure to balance our dataset and have it include a 1:1 amount of true and false data, this was achieved by mixing the data with negative labels. We used 5000 seconds of "SPEECH_WITH_MUSIC", 5000 seconds of "SPEECH_WITH_NOISE", and 5000 seconds of "CLEAN_SPEECH". We combined this with 15000 seconds of "NO_SPEECH", providing us with a total of 30000 seconds of mixed audio data where half is true, and half is false. This dataset is recorded at 44100Hz.

  • Librispeech, a dataset containing speaker-labled audiobook data, came to be very useful in the project. Since it does not contain (noticeable) noise, this dataset was primarily used for speaker differentiation. This was convineant as all speakers in the dataset are labled. Since this dataset was at 16KHz, we upsampled it to fit the other datasets at 44.1KHz.

  • CHIME-Home was used for some negative samples, as it partly contained non-speech audio. This dataset was also at a different sampling rate, and had to be upscaled in order to keep our data integrity as we did not want our data integrity comprimised due to irregular data.

5.5 Data visualisation (Exploratory) I compared visual representations of the data in order to explore the amplitude of certain segments, to decide which segment we should use to train our algorithm. The source data file was too big and would have been very slow to process, so having a visual representation helped us create a smaller but representative version of the dataset. In this instance, the software Audacity was used to visually represent the data while still being able to listen to the audio, for quality reasons (such as spikes in amplitude that may be loud speech, or just a glitch/unintentional sounds from recording). I think there were limited opportunities for us to visualize our data, since we worked specifically to identify speech.

Visualising our data didnt seem to have a lot of value, since we were working to identify speech/conversation in audio files. The waveform for speech will look like any other waveform (or spectrogram, MFCC, etc) to me, as humans can't interpret this type of data from a visualisation.

6. Communication

In this section, my participation in presentations and our paper is highlighted. This is one area where I feel like I contributed a lot, and I hope the portfolio reflects that. Since we did not use a formal schedule for the persons responsible for the presentation or the workload distribution in our paper, it's unfortunately hard to provide 'proof' on it.

6.1 Presentations The presentations where I partook are the following:

External Presentation:

Internal Presentation:

6.2 Writing paper

I helped write the paper as much as possible. Before the writing started, I gave a detailed overview of our subquestions and answered them, which helped form the base for our paper. Of course the structure changed a lot since then, but it was a start. I worked a lot on the introduction part, including background and research problem of the paper. I also wrote content in other sections, but my primary focus (since we divided it up) was the introduction. The introduction was the first section of the paper I started writing, and I later starter helping out on other sections. But I hope the introduction properly showcases our domain and the purpose of our work.

I also helped other group members writing the paper by giving constructive feedback, always being mindful of other people's work and not criticizing. I ended up making quite a few corrections to the paper in most sections, an effort that I hope changed our paper for the better since I feel it's important to deliver a strong paper.

It is difficult to give examples here, since writing the paper was a continuous process and quite hard to measure in terms of contribution. In my eyes, all group members pulled their weight in this department.

Here is a link to our finished paper: https://drive.google.com/file/d/1tm8MRCr17ix6i32tT9nXcVKYS6k9HhKh/view?usp=sharing

Datacamp

All assigned datacamp courses except for one were completed. The one that was not completed was **Joining Data with Pandas**, since it seemed like the one with least in common with our project work.
Outside of the assigned datacamp courses, roughly 1/4 of another course, **Spoken Language Processing in Python**, was also completed.
Here is a link to a screenshot with the completed courses:
Image of Datacamp Completion

GitHub Pages

References

A. Ashar, M. S. Bhatti, and U. Mushtaq, (2020), Speaker identification using a hybrid cnn-mfcc approach. in 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), 2020, pp. 1–4. doi: 10.1109/ICETST49965.2020.9080730.

M. Vacher, D. Istrate, F. Portet, T. Joubert, and T. Chevalier, (2011), The sweet-home project: audio technology in smart homes to improve well-being and reliance. Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:5291-4. doi: 10.1109/IEMBS.2011.6091309 PMID: 22255532.

Z. Uddin, W. Khaksar, and J. Torresen, (2018), Ambient Sensors for Elderly Care and Independent Living: A Survey. Sensors 18, no. 7: 2027. https://doi.org/10.3390/s18072027

H. Salehghaffari, (2018), Speaker verification using convolutional neural networks. arXiv preprint arXiv:1803.05427.

About

Portfolio for applied data science minor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published