Audio Classification using Machine Learning and Python | iNNovationMerge

Audio Classification using Machine Learning and Python



For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com


What?

  • Audio data generation and storage is increasing due to virtual assistants such as Alexa, Siri, and Google Home

  • According to the article, Audio consumption in the US is on the rise in 2020, with total audio activity up 9.4% year-over-year in the first half of the year

  • Source: marketingcharts.com

  • There are several industries such as contact centres, BFSI, retail, eCommerce, telecom, IT and healthcare which needs audio analytics to automate business process and reduce cost associated with it

Why?

  • Due to increase in digital content via different channels, Audio information is playing significant role today. It will be very tedious job to listen and extract information from Audio
  • Hence, There is need of Technology which will analyze Audio automatically and extracts insights out of it
  • Audio data analysis is all about analyzing and understanding audio signals captured by digital devices
  • One of the analysis is Audio Classification. It is the process of listening to and analyzing audio recordings
  • Using Machine learning technology this process can be automated to analyze audio files. This automation will help virtual assistants, home automation, content based recommendation, automatic speech recognition and text to speech applications

How?

  • This article explains how to classify audio into music and speech using an open-source Python Library called pyAudioAnalysis. pyAudioAnalysis extracts features of audio and provides audio classification using pretrained machine learning models such as SVM and KNN
  • pyAudioAnalysis is licensed under the Apache License and it is available at GitHub- pyAudioAnalysis.

Software’s Required:

  • Python 3.6

Network Requirements

  • Internet to download packages

Implementation

  • pyAudioAnalysis can be used to extract audio features, train and apply audio classifiers, segment an audio stream using supervised or unsupervised machine learning models
  • pyAudioAnalysis has wrappers to train, create custom models and execute classification on unknown audio file. For this, code needs set of WAV files stored in respective class folders. Below block diagram explains pyAudioAnalysis classification flow
  • Block Diagram (Source: iNNovationMerge)
    Reference

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

  • Error : ImportError: failed to find libmagic. Check your installation
  • Solution : pip install python-magic-bin==0.4.14
  • Issue resolved link

data folder structure

  • Folder structure (Source: iNNovationMerge)

pre trained model file folder structure

  • Pre trained models (Source: iNNovationMerge)

Audio Classification using pretrained two class SVM model(svm_rbf_sm)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/svm_rbf_sm","svm_rbf")
print(f'P({p_nam[0]}={p[0]})')
print(f'P({p_nam[1]}={p[1]})')

Output :

  • P(speech=1.605096086774166e-06)
  • P(music=0.9999983949039132)

Audio Classification using pretrained four class SVM model(svm_rbf_4class)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/svm_rbf_4class","svm")
print(f'P({p_nam[0]}={p[0]})')
print(f'P({p_nam[1]}={p[1]})')
print(f'P({p_nam[2]}={p[2]})')
print(f'P({p_nam[3]}={p[3]})')

Output :

  • P(speech=0.005759308568885096)
  • P(music=0.893296481513386)
  • P(silence=0.0016697600146891997)
  • P(other=0.0992744499030396)

Audio Classification using pretrained eight class SVM model(svm_rbf_movie8class)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/svm_rbf_movie8class","svm")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Speech=0.13171912921682888)
  • P(Music=0.7011823461550247)
  • P(Others1=0.00980347558801528)
  • P(Others2=0.020903164965289794)
  • P(Others3=0.00724064282867633)
  • P(Shots=0.010905822090526755)
  • P(Fights=0.018400925456362103)
  • P(Screams=0.09984449369927641)

Audio Classification using pretrained musical class SVM model(svm_rbf_musical_genre_6)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/svm_rbf_musical_genre_6","svm")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Blues=0.19340571058645775)
  • P(Classical=0.05183508791839752)
  • P(Electronic=0.3333299536117741)
  • P(Jazz=0.02182702350701883)
  • P(Rap=0.22459715419862877)
  • P(Rock=0.17500507017772288)

Audio Classification using pretrained Gender class SVM model(svm_rbf_speaker_male_female)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/svm_rbf_speaker_male_female","svm")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Male=0.02808339073084605)
  • P(Female=0.9719166092691538)

Audio Classification using pretrained two class KNN model(knn_sm)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/knn_sm","knn")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(speech=0.0)
  • P(music=1.0)

Audio Classification using pretrained four class KNN model(knn_4class)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/knn_4class","knn")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(speech=0.07692307692307693)
  • P(music=0.07692307692307693)
  • P(silence=0.0)
  • P(other=0.8461538461538461)

Audio Classification using pretrained eight class KNN model(knn_movie8class)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/knn_movie8class","knn")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Speech=0.3333333333333333)
  • P(Music=0.5555555555555556)
  • P(Others1=0.0)
  • P(Others2=0.0)
  • P(Others3=0.0)
  • P(Shots=0.0)
  • P(Fights=0.1111111111111111)
  • P(Screams=0.0)

Audio Classification using pretrained musical class KNN model(knn_musical_genre_6)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/knn_musical_genre_6","knn")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Blues=0.0)
  • P(Classical=0.2)
  • P(Electronic=0.2)
  • P(Jazz=0.2)
  • P(Rap=0.2)
  • P(Rock=0.2)

Audio Classification using pretrained Gender class KNN model(knn_speaker_male_female)

from pyAudioAnalysis import audioTrainTest as aT
c, p, p_nam = aT.file_classification("data/doremi.wav", "data/models/knn_speaker_male_female","knn")
for k in range(len(p_nam)):
    print(f'P({p_nam[k]}={p[k]})')

Output :

  • P(Male=0.0)
  • P(Female=1.0)

Clone and Run the project


  TOC