iNNovationMerge

Software

Publish Date: 2020-10-17

Article Word Count: 712

Reading Time: 4 Min

For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com

What?

Features extraction of raw data is very important to understand the relationship present in it.
Features extraction and analysis is must before creating any machine learning model. Analysis of structured data is easier than analyzing unstructured data such as Audio.
The audio data cannot be understood directly by using normal media tools, This article explains the process of extraction features and understanding of Audio data.

Why?

Due to audio feature extraction, we can perform audio classification, audio recommendation and prediction of in machine learning space.
According to this thesis, FEATURE SELECTION AND ANALYSIS FOR STANDARD MACHINE LEARNING
CLASSIFICATION OF AUDIO BEEHIVE SAMPLES, Audio Features of bee buzzing sound is used to determine the health of a hive.

How?

This article explains how to extract features of audio using an open-source Python Library called pyAudioAnalysis.
pyAudioAnalysis has two stages in audio feature extraction
- Short-term feature extraction : This splits the input signal into short-term windows (frames) and computes a number of features for each frame. This process leads to a sequence of short-term feature vectors for the whole signal.
  * Mid-term feature extraction : This extracts a number of statistics (e.g. mean and standard deviation) over each short-term feature sequence.
pyAudioAnalysis is licensed under the Apache License and it is available at GitHub- pyAudioAnalysis.

Audio Classification using Machine Learning and Python

Software’s Required:

Python 3.6

Network Requirements

Internet to download packages

Implementation

pyAudioAnalysis has feature_extraction() function which extracts total 64 short-term features.
- 34 short-term features
- 30 delta features
Reference

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

Error : ImportError: failed to find libmagic. Check your installation
Solution : pip install python-magic-bin==0.4.14
Issue resolved link

short-term Feature Extraction

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
import matplotlib.pyplot as plt
import cv2

[Fs, x] = audioBasicIO.read_audio_file("data/doremi.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)

short-term Feature(f_names[0]) ID 1 - ZCR

ZCR(Zero Cross Rate) is rate of sign-changes of the signal during the duration of a particular frame.
It is the rate at which the signal changes from positive to zero to negative or from negative to zero to positive.
Low ZCR values correspond to a Lower frequency signal portion and vice-versa.
Zero crossing rates features are used identify noise or silence or speech hence one of the application is Voice activity detection (VAD), i.e., finding whether human speech is present in an audio segment or not.
Reference
Reference

short-term Feature(f_names[1]) ID 2 - Energy

The energy of a audio signal indicates the strength of the signal energy. It is sum of squares of the signal values, normalized by the respective frame length.
This feature will help in dividing the audio signal into four energy-based regions: noise, low, medium and high.
Each signal can be annotated like these and can be used for machine learning algorithms.
Reference
Reference

short-term Feature(f_names[2]) ID 3 - Entropy of Energy

The Entropy of Energy describes the time domain distribution of the audio signal, It also can be interpreted as a measure of abrupt changes.
This feature is used in Audio Steganalysis
![Entropy of Energy Output (Source: iNNovationMerge)
Reference

short-term Feature(f_names[3]) ID 4 - Spectral Centroid

Spectral Centroid indicates the center of gravity of the spectrum for a sound is located. It is measured as the weighted mean of the frequencies present in audio.
If the value of center of the spectrum is less, then spectrum energy is more concentrated in the low frequency range.
If the frequencies in music are same in entire audio then spectral centroid would be around a centre.
This feature is used in texture classification.
Reference
Reference