iNNovationMerge

Software

Publish Date: 2020-10-19

Article Word Count: 543

Reading Time: 3 Min

For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com

What?, Why?, How?

Covered in Hidden Features of Audio Data and Extraction using Python - Part 1

Software’s Required:

Python 3.6

Network Requirements

Internet to download packages

Implementation

pyAudioAnalysis has feature_extraction() function which extracts total 64 short-term features.
- 34 short-term features
- 30 delta features
Reference

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

Error : ImportError: failed to find libmagic. Check your installation
Solution : pip install python-magic-bin==0.4.14
Issue resolved link

short-term Feature Extraction

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
import matplotlib.pyplot as plt
import cv2

[Fs, x] = audioBasicIO.read_audio_file("data/doremi.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)

short-term Feature(f_names[4]) ID 5 - Spectral Spread

It is a measure of the average spread of the spectrum in relation to its centroid.
The spectral spread indicates the distribution of the audio signal around its centroid.
- Large Spectral Spread - Noise like signals
- Low Spectral Spread - Tonal sounds
Reference
Reference

short-term Feature(f_names[5]) ID 6 - Spectral Entropy

Spectral entropy is a measure of uniformity of each frame of the signal, also entropy can be used to capture the distinct spectral peaks.
If entropy is more, then uniform is the distribution.
This feature is used in study, Spectral entropy indicates electrophysiological and hemodynamic changes in drug-resistant epilepsy
Reference
Reference

short-term Feature(f_names[6]) ID 7 - Spectral Flux

Spectral flux is the measure of change between the normalized magnitudes of two adjacent frames. It is calculated by comparing the power spectrum for one frame against the power spectrum from the previous frame.
This feature is used to identify music and speech signals. Higher rate of change represents music.
It is used in study Speech/Audio Signal Classification Using Spectral Flux Pattern Recognition
Reference
Reference

short-term Feature(f_names[7]) ID 8 - Spectral Rolloff

Spectral rolloff is the frequency below which 90% of the magnitude distribution of the spectrum is concentrated, e.g. 85%, lies.
Reference
Reference

short-term Feature(f_names[9]-f_names[21]) ID 9 - MFCCs

Mel Cepstral Coefficient(MFCC) describes the overall shape of a spectral envelope. * * This features are based on the Fourier transform. After taking the Fourier transform of an analysis window, the magnitude spectrum is passed through a Mel filterbank with varying bandwidth mimicking the human ear, i.e. small bandwidth at low frequencies and large bandwidth at high frequencies.
The output energy of each filterbank is log transformed and MFCCs are obtained by taking the Discrete Cosine Transform of the outputs.
Reference
Reference

short-term Feature(f_names[22]-f_names[33]) ID 10 - Chroma Vector

Vector A 12-element representation of the spectral energy where the bins represent the 12 equal-tempered pitch classes of western-type music (semitone spacing).
Reference