Hidden Features of Audio Data and Extraction using Python - Part 2 | iNNovationMerge

Hidden Features of Audio Data and Extraction using Python - Part 2

For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com

What?, Why?, How?

Software’s Required:

  • Python 3.6

Network Requirements

  • Internet to download packages


  • pyAudioAnalysis has feature_extraction() function which extracts total 64 short-term features.
    • 34 short-term features
    • 30 delta features
  • Block Diagram (Source: iNNovationMerge)

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

  • Error : ImportError: failed to find libmagic. Check your installation
  • Solution : pip install python-magic-bin==0.4.14
  • Issue resolved link

short-term Feature Extraction

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
import matplotlib.pyplot as plt
import cv2

[Fs, x] = audioBasicIO.read_audio_file("data/doremi.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)

short-term Feature(f_names[4]) ID 5 - Spectral Spread

  • It is a measure of the average spread of the spectrum in relation to its centroid.
  • The spectral spread indicates the distribution of the audio signal around its centroid.
    • Large Spectral Spread - Noise like signals
    • Low Spectral Spread - Tonal sounds
  • Spectral Spread Formula (Source: ccrma.stanford.edu)
  • Spectral Spread Output (Source: iNNovationMerge)

short-term Feature(f_names[5]) ID 6 - Spectral Entropy

short-term Feature(f_names[6]) ID 7 - Spectral Flux

short-term Feature(f_names[7]) ID 8 - Spectral Rolloff

  • Spectral rolloff is the frequency below which 90% of the magnitude distribution of the spectrum is concentrated, e.g. 85%, lies.
  • Spectral Rolloff Formula (Source: egr.msu.edu)
  • Spectral Rolloff Output (Source: iNNovationMerge)

short-term Feature(f_names[9]-f_names[21]) ID 9 - MFCCs

  • Mel Cepstral Coefficient(MFCC) describes the overall shape of a spectral envelope. * * This features are based on the Fourier transform. After taking the Fourier transform of an analysis window, the magnitude spectrum is passed through a Mel filterbank with varying bandwidth mimicking the human ear, i.e. small bandwidth at low frequencies and large bandwidth at high frequencies.
  • The output energy of each filterbank is log transformed and MFCCs are obtained by taking the Discrete Cosine Transform of the outputs.
  • MFCCs Formula (Source: egr.msu.edu)
  • MFCCs Output (Source: iNNovationMerge)

short-term Feature(f_names[22]-f_names[33]) ID 10 - Chroma Vector

  • Vector A 12-element representation of the spectral energy where the bins represent the 12 equal-tempered pitch classes of western-type music (semitone spacing).
  • Chroma Vector (Source: iNNovationMerge)

short-term Feature(f_names[34]-f_names[34]) ID 11 - Chroma Deviation

  • The standard deviation of the 12 chroma coefficients.
  • Chroma Vector Output (Source: iNNovationMerge)

Clone and Run the project


64 Audio Features Demo