Hidden Features of Audio Data and Extraction using Python - Part 1 | iNNovationMerge

Hidden Features of Audio Data and Extraction using Python - Part 1

For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com


  • Features extraction of raw data is very important to understand the relationship present in it.
  • Features extraction and analysis is must before creating any machine learning model. Analysis of structured data is easier than analyzing unstructured data such as Audio.
  • The audio data cannot be understood directly by using normal media tools, This article explains the process of extraction features and understanding of Audio data.



  • This article explains how to extract features of audio using an open-source Python Library called pyAudioAnalysis.
  • pyAudioAnalysis has two stages in audio feature extraction
    • Short-term feature extraction : This splits the input signal into short-term windows (frames) and computes a number of features for each frame. This process leads to a sequence of short-term feature vectors for the whole signal.
         * Mid-term feature extraction : This extracts a number of statistics (e.g. mean and standard deviation) over each short-term feature sequence.
  • pyAudioAnalysis is licensed under the Apache License and it is available at GitHub- pyAudioAnalysis.

Related Article

Software’s Required:

  • Python 3.6

Network Requirements

  • Internet to download packages


  • pyAudioAnalysis has feature_extraction() function which extracts total 64 short-term features.
    • 34 short-term features
    • 30 delta features
  • Block Diagram (Source: iNNovationMerge)

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

  • Error : ImportError: failed to find libmagic. Check your installation
  • Solution : pip install python-magic-bin==0.4.14
  • Issue resolved link

short-term Feature Extraction

from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
import matplotlib.pyplot as plt
import cv2

[Fs, x] = audioBasicIO.read_audio_file("data/doremi.wav")
F, f_names = ShortTermFeatures.feature_extraction(x, Fs, 0.050*Fs, 0.025*Fs)

short-term Feature(f_names[0]) ID 1 - ZCR

  • ZCR(Zero Cross Rate) is rate of sign-changes of the signal during the duration of a particular frame.
  • It is the rate at which the signal changes from positive to zero to negative or from negative to zero to positive.
  • Low ZCR values correspond to a Lower frequency signal portion and vice-versa.
  • Zero crossing rates features are used identify noise or silence or speech hence one of the application is Voice activity detection (VAD), i.e., finding whether human speech is present in an audio segment or not.
  • zcr Formula (Source: wikipedia.org)
  • zcr Output (Source: iNNovationMerge)

short-term Feature(f_names[1]) ID 2 - Energy

  • The energy of a audio signal indicates the strength of the signal energy. It is sum of squares of the signal values, normalized by the respective frame length.
  • This feature will help in dividing the audio signal into four energy-based regions: noise, low, medium and high.
  • Each signal can be annotated like these and can be used for machine learning algorithms.
  • Energy Formula (Source: musicinformationretrieval.com)
  • Energy Output (Source: iNNovationMerge)

short-term Feature(f_names[2]) ID 3 - Entropy of Energy

short-term Feature(f_names[3]) ID 4 - Spectral Centroid

  • Spectral Centroid indicates the center of gravity of the spectrum for a sound is located. It is measured as the weighted mean of the frequencies present in audio.
  • If the value of center of the spectrum is less, then spectrum energy is more concentrated in the low frequency range.
  • If the frequencies in music are same in entire audio then spectral centroid would be around a centre.
  • This feature is used in texture classification.
  • Spectral Centroid Formula (Source: wikipedia.org)
  • Spectral Centroid Output (Source: iNNovationMerge)

Other Features

  • Spectral Spread, Spectral Entropy, Spectral Flux, Spectral Rolloff, MFCCs, Chroma Vector, Chroma Deviation Features will be covered in next Part

Clone and Run the project


64 Audio Features Demo