Software

Publish Date: 2020-11-13

Article Word Count: 451

Reading Time: 2 Min

For Feedbacks | Enquiries | Questions | Comments - Contact us @ innovationmerge@gmail.com

What?, Why?

Covered in Audio Segmentation using Supervised & Unsupervised Algorithms in Python - Part 1

Software’s Required:

Python 3.6

Network Requirements

Internet to download packages

Prerequisite

How?

Implementation

pyAudioAnalysis is licensed under the Apache License and it is available at GitHub- pyAudioAnalysis.
Reference

Install pyAudioAnalysis

git clone https://github.com/tyiannak/pyAudioAnalysis.git
pip install -e .

Troubleshooting if error

Error : ImportError: failed to find libmagic. Check your installation
Solution : pip install python-magic-bin==0.4.14
Issue resolved link

Unsupervised - Silence removal

pyAudioAnalysis has semi-supervised silence removal function which takes an uninterrupted audio data and provides response with segment endpoints that correspond to individual audio events, removing silent areas of the audio.
This is achieved by following steps
- SVM model will be trained to distinguish between high-energy and low-energy short-term frames.
- Whole audio data is passed to the SVM classifier which results in a sequence of probabilities that correspond to a level of confidence that the respective short-term frames belong to an audio event.
- A dynamic thresholding is used to detect the active segments.

from pyAudioAnalysis import audioBasicIO as aIO
from pyAudioAnalysis import audioSegmentation as aS
[Fs, x] = aIO.read_audio_file("data/recording1.wav")
segments = aS.silence_removal(x, Fs, 0.020, 0.020, smooth_window = 1.0, weight = 0.3, plot = True)

Unsupervised - Speaker Diarization

Speaker diarization is the process of identifying “who spoke when” in the audio provided automatically.
Speaker diarization needs both segmentation and clustering, where speech segments are grouped into speaker-specific clusters.
Following are the main algorithmic steps performed to implement diarization
- Feature extraction (short-term and mid-term)
- FLsD step (optional)
- K-means clustering
- Smoothing
```
 from pyAudioAnalysis import audioSegmentation
num_speakers = 4
audioSegmentation.speaker_diarization("data/diarizationExample.wav", num_speakers, plot_res=True)
```

Unsupervised - Audio thumbnailing

Audio thumbnailing refers to the unsupervised extraction of the most representative part of a music recording such as chorus, music.
In pyAudioAnalysisLibrary this has been implemented in the musicThumbnailing function.
The automatically annotated diagonal segment represents the area where the self similarity is maximized, leading to the definition of the “most common segments” in the audio stream.
```
from pyAudioAnalysis import audioAnalysis
thumbSize = 25
audioAnalysis.thumbnailWrapper("data/recording1.wav",thumbSize)
```