Speaker Diarization

Speaker diarization is the task of answering the question "who spoke when".

pyAudioAnalysis implementation is the variant based on [1].

There are four main algorithmic steps $\colon$ 1. Feature extraction 2. FLsD 3. Clustering 4. Smoothing

Feature extraction (short-term and mid-term) step $\colon$ For each mid-term segment, the averages and standard deviations of the MFCCs are used, along with an estimate of the probabilities that segment belongs to a male or a female speaker (model knnSpeakerFemaleMale)
(optional) FLsD step $\colon$ The mid-term feature statistic vectors in original feature space are projected onto the FLsD subspace.
Clustering $\colon$ A k-means clustering method on either original feature space or the FLsD subspace. If the number of speakers is unknown, clustering process is repeated for a range of number of speakers and the Silhouette width criterion is used to find the optimal number of speakers.
Smoothing $\colon$ A combination of a meidan filtering step on the extracted cluster IDs and a Viterbi Smoothing step.

Function speackDiarization() in audioSegmentation.py is used to extract a sequence of audio segments and respective cluster labels, given an audio file.

Command-line example:

python audioAnalysis.py speakerDiarization -i data/diarizationExample.wav --num 4

This command takes 3 arguments $\colon$

-i <fileName>, fileName is the filename of audio recording as input.
--num <numOfSpeakers (0 for unknown)>, numOfSpeakers is the number of speakers in audio recordings.
--flsd, flag to enable FLsD method

Function evaluateSpeakerDiarization() also compute cluster purity and speaker purity.

Function evaluateSpeakerDiarization() is used in speakerDiarization() to compare the extracted sequence of speaker label and ground-truth label.

Ground-truth file is the file with .segments extension.

Function speakerDiarizationEvaluateScript() is used to extract the overall performance measures for a set of auto recordings, and respective `.segments. files, stored in directory.

Figure 3 $\colon$ function speakerDiarization() result of data/diarizationExample.wav

Figure 3 is the result of typing python audioAnalysis.py speakerDiarization -i data/diarizationExample.wav --num 4

紅色實現是ground-truth label，藍色實線是speaker diarization classification的結果，cluster purity是86.2%, speaker purity是86.2%.

Speaker Diarization

Speaker Diarization

results matching ""

No results matching ""