Speaker Diarization

Speaker diarization is the task of answering the question "who spoke when".

pyAudioAnalysis implementation is the variant based on [1].

There are four main algorithmic steps:\colon 1. Feature extraction 2. FLsD 3. Clustering 4. Smoothing

  1. Feature extraction (short-term and mid-term) step:\colon For each mid-term segment, the averages and standard deviations of the MFCCs are used, along with an estimate of the probabilities that segment belongs to a male or a female speaker (model knnSpeakerFemaleMale)
  2. (optional) FLsD step:\colon The mid-term feature statistic vectors in original feature space are projected onto the FLsD subspace.
  3. Clustering:\colon A k-means clustering method on either original feature space or the FLsD subspace. If the number of speakers is unknown, clustering process is repeated for a range of number of speakers and the Silhouette width criterion is used to find the optimal number of speakers.
  4. Smoothing:\colon A combination of a meidan filtering step on the extracted cluster IDs and a Viterbi Smoothing step.

Function speackDiarization() in audioSegmentation.py is used to extract a sequence of audio segments and respective cluster labels, given an audio file.

Command-line example:

python audioAnalysis.py speakerDiarization -i data/diarizationExample.wav --num 4

This command takes 3 arguments:\colon

  • -i <fileName>, fileName is the filename of audio recording as input.
  • --num <numOfSpeakers (0 for unknown)>, numOfSpeakers is the number of speakers in audio recordings.
  • --flsd, flag to enable FLsD method

Function evaluateSpeakerDiarization() also compute cluster purity and speaker purity.

Function evaluateSpeakerDiarization() is used in speakerDiarization() to compare the extracted sequence of speaker label and ground-truth label.

Ground-truth file is the file with .segments extension.

Function speakerDiarizationEvaluateScript() is used to extract the overall performance measures for a set of auto recordings, and respective `.segments. files, stored in directory.

Figure 3:\colon function speakerDiarization() result of data/diarizationExample.wav

Figure 3 is the result of typing python audioAnalysis.py speakerDiarization -i data/diarizationExample.wav --num 4

紅色實現是ground-truth label,藍色實線是speaker diarization classification的結果,cluster purity是86.2%, speaker purity是86.2%.

results matching ""

    No results matching ""