Sentence Segmentation 規劃
match 1,2,3,4 audio (match-based method)
librosa.sequence.dtwfrequency domain noise reduction + slience removal (frequency-based method)
librosa temporal segmentation api
librosa.segment.agglomerativefeature clustering-based method)waveform threshold (envelope) temporal-based method
pyAudioAnalysis->audioSegmentation.py->silenceRemoval()