This was my bachelor (undergraduate) thesis from 2009. The aim was accurately to distinguish when Armin van Buuren was talking regardless of background silence, music or other voices (sung or spoken). We achieved strong empirical results which could be further improved with some basic domain-specific heuristics or compromises on the feature parameters. SVM and Bayesian Logistical Regression produced particularly encouraging results both yielding ~98% overall classification accuracy and ~99% F-score on the speech class on the highest model where we increased the verbosity of the underlying feature set. SVM however provided the most robust performance given several feature variations, significantly out-performing the others given less verbosity on the feature set.
Spectrogram of human speech over music. |
Papers >