UMP Institutional Repository

Result comparison of model validation techniques on audio-visual speech recognition

Thum, Wei Seong and M. Z., Ibrahim and Nurul Wahidah, Arshad and D.J., Mulvaney (2017) Result comparison of model validation techniques on audio-visual speech recognition. In: IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, 449 . Springer, Singapore, Berlin, Germany, pp. 1-8. ISBN 978-981-10-6450-0 (Print); 978-981-10-6451-7 (online)

[img] Pdf
78. Result Comparison of Model Validation Techniques on Audio-Visual Speech Recognition.pdf
Restricted to Repository staff only

Download (612kB) | Request a copy
[img]
Preview
Pdf
78. A Comparison of Model Validation Techniques on Audio-Visual Speech Recognition.pdf

Download (612kB) | Preview

Abstract

This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset.

Item Type: Book Section
Uncontrolled Keywords: Audio-visual speech recognition; Hidden Markov model; HTK Toolkit; Holdout validation; Leave one out cross validation
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Faculty/Division: Faculty of Electrical & Electronic Engineering
Depositing User: Pn. Hazlinda Abd Rahman
Date Deposited: 24 May 2018 06:40
Last Modified: 18 Jul 2018 05:01
URI: http://umpir.ump.edu.my/id/eprint/20566
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item