Question classification of CoQa (QCOC) dataset

Abbas Saliimi, Lokman and Mohamed Ariff, Ameedeen and Ngahzaifa, Ab. Ghani (2021) Question classification of CoQa (QCOC) dataset. In: 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM) , 24-26 August 2021 , Pekan. pp. 1-2.. ISBN 978-166541407-4

[img] Pdf
Question classification of coqa_FULL.pdf
Restricted to Repository staff only

Download (1MB) | Request a copy
[img]
Preview
Pdf
Question classification of coqa.pdf

Download (235kB) | Preview

Abstract

This paper proposes a new dataset for question classification process. Named QCoC (Question Classification of CoQA), this dataset is created based on Stanford’s CoQA (A Conversational Question Answering Challenge) dataset. The total of QCoC datapoint is 116630 (total of combined questionanswer pairs in CoQA training and evaluation dataset). Common question classification datasets are classifying question based on its paired answer’s knowledge (the semantic of answer’s context). For QCoC, classification is done differently that is per answer’s feature (semantic and syntactic of answer’s type). This paper discusses the question classification datasets, QA datasets, and justification of CoQA as selected base for QCoC. Then QCoC specification is discussed with class definition, classification method and result subsections. To the author’s knowledge, such dataset is still nonexistent to date. This paper suggests that this type of dataset is useful in solving abstractive answers issue in Question-Answering (QA) system. While factual answers can be directly produced by regular QA system, abstractive answers need some additional components. Although it is a recognizable issue, lack of suitable dataset perhaps is the reason why such direction is not being pursued. With QCoC dataset made publicly available1, hopefully such direction is open for further exploration.

Item Type: Conference or Workshop Item (Lecture)
Additional Information: Indexed by Scopus
Uncontrolled Keywords: Dataset; QA system; Natural language processing
Subjects: Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Faculty of Computing
Depositing User: Pn. Hazlinda Abd Rahman
Date Deposited: 22 Jun 2022 01:57
Last Modified: 22 Jun 2022 01:57
URI: http://umpir.ump.edu.my/id/eprint/33107
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item