Study of keyword extraction techniques for electric double-layer capacitor domain using text similarity indexes: An experimental analysis

Miah, M. Saef Ullah and Junaida, Sulaiman and Talha, Sarwar and Kamal Z., Zamli and Jose, Rajan (2021) Study of keyword extraction techniques for electric double-layer capacitor domain using text similarity indexes: An experimental analysis. Complexity, 2021 (8192320). pp. 1-12. ISSN 1076-2787. (Published)

[img]
Preview
Pdf
Study of keyword extraction techniques for electric double-layer capacitor domain.pdf
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

Keywords perform a significant role in selecting various topic-related documents quite easily. Topics or keywords assigned by humans or experts provide accurate information. However, this practice is quite expensive in terms of resources and time management. Hence, it is more satisfying to utilize automated keyword extraction techniques. Nevertheless, before beginning the automated process, it is necessary to check and confirm how similar expert-provided and algorithm-generated keywords are. This paper presents an experimental analysis of similarity scores of keywords generated by different supervised and unsupervised automated keyword extraction algorithms with expert-provided keywords from the electric double layer capacitor (EDLC) domain. The paper also analyses which texts provide better keywords such as positive sentences or all sentences of the document. From the unsupervised algorithms, YAKE, TopicRank, MultipartiteRank, and KPMiner are employed for keyword extraction. From the supervised algorithms, KEA and WINGNUS are employed for keyword extraction. To assess the similarity of the extracted keywords with expert-provided keywords, Jaccard, Cosine, and Cosine with word vector similarity indexes are employed in this study. The experiment shows that the MultipartiteRank keyword extraction technique measured with cosine with word vector similarity index produces the best result with 92% similarity with expert-provided keywords. This study can help the NLP researchers working with the EDLC domain or recommender systems to select more suitable keyword extraction and similarity index calculation techniques.

Item Type: Article
Additional Information: Indexed by Scopus
Uncontrolled Keywords: Automation, Electrochemistry, Extraction
Subjects: Q Science > QA Mathematics > QA76 Computer software
Q Science > QD Chemistry
T Technology > T Technology (General)
Faculty/Division: Faculty of Industrial Sciences And Technology
Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Mr Muhamad Firdaus Janih@Jaini
Date Deposited: 13 Jan 2022 03:17
Last Modified: 13 Jan 2022 03:17
URI: http://umpir.ump.edu.my/id/eprint/33202
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item