A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles

Miah, Mohammad Badrul Alam and Suryanti, Awang and Rahman, Md Mustafizur and Sanwar Hosen, A. S. M. and Ra, In-Ho (2022) A New Unsupervised Technique to Analyze the Centroid and Frequency of Keyphrases from Academic Articles. Electronics, 11 (17). pp. 1-20. ISSN 2079-9292. (Published)

[img]
Preview
Pdf
A New Unsupervised Technique to Analyze.pdf
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

Automated keyphrase extraction is crucial for extracting and summarizing relevant information from a variety of publications in multiple domains. However, the extraction of good-quality keyphrases and the summarising of information to a good standard have become extremely challenging in recent research because of the advancement of technology and the exponential development of digital sources and textual information. Because of this, the usage of keyphrase features for keyphrase extraction techniques has recently gained tremendous popularity. This paper proposed a new unsupervised region-based keyphrase centroid and frequency analysis technique, named the KCFA technique, for keyphrase extraction as a feature. Data/datasets collection, data pre-processing, statistical methodologies, curve plotting analysis, and curve fitting technique are the five main processes in the proposed technique. To begin, the technique collects multiple datasets from diverse sources, which are then input into the data pre-processing step by utilizing some text pre-processing processes. Afterward, the region-based statistical methodologies receive the pre-processed data, followed by the curve plotting examination and, lastly, the curve fitting technique. The proposed technique is then tested and evaluated using ten (10) best-accessible benchmark datasets from various disciplines. The proposed approach is then compared to our available methods to demonstrate its efficacy, advantages, and importance. Lastly, the results of the experiment show that the proposed method works well to analyze the centroid and frequency of keyphrases from academic articles. It provides a centroid of 706.66 and a frequency of 38.95% in the first region, 2454.21 and 7.98% in the second region, for a total frequency of 68.11%

Item Type: Article
Uncontrolled Keywords: Keyphrase extraction; KCFA technique; Data pre-processing; Curve plotting; Curve fitting technique; Feature; Keyphrase centroid; keyphrase frequency
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
T Technology > TA Engineering (General). Civil engineering (General)
Faculty/Division: Institute of Postgraduate Studies
College of Engineering
Faculty of Computing
Depositing User: Noorul Farina Arifin
Date Deposited: 06 Sep 2022 08:14
Last Modified: 05 Jan 2024 07:46
URI: http://umpir.ump.edu.my/id/eprint/35105
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item