Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

Miah, Mohammad Badrul Alam and Suryanti, Awang and Md Saiful, Azad and Rahman, Md Mustafizur (2022) Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach. International Journal of Advanced Computer Science and Applications (IJACSA), 13 (1). pp. 788-796. ISSN 2156-5570(Online). (Published)

[img]
Preview
Pdf
Keyphrases Concentrated Area Identification.pdf
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

The extraction of high-quality keywords and sum-marising documents at a high level has become more difficult in current research due to technological advancements and the expo-nential expansion of textual data and digital sources. Extracting high-quality keywords and summarising the documents at a high-level need to use features for the keyphrase extraction, becoming more popular. A new unsupervised keyphrase concentrated area (KCA) identification approach is proposed in this study as a feature of keyphrase extraction: corpus, domain and language independent; document length-free; utilized by both supervised and unsupervised techniques. In the proposed system, there are three phases: data pre-processing, data processing, and KCA identification. The system employs various text pre-processing methods before transferring the acquired datasets to the data processing step. The pre-processed data is subsequently used during the data processing step. The statistical approaches, curve plotting, and curve fitting technique are applied in the KCA identification step. The proposed system is then tested and evaluated using benchmark datasets collected from various sources. To demonstrate our proposed approach’s effectiveness, merits, and significance, we compared it with other proposed techniques. The experimental results on eleven (11) datasets show that the proposed approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains.

Item Type: Article
Uncontrolled Keywords: Keyphrase concentrated area; KCA identification; Feature extraction; Data processing; Keyphrase extraction; Curve fitting
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty/Division: Institute of Postgraduate Studies
College of Engineering
Faculty of Computing
Depositing User: Noorul Farina Arifin
Date Deposited: 07 Feb 2022 08:42
Last Modified: 05 Jan 2024 01:56
URI: http://umpir.ump.edu.my/id/eprint/33320
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item