Mohammad Badrul Alam, Miah (2024) An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features. PhD thesis, Universti Malaysia Pahang Al-Sultan Abdullah (Contributors, Thesis advisor: Suryanti, Awang).
|
Pdf
An extended tree-based keyphrase extraction technique (etket) for academic articles based on syntactic features.pdf - Accepted Version Download (5MB) | Preview |
Abstract
Recently, automatic keyphrase extraction (AKE) has faced challenges in extracting high quality keyphrases and summarizing information at a superior level due to technologicaL advancements and the exponential growth of digital sources and textual information.Machine learning including unsupervised AKE depends on feature extraction to extract relevant features and ranking procedure to choose significant keyphrases. However, existing unsupervised AKE has some limitations, including the inability to recognize appropriate features that provide diversity and topical coverage, which are occasionally neglected and misguided ranking procedures. In addition, the existing tree-based technique doesn’t use feature extraction, which is vital for achieving good performance, and uses term frequency (TF) only as a key feature, which misguides the ranking procedure to select the most significant keyphrase because the TF values of irrelevant keyphrases are higher than those of relevant keyphrases. Therefore, this thesis sought to develop an extended tree-based keyphrase extraction technique (ETKET) by proposing new features of keyphrases with new formulas and an extended ranking procedure to select the top most significant keyphrases from academic articles. The proposed technique consists of five main processes: data collection and preprocessing; candidate keyphrase selection; candidate keyphrase processing to select the final candidate keyphrase using a keyphrase extraction (KePhEx) tree; feature extraction to extract new features such as keyphrase frequency,keyphrase centroid, keyphrase distance, keyphrase concentration area, keyphrase position,and keyphrase positions in different sentences; and finally, an extended ranking procedure to select the topmost significant keyphrases. The proposed technique was evaluated on five widely used benchmark long datasets (SemEval2010, Schutz2008, Nguyen2007, Citeulike180, and Cacic) to measure its performance and effectiveness. The obtained results were then compared with state-of-the-art techniques, showing that the proposed technique outperformed others in terms of precision, recall, and F1-score. Thus, the results proved that the ETKET was able to extract the topmost significant keyphrases
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (Doctor of Philosophy) -- Universiti Malaysia Pahang – 2024, SV: Assoc. Prof. Ts. Dr. Suryanti Awang, NO.CD:3241 |
Uncontrolled Keywords: | automatic keyphrase extraction (AKE) |
Subjects: | Q Science > QA Mathematics > QA76 Computer software T Technology > T Technology (General) |
Faculty/Division: | Institute of Postgraduate Studies Faculty of Computing |
Depositing User: | Mr. Mohd Fakhrurrazi Adnan |
Date Deposited: | 07 May 2025 07:08 |
Last Modified: | 07 May 2025 07:08 |
URI: | http://umpir.ump.edu.my/id/eprint/44274 |
Download Statistic: | View Download Statistics |
Actions (login required)
![]() |
View Item |