Knowledge based semantic representation for semantic relatedness measurements

No default citation style available for Eprints

Preview

Pdf
ir.Knowledge based semantic representation for semantic relatedness measurements.pdf - Accepted Version
Download (440kB) | Preview

Abstract

Textual analysis has become one of the most important tasks due to the rapid increase in the number of texts. The text has been continuously generated in a variety of formats, including social media postings and chats, emails, articles, and news. The handling of these texts necessitates efficient and effective procedures capable of dealing with linguistic challenges arising from natural language complexity. In recent years, there has been a lot of research into using semantic characteristics from lexical sources to deal with synonymy and ambiguity difficulties in text mining tasks like document clustering and classification. The main challenges of exploiting the lexical knowledge sources included WordNet in how to incorporate the different types of semantic relations for capturing more semantic evidence and how to handle the high dimensionality of the current semantic representation approaches. The research proposes a new knowledge-based semantic representation approach for semantic relatedness measurements. The weighting-based method for incorporating the semantic relations in the lexical sources is proposed to form the representation vector of the word. The proposed approach depends on the topological parameters (depth, density, descendants, and ancestors) in the semantic taxonomy. To handle the high dimensionality issue in the weighting-based method, a new topic-based technique is introduced to represent the semantics of words in terms of topics instead of the concepts in the weighting-based method. This proposed approach depends on the semantic features in the lexical sources (such as WordNet) for handling the synonymy and ambiguity issues. The proposed approach is evaluated for semantic relatedness measurements using six gold standard test sets. The evaluation results in terms of correlation measures demonstrate that the weighting-based method is more effective than the state-of-the-art feature-based methods. For the sample's harmonic measure to be accurate, the most anomalous value of r and p is calculated using the measure of the mean for each dataset, the proposed r and p methods are MC30, RG65, WordRel353, MT287, MEN3000, and Rgnew65 r 0.82, 0.86, 0.52, 0.53, 0.89, and 0.47, also for p 0.80, 0.82, 0.52, 0.47, 0.82, and 0.45. The results of the measurements indicated from the datasets are measures of the standard Means, thus the results of measurements of the proposed approach are 0.81, 0.84, 0.46, 0.49, 0.52, and 0.86 for MC30, RG65, WordRel353, MT287, MEN3000, and Rgnew65, respectively. The Non-zero is utilised to assess the proposed approach in order to ascertain the percentage of word pairings with a semantic relatedness value larger than zero. Using MC30, RG65, WordRel35, MT287, MEN3000, and Rgnew65, the NZ attained in the experiments was 0.96, 0.95, 0.95, 0.87, 0.95, and 0.95, respectively.

Item Type:	Thesis (PhD)
Additional Information:	Thesis (Doctor of Philosophy) -- Universiti Malaysia Pahang – 2022, SV: Ts. Dr. Taha Hussein Alaaldeen Rassem, NO. CD: 13288
Uncontrolled Keywords:	semantic relatedness measurements
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General)
Faculty/Division:	Institute of Postgraduate Studies Faculty of Computing
Depositing User:	Mr. Nik Ahmad Nasyrun Nik Abd Malik
Date Deposited:	23 May 2023 08:32
Last Modified:	18 Sep 2023 07:52
URI:	http://umpir.ump.edu.my/id/eprint/37679
Download Statistic:	View Download Statistics

Actions (login required)

View Item