A semantic taxonomy for weighting assumptions to reduce feature selection from social media and forum posts

Ali Muttaleb, Hasan and Rassem, Taha H. and Noorhuzaimi@Karimah, Mohd Noor and Ahmed Muttaleb, Hasan (2020) A semantic taxonomy for weighting assumptions to reduce feature selection from social media and forum posts. In: 4th International Conference of Reliable Information and Communication Technology, IRICT 2019, 22-23 September 2019 , Johor Bahru, Malaysia. pp. 407-419., 1073. ISSN 2194-5357 ISBN 978-303033581-6

A Semantic Taxonomy for Weighting Assumptions to Reduce Feature Selection from Social Media and Forum Posts.pdf

Download (174kB) | Preview


Numerous researchers have worked on the knowledge-based semantics of words to clarify the ambiguity of (https://github.com/alimuttaleb/Ali-Muttaleb/blob/master/Synonym.txt) synonyms in various natural-language processing fields, such as Wikipedia, websites, and social networks. This paper attempts to clarify ambiguities in the lexical semantics of taxonomy in social media. It proposes a new knowledge-based semantic representation approach that can handle ambiguity and high dimensionality issues in text mining. The proposed approach consists of two main components, namely, a feature-based method for incorporating the relationships between lexical sources and a topic-based reduction method to overcome high dimensionality issues. These components help weight and reduce the relevant features of a concept. The proposed approach captures further lexical semantic similarity between words. It also evaluates the use of (https://wordnet.princeton.edu) WordNet 3.1 in text clustering and constant weighting assumption in the feature-based method used to select concepts/words from social media. To address ambiguity, the semantics of concepts with small feature subset size reduction are represented, and the performance of the semantic similarity measurement is improved. The proposed method evaluates word semantic similarity using the (https://github.com/alimuttaleb/semantictaxonomy/blob/master/mc30.txt) MC30 dataset in WordNet and obtains the following results for semantic representation: r = 0.82, p = 0.81, m = 0.81, and nz = 0.96.

Item Type: Conference or Workshop Item (Speech)
Additional Information: Indexed by Scopus
Uncontrolled Keywords: Feature selection; Feature-based method; Gloss; MC30; Semantic representation; Semantic taxonomy; Social media
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: Dr. Taha Hussein Alaaldeen Rassem
Date Deposited: 13 Jul 2020 06:16
Last Modified: 13 Jul 2020 06:16
URI: http://umpir.ump.edu.my/id/eprint/28450
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item