A Review of Recent Trends: Text Mining of Taxonomy Using WordNet 3.1 for the Solution and Problems of Ambiguity in Social Media

Hasan, Ali Muttaleb and Rassem, Taha H. and Noorhuzaimi@Karimah, Mohd Noor and Hasan, Ahmed Muttaleb (2020) A Review of Recent Trends: Text Mining of Taxonomy Using WordNet 3.1 for the Solution and Problems of Ambiguity in Social Media. In: Intelligent Computing and Innovation on Data Science: Proceedings of ICTIDS 2019 , 11-12 October 2019 , Kuala Lumpur, Malaysia. pp. 137-152., 118. ISBN 978-981-15-3284-9

[img]
Preview
Pdf
A Review of Recent Trends.pdf

Download (93kB) | Preview

Abstract

Text processing has been playing a great role in information retrieval to solve the problem of ambiguity in natural language processing, e.g., internet search, data mining, and social media. In semantic similarity, it will be used to analyze the relationships between Word-Pairs on social media. Organizing a huge number of unstructured text documents into a small number of concepts of word sense disambiguation is essential so that the lexical source could incorporate the features for capturing more semantic evidence. Text mining involves the pre-processing of documents collections, text categorization and classification, and extracting information and terms from golden standard data sets. This work proposed the lexical sourced from the semantic representation. The paper contained an evaluation of the advanced measures, which include shortest path, depth, and information content measures. In this paper, we used the same set of measures as previous studies, but different methods such as taxonomy on social media by semantic similarities, such as Synonymy (https://github.com/alimuttaleb/Ali-Muttaleb/blob/master/Synonym.txt), Non-taxonomy, Hypernym, and Glosses. This paper has focused to address the synonymy and ambiguity by incorporating the knowledge in the lexical resources. Thus, each word in a document is linked to its corresponding concept in the lexical resources. To build the semantic representation, these approaches can be classified into two main approaches: knowledge-based and statistical approaches. The knowledge-based approaches depend on structured information that is normally available in forms of dictionaries, thesaurus, lexicons, WordNet 3.1, and ontologies. The statistical approaches are based on finding the semantic relations among words using the frequencies of words in a given corpus.

Item Type: Conference or Workshop Item (Lecture)
Uncontrolled Keywords: Semantic similarity, Shortest path measures, Depth measures, Information content measures, WordNet 3.1, Text mining, Knowledge-based approach
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: Dr. Taha Hussein Alaaldeen Rassem
Date Deposited: 26 Jan 2021 03:36
Last Modified: 26 Jan 2021 03:36
URI: http://umpir.ump.edu.my/id/eprint/28455
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item