Use Word Cloud Image Of Web Page Text Content On Convolutional Neural Network (CNN) For Classification Of Web Pages

Siti Hawa, Apandi and Jamaludin, Sallim and Rozlina, Mohamed (2024) Use Word Cloud Image Of Web Page Text Content On Convolutional Neural Network (CNN) For Classification Of Web Pages. International Journal of Computing and Digital Systems, 15 (1). pp. 1-12. ISSN 2210-142X. (Published)

[img]
Preview
Pdf
Use Word Cloud Image Of Web Page Text Content On Convolutional Neural Network (CNN) For Classification of Web Pages.pdf

Download (519kB) | Preview

Abstract

In today's environment, people can easily use the internet to find information by visiting web pages. Most people like to visit web pages that offer games and videos to watch online. People who spend a lot of time on web pages like these can become addicted to the internet and it can have a bad effect on them. Access to web pages that offer games and streaming videos needs to be limited to stop people from being addicted to the internet. It needs a tool that can classify web pages category based on its content. Due to lack of matrix representation that unable to handle long web page text content, this study uses a technique which is word cloud image to visualize the words that has been extracted from the text content web page after performing data pre-processing. The most popular words from the text content web page are displayed in big size and appear in center of the word cloud image. The most popular words are the words that frequently appear in the text content web page, and it related to describe what the web page content is about. The Convolutional Neural Network (CNN) identifies the pattern of words displayed in the central areas of the word cloud image to classify the category that the web page belongs to. The proposed model for classifying web pages has an accuracy of 0.86. The proposed model can be used, for example, by the institution to set rules and limit the usage of the internet for the users to surf the web pages that offer games and streaming videos. It will be one of the ways to prevent users from getting internet addiction.

Item Type: Article
Uncontrolled Keywords: Web page classification, document representation, word cloud image, deep learning, Convolutional Neural Network
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Miss Amelia Binti Hasan
Date Deposited: 16 Jan 2024 07:23
Last Modified: 16 Jan 2024 07:23
URI: http://umpir.ump.edu.my/id/eprint/40041
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item