Data pre-processing of website browsing record: An initial step for web page classification

Siti Hawa, Apandi and Jamaludin, Sallim and Rozlina, Mohamed (2021) Data pre-processing of website browsing record: An initial step for web page classification. In: 7th International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021 , 24-26 Aug. 2021 , Pekan, Malaysia. pp. 679-684.. ISSN 978-166541407-4 ISBN 978-1-6654-1408-1

[img] Pdf
Data pre-processing of website browsing record_FULL.pdf
Restricted to Repository staff only

Download (1MB) | Request a copy
[img]
Preview
Pdf
Data_pre-processing_of_website_browsing_record_An_initial_step_for_web_page.pdf

Download (632kB) | Preview

Abstract

The Internet utilization has resulted in an increase in the number of web pages on the World Wide Web. The classification of web pages is required to organize the growing number of web pages. A web page classification system is proposed to be constructed using a deep learning algorithm. The initial step for web page classification is data pre-processing. The website browsing record is used as a dataset in this study. The raw dataset needs to be pre-processing to fetch the cleaned data by removing missing value data, redundant data, and error data. There are many steps in data pre-processing which include data cleaning and web content pre-processing. The main contribution of this paper is to investigate how to do data pre-processing on website browsing records that focusing on the Game and Online Video web pages that will be utilized as the dataset to construct the web page classification model. After doing the data pre-processing, the number of datasets will be reduced. This shows many datasets have been removed because it is inactive and not suitable to be used in this study as the dataset of Game and Online Video web pages.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Indexed by Scopus
Uncontrolled Keywords: Deep learning; Scientific computing; Web pages; Games; Tag clouds; Cleaning; Tokenization
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Pn. Hazlinda Abd Rahman
Date Deposited: 16 Jan 2024 06:13
Last Modified: 16 Jan 2024 06:13
URI: http://umpir.ump.edu.my/id/eprint/33291
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item