Noor Fatihah , Mazlam (2012) Enhancement of stemming process for malay illicit web content. Masters thesis, Universiti Teknologi Malaysia (Contributors, UNSPECIFIED: UNSPECIFIED).
CD6530.pdf
Download (168kB) | Preview
Abstract
Web filtering system is one of the systems use to prevent users from can
access any web pages that contain illicit contents. There are six (6) phases included
in web filtering process. One of them is pre-processing phase. In this phase, there are
three main activities included; HTML parsing, stemming, and stopping. The main
focus in this research is stemming process. Stemming process is used to remove any
affixes that attached together in the input words from web pages to produce the
correct root words. To date, the existing stemming algorithm in Malay language;
Othman’s stemming algorithm and Sembok’s stemming algorithm still produce
errors in the result. Hence, the errors from both stemming algorithm were analyzed.
Few features were created to encounter the problems occurred in existing stemming
algorithm. There are initial checking with dictionary, implementation of Rule 2 and
also checking with additional dictionary that contains the illicit words not included in
the initial dictionary. These new features were added in enhanced stemming
algorithm.In order to check the effectiveness of the new features added in the
enhanced stemming algorithm, few tests were done to the sample of web pages.
Based from the test, the result shows that only 11% corrected words produced if the
test is done by without checking with initial dictionary and 72% corrected words
produced if the process starts with initial checking with dictionary. The result for the
test for implementation of Rule 2 shows that by using Sembok’s algorithm it
produced only 17% corrected words compared with enhanced stemming algorithm
produced 62% corrected words. As conclusion, the implementation of new features
in enhanced stemming algorithm can reduce the errors produce in Sembok’s
stemming algorithm.
| Item Type: | Thesis (Masters) |
|---|---|
| Additional Information: | Thesis (Masters of Computer Science (Information Security) -- Universiti Teknologi Malaysia - 2012 |
| Uncontrolled Keywords: | Computer networks Security measures;World Wide Web Security measures;Web sites Security measures |
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
| Faculty/Division: | Unspecified |
| Depositing User: | Muhamad Firdaus Janih@Jaini |
| Date Deposited: | 05 Nov 2015 03:26 |
| Last Modified: | 19 Aug 2021 05:24 |
| URI: | https://umpir.ump.edu.my/id/eprint/9463 |
| Statistic Details: | View Download Statistic |

