Ng, Choon-Ching and Siau-Chuin, Liew and Wan Muhammad Syahrir, Wan Hussin and Tutut, Herawan (2013) Identifying the Dominant Language of Web Page Using Supervised N-grams. 2012 International Conference on Advanced Computer Science Applications and Technologies. pp. 344-348. (Published)
PDF
dentifying_the_Dominant_Language_of_Web_Page_Using_Supervised_N-grams.pdf - Published Version Restricted to Repository staff only Download (228kB) |
Abstract
Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.
Item Type: | Article |
---|---|
Additional Information: | Liew Siaw Chuin (Siau-Chuin Liew) |
Uncontrolled Keywords: | Support vector machine, supervised N-grams, language identification, Arabic script |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Faculty/Division: | Faculty of Computer System And Software Engineering |
Depositing User: | Ms. Hazima Anuar |
Date Deposited: | 09 Jan 2015 03:00 |
Last Modified: | 27 Apr 2018 01:15 |
URI: | http://umpir.ump.edu.my/id/eprint/6869 |
Download Statistic: | View Download Statistics |
Actions (login required)
View Item |