UMP Institutional Repository

Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik

Noorhuzaimi@Karimah, Mohd Noor (2016) Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad dan kelas semantik. PhD thesis, Universiti Kebangsaan Malaysia.

[img]
Preview
Pdf
Resolusi anafora artikel Bahasa Melayu berasaskan pengetahuan terhad.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Anaphora resolution (AR) is a process to resolve reference entity of pronoun anaphora. It is a phenomenon that occur in every languages and requires human experts or specific rules in order to resolve it. AR able to improve language processing applications such as question-answering, text mining, document summarizations, and information extraction. There has been various research carried out on AR, but the majority of them were meant for languages such as English, Japanese and Norwegian. Very few and almost no research effort have been focussed on AR for Malay language. Therefore, the aim of this research is to resolve the phenomena of AR for Malay text by using knowledge poor approach and semantic class labelling model. In order to achieve the aim, a framework of the Malay AR has been developed as a guide to solve this phenomenon in Malay language. Meanwhile, the process to determine the type of usage for pronoun nya has been solved by using a set of rules, a set of similar words, and word filtering that has been generate from semantic class labelling model. This process is important because the use of pronoun nya in Malay text is the highest, amounting to 68% as compared to other pronouns that mostly depend on the sociological status of referring entity or antecedent. The antecedent candidate determination is an important process that should be considered. The antecedent candidates can be in the form of proper noun or nouns. In order to determine proper nouns as suitable candidates, two main processes need to be done: (1) the entity recognition for proper noun that has the word 'dan' and comma symbol (,); and (2) the process to determine the semantic label for each retrieved candidate in order to determine their sociological status. The research used part of the name gazetteers for people, organization, location and position. Testing has been conducted on 60 Malay articles with different classes of proper nouns. The results were compared with the benchmark data tagged by a Malay linguist. The result shows an average precision and recall values of 85% and 90% respectively. The proposed framework of AR by using knowledge poor approach for Malay text shows increased success rate by 18.79% as compared to the generic approach proposed by Mitkov and Lappin.

Item Type: Thesis (PhD)
Additional Information: Thesis (Doctor of Philosophy) -- Universiti Kebangsaan Malaysia - 2016, SV: PROF DR. SHAHRUL AZMAN NOAH, NO. CD: 11506
Uncontrolled Keywords: Malay language -- Vocabulary; Malay language -- Grammar
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Faculty/Division: Faculty of Computer System And Software Engineering
Depositing User: Mrs. Sufarini Mohd Sudin
Date Deposited: 12 Jul 2019 02:18
Last Modified: 28 Jul 2021 03:18
URI: http://umpir.ump.edu.my/id/eprint/25341
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item