CROSS-LINGUAL ANNOTATION PROJECTION FOR THE DEVELOPMENT OF MALAY CORPUS
Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay is experimented as the lessresourced language and English is experimented as the rich-re...
| Main Author: | ZAMIN, NORSHUHANI |
|---|---|
| Format: | Thesis |
| Language: | English |
| Institution: | Universiti Teknologi Petronas |
| Record Id / ISBN-0: | utp-utpedia.21305 / |
| Published: |
2014
|
| Subjects: | |
| Online Access: |
http://utpedia.utp.edu.my/21305/1/2014%20-COMPUTER%20%26%20INFORMATION%20SCIENCES%20-%20CROSS-LINGUAL%20ANNOTATION%20PROJECTION%20FOR%20THE%20DEVELOPMENT%20OF%20MALAY%20CORPOS%20-%20NORSHUHANI%20ZAMIN.pdf http://utpedia.utp.edu.my/21305/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: |
Cross-lingual annotation projection methods can benefit from rich-resourced
languages to improve the performance of Natural Language Processing (NLP) tasks in
less-resourced languages. In this research, Malay is experimented as the lessresourced
language and English is experimented as the rich-resourced language. The
research is proposed to reduce the deadlock in Malay computational linguistic
research due to the shortage of Malay tools and annotated corpus by exploiting stateof-
the-art English tools. The aim of the research is to investigate a suitable crosslingual
annotation projection based on word alignment of two languages with
syntactical differences. A word alignment method known as MEW A (Malay-J;nglish
Word Aligner) that integrates a Dice Coefficient and bigram string similarity measure
with little supervision is proposed. |
|---|