Morphological Analysis for Resource-Poor Machine Translation

No Thumbnail Available
Date
2010-12-27T09:39:05Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In statistical machine translation, word-to-word probabilities are usually difficult to estimate because of the problem of data sparseness, especially for resource-poor languages. Furthermore, this problem would become more serious for translation from morphologically complex languages such as Malay or Indonesian to morphologically simple ones such as English, since we need to be able to translate word forms in many different morphological variants. This paper conducts a morphological analysis for such resource-poor and morphologically rich machine translation: one is Malay-English machine translation; another is Indonesian-English. Specifically, we use morphological analysis to modify the unknown words of morphologically complex languages, and explore the effect of using the modified input on translation quality with varying number of training sentences. In our experiments, a number of trials were carried out to assess the performance of the proposed approach. The experimental results show that our proposed method can improve translation quality when the rate of unknown words is higher than 20%, and the improvement gradually increases as the unknown word rate increases.
Description
Keywords
Citation