CAT Tool Characteristics

a) SMT

Statistical machine translation (SMT) is an approach to machine translation that is characterized by the use of machine learning methods and it means that SMT has a learning algorithm that is applied to large body of previously translated text, or known as parallel corpus, parallel text, bitext, or multitext (Syahrina : 2011). SMT is based on the concept of probability. The translation is chosen from the highest probability. The probability score is obtained by previous data from training the SMT with human translated document and from mathematical model, including language model and translation model. The source language text is pre-processed first before applying language model and global search model and preprocessed again for the final presentation in the target language text.

The main goal of SMT is the translation of a text given in some source language into a target language (Jussa, et all : 2012). SMT model firstly started from a word-based translation. But recent development introduces SMT of other models such as phrase-based and syntax-based. Syntax development was still on the research.

b) RBMT

Rule-based machine translation (RBMT) systems were the first commercial machine translation systems (Jussa, et all : 2012). RBMT is much more complex than translating word to word, and these systems develop linguistic rules that allow the words to be put in different places, to have different meaning depending on context, etc. BMT methodology applies a set of linguistic rules in three different phases: analysis, transfer and generation. Therefore, a rule-based system requires: syntax analysis, semantic analysis, syntax generation and semantic generation.

Thurmair (2009) quoted by (Syahrina : 2011) gave comments about how RMT and SMT performs. RMT systems have weaknesses in lexical selection in transfer, and lack robustness in case of analysis failures sentences. However they translate more accurately by trying to represent every piece of the input. Meanwhile, SMT systems are more robust and always pro-duce output. They read more fluent, due to the use of Language Models, and are better in lexical selection. However, they have difficulties to cope with phenomena which require linguistic knowledge, like morphology, syntactic functions, and word order. Also, they lose adequacy due to missing or spurious translations.

c) HMT

Hybrid Machine Translation (HMT) was built due to the weakness of the two approaches and their possibility to be integrated (Syahrina : 2011). Syahrina mentioned that statistical Machine Translation and Rule-Based Translation are two MT approaches which work oppositely yet complementarily. SMT did not need to learn about the language at all, while RMT‟s basis is gathering language rules.

In HMT architecture there are three basic components of HMT architecture: identification of source language by observing chunks (words, phrases and equivalents), transformation of the chunks into target language, and generation of translated language (Thurmair, 2009).

Reference :

Jussa, Marta R. Costa, et al, (2012). Study and comparison of rule-based and statistical catalan-spanish machine translation systems. Journal of Computing and Informatics. 31: 245–270.

Syahrina, Alvi. (2011). Online machine translator system and result comparison – statistical machine translation vs hybrid machine translation. Unpublished Thesis. University of Boras, Sweden.

Thurmair, G. (2009). Comparing different architectures of hybrid Machine Translation systems. European Association of Machine Translation.

Translate.google.co.id

Tukang Terjemah

CAT Tool Characteristics

Related Posts

Post a Comment

Post a Comment

Populer

Label List

Popular - Seminggu Ini

kunjungan

Cari