CAT Tool Characteristics

a)  SMT
Statistical machine translation (SMT) is an approach to machine  translation  that  is characterized by  the  use  of  machine  learning  methods  and it means  that SMT  has  a  learning  algorithm  that  is  applied  to large  body  of  previously  translated  text,  or  known  as  parallel  corpus,  parallel  text, bitext, or multitext (Syahrina : 2011).  SMT is based on the concept of probability. The translation is chosen from the highest probability. The probability score is obtained by previous data from training the SMT with human translated document and from mathematical model, including language model and translation model. The  source  language  text  is pre-processed  first before applying  language model  and  global  search model  and  preprocessed  again  for  the final presentation in the target language text.
The main goal of SMT is the translation of a text given in some source language into a target language (Jussa, et all : 2012). SMT model firstly started from a word-based translation. But recent development introduces SMT of other models such as phrase-based and syntax-based.  Syntax development was still on the research.

b)  RBMT
Rule-based machine translation (RBMT) systems were the first commercial machine translation systems (Jussa, et all : 2012). RBMT is much more complex than translating word to word, and these systems develop linguistic rules that allow the words to be put in different places, to have different meaning depending on context, etc. BMT methodology applies a set of linguistic rules in three different phases: analysis, transfer and generation. Therefore, a rule-based system requires: syntax analysis, semantic analysis, syntax generation and semantic generation.

Thurmair (2009) quoted by (Syahrina : 2011) gave comments about how RMT and SMT performs. RMT  systems  have  weaknesses  in  lexical  selection  in  transfer,  and  lack robustness in case of analysis failures sentences. However they translate more accurately by trying to represent every piece of the input. Meanwhile, SMT systems are more robust and always pro-duce output. They read more fluent, due to the use of Language Models, and are better in lexical selection. However,  they  have  difficulties  to  cope  with  phenomena  which  require linguistic  knowledge,  like morphology,  syntactic  functions,  and  word  order. Also, they lose adequacy due to missing or spurious translations.

c)    HMT
Hybrid Machine Translation (HMT) was built due to the weakness of the two approaches and their possibility to be integrated (Syahrina : 2011). Syahrina mentioned that statistical Machine Translation and Rule-Based Translation are two MT approaches which work oppositely yet complementarily. SMT did not need to learn about the language at all, while RMT‟s basis is gathering language rules.
In  HMT  architecture  there  are  three  basic  components  of  HMT  architecture: identification  of  source  language  by  observing  chunks  (words,  phrases  and equivalents),  transformation  of  the  chunks  into  target  language,  and  generation  of translated  language  (Thurmair,  2009).

Reference :

  1. Jussa, Marta R. Costa, et al, (2012). Study and comparison of rule-based and statistical catalan-spanish machine translation systems. Journal of Computing and Informatics. 31: 245–270.
  2. Syahrina, Alvi. (2011). Online machine translator system and result comparison – statistical machine translation vs hybrid machine translation. Unpublished Thesis. University of Boras, Sweden.
  3. Thurmair,  G.  (2009).  Comparing  different  architectures  of  hybrid  Machine Translation systems. European Association of Machine Translation.
  4. Translate.google.co.id

Related Posts

Post a Comment

Subscribe Our Newsletter