Home | Photos | Old stuff | Links | Blog  
     
 

Archive for May 1st, 2009

Google translator needs some work

Friday, May 1st, 2009 by Ari

Every so often I run across a Hebrew word or phrase I don’t know while surfing the web, and try to use Google translator to translate it. After much study, I’ve come to the inescapable conclusion that Google translator sucks. I’ve used plenty of translators on the web before, and I know none of them are perfect, but google’s is far and away the worst one I’ve run across. (Notable because this is the first time Google has done something significantly worse than the competition). Here’s one laughable example. Since I run the Rambam study group at shul (Rambam was a 12th century rabbi whose writings are still considered very influential), I use the translater a lot when I prepare by reading his writings online. This is the first sentence of the chapter we’ll be covering this week:

 

 אֵין הַמּוֹצִיא מֵרְשׁוּת לִרְשׁוּת אוֹ הַמַּעְבִּיר בִּרְשׁוּת הָרַבִּים חוּץ לְאַרְבַּע אַמּוֹת, חַיָּב–עַד שֶׁיַּעְקֹר מֵעַל גַּבֵּי מְקוֹם שֶׁיֵּשׁ בּוֹ אַרְבָּעָה טְפָחִים עַל אַרְבָּעָה טְפָחִים אוֹ יָתֵר, וְיַנִּיחַ עַל גַּבֵּי מְקוֹם שֶׁיֵּשׁ בּוֹ אַרְבָּעָה עַל אַרְבָּעָה.

I would translate this more or less as follows:

One who transfers from domain to domain or carries in the public domain beyond four cubits is not liable until he picks up from a place that is four handsbreaths by four handsbreadths or more, and puts it down in a place that is four by four.

Admittedly it’s a somewhat convoluted and awkward construction in English, but someone who understood the context (Rambam is writing about the restrictions on carrying on the sabbath), could understand this sentence. On the other hand, here is how Google translates this sentence:

The Authority does not have authority or permission Hmabir many foreign Four die, owing – to Shiakr over a place that has four and nurture and nurture or the four remaining, Weinih on a place that has four on four.

While some of those mistakes can be understood in a context free environment (domain and permission are the same word in Hebrew), the unforgiveable sin (IMHO) is that several of the words in the translation are not actually words.  There are also several places where the translation was just…. weird. The Hebrew word חוּץ almost always means outside, I don’t know why it was translated as “foreign” here. The interesting part is that if you translate it alone, without any other context, it does correctlytranslate the word as “external” or “outside”. This of course means that Google does look for context sensitive clues when trying to do a translation, which shouldn’t be a surprise given the talent Google has and their desire to do everything well. However it also means that their context sensitive translations appear to be even worse (in at least some instances) than the context free translations they replaced.