Google translator needs some work
May 1st, 2009 by AriEvery so often I run across a Hebrew word or phrase I don’t know while surfing the web, and try to use Google translator to translate it. After much study, I’ve come to the inescapable conclusion that Google translator sucks. I’ve used plenty of translators on the web before, and I know none of them are perfect, but google’s is far and away the worst one I’ve run across. (Notable because this is the first time Google has done something significantly worse than the competition). Here’s one laughable example. Since I run the Rambam study group at shul (Rambam was a 12th century rabbi whose writings are still considered very influential), I use the translater a lot when I prepare by reading his writings online. This is the first sentence of the chapter we’ll be covering this week:
אֵין הַמּוֹצִיא מֵרְשׁוּת לִרְשׁוּת אוֹ הַמַּעְבִּיר בִּרְשׁוּת הָרַבִּים חוּץ לְאַרְבַּע אַמּוֹת, חַיָּב–עַד שֶׁיַּעְקֹר מֵעַל גַּבֵּי מְקוֹם שֶׁיֵּשׁ בּוֹ אַרְבָּעָה טְפָחִים עַל אַרְבָּעָה טְפָחִים אוֹ יָתֵר, וְיַנִּיחַ עַל גַּבֵּי מְקוֹם שֶׁיֵּשׁ בּוֹ אַרְבָּעָה עַל אַרְבָּעָה.
I would translate this more or less as follows:
One who transfers from domain to domain or carries in the public domain beyond four cubits is not liable until he picks up from a place that is four handsbreaths by four handsbreadths or more, and puts it down in a place that is four by four.
Admittedly it’s a somewhat convoluted and awkward construction in English, but someone who understood the context (Rambam is writing about the restrictions on carrying on the sabbath), could understand this sentence. On the other hand, here is how Google translates this sentence:
The Authority does not have authority or permission Hmabir many foreign Four die, owing – to Shiakr over a place that has four and nurture and nurture or the four remaining, Weinih on a place that has four on four.
While some of those mistakes can be understood in a context free environment (domain and permission are the same word in Hebrew), the unforgiveable sin (IMHO) is that several of the words in the translation are not actually words. There are also several places where the translation was just…. weird. The Hebrew word חוּץ almost always means outside, I don’t know why it was translated as “foreign” here. The interesting part is that if you translate it alone, without any other context, it does correctlytranslate the word as “external” or “outside”. This of course means that Google does look for context sensitive clues when trying to do a translation, which shouldn’t be a surprise given the talent Google has and their desire to do everything well. However it also means that their context sensitive translations appear to be even worse (in at least some instances) than the context free translations they replaced.
May 1st, 2009 at 4:23 pm
Google reader translates everything–including my friends’ last names in their facebook status updates–but with no indication of what language they are translated from. So I have a friend “Brigitte Mediator” (her last name is Mittler) and another friend who is a fan of someone who plays on a sports team (maybe the Caps) whose last name translates to–I kid you not–testicles. At least according to Google. Gotta love it.
May 4th, 2009 at 10:10 am
You are correct that Google is using context in their translation. This is a good example of where that is a bad idea. However, this is a pretty unfair example. Imagine if you asked Google to translate a passage from Old English (same time frame as Rambam) to modern French. You would expect complete nonsense as they are significantly different languages. An example from Beowulf: “Þæt wæs god cyning!” is translated as “That was a good king.” Would you ever expect Google translate to get this correct? For the record, Google totally punts when asked to translate this old English passage.
The passage from the Rambam is effectively a different language and I would not expect Google to do well on this passage. You can see that Google clearly has never seen the words “amah” and “tefach” before and is unfamiliar with the construction “arba al arba”. The phrase: “הָרַבִּים חוּץ לְאַרְבַּע אַמּוֹת” gets translated word for word as: “many (rabim) foreign (one possible connotation of chutz, like “chutnik” as foreigner), Four (arba) die (from the root met, since it doesn’t know amot). Since Google does not support this language, you are much better off translating each word individually and reconstructing the meaning of the passage by yourself. It’s not fair to say that Google translate is bad when it doesn’t support the language you are providing.
In general, the Google translation system is considered one of the best in the business, although I don’t know how well they do on Hebrew in particular.
May 10th, 2009 at 8:14 am
No, the Hebrew translator isn’t good at all – even though the translators for other languages are pretty good. I’ve tried using it, and even though my Hebrew isn’t that great, I do better than Google 98% of the time.