字幕列表 影片播放
Machine translation is incredibly difficult. And to prove that, I will now read this introduction
again, after it's been sent through Google's translator -- currently one of the best in
the world -- and then translated back into English.
Machine translation is very difficult. Back then translated into English - is one of the
best in the world right now - it is to prove that, after being sent through Google's translator,
I'll read this again introduced.
Okay, I chose a difficult language, but each one I tried introduced subtle errors in different
ways. Via Chinese, it had been translated by “Google hair”. Via French, the introduction
became a “he”, not an “it”. And those sentences were incredibly simple.
Folks who only speak one language -- and I am embarrassed to say that's a group that
includes me, I'm sorry -- folks who only speak one language often assume that you can
open a translation dictionary, pick an appropriate word, faff around with the grammar a bit,
and have a functional sentence in another language. For simple sentences, yes, that's
true: but very few sentences in the real world are actually that simple.
Google recently released a paper about how they'd reduced machine translation to a
problem in vector space mathematics, representations of concepts in an abstract language space.
Which is great for mapping concepts to words, and it'll even deal well with homographs,
identical words that mean completely different things. You can deal with those through context:
the days of “hydraulic ram” being translated as “water sheep” are pretty much in the
past.
[OFF SCREEN LAUGHTER]
Spot the engineer.
For formal, technical documents, it might even start to work well.
But for more casual communication, it's not so easy.
Heck, translating between British English and American English isn't always easy.
Not because your car's “hood” is our “bonnet”, but because “that's a brave
idea” isn't a compliment in British English, it means you're a prat and your idea is
impossible.
There are concepts which don't quite match between languages. “Bonne nuit” might
literally mean the same as “buenas noches” -- I'm sorry about my pronunciation there
-- but one is meant for saying goodnight at bedtime and the other's for saying hello
or goodbye at any point after dark.
Then you have the concepts that don't translate between languages at all. In French, “you”
translates as “vous” if it's someone you should be respectful towards, and “tu”
if it's a more casual conversation. Or if you're talking to God. No, really. God is
“tu”. A computer will crush both of those to “you” when translating to other languages,
and it won't have any idea which of them to use when translating into French.
And that is just a simple “honorifics” system. Korean has a much more complicated
set of pronouns for all sorts of situations. Remember this? That repeated line: oppan Gangnam
style. The English translation of “oppa” is usually “a woman's older brother”:
but in everyday speech, “oppa” is used to refer to someone based on a series of complicated
and fuzzy rules that make instinctive sense to native speakers. To make it worse, PSY
is referring to himself in the third person there, which sounds really weird when translated
out of Korean. There is no way to translate all of the meaning in those words into one
English sentence.
Then you have the problem of shared expectations. English-speaking cultures tend to be monochronic:
if you make an appointment to meet someone at 11am, you are expected to be there at about
11am. I mean, groups of friends can often get around this -- “the party starts at
6” often means people will turn up anywhere from 6:30 to 9. But imagine if that lack of
punctuality, and that acceptance of a lack of punctuality, expanded to all aspects of
everyday life. Welcome to the rest of the world. Massive parts of this planet run on
what is called polychronic time. Two appointments at the same time? That's fine, they'll
understand. And they will understand.
Needless to say, there is often quite significant culture clash when monochronic and polychronic
people meet. But a machine translation isn't going to see an English sentence like “I'll
meet you at 7pm” and add a note for someone in a polychronic culture that, no, they really
do mean 7pm, and they're going to be annoyed if you're late.
Ultimately, to accurately translate something, you don't just need to know how words map
to concepts: you need to understand social structures, subtext, nuance, innuendo. You
need at least a basic theory of mind: the idea that the speaker and the listener both
have beliefs and desires expressed by the particular words they've chosen. Translators
need to be able to ask questions of the original author, so you can check that the subtleties
that you have to add to their work reflect their intention.
The problem isn't that language is messy -- computers can cope with messy, heck, they
can pretty much solve CAPTCHAs better than humans these days. The problem is that language
relies on intent, on shared secrets, on group identity, and on hidden knowledge. Machine
translation is a useful tool, don't get me wrong, but trying to get a machine to translate
better than a human is… a brave idea.
[Translating thee subtitles? Add your name here!]