谷歌翻譯如何利用數學理解 134 種語言 | WSJ Tech Behind (How Google Translate Uses Math to Understand 134 Languages

字幕列表影片播放

由 AI 自動生成

In a fraction of a second, Google Translate can make sense of your surroundings.

只需幾分之一秒，谷歌翻譯就能讓你瞭解周圍的環境。
But this isn't the same Google Translate from the early 2000s.

但現在的谷歌翻譯已經不是 2000 年代初的那個谷歌翻譯了。
Over the past two decades, the technology has gone through a complete overhaul, shifting from a basic pattern matching tool to a sophisticated neural network that handles more than 130 languages.

在過去的二十年裡，這項技術經歷了徹底的改革，從一個基本的模式匹配工具轉變為一個複雜的神經網絡，可以處理 130 多種語言。
It works by turning language into something computers can understand:

它的工作原理是將語言轉化為計算機能夠理解的內容。
Math.

數學
Exciting times for people who like language and math.

對於喜歡語言和數學的人來說，這是一個激動人心的時代。
This is the tech behind Google Translate.

這就是谷歌翻譯背後的技術。
There's very little code left today from the early days of the phrase-based translation.

如今，基於短語翻譯的早期代碼已所剩無幾。
We have shut down and deleted almost all of it.

我們已經關閉並刪除了幾乎所有內容。
That Google Translate from two decades ago laid the foundation for what we use today.

二十年前的谷歌翻譯為我們今天的使用奠定了基礎。
When it launched in 2006, it worked by playing a matching game.

2006 年推出時，它的工作原理是玩配對遊戲。
First, the model looked at lots of examples of professional translations scraped from the internet.

首先，該模型研究了從互聯網上搜索到的大量專業翻譯範例。
Then, when users entered sentences for translation, the tool would break them into the longest possible chunks of words it had seen before and combine the chunks.

然後，當用戶輸入要翻譯的句子時，該工具會將句子分解成它以前見過的儘可能長的詞塊，並將這些詞塊組合起來。
It now uses a much more sophisticated machine learning approach, a so-called transformer model, which is the building block of all modern AI.

現在，它採用了一種更復雜的機器學習方法，即所謂的轉換器模型，這是所有現代人工智能的基礎。
Transformers turn language into math by assigning numbers to words.

變形金剛通過為單詞賦予數字，將語言轉化為數學。
The key insight is that a series of numbers can represent a meaning.

關鍵的啟示在於，一串數字可以代表一種意義。
You can then do math with those vectors that shows something about the relationships of the meanings of words to each other.

然後，您就可以用這些向量進行數學運算，從而得出詞義之間的關係。
For each language Google Translate supports, every word gets converted into a vector, which is written like a list of numbers.

對於谷歌翻譯支持的每種語言，每個單詞都會被轉換成一個向量，就像數字列表一樣。
This way, the computer can do math with them.

這樣，計算機就能對它們進行運算。
For instance, king minus man plus woman equals queen.

例如，國王減去男人再加上女人等於王后。
The specific numbers assigned to each word don't really matter, and they're different in different languages.

分配給每個單詞的具體數字並不重要，而且在不同的語言中也不盡相同。
But what matters is how each word relates to every other word.

但重要的是每個單詞與其他單詞之間的關係。
It's all based on machine learning from billions of examples.

這一切都基於從數十億個實例中進行的機器學習。
But most of the time you want to translate something, it's not just an individual word.

但大多數情況下，你要翻譯的不僅僅是一個單詞。
So the computer also has to figure out how words work together.

是以，計算機還必須弄清楚單詞是如何組合在一起的。
And this is where transformers, a breakthrough in machine learning, come in.

而這正是機器學習領域的一項突破--變壓器的作用所在。
The next generation of neural translation was called the transformer architecture.

下一代神經翻譯被稱為轉換器架構。
And this added a level, so it moved from representing the meaning of one word by a row of numbers to putting all the meanings of all the words into a table and doing math on that whole table.

這就增加了一個層次，從用一排數字來表示一個單詞的意思，變成把所有單詞的意思都放到一個表格裡，然後在整個表格上做數學運算。
And that enables you to do math that talks about not only the meaning of each word, but the importance of the relationships of the words to each other.

這樣，你在做數學運算時，不僅可以討論每個單詞的含義，還可以討論單詞之間關係的重要性。
Say you're trying to translate this Italian sign into English.

假設您要將這個意大利語標識翻譯成英語。
First, Google Translate would turn each word into a vector.

首先，谷歌翻譯會將每個單詞轉化為矢量。
And those vectors would be put into one giant table, or matrix.

這些向量將被放入一個巨大的表格或矩陣中。
Then the computer tries to figure out how each word interacts with every other word on this side.

然後，計算機會嘗試找出每個單詞與這一邊的其他單詞之間的相互作用。
Mathematically, this is basically a lot of multiplication.

從數學上講，這基本上就是大量的乘法運算。
The most important kind of magical step is laying them out in a matrix and doing what's called matrix multiplication.

最重要的神奇步驟是將它們排成矩陣，然後進行所謂的矩陣乘法。
And if you do enough of that, you can solve this problem.

如果你做得足夠多，你就能解決這個問題。
All this creates a new list of numbers.

所有這些都會產生一個新的數字列表。
This is what's called a context vector.

這就是所謂的上下文向量。
And it's something pretty special.

它是非常特別的。
This list of numbers actually represents what the sentence means.

這組數字實際上代表了句子的意思。
Not just the sum of all of its words.

而不僅僅是所有文字的總和。
At least, if the model has done its job correctly.

至少，如果模型的工作是正確的。
If you put that together and are very clever, which the people who invented transformers were, and you train on a lot of data, which we do, you can eventually get to a collection of numbers that meaningfully represents the meaning of the sentence.

如果你把這些數據組合在一起，並且非常聰明（發明變壓器的人就是這樣），再對大量數據進行訓練（我們也是這樣做的），最終就能得到一組能夠有意義地表達句子含義的數字。
So that's called the encoder stage.

這就是所謂的編碼器階段。
Then you have a decoder, which, roughly speaking, is the encoder in reverse.

然後是解碼器，粗略地說，解碼器就是編碼器的反向。
The computer has to decode this back to human language.

計算機必須將其解碼為人類語言。
The decoder now also goes through lots and lots of operations.

解碼器現在也要進行大量的操作。
And finally, you start getting vectors out which can be mapped back to individual words.

最後，你開始得到向量，這些向量可以映射回單個單詞。
So we hopefully get closed, four, then holiday.

是以，我們希望先關門，再放假。
So this is how language becomes math.

語言就是這樣變成數學的。
Getting this math to work requires a lot of training.

要讓這門數學發揮作用，需要大量的訓練。
Lots of the numbers in this math problem are chosen randomly and then refined as the computer learns from billions of examples.

這道數學題中的很多數字都是隨機選擇的，然後通過計算機從數十億個示例中學習，不斷完善。
Before deploying an update with a set of values and weights, engineers run numerous tests with their AI evaluator and then professional human translators who check accuracy.

在使用一組值和權重部署更新之前，工程師會使用人工智能評估器進行多次測試，然後由專業人工翻譯人員檢查準確性。
But since every possible combination of words leads to a unique equation, it's impossible to test everything.

但是，由於每一個可能的詞語組合都會導致一個獨特的等式，是以不可能對所有內容進行測試。
Since the model has trained on translations going to or from English, it often requires more steps to go between two non-English languages.

由於該模型是根據英語之間的翻譯進行訓練的，是以在兩種非英語語言之間轉換時往往需要更多的步驟。
For example, if you want to translate something in Japanese to Zulu, it will go from Japanese to English and then English to Zulu.

例如，如果您想將日語翻譯成祖魯語，它會先將日語翻譯成英語，然後再將英語翻譯成祖魯語。
The first thing that happens when you use Google AR Translate is that we have to actually extract the text from the image.

使用谷歌 AR 翻譯的第一件事是，我們必須從影像中提取文字。
And so as you can see here, it detects that now this is Chinese and it translates it to English.

正如你在這裡看到的，它檢測到這是中文，然後將其翻譯成英文。
It makes information a lot more accessible because for many people typing script in a foreign language is not an option.

它使人們更容易獲取信息，因為對許多人來說，用外語打字並不是一種選擇。
The key component is a technology called Optical Character Recognition, or OCR.

其關鍵部件是一種名為光學字符識別（或 OCR）的技術。
Google has been using that since 2002, when it started digitizing libraries for Google Books.

谷歌自 2002 年開始為谷歌圖書對圖書館進行數字化處理以來，就一直在使用這種方法。
Initially, it would do something very simple like pattern matching.

最初，它會做一些非常簡單的事情，比如模式匹配。
So you can think of it as, is this the same as this?

所以，你可以認為，這和這個一樣嗎？
Yes, so it's an A or B or whatnot.

是的，所以是 A 或 B 什麼的。
But now, Optical Character Recognition also uses transformers.

但現在，光學字符識別也使用了變壓器。
First, Google Lens identifies lines of text and text direction.

首先，Google Lens 可以識別文本行和文本方向。
Then it determines specific characters and words.

然後確定具體的字元和單詞。
Instead of dividing the sentence into words and assigning numbers to each word, though, it divides an image into patches of pixels.

不過，它不是將句子分割成單詞併為每個單詞分配數字，而是將影像分割成像素片段。
These are called tokens.

這些被稱為代幣。
The encoder of the transformer is going to process all of these tokens simultaneously to predict the best character and the best word eventually.

變換器的編碼器將同時處理所有這些標記，最終預測出最佳字元和最佳單詞。
This means that Google Lens, the company's visual search tool, can often read things even when it can't make out every single letter.

這意味著，谷歌公司的視覺搜索工具 Google Lens 即使不能讀出每一個字母，也能讀出很多東西。
With transformers, they're able to pick up on grammar.

有了變壓器，他們就能聽懂文法。
If there is a spelling mistake, the transformer will also be able to use the context to disambiguate and still extract the right word.

如果出現拼寫錯誤，轉換器也能利用上下文進行消歧，並仍然提取出正確的單詞。
After it completes Optical Character Recognition, Google Lens analyzes the layout of all the text.

完成光學字符識別後、Google Lens 會分析所有文本的佈局。
That's how a computer would know to translate this sign as "you matter, don't give up," rather than "you don't matter, give up."

這樣，電腦就知道把這個符號翻譯成 "你很重要，不要放棄"，而不是 "你不重要，放棄吧"。
When you look at the newspaper, humans are excellent at just glancing at it and understanding what is the reading order, what should you read first.

當你看報紙時，人類很擅長只看一眼，就能明白閱讀順序是什麼，你應該先讀什麼。
This is a concept that isn't actually easy to solve technically.

這個概念在技術上其實並不容易解決。
It's very hard.

這很難。
The key is for Optical Character Recognition to understand something about the meaning of what it's reading.

關鍵是要讓光學字符識別系統瞭解所讀內容的含義。
This is also done through extensive training.

這也是通過廣泛培訓實現的。
After the chunks of text are sent to the translator,

在將文本塊發送給譯員之後、
Google Lens uses painting models to erase the text off different signs or backgrounds.

Google Lens 使用繪畫模型來擦除不同標誌或背景上的文字。
That way, translated text can be placed on top of clean surfaces.

這樣，翻譯好的文本就可以放在乾淨的表面上。
Using generative models, it tries to predict and create pixels that match the surrounding pixels so that when we overlay the translated text, it looks very natural and seamless.

它使用生成模型，嘗試預測和創建與周圍像素相匹配的像素，這樣當我們疊加翻譯文本時，看起來就會非常自然和無縫。
This doesn't always work seamlessly.

這並不總能做到天衣無縫。
This one is not picking up the first line. I'm not sure why.

這個人沒有接第一行。我不知道為什麼。
Some translations don't fully account for context, which is why alto on this Mexican stop sign might be mistranslated to high.

有些翻譯沒有充分考慮上下文，是以這個墨西哥站牌上的 alto 可能被誤譯為 high。
And while Optical Character Recognition can frequently identify text in bad lighting or with complicated perspective, it has its limits.

雖然光學字符識別技術可以經常在光線不好或透視複雜的情況下識別文字，但它也有其侷限性。
One of them is with deformable objects.

其中之一就是可變形物體。
Whenever there is text on like a sweater or cookie wrapper, depending on the pose and the angle, it might be more challenging and difficult to extract the right OCR.

每當有文字出現在毛衣或餅乾包裝紙上時，根據不同的姿勢和角度，提取正確的 OCR 可能會更具挑戰性和難度。
Well-formed, grammatically correct, fluent text we're quite good at.

格式規範、文法正確、文字流暢是我們的強項。
Where we have challenges is people using slang, using casual speech in chat and social media.

我們面臨的挑戰是人們在哈拉和社交媒體中使用俚語、隨意說話。
We don't necessarily see as much of that because we don't have access to as much data.

我們不一定能看到那麼多，因為我們無法獲得那麼多數據。
Google is working to add some more features, like letting users refine their translations if they want to, similar to how you can ask Google Gemini or ChatGPT to make translation more or less formal,

谷歌正在努力增加更多的功能，比如讓用戶可以根據自己的需要完善翻譯，就像你可以要求谷歌雙子座或哈拉 GPT 使翻譯更正式或不那麼正式，或者用智利西班牙語而不是歐洲西班牙語進行翻譯一樣。
or in Chilean Spanish rather than European Spanish.
And it's also working to add more languages.

它還在努力增加更多語言。
There are an estimated 6,000 to 7,000 languages in the world.

世界上估計有 6000 到 7000 種語言。
Our goal is to support all of them.

我們的目標是支持他們所有人。

谷歌翻譯如何利用數學理解 134 種語言 | WSJ Tech Behind (How Google Translate Uses Math to Understand 134 Languages | WSJ Tech Behind)