Name: Google 最強 AI 「Gemini」海放 ChatGPT？(Google's Gemini just made GPT-4 look like a baby’s toy?)
Uploaded: 2023-12-12T21:01:30.000Z
Duration: 4 min 41 s

Google got obliterated by Microsoft's blitzkrieg attack in the great AI war of 2023.

在 2023 年人工智慧大戰中，Google 被微軟的閃電戰徹底消滅。

程度副詞

GPT-4 captured the zeitgeist of the artificial intelligence age we just entered.

GPT-4 抓住了我們剛剛進入的人工智慧時代的時代精神。

And things got so bad for Google that people unironically started using Bing.

谷歌的情況變得如此糟糕，人們諷刺地開始使用 Bing。

But the war is just getting started and just yesterday, Google unleashed its highly anticipated Gemini model that beats GPT-4 on nearly every benchmark.

但戰爭才剛開始，就在昨天，Google 發布了備受期待的 Gemini 模型，該模型幾乎在所有基準測試上都擊敗了 GPT-4。

It is December 7th 2023, and you are watching the Code Report.

現在是 2023 年 12 月 7 日，你正在觀看 Code Report。

Gemini first became known to the public earlier this year at Google IO when Sundar explained it like this.

Gemini 在今年稍早的 Google IO 上首次為大眾所知，當時 Sundar 是這樣解釋的。

You've been applying AI; to make AI; rigorously tested; AI; AI.

你一直在應用人工智慧； 製造人工智慧； 經過嚴格測試； 人工智慧; 人工智慧。

Gemini is a multimodal large language model that will replace LaMDA and PaLM2 like GPT-4.

Gemini 是一個多模態大語言模型，將像 GPT-4 一樣取代 LaMDA 和 PaLM2。

It's multimodal which means it's not only trained on text but also sound, images and video.

它是多模式的，這意味著它不僅接受文字訓練，還接受聲音、圖像和影片的訓練。

谷歌的示範影片簡直太瘋狂了。

It can recognize what's going on in a video feed and respond in real time.

它可以識別影片中發生的情況並即時回應。

Like this guy draws a duck, then the AI tells him it's a duck.

就像這個人畫了一隻鴨子，然後人工智慧告訴他這是一隻鴨子。

Like holy fuck, and it can do that in multiple languages.

老天，而且他還可以支援多種語言。

What's really crazy though is that it can keep track of things in an ongoing video feed.

但真正瘋狂的是，它可以追蹤正在進行的影片來源中的內容。

Like it plays the game of find the ball under the cup and even after the cups are scrambled up, it still knows where the ball is.

例如它玩在杯子下面找球的遊戲一樣，即使杯子被打亂，它仍然知道球在哪裡。

And it can even do connect the dots, which makes my five-year-old obsolete.

它甚至可以把這些點連結起來，這讓我五歲的孩子變得過時了。

It also does multimodal outputs like it can generate images on the fly like Stable Diffusion and can even generate music based on a prompt.

它還可以進行多模式輸出，例如可以像穩定擴散一樣動態生成影像，甚至可以根據提示生成音樂。

And not just text to audio but image to audio.

不僅僅是文字轉聲音，還有圖像轉聲音。

這是一種「任何型態轉任何型態」的模式。

It's also good at logic and spatial reasoning.

Using these two pictures, it's able to tell you which car will go faster based on the aerodynamics of the vehicle.

使用這兩張圖片，它可以根據車輛的空氣動力學告訴你哪輛車會跑得更快。

In the future, a civil engineer will be able to just take a picture of some land, then the AI can instantly generate some blueprints for a bridge.

未來，土木工程師只需拍攝一些土地的照片，然後 AI 就可以立即產生一些橋樑的藍圖。

So software engineers aren't the only type of engineers becoming obsolete.

因此，軟體工程師並不是唯一被淘汰的工程師類型。

Although I do of course have some more bad news for programmers.

雖然我也會為程式設計師來帶來一些壞消息。

Google also unveiled AlphaCode 2, which performs better than 90% of competitive programmers.

谷歌也推出了 AlphaCode 2，它的表現優於 90% 的競爭性程式設計師。

And we're talking about programmer solving highly complex abstract problems like you might find on Codeforces competitions.

我們談論的是程式設計師解決高度複雜的抽象問題，就像你在 Codeforces 競賽中可能會發現的那樣。

Like any good programmer, AlphaCode 2 can break down problems into smaller problems using techniques like dynamic programming.

與任何優秀的程式設計師一樣，AlphaCode 2 可以使用動態程式設計等技術將問題分解為更小的問題。

Now all these demos look really amazing at first glance, but is this all just a marketing sleight of hand from Google?

所有這些示範乍看之下確實令人驚嘆，但這只是谷歌的行銷花招嗎？

Well, currently, Gemini comes in three sizes: tall, grande and venti.

目前，Gemini 有三種尺寸：中杯、大杯和特大杯。

The smallest version is designed to be embedded on devices like Android phones.

最小的版本專為嵌入安卓手機等設備而設計。

While the Pro version is your more general purpose model.

While Ultra is like the Magnum XL of the Gemini family and the one that's blowing everybody's minds.

而 Ultra 就像是 Gemini 家族中的巨無霸 XL，讓每個人都大開眼界。

If you're in the United States, you can actually use Gemini right now in the Bard chatbot.

如果你在美國，現在實際上可以在 Bard 聊天機器人中使用 Gemini。

However, it's using Gemini Pro, the midrange version.

然而，它使用的是 Gemini Pro，中階版本。

Bard is way better than it was six months ago and it's still extremely fast,

Bard 比六個月前好用多了，而且速度仍然非常快，

but after using it for a few minutes, it's pretty obvious that it's not quite as good as GPT-4 Pro.

但使用幾分鐘後，很明顯它不如 GPT-4 Pro。

但 GPT-4 對 Gemini Ultra 感到緊張。

When I asked about it, it started throwing mad shade at itself and then before it finished, Sam Altman pulled a plug, giving me this network error.

當我詢問它時，它開始向自己投擲瘋狂的陰影，然後在完成之前，山姆·奧爾特曼拔掉了插頭，給了我這個網路錯誤。

When it comes to benchmarks, Gemini Pro underforms GPT-4 in most situations, but Gemini Ultra outperforms it on almost every single category.

在基準測試方面，Gemini Pro 在大多數情況下都低於 GPT-4，但 Gemini Ultra 在幾乎每個類別上都優於 GPT-4。

Most notably, it's the first model ever to outperform human experts on massive multitask language understanding, which is typically a multiple-choice test over a wide array of subjects.

最值得注意的是，它是第一個在大規模多任務語言理解方面超越人類專家的模型，這通常是針對廣泛主題的多項選擇測試。

What's hella surprising though is that Gemini Ultra underperforms GPT-4 on the HellaWwag benchmark,

但令人驚訝的是，Gemini Ultra 在 HellaWwag 基準測試中的表現低於 GPT-4，

it's designed to evaluate common sense natural language by having the AI finish a sentence that's often vague and ambiguous.

它旨在透過讓人工智慧完成通常模糊且不明確的句子來評估常識性自然語言。

For example, a man watches a Fireship video and afterwards feels blank.

例如，一個人觀看了 Fireship 影片，然後感到＿＿。

It's a job that's really easy for humans to do and a very important benchmark, because When an AI can't do this well, it doesn't feel very humanlike.

這是一項對人類來說非常容易完成的工作，也是一個非常重要的基準，因為當人工智慧不能很好地完成這項工作時，它會讓人感覺不太像人類。

In GPT-4, I can write a vague prompt filled with typos, and somehow it almost always seems to know what I'm talking about.

在 GPT-4 中，我可以寫出充滿拼字錯誤的模糊指令，不知怎的，它似乎總是知道我在說什麼。

The fact that GPT-4 is doing so much better on HellaSwag is hella concerning to say the least.

事實上，GPT-4 在 HellaSwag 上的表現要好得多，至少可以說是令人擔憂的。

But another interesting thing to know from the technical paper is how they train this beast.

但從技術論文中可以了解到的另一件有趣的事情是他們如何訓練這頭野獸。

They use their newly unveiled version 5 Tensor Processing Units, which are deployed in SuperPODs of 4096 chips.

他們使用新推出的版本 5 張量處理器，這些部署在 4096 個晶片的 SuperPOD 中。

Each SuperPOD has a dedicated optical switch which allows data to transfer quickly between the pods to train in parallel.

每個 SuperPOD 都有一個專用的光開關，可讓資料在 Pod 之間快速傳輸以並行訓練。

Then they can dynamically reconfigure into 3D tours topologies.

然後，它們可以動態地重新配置為 3D 遊覽拓撲。

In other words, they can shape shift into donuts to reduce the latency between ships.

換句話說，它們可以將轉變塑造成甜甜圈，以減少傳輸之間的延遲。

And the scale of Gemini Ultra is so large that they had to communicate between multiple data centers.

而且 Gemini Ultra 的規模太大了，需要在多個資料中心之間進行通訊。

The paper also describes the training data set which basically includes everything you can find on the internet, including web pages and YouTube videos as well as scientific papers and books.

該論文還描述了訓練資料集，其中基本上包括在網路上可以找到的所有內容，包括網頁和 YouTube 影片以及科學論文和書籍。

They filter for quality, then use reinforcement learning through human feedback to fine tune the quality and avoid hallucinations.

他們過濾品質，然後透過人類反饋使用強化學習來微調質量並避免幻覺。

Overall, Gemini looks amazing on paper but prepare to be disappointed.

總的來說，Gemini 在紙上看起來很棒，但做好失望的準備。

The Nano and Pro models will be available on Google Cloud on December 13th,

Nano 和 Pro 型號將於 12 月 13 日在 Google Cloud 上提供，

but the Gemini Ultra Pro Max won't be available until next year until additional safety tests are done and it reaches 100% on the Hella woke benchmark.

但 Gemini Ultra Pro Max 要到明年才能上市，除非完成額外的安全測試，並且在 Hella 喚醒基準上達到 100%。

Thanks for watching and I will see you in the next one.

字幕列表影片播放

Google 最強 AI 「Gemini」海放 ChatGPT？(Google's Gemini just made GPT-4 look like a baby’s toy?)

obvious

extremely

insane

multiple