OpenAI 推出會算數學又會講話的全新模型「GPT-4o」！實測結果真的這麼好用嗎？！ (NEW GPT-4o: My Mind is Blown.)

字幕列表影片播放

已審核字幕已審核

What's up? Josh here.

大家好嗎？我是 Josh 。
So in case you missed it, OpenAI has just announced ChatGPT-4o, which is their brand new flagship model that is two times faster and more capable than GPT-4.

如果你錯過了，OpenAI 剛發佈了 ChatGPT-4o，這是他們全新的旗艦模型，比 GPT-4 快兩倍，功能也更強大。
And good news for all of us is going to be free to use.

對我們所有人來說，好消息是可以免費使用了。
Now, GPT-4 was previously a $20-a-month subscription, but now with 4o being completely free, we also get the benefits of everything that we got with GPT-4.

以前，GPT-4 每月的訂閱費為 20 美元，但現在 4o 完全免費，我們還能享受到 GPT-4 帶來的所有好處。
There's Vision, where you can upload images and ask it questions about those images.

有 Vision (視覺) 功能，你可以上傳圖片，並向它提出有關圖片的問題。
There's also Browse, where it can scrub the internet for more real-time and up-to-date data.

還有 Browse (瀏覽) 功能，它可以在網上搜索更多即時和最新的數據。
There's also Memory, where it can actually remember facts about you.

還有 Memory (記憶) 功能，它可以記住關於你的事實。
And then lastly, there's Analyzing Complex Data. So you can actually give it like an Excel spreadsheet and ask it questions about that.

最後是 Analyzing Complex Data (分析複雜數據) 功能。所以，你可以給它一個 Excel 表格，然後向它提出相關問題。
So all of those features are going to be coming to 4o in the next couple of weeks.

所有這些功能都將在接下來的幾周內登陸 4o 。
But yeah, first of all, let's just start with everything that's going to be new with GPT-4o.

但首先，讓我們從 GPT-4o 的所有新功能開始。
So in the presentation, the most impressive part was obviously the demo. So they did a bunch of stuff.

在演示中，最令人印象深刻的部分顯然是示範。所以他們做了很多事。
They asked it all kinds of questions, gave it math equations, and asked it to read bedtime stories.

他們問它各種問題，請它算數學公式，還讓它讀睡前故事。
And for the most part, I think the intelligence level and like the answers it's giving is pretty similar to the current GPT-4, which is why I don't think they updated the name to GPT-5.

在大多數情況下，我認為它的智能程度和給出的答案與當前的 GPT-4 非常相似，這也是我認為他們沒有將名稱更新為 GPT-5 的原因。
But surprisingly, the biggest updates of 4o actually come in the voice feature.

但令人驚訝的是，4o 最大的更新其實是語音功能。
Hey, ChatGPT, how are you doing?

嘿，ChatGPT，你好嗎？
I'm doing fantastic. Thanks for asking. How about you?

我很好。謝謝你的關心。你呢？
Pretty good.

相當不錯。
What's up?

怎麼了？
So my friend Barrett here, he's been having trouble sleeping lately.

我的朋友 Barrett 最近失眠了，
And I want you to tell him a bedtime story about robots and love.

我要你給他講一個關於機器人和愛的睡前故事。
Oh, a bedtime story about robots and love. I got you covered.

哦，一個關於機器人和愛情的睡前故事。我為你準備好了。
So now we have response times as quick as 232 milliseconds and with an average of 320 milliseconds, which is sort of the average human response rate of a conversation.

現在的回應時間最快可達 232 毫秒，平均為 320 毫秒，這相當於人類對話的平均回應速度。
You can also now just interrupt the conversation simply by speaking, which I think is pretty intuitive.

你現在還可以通過說話來打斷對話，我認為這非常直觀。
They even put this disclaimer on the website that all of their videos are played at one time speed because previously there was such a delay that now it just seems like such a drastic improvement.

他們甚至在網站上發佈了免責聲明，稱他們的所有影片都是以一個速度播放的，因為以前會有這樣的延遲，現在看起來就像是一個巨大的進步。
So yeah, clearly some very impressive stuff here that they're able to pull off just milliseconds for a response time.

他們能在毫秒級的響應時間內完成任務，顯然是非常了不起的。
And you know what I was thinking, the humane AI pin really would have benefited from GPT-4o with its faster response times because it was largely flamed online for how slow it took to respond.

你知道我在想什麼，人性化的 AI pin 確實會受益於 GPT-4o 更快的響應時間，因為它在網上因響應速度太慢而受到廣泛批評。
And it was running on GPT-4, which was much slower.

而且它是在 GPT-4 上運行的，速度要慢得多。
Who designed the Washington Monument?

華盛頓紀念碑是誰設計的？
But yeah, that is the first thing that I noticed is the speed.

但我首先注意到的就是速度。
But the second thing you might've picked up on already is the emotion behind the voice.

不過你可能已經注意到的第二點是聲音背後的情感。
How are you?

你好嗎？
I'm doing well. Thanks for asking. How about you?

我很好。謝謝關心，你呢？
Hey, ChatGPT. How are you doing?

嘿，ChatGPT。你好嗎？
I'm doing fantastic. Thanks for asking. How about you?

我很好。謝謝你的關心。你呢？
Me? The announcement is about me? Well, color me intrigued.

我？這個公告是關於我嗎？我很好奇。
Are you about to reveal something about AI?

你準備透露有關 AI 的資訊嗎？
So it seems like OpenAI has really just dialed up the expressiveness and just the overall energy of this assistant, which I'm not sure how I feel about.

OpenAI 似乎真的提高了這個助手的表現力和整體活力，我不知道自己對此有何感想。
It just feels like you're talking to a friend who is just overly caffeinated and overly energized all of the time,

這種感覺就像你在和一個總是咖啡因攝入過多、精力過剩的朋友哈拉，
which I think for an assistant should just honestly be a little bit more straightforward and straight up.

而我認為作為助理，老實說應該更直截了當一些。
Hopefully in the future, we can have the option to customize the voice.

希望將來我們可以選擇自定義語音。
I think that would be a smart move.

我認為這是明智之舉。
But also, you can ask it to change its tone.

但你也可以要求它改變語氣。
So in the demo, they asked it to be a little bit more dramatic when reading a bedtime story.

在示範中，他們要求它在讀睡前故事時更戲劇化一些。
And they also asked it to read it in a robotic voice.

他們還要求它用機器人的聲音讀出來。
I really want maximal emotion, like maximal expressiveness, much more than you were doing before.

我想要最大限度的情感，比如最大限度的表現力，比你之前做的要多得多。
Understood. Let's amplify the drama.

明白。讓我們放大戲劇效果。
Once upon a time in a world not too different from ours.

很久很久以前，在一個與我們的世界不太一樣的世界裡。
Initiating dramatic robotic voice.

啟動戲劇性的機器人聲音。
And then also apparently the robot can sing, which I'll let you be the judge of that.

此外，這個機器人顯然還會唱歌，這個我就不多說了。
(ChatGPT-4o singing)

(ChatGPT-4o 唱歌)
There's also a new feature that is sort of a subset of Vision, which is being able to take your camera and just point it at something and asking it questions about that in real time.

還有一個新功能是視覺功能的子集，它可以拿起你的相機，將其指向某個物體，並即時詢問有關該物體的問題。
Sort of like this beta test of giving the AI eyes.

有點像給 AI 配上眼睛的測試版。
What do you see?

你看到了什麼？
Oh, I see "I love ChatGPT." That's so sweet of you.

哦，我看到「我喜歡 ChatGPT」。你人真好。
Now, as if all of that wasn't enough, they also announced a brand new desktop app where you can do all of those same things like text input, speech input, as well as upload images.

似乎這一切還不夠，他們還宣佈推出一個全新的桌面應用程式，在該應用程式中，你可以進行所有這些相同的操作，如文字輸入、語音輸入以及上傳圖片。
But also on top of that, you can also screen share.

除此之外，你還可以進行螢幕共享。
So you can have it sort of just look at your screen and whatever you're looking at, you can ask it questions.

你可以讓它看著你的螢幕，不管你在看什麼，你都可以向它提問。
Now, I think this is going to be a huge productivity feature for anybody who works on their computer a lot.

我認為對於經常使用電腦的人來說，這將是一項巨大的生產力功能。
In the demo, they sort of showed how it could analyze a graph that you're looking at.

在示範中，他們展示如何分析你正在查看的圖表。
But also I think it would be really helpful for research purposes.

不過，我認為這對研究工作也很有幫助。
And just, I don't know, there's just so many use cases where I'm on the computer and it would be nice to almost have a conversational, like, assistant or someone to bounce ideas off of.

只是，我也不知道，我在電腦上有太多的使用場景，如果能有一個類似對話的助手，或者有人能幫我出謀劃策，那該多好。
I think that would be really helpful.

我認為這真的很有幫助。
All right, sure it can see our screen.

好的，它我可以看到我們的螢幕。
Can you find which one is the hypotenuse?

你能找出哪個是斜邊嗎？
Oh, okay. I see.

哦，好，我明白了。
So I think the hypotenuse is this really long side from A to B.

所以我認為斜邊就是從 A 到 B 的這條很長的邊。
Would that be correct?

這樣對嗎？
Exactly. Well done.

沒錯。幹得好。
Now, just to quickly touch on what the "o" in 4o actually really is pointing to.

現在，我只想簡單談談 4o 中的「o」究竟是什麼意思，它並不是指它無所不知或無所不能，
It's not pointing to so much the fact that it's omniscient or omnipotent, but rather the fact that it is taking your multimodal inputs, which is text, speech, and now vision, all into the same neural network.

而是指它將你的多模態輸入（即文字、語音和現在的視覺）全部納入同一個神經網絡。
Whereas before, it was processing those separately.

而之前是單獨處理這些數據。
So before with the voice feature on 3.5 and 4, it would actually take your voice and transcribe it into text.

在使用 3.5 和 4 的語音功能之前，它實際上是將你的語音轉錄成文字。
And so that's how it was recognizing your input, which basically strips a lot of information from that LLM.

這就是它識別你的輸入的方式，這基本上從 LLM 中剝離了許多資訊。
So all of your emotion and the tone that would be captured in an audio format is now just boiled down into text.

在音頻格式中可以捕捉到的所有情感和語氣，現在都變成了文字。
So you can think of it like texting a friend versus calling a friend.

所以，你可以把它想象成給朋友發短信和給朋友打電話。
So now with the new Omni model, it is sort of taking all of those things into consideration with their response.

現在有了新的 Omni 模型，他們在響應時會考慮到所有這些因素。
But yeah, that is the latest update with OpenAI.

沒錯，這就是 OpenAI 的最新更新。
Clearly some very impressive stuff cooking under the hood.

顯然，非常令人印象深刻的事情正在計畫當中。
I'm curious to see what Google is going to come out with tomorrow.

我很想知道谷歌明天會推出什麼產品。
So definitely get subscribed for that.

所以一定要訂閱。
And that video is already out.

那部影片已經發布了。
It's probably on the screen somewhere.

可能就在螢幕的某個地方看到。
Hope you enjoyed the video.

希望你們喜歡這部影片。
I'll catch you guys in the next one.

下一支影片見。
Peace.

掰。