字幕列表 影片播放
Yesterday, China released a state-of-the-art, free and open-source, chain-of-thought reasoning model with performance that rivals OpenAI's o1, which I'm stupidly paying $200 a month for right now.
昨天,中國發布了一個最先進的、免費開源的思維鏈推理模型,其性能可與 OpenAI 的 o1 相媲美,而我現在正傻乎乎地為它支付每月 200 美元的費用。
You see, there's two types of people in the tech world right now.
你看,現在科技界有兩種人。
In one camp, we have the pessimists, who think that AI is overhyped and plateaued with GPT-3.5.
其中一個陣營是悲觀主義者,他們認為 AI 能被過度誇大了,GPT-3.5 已經達到了高原期。
In the other camp, we have the optimists, who think we're about to see the emergence of an artificial superintelligence that will propel humanity into Ray Kurzweil's technological singularity.
另一個陣營是樂觀主義者,他們認為我們即將看到超級 AI 的出現,它將推動人類進入雷-庫茲韋爾(Ray Kurzweil)的技術奇點。
Nobody truly knows where things are going, but one thing to remember is that pessimists sound smart, while optimists make money.
沒有人真正知道事情會怎麼發展,但有一點要記住,悲觀主義者聽起來很聰明,而樂觀主義者卻能賺錢。
But sometimes, it's hard to be an AI optimist because you need to trust hype jedis like Sam Altman and closed AI companies like OpenAI.
但有時,你很難成為 AI 的樂觀主義者,因為你需要相信山姆-奧特曼這樣的炒作專家和 OpenAI 這樣的封閉式 AI 公司。
Well, luckily, on the same day that TikTok's ban was removed, China gave the world a gift in return in the form of DeepSeek R1.
幸運的是,在 TikTok 解除禁令的同一天,中國以 DeepSeek R1 的形式向世界回贈了一份禮物。
And in today's video, you'll learn exactly how to use it like a senior prompt engineer.
在今天的影片中,你將學到如何像高級指令工程師一樣準確地使用它。
It is January 21st, 2025, and you're watching The Code Report.
現在是 2025 年 1 月 21 日,你正在收看《代碼報告》。
Yesterday, the course of history changed forever.
昨天,歷史的軌跡永遠地改變了。
And no, I'm not talking about the return of the king, but rather the release of DeepSeek, which is an MIT licensed chain of thought model that you can use freely and commercially to make money in your own applications.
不,我說的不是 "王者歸來",而是 DeepSeek 的發佈。DeepSeek 是一種獲得麻省理工學院許可的思維鏈模型,你可以在自己的應用程式中自由使用,並通過商業途徑賺錢。
This model came out while Sam Altman was busy at Trump's inauguration, which is a perfect time to use this new meme template, where Zuckerberg appears to detect a rack overflow in this artificial binary code owned by Jeff Bezos.
這個模型出現時,山姆-奧特曼正在忙於川普的就職典禮,這正是使用這個新備忘錄模板的絕佳時機,祖克柏似乎在傑夫-貝索斯擁有的這個人工二進制代碼中檢測到了機架溢出。
He's going to have some explaining to do with his wife, but Sam Altman also rained on the AI optimist parade recently when he said that the AI hype was No, they have not achieved AGI internally.
不過,山姆-奧特曼最近也給 AI 的樂觀主義者們澆了一盆冷水,他說 AI 的炒作是不對的,他們內部還沒有實現 AGI。
And that's pretty obvious with how buggy ChatGPT is.
這一點從 ChatGPT 漏洞百出就能看出來。
Like recently, a security researcher figured out how to get ChatGPT to DDoS websites for you.
就像最近,一位安全研究員發現瞭如何讓 ChatGPT 為你的網站提供 DDoS。
All you have to do is provide it with a list of similar URLs that point to the same website, and it will crawl them all in parallel, which is something that no truly intelligent being would do.
你只需向它提供指向同一網站的類似 URL 列表,它就會並行抓取所有 URL,而這是任何真正的智能生物都不會做的事情。
That being said, the release of o1 a few months ago was another step forward in the AI race.
儘管如此,幾個月前發佈的 o1 在 AI 競賽中又向前邁進了一步。
But it didn't take long for open source to catch up, and that's what we have with DeepSeek R1.
但沒過多久,開放源代碼就迎頭趕上,這就是我們的 DeepSeek R1。
As you can see from its benchmarks, DeepSeek R1 is on par with OpenAI o1, and even exceeds it in some benchmarks like math and software engineering.
從基準測試中可以看出,DeepSeek R1 與 OpenAI o1 不相上下,在數學和軟體工程等某些基準測試中甚至超過了 OpenAI o1。
But let me remind you once again that you should never trust benchmarks.
但是,請允許我再次提醒大家,千萬不要相信基準。
Just recently, this company, Epic AI, which provides a popular math benchmark, only recently disclosed that they've been funded by OpenAI, which feels a bit like a conflict of interest.
就在最近,這家提供流行數學基準的公司--Epic AI,才剛剛披露他們得到了 OpenAI 的資助,這感覺有點像利益衝突。
I don't care about benchmarks anyway and just go off of vibes, so let's go ahead and try out DeepSeek R1 right now.
反正我也不關心基準測試,只是憑感覺,所以我們現在就來試試 DeepSeek R1。
And they have a web-based UI, but you can also use it in places like Hugging Face or download it locally with tools like Ollama.
他們有一個基於網絡的用戶界面,但你也可以在 Hugging Face 等地方使用它,或者通過 Ollama 等工具下載到在地。
And that's what I did for its 7 billion parameter model, which weighs about 4.7 gigabytes.
這就是我為其 70 億參數模型所做的工作,該模型重約 4.7 GB。
However, if you want to use it in its full glory, it'll take over 400 gigabytes and some pretty heavy duty hardware to run it with 671 billion parameters.
不過,如果你想使用它的全部功能,需要 400 多 GB 和相當重型的硬體才能運行它的 6710 億個參數。
But if you want something that's on par with o1 Mini, you want to go with 32 billion parameters.
但是,如果你想獲得與 o1 Mini 相當的性能,就需要使用 320 億個參數。
Now, one thing that makes DeepSeek different is that it doesn't use any supervised fine-tuning.
DeepSeek 的與眾不同之處在於,它不使用任何監督微調。
Instead, it uses direct reinforcement learning.
相反,它使用直接強化學習。
But what does that even mean?
但這究竟意味著什麼呢?
Well, normally, with supervised fine-tuning, you show the model a bunch of examples and explain how to solve them step by step, then evaluate the answers with another model or a human.
通常情況下,在有監督的微調中,你需要向模型展示一系列示例,並一步步解釋如何解決這些問題,然後用另一個模型或人類來評估答案。
But R1 doesn't do that and pulls itself up by its own bootstraps using direct or pure reinforcement learning, where you give the model a bunch of examples without showing it the solution first, then it tries a bunch of things on its own and learns or reinforces itself by eventually finding the right solution, just like a real human with reasoning capabilities.
但 R1 並不這樣做,它通過直接或純強化學習來自我提升,即在不向模型展示解決方案的情況下,先給它一堆例子,然後它自己嘗試一堆事情,通過最終找到正確的解決方案來學習或強化自己,就像真正具有推理能力的人類一樣。
DeepSeek also released a paper that describes the reinforcement learning algorithm.
DeepSeek 還發布了一篇介紹強化學習算法的論文。
It looks complicated, but basically, for each problem, the AI tries multiple times to generate answers, which are called outputs.
它看起來很複雜,但基本上,對於每個問題,AI 都會多次嘗試生成答案,這些答案被稱為輸出。
The answers are then grouped together and given a reward score, so the AI learns to adjust its approach for answers with a higher score.
然後,AI 會將這些答案分組,並給予獎勵分值,從而學會調整自己的方法,尋找分值更高的答案。
That's pretty cool, and we can see the model's actual chain of thought if we go ahead and prompt it here with Ollama.
這很酷,我們可以看到模型的實際思維鏈,如果我們繼續在奧拉瑪給予指令。
When prompting a chain of thought model like R1 or o1, you want to keep the prompt as concise and direct as possible, because unlike other models like GPT-4, the idea is that it does thinking on its own.
在給予 R1 或 o1 等思維鏈模型指令時,要儘量保持提示簡潔直接,因為與 GPT-4 等其他模型不同的是,它的理念是自己進行思考。
Like, if I ask it to solve a math problem, you'll notice that it first shows me all the thinking steps, and then after that thinking process is done, it'll show the actual solution.
比如,如果我讓它解一道數學題,你會發現它首先會向我顯示所有的思考步驟,然後在思考過程結束後,它會顯示實際的解決方案。
That's pretty cool, but you might be wondering when to use a chain of thought model instead of a regular large language model.
這很酷,但你可能想知道什麼時候該使用思維鏈模型,而不是普通的大型語言模型。
Well, basically, the chain of thought models are much better when it comes to complex problem solving, things like advanced math problems, puzzles, or coding problems that require detailed planning.
基本上,在解決複雜問題時,比如高級數學問題、謎題或需要詳細規劃的編碼問題時,思維鏈模型要好得多。
But if you want to build the future with AI, you need to learn it from the ground up, and you can do that today for free thanks to this video's sponsor, Brilliant.
但是,如果你想用 AI 打造未來,你就需要從頭開始學習,而今天你就可以免費學習,這要感謝本影片的贊助商 Brilliant。
Their platform provides interactive hands-on lessons that demystify the complexity of deep learning.
他們的平臺提供交互式實踐課程,揭開深度學習複雜性的神祕面紗。
With just a few minutes of effort each day, you can understand the math and computer science behind this seemingly magic technology.
每天只需幾分鐘的努力,你就能瞭解這項看似神奇的技術背後的數學和電腦科學。
I'd recommend starting with Python, then check out their full How Large Language Models Work course if you really want to look under the hood of ChatGPT.
我建議你先從 Python 開始學習,如果你真的想了解 ChatGPT 的原理,可以查看他們的大型語言模型運作原理課程。
Try everything Brilliant has to offer for free for 30 days by going to brilliant.org slash fireship or use the QR code on screen.
請訪問 brilliant.org slash fireship 或使用螢幕上的 QR code ,在 30 天內免費試用 Brilliant 的所有功能。
This has been The Code Report, thanks for watching, and I will see you in the next one.
The Code Report 到此結束,感謝收看,我們下期再見。