Placeholder Image

字幕列表 影片播放

  • Dear Fellow Scholars, this is Two Minute Papers withroly Zsolnai-Fehér.

  • Hold on to your papers, because this work on AlphaGo is absolute insanity.

  • In the game of Go, the players put stones on a table where the objective is to surround

  • more territory than the opponent.

  • This is a beautiful game that is particularly interesting for AI research, because the space

  • of possible moves is vastly larger than in chess, which means that using any sort of

  • exhaustive search is out of question and we have to resort to smart algorithms that are

  • able to identify a small number of strong moves within this stupendously large search

  • space.

  • The first incarnation of DeepMind's Go AI, AlphaGo uses a combination of a policy network

  • that is responsible for predicting the moves, and a value network that predicts the winner

  • of the game after it plays it to the end against itself.

  • These are both deep neural networks and they are then combined with a technique called

  • Monte Carlo Tree Search to be able to narrow down the search in this this large search

  • space.

  • This algorithm started out with a bootstrapping process where it was shown thousands of games

  • that were used to learn the basics of Go.

  • Based on this, it is clear that such an algorithm can learn to be as good as formidable human

  • players.

  • But the big question was, how could it possibly become even better than the professionals

  • that it has observed?

  • How could the disciple become better than its master?

  • The solution is that after it has learned what it can from these games, it plays against

  • itself many-many times to improve its skills.

  • This second phase is the main part of the training that takes the most time.

  • Let's call this base algorithm AlphaGo Fan, which was used to play against Fan Hui a 2-dan

  • European Go champion, who was defeated 5 to 0.

  • This was a historic moment and the first time an AI beat a professional Go player without

  • a handicap.

  • Fan Hui described his experience as playing against a very strong and stable player and

  • he also mentioned that the algorithm felt very human-like.

  • Some voiced their doubts within the Go community and noted that the algorithm would never be

  • able to beat Lee Sedol, a 9-dan world champion, and winner of 18 international titles.

  • Just to give you an intuition of the difference, based on their Elo points, Lee Sedol is expected

  • to beat Fan Hui 97 times out of 100 games.

  • So a few months later, DeepMind organized a huge media event where they would challenge

  • him to play against AlphaGo.

  • This was a slightly modified version of the base algorithm that used a deeper neural network

  • with more layers and was trained using more resources than the previous version.

  • There was also an algorithmic change to the policy networks, the details on this are available

  • in the paper in the description, it is a great read, make sure to have a look.

  • Let's call this algorithm AlphaGo Lee.

  • This event was watched all around the world and can perhaps be compared to Kasparov's

  • public chess games against Deep Blue.

  • I have the fondest memories of waking up super early in the morning, jumping out of the bed

  • in excitement to watch all these Go matches.

  • And in a long and nailbiting series, Lee Sedol was defeated 4 to 1 by the AI.

  • With significantly less media attention, the next phase came bearing the name AlphaGo Master,

  • which used around ten times less tensor processing units than the AlphaGo Lee and became an even

  • stronger player.

  • This algorithm played against human professionals online in January 2017 and won all 60 matches

  • it had played.

  • This is insanity, but if you think that's it, well, hold on to your papers now.

  • In this newest work, AlphaGo has reached its next form, AlphaGo Zero.

  • This variant does not have access to any human played games in the first phase and learns

  • completely through self-play.

  • It starts out from absolutely nothing, with just the knowledge of the rules of the game.

  • It was trained for 40 days, and by day 3, it reached the level of AlphaGo Lee, this

  • is above World champion level.

  • Around day 21, it hits the level of AlphaGo Master, which is practically unbeatable to

  • all human beings.

  • And get this, at 40 days, this version surpasses all previous AlphaGo versions and defeats

  • the previously published worldbeater version 100-0.

  • This has kept me up for several nights now and I am completely out of words.

  • In this version, the two neural networks are fused into one, which can be trained more

  • efficiently.

  • It is beautiful to see these curves as they show this neural network starting from a random

  • initialization.

  • It knows the rules, but beyond that, it is completely clueless about the game itself,

  • and it rapidly becomes practically unbeatable.

  • And I left the best part for last - it uses only one single machine.

  • I think it is fair to say that is history unfolding before our eyes.

  • What a time to be alive!

  • Congratulations to the DeepMind team for this remarkable achievement.

  • And, for me, I love talking about research to a wider audience and it is a true privilege

  • to be able to tell these stories to you.

  • Thank you very much for your generous support on Patreon and making me able to spend more

  • and more time with what I love most.

  • Absolutely amazing.

  • And now, I know it's a bit redundant, but from muscle memory, I'll sign out the usual

  • way.

  • Thanks for watching and for your generous support, and I'll see you next time!

Dear Fellow Scholars, this is Two Minute Papers withroly Zsolnai-Fehér.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級 英國腔

新的DeepMind人工智能100-0擊敗AlphaGo|兩分鐘論文#201。 (New DeepMind AI Beats AlphaGo 100-0 | Two Minute Papers #201)

  • 87 2
    jigme.lee888 發佈於 2021 年 01 月 14 日
影片單字