Placeholder Image

字幕列表 影片播放

  • everyone.

  • My name's Josh, and I'm here today with Andrew Jackson, who is the author of Mini Go, which is an open source.

  • Unofficial implementation of alphago zero.

  • So, Andrew, thank you very much for coming on the show.

  • Josh, Thanks for having great to have you.

  • I'm having a lot of fun for me too.

  • So to kick things off, could you tell me a little bit about me?

  • Go.

  • So many go is a open source implementation of alphago zero.

  • That is a ground up rewrite.

  • I not affiliated with the Deepmind team.

  • I'm a go enthusiast who also works for Google.

  • Um who had the opportunity to write a implementation based on their published papers and haven't looked back?

  • This is sort of, ah labor of love for me.

  • So Andrew, you have an amazing demo of many go.

  • And I was hoping we could jump right into that.

  • Sure.

  • Explain to me what's on the screen.

  • Definitely, definitely.

  • So what we're looking at right here is we're looking at meaning go.

  • Playing itself right now, it is playing both sides of the same game, and it is trying to figure out what the best sequence of moves are for both players.

  • What you're seeing here, if you haven't looked at the game ago before, is you're looking at these numbered stones that we see here are indicating the order of moves that that would be played.

  • So you see White one than black to than White three than Black four.

  • And then, as white plays moves, then they switch.

  • So there's black one and white two switches again.

  • But if you were actually playing a game of go, it would look like this.

  • No, no, the sport is a nine by nine.

  • That's right.

  • This is a smaller version of the game ago usually used for beginners or people just learning how to play.

  • In this case, it's almost a little bit easier to see everything that's going on, and it definitely will play many games.

  • As we sit here and chat.

  • Also kind of a nice benefit.

  • I can hear your laptop fan wearing away.

  • Oh yeah, it's cranking out a lot of extra heat, for sure, but this is what you would see if you were playing.

  • So I toggle this button over here to make it such that I could play one side of the other.

  • What we see up here in the upper right hand corner is the graph of who it thinks is goingto win.

  • So it thinks this is a very close game, swung slightly back and forth.

  • But it's thinks that black maybe has a slight edge on this game.

  • As I'm playing white, I could play a move.

  • That's a polite way to say this.

  • Not very good.

  • Yes, I could play a really bad move, and as I do that, we will see a blunder.

  • Is the Weatherly Blunt?

  • Well, you Nay, let's just say less good.

  • It doesn't actually completely tank yet, so let's play another.

  • Another move.

  • That's probably not the optimal moving this situation, and we'll see as it should.

  • Yet Wow, wow doesn't much care for my movie.

  • Yeah, Black jumps to 95% chance that it's confidence is gonna win the entire game.

  • That's his confidence that black is going right.

  • So based on that one mistake, you basically basically from the game.

  • The Value Network is rather opinionated in this, a stage in its training, it has a fairly good idea who's going to win or lose.

  • You may have noticed also that this large mass of white stones has disappeared.

  • There were some other opportunities, maybe for white to have, uh, created some complications here.

  • I haven't really been looking at the game that close, really?

  • But my guess is that actually weakens, Weaken, scroll back through what it thought was going to happen.

  • We can look that it thought that what I would play here instead was it thought that I would take those two stones on the bottom to Blackwood recapture with three.

  • And now this is a real game ago.

  • If you're if you're interested, I definitely, you know, learn more about the game ago.

  • You can check out the rules or tutorials online.

  • I can imagine sitting down with a couple chess experts and building that, uh, value function that tells me how good aboard it's.

  • And that's exactly what they did with the Blue.

  • That's exactly how IBM built Deep Blue is they worked with a number of grandmasters to say, How do you evaluate this and why they extracted generalized rules and they use those generalized rules.

  • They just codified them essentially hard coded those rules and use that to guide this exhaustive alphabet of Searcher Min Max search where there's a couple of nuances and how that searches performed.

  • But by and large it always assumes that people take the best move available to them.

  • Um, and tends not to look back like tends not to re, uh, reevaluate other moves if it's already found what it thinks is the best men.

  • This is very different from how Alphago works, which is using something called Monte Carlo Tree Search Monte Carlo Tree Search.

  • There's a little bit of confusion because in the early two thousands, there was a breakthrough and go Aye, aye, where they did Monte Carlo rollouts where you would essentially, when you reached the depth of your search, he would very rapidly play the game out to the end 100,000 times.

  • And those Monte Carlo rollouts would be used to describe the value of the positions very rapidly.

  • Philip, the game, like to complete novices were playing just randomly picking leave legal moves and I, and one side winds slightly more than the other side.

  • Then I could guess that this is a good approximation of the value.

  • Go is a game of perfect information, just like chess.

  • So there is, in theory, a perfect set of moves that could be followed such that the person who plays first could always win, just like tic tac toe Connect four.

  • It is a game of perfect information.

  • In practice, though, there are so many possible permutations that it is not feasible to explore all of them.

  • There's probably more possible go games, right, R.

  • Adams.

  • And that is, that is, uh, that's a thing that has been said before we could roll back.

  • This is this is how the game starts is it thinks that this 12 move sequence is what's gonna happen.

  • But if we go 12 moves in, we can see that that isn't actually the sequence that occurs while it's it's trying to look as far into the future is it can when the future actually arrives.

  • It may be a little bit different than than we intended, which is getting a little metaphysical for what is otherwise a fairly concrete task here.

  • But hey, that's good.

  • So I see multiple windows on screen.

  • That's right.

  • So one of the three small boards, right?

  • So what we're looking at, I'm gonna go ahead and right now Black is going to wear.

  • White is going to resign, and it's going to start another one here as it realized that black is basically a sure thing to win this game.

  • What we're looking at is we are looking at on the main board.

  • We saw the principal variation, which we were talking about white, one black tube, it said.

  • It said a, um what we see in the upper right him are the uppermost board is each of the leaves of the search prior to being evaluated by the neural network.

  • Underneath me, Nico the middle board is a history is a heat map of the visits, so the darker squares are being explored more often.

  • So this is a quick way to see which moves it's considering.

  • Um, and then the bottom.

  • Most Board is one of the one of the stranger views of the game that is the change in the value of the board based on where they play.

  • So, for instance, when it's blacks turn, if black doesn't play at this point in the middle, what changed?

  • The whole board will suddenly looked better for white, whereas when it's whites turn, if it doesn't play a critical point.

  • The rest of the board suddenly becomes very good for black.

  • It's also cool.

  • It has this heat map showing which moves are likely to be good.

  • So does it's always explore the best moves, or does it explode space in a different way?

  • You know the idea behind Monte Carlo Tree searches to balance exploration and exploitation in a way that doesn't lock it into a single path in a way that a NAFTA beta search might, or the way that chess engines has the historically searched board positions.

  • So I know many go is powered by a CNN or congressional neural network, and I know these air similar to the CNN's we would use to classify images, cats and dogs, right?

  • So it's ah, residual network similar to resident or ah, whatever your preferred example is which, uh, has a number of shared layers of convolution layers with your standard.

  • Ray Lou, huh?

  • You want to call those fresh relations activations, your standard rail you activations, followed by another con volitional layer, and then the skip connection.

  • There's a whole stack of these shared layers, which at the bottom of that stack with the top, which which you write it fans out into a policy head, which gives it a heat map of the most likely moves to be played.

  • And a Value head, which just spits out a single number indicating who thinks is going to win.

  • That's a little bit different than what we see in the upper right hand corner.

  • This is actually the average of that value number over all of its searches.

  • So this is not just the output of the of the Value Head.

  • That's the average of all of its searching.

  • Um, so that's a little bit different, but it's is ultimately made up of those outputs of that value.

  • Similarly, the policy end diverges from the original estimate of the policy heat map based on the results of its search.

  • So the policy heat map says we'll start looking here.

  • But the Value Network may say, OK, that thing that we thought was a really good move actually doesn't work at all.

  • Let's look at that much less so.

  • This actually is a really nice segue into how many go learns, which is it's entirely reinforcement driven, reinforcement, learning driven, right?

  • Yeah, where this heat map end forms the target for the next version of the policy head.

  • So the loss function that we're trying to minimize here is the difference between the policy output and this visit heat map and the that's thinking tha S o The lost function is the difference between the policy heat map and the actual visit counts after search and the some of that that area of the cross entropy air there and the mean squared air between the value estimate and the actual result of the game.

  • So each of the input examples to training is the policy output, the original policy output, the policy output revised by search, the value output and the actual outcome of the game.

  • So the difference between the moves that we started looking at and the moves we ended up looking at the most shapes the policy network towards the moves that search thought were best in this situation and the value loss shapes the value Network towards predicting better in the future.

  • Basically So, Andrew, it's been great chatting with you.

  • Thank you very much for coming on the show.

  • Thanks everyone for watching you can check out many go at good hub dot com slash tensorflow slash mini Go and you can learn more about hell.

  • It works and go itself Using the resource is that Andrew mentioned.

  • So thanks very much.

  • And I will see you guys next time send pull requests.

everyone.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

MiniGo。TensorFlow遇見安德魯-傑克遜(TensorFlow Meets) (MiniGo: TensorFlow Meets Andrew Jackson (TensorFlow Meets))

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字