Placeholder Image

字幕列表 影片播放

  • [WHISTLE] Hello, and welcome to a coding challenge, Tic, Tac,

  • Toe.

  • Oh, with the Minimax algorithm.

  • That's why I'm here, because I made this other coding

  • challenge, Tic-Tac-Toe, where I created a kind of a big--

  • it was kind of a mess if I'm being perfectly honest.

  • But I made a working version of the Tic-Tac-Toe game

  • that just played with two players picking random spots.

  • So since then, you can check one of my live streams

  • if you want to find where I did this.

  • I made some adjustments to it so that I could, as a human being,

  • play the game.

  • So right now I'm going to play the random computer picker.

  • I'm going to go here.

  • And then I'm going to block X. And then I'm going to go here.

  • And ha, ha, I win.

  • O wins.

  • So what I have the adjustment that I

  • made is that I added a mouse pressed function

  • where I find where did I click?

  • And I put my human variable, which

  • is the letter o onto the board.

  • And then I called next turn where next turn

  • picks a random spot in the board and makes

  • that the AI spot or X's spot.

  • So the whole point of this video is

  • for me to implement something called

  • Minimax, which is an algorithm, a search algorithm, if you

  • will, to find the optimal next move for the AI.

  • I mean, it could find it for myself.

  • And then I could implement it.

  • But the idea is I want this player, the computer player

  • to beat me at the game or at least

  • tie to play a perfect game.

  • If you want to learn more about the Minimax algorithm,

  • I would suggest two resources that you

  • can look at that I actually looked at before beginning

  • this coding challenge.

  • I haven't programmed this before.

  • But I watched this video by Sebastian Lague that

  • explains the Minimax algorithm, also something called

  • alpha-beta pruning, which I'm not going to implement

  • but could be a good exercise, next step for you.

  • And I also found this article on the geeksforgeeks.org website,

  • which is three-part series about the Minimax algorithm

  • and how to apply it to Tic-Tac-Toe.

  • So those resources are probably your best bet

  • for learning about the theory of the Minimax algorithm

  • and how it can be applied to a wide variety of scenarios.

  • But I'm going to look at the specifics of Tic-Tac-Toe, which

  • is a great starting example, because it's

  • a simple enough game that I can actually diagram out

  • all of the possibilities, which even if that's computationally

  • possible, it's very hard to diagram.

  • And there's lots of scenarios where you couldn't even compute

  • all of the possible outcomes.

  • And chess is an example of that.

  • So Minimax is typically visualized as a tree.

  • So the root of the tree is the current state

  • of the game board.

  • [MUSIC PLAYING]

  • So I've drawn a configuration here of the Tic-Tac-Toe board

  • midway through the game because I

  • don't want to start at the beginning

  • where there's so many possibilities

  • that I couldn't diagram.

  • So right now it's X's turn.

  • And X has three possible moves.

  • So I could express that in a tree diagram as move one,

  • move two, and move three.

  • So let's take a look at what I mean by that.

  • [MUSIC PLAYING]

  • I'm going to use a blue marker so you can see the new moves.

  • So let's say one possible move is X going here.

  • Let me diagram out the next two possible moves.

  • [MUSIC PLAYING]

  • Another possible move is X going here.

  • [MUSIC PLAYING]

  • And another possible move is X going here.

  • So we need to look at each of these

  • and determine is the game over, or is it continuing?

  • So only in this case is the game over.

  • And in this case, the game is over, and X has one.

  • So I'm going to mark this green for the state of X

  • having won the game.

  • Now, it's O's turn.

  • And for each of these spots, O could

  • make two possible options.

  • [MUSIC PLAYING]

  • O could go here, or O could go here.

  • Oh could go here, or O could go here.

  • And look at these.

  • In this case and this case, O has won.

  • Remember, we're trying to solve for the optimal move

  • for X. X is making the first turn.

  • So this means I can mark these were O wins as red, as

  • like, those are bad outcomes.

  • So while these are not terminal states,

  • there's only one move possible for X. So I could draw an arrow

  • and put that down here.

  • But ultimately I can just consider this

  • as X has to go here.

  • So these, you could consider as terminal and in which case

  • X will win.

  • Now, you see that I visualized all the possible outcomes

  • of the Tic-Tac-Toe game based on this starting configuration.

  • So how does X pick which path to go down

  • where we can see here that the goal is X to win the game

  • to get to here, here, or there.

  • How does it do that?

  • And the way that it does that--

  • by the way, spoiler alert.

  • This is the move it should pick.

  • It should just win instantly.

  • But the idea of the Minimax algorithm

  • is if we could score every possible outcome,

  • those scores can ripple back up the tree

  • and help X decide which way to go to find the highest

  • possible score.

  • The nice thing about Tic-Tac-Toe is

  • there's not a range of scores.

  • The endpoints are either you won, you lost, or you tied.

  • I could say if X wins, it's a score of plus 1 .

  • If O wins, it's a score of negative 1.

  • And if it's a tie, it's a score of 0.

  • So in other words, we can give this a score of plus 1.

  • This a score of plus 1 .

  • This a score of negative 1.

  • This a score of negative 1, and this a score of plus 1.

  • But we've got an issue.

  • I'm trying to pick between this option, this option,

  • and this option.

  • These two don't have a score.

  • How do I pick their score?

  • Well, I can pick their score based on which one of these

  • would be picked.

  • But who's picking these?

  • This is why the algorithm is called Minimax,

  • because we have two players each trying

  • to make an optimal score.

  • X is what's known as the maximizing player.

  • So to go from here to here, it's X's turn.

  • And X is maximizing.

  • But here, it's now O's turn.

  • O wants to win.

  • X can assume that O is going to play their optimal move.

  • Now, you could have a situation where your player doesn't

  • play optimally.

  • And that's something you might have to consider

  • in terms of game theory.

  • But in this case of the algorithm,

  • we can assume that O is going to make the best possible move.

  • I mean, if it makes a worse move, that's only better for us

  • anyway so it's all going to work out.

  • So O is what's referred to as the minimizing player--

  • the minimizing player.

  • So O has the option of picking negative 1 or plus 1,

  • negative 1 or plus 1.

  • Which one is O going to pick as the minimizing player?

  • The option to win so we can then take the best outcome for O,

  • the minimizing outcome for O and pass that back up

  • here's the score and the same thing here.

  • Now, X is the maximizing player.

  • Which one of these is it going to pick--

  • negative 1, plus 1, or negative 1.

  • These are negative 1, because even

  • though it could win here, if X goes this way,

  • O is going to win.

  • If X goes this way, O is going to win if O plays optimally.

  • So this is the way to go.

  • This scenario that I picked might not

  • be the best demonstration of the idea.

  • In fact, there's no ties here.

  • There's only two turns being played.

  • So maybe what you might want to do is pause this video

  • and do the same exact diagram but have

  • X and O with two starting positions

  • and see if you can diagram that whole thing out yourself.

  • So how do we implement this in code?

  • So that your first clue should be the fact

  • that this is a tree diagram.

  • Anytime you see a tree diagram, typically

  • speaking, a recursive algorithm, a function

  • that calls itself is something that will apply.

  • And this case is no different.

  • I want to call Minimax on this board position,

  • try, and then loop through all the possible moves

  • and call Minimax on each one of these board positions, loop

  • through all the possible moves.

  • If I ever get to an end point, return the score,

  • and have that score backtrack up or ripple back up the chain.

  • So let's see if we can write the code to do this.

  • So this here is the place where I want to write the new code.

  • Currently, a move is chosen by picking

  • a random available spot, and I no longer want to do that.

  • I'm actually going to change the name of this function

  • to Best Move.

  • And I don't need to worry about this, what's available.

  • I just want to actually look at all the possible spots.

  • So basically, I just want to know is the spot available?

  • If the spot is available, I want to try going there.

  • So I'm going to say board--

  • IJ is the AI player.

  • And then I want to call Minimax on the board.

  • Why do I want to call Minimax?

  • Because I want to get what is the score?

  • I want Minimax with return to me the score

  • for this particular move.

  • Now, it's going to have to recursively check

  • these two spots.

  • But we'll get there.

  • We're getting there.

  • Ultimately, though, I need to figure out

  • which one was the best.

  • So I need to keep track by saying,

  • like the best score is negative infinity.

  • If score is greater than the best score,

  • then that's the new best score.

  • And the best move is this particular IJ.

  • And let me declare best move.

  • So, again, I haven't implemented Minimax yet.

  • I'm just getting the basic idea down.

  • For the furthest turn, I need to start

  • by looking at all the possible moves, get the score for those

  • by calling this mysterious Minimax

  • function I'm about to write, and find the best one,

  • then apply that move.

  • There's a big issue here, though,

  • because I am changing the global variable aboard.

  • I am altering the game.

  • I am moving X to that spot and then moving X to the next spot.

  • So I can make a copy of the board.

  • But I think an easy way to deal with this would actually just

  • be to undo it.

  • So right afterwards, I'm going to just quickly undo that move.

  • The next step would be to write the Minimax function that truly

  • write the algorithm itself.

  • I kind of want to know to see if there's any errors here.

  • So actually, what I'm going to do

  • is I'm just going to put a placeholder function in,

  • Minimax, which receives a board.

  • And then I'm just going to return the score of 1.

  • So Minimax is always going to return the score of 1,

  • meaning it's not going to be able to pick.

  • They're all going to be a tie.

  • So it still is going to pick the first one.

  • But let's see how that goes.

  • Next turn is not defined because I called it best move.

  • And it's in one more place here in Setup.

  • OK, where?

  • It went there.

  • Watch, it's going to go to the next available spot,

  • to the next available spot, to the next available spot.

  • So you can see this is working.

  • It's working in a sense that the algorithm is making a choice.

  • But no matter what, it's always just going

  • to go in order of the board and oh, X won.

  • There's also a big issue here, which thankfully no

  • major problems happened.

  • But I named this function Best Move.

  • And then I created a variable inside of the function

  • with the same name, which can always run into trouble.

  • So let me actually just change this to move.

  • Sort of by definition, it is the best move--

  • make sure this still works.

  • All right.

  • X is just picking straight down.

  • Now, the hard part.

  • I need to write the Minimax function itself.

  • So one thing that I've skipped here,

  • which is typical of a Minimax algorithm,

  • is I would also keep track of the depth,

  • because ultimately, I could want to consider like,

  • how long is it taking me to get somewhere?

  • What is the depth of the tree that I'm on?

  • So that's something that you'll see often within the algorithm

  • implementation.

  • So let me add that.

  • So I add an argument depth here and then give it a 0 at first.

  • Oh, you know what.

  • I need another really important argument.

  • This is a turn-based game.

  • And the algorithm is going to behave differently,

  • whether it is the maximizing player's turn, X,

  • or the minimizing players turn, O, in this case.

  • And, again, you could have more than two players.

  • Or there could be a lot of weird stuff going on.

  • But that's ultimately what I need here.

  • So let me also add is maximizing--

  • is maximizing.

  • And this here would be true.

  • All right, so I want to perform Minimax

  • on this board with a starting depth to 0.

  • And the first move is the maximizing player.

  • So let's add maximizing.

  • What do I want to do?

  • Actually, what I want to do even before this is

  • check to see if somebody won.

  • [MUSIC PLAYING]

  • I'm pretty sure I already have a check winner function.

  • That's something that I wrote in the previous coding challenge.

  • Let me double-check that--

  • check winner.

  • The winner is null.

  • And then by the end, it returns a tie.

  • It returns null, a tie, or the winner.

  • So the score is--

  • let's make a little lookup table that's basically

  • exactly what I wrote here.

  • If X wins, it's a 0.

  • If O wins, it's a a negative 1.

  • If it's a tie, it's a 0.

  • [MUSIC PLAYING]

  • All right, so this is my lookup table for what the score then--

  • and, again, it's, a very simple scenario.

  • There's only three possible scores.

  • If the result is not equal to null,

  • the score is the number associated

  • with this particular O. It got rid of those.

  • I didn't need those quotes.

  • Visual Studio Code, just clean that up for me,

  • thank you very much.

  • The score is based on whoever won,

  • and then I can return that.

  • So this would be the terminal condition.

  • If I am calling Minimax on this particular board

  • configuration at this particular depth, and it's an end state,

  • if it's a terminal state, just return the score.

  • That's what it's supposed to do.

  • [RINGING]

  • If it's not this state, I can't just return the score.

  • I need to check all the possible next moves.

  • So I got this wrong actually.

  • The next move, this is the AI player

  • trying a spot, their move.

  • So the next move is actually not the maximizing player.

  • This should be false.

  • But I'm going to write this as if it is.

  • So if it's the maximizing player,

  • I can copy, paste that from above,

  • check all the possible spots again.

  • If it's available, try.

  • Now, that's the human, right?

  • Then, oh, no, this is the AI.

  • I'm confused, because clearly the way

  • I'm stepping through this, in my mind it's O's turn.

  • But I'm ready the code for both, whether it's

  • X's turn or O's turn.

  • So for the sake of argument, it's X's turn right now.

  • So I'm checking all of those spots.

  • And then I'm going to say return.

  • I need to find the best move.

  • I'm kind of doing when I did again.

  • Is there a way for me to reduce the amount of code I'm writing?

  • I'm not going to worry about this.

  • I want to, once again, find the best

  • score, which will be in this case, negative infinity.

  • And I want to say the score is the Minimax algorithm

  • of the board at depth plus 1.

  • And now the next player's turn is false

  • because I'm the maximizing player.

  • And then I'm going to undo this just like I did before.

  • Why do I have to write this code twice?

  • I'll think about it later if I can refactor this,

  • only do it once.

  • If score is greater than best score,

  • best score is equal to score.

  • And then after all of this--

  • for loop, for loop, if, it's an empty spot,

  • boom, boom, return the best score.

  • So this is finding the best score

  • for all the possible next turns by the AI player.

  • Or if it's not the AI player, if it's the minimizing player,

  • we can do exactly the same thing.

  • And, again, maybe there's a way.

  • I'm missing a curly bracket?

  • Yes, thank you.

  • There's a curly bracket.

  • So this is very important.

  • There's a lot of ifs and for state for loops here.

  • So maybe there's also a nice way to refactor this and make

  • it more readable.

  • But if it's the maximizing player, check all the spots,

  • find the best possible outcome, and return it.

  • But call Minimax recursively at the next future move.

  • And then here, the minimizing player

  • wants to find the best score for it, which is the lowest score.

  • So and that's the human player.

  • And if the score is less than the best score and then

  • return that best score.

  • Well, oh, forget about this return one.

  • I'm going to stare at this to make sure

  • all my curly brackets line up--

  • if, if, for, for, and then return the best score.

  • Otherwise, if, if, for, for, and return the best score.

  • I think this might actually be good.

  • Should we just try it?

  • [DRUM ROLL]

  • Error on line 33, return-- hold on.

  • Oh, this should be-- ah, look at this.

  • So this is a huge mistake here.

  • Return true-- why did I have that there?

  • I don't know why I did that.

  • That's return score.

  • This should be a return score.

  • The whole point of this-- and I don't even

  • need a separate variable here.

  • I can just say, return scores result. Right,

  • so this is any kind of recursive function

  • needs a terminal condition, needs to exit, right?

  • This function is always going to keep calling itself and calling

  • itself Minimax--

  • Minimax-- Minimax--

  • Minimax.

  • I don't know what this extra code is in here and Minimax,

  • OK.

  • Oh, yes.

  • Simon is pointing out something to me,

  • which is great, which I don't need this IF statement to find

  • the best score--

  • the whole point of this mini.

  • And also, there's a great chance we

  • use the min function and the max function because it's

  • the Minimax algorithm.

  • So I can actually say score is the lower one.

  • The minimum between score and best score

  • is going to make it way easier to read.

  • Thank you.

  • And oh, no, but this one is max--

  • is max.

  • And then this one-- oh, thank you for this.

  • I don't know why I didn't think of this.

  • I'm sure it's in every implementation

  • score is the minimum between score and best score.

  • So this makes it much easier to read here, OK.

  • Let's try it.

  • Let's try it.

  • Why not?

  • Caution to the wind.

  • [DRUM ROLL]

  • OK, X went in the top left.

  • That's definitely the optimal place to go.

  • I'm going to go here.

  • [DRUM ROLL]

  • No, no bad X--

  • bad X. You're not going in the right place.

  • [MUSIC PLAYING]

  • Yeah, I beat you.

  • You're still going in order.

  • Why are you still going in order?

  • Oh, whoa, you're not going in order.

  • You're making weird decisions.

  • Oh, oh, oh.

  • I see the error.

  • I see the error of my ways.

  • This has to be true, right?

  • After the maximizing player's turn,

  • it's the minimizing player's turn.

  • After the minimizing player's turn,

  • it's the maximizing player's turn, OK.

  • [DRUM ROLL]

  • That's a good move, X. I see what you did there.

  • [DRUM ROLL]

  • [MUSIC PLAYING]

  • Why can't you figure out to go there, X?

  • Oh, oh.

  • [BUZZ]

  • OK.

  • This, I'm finding this.

  • Score is the new score.

  • And the best score should be the bigger one,

  • the maximum between score and best score, not score.

  • This has to be this, because I'm returning best score.

  • And this has to be this.

  • OK, OK.

  • [WHISTLING]

  • Let's see.

  • I really should not continue to play this drum sound.

  • [DRUM ROLL]

  • Come on, X-- figure out where to go.

  • OK, I'm going to go here.

  • Oh, shit, come on.

  • Oh, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no,

  • no--

  • [BUZZ]

  • That's in the wrong place.

  • It's in the wrong part.

  • I knew.

  • Look, my brackets-- bad brackets.

  • I need to return the best score, hello,

  • after I checked all of the possibilities,

  • meaning both for loops, the I and the J I've completed.

  • I got that right up here but not down here.

  • Oh, OK.

  • I really think we've got it this time.

  • Tympani sound.

  • [DRUM ROLL]

  • Ah, yes.

  • [DRUM ROLL]

  • [MUSIC PLAYING]

  • Ha, ha, tied.

  • All right.

  • All right, you.

  • We're going to go with the competition now.

  • I'm going let you in by going here.

  • Ah.

  • So if I don't go in, if I go in a corner,

  • yet, if I don't go to the middle,

  • it's always going to win.

  • So X is always going to win, unless I

  • go to the middle in which there's always going to be

  • a tie unless I make a mistake.

  • Then, it's going to win.

  • But if I go to any other spot, X is

  • going to be able to win because it's always going to--

  • [LAUGHS] OK, this work.

  • Hey, man.

  • So that's the Minimax algorithm.

  • [RING]

  • So one thing that might be interesting

  • for me to try here really quickly before I sort

  • of finish off this challenge would just be the comment

  • out the AI going first.

  • So I know this is technically correct because X

  • is supposed to go first.

  • But let's see what happens if I start with a fresh broad

  • and I go first.

  • So I'm going to go here.

  • I'm going to go here.

  • I'm going to here.

  • Oh.

  • I'm going to go here.

  • Oh, that's high.

  • Ooh, and something became undefined.

  • Oh, if I go last, I guess there's no move for X.

  • And I've got an error.

  • So that's [INAUDIBLE].

  • Can I beat--

  • No, I lost.

  • I don't believe there's any way for me to beat the AI player.

  • It's a solved game because it's always

  • going to go to the center.

  • So I could try going here.

  • I could go here, go here.

  • The best I could do is a tie.

  • So you've watched this coding challenge.

  • Maybe you have a creative idea about how

  • to make your own version of this.

  • But I just quickly whipped up thanks to the chat

  • in the Livestream.

  • Also, some suggestions of things you could try.

  • So first of all, what happens if you just

  • make two AI players that just play against each other

  • over and over again?

  • Well, in that case, it's going to be a tie every single time.

  • It's a solved game.

  • But that might lead to some interesting possibilities.

  • One thing that I'm not using is using the depth in the score.

  • So, for example, if I go back to the code,

  • and I change these to, say, 10 and negative 10,

  • I could account for the depth, meaning

  • I could have a higher score if I win more quickly than if it

  • takes longer to win.

  • Right now, I'm not accounting for that.

  • So where in the code would you subtract out depth?

  • That's something you could try.

  • Certainly it would be interesting to make a larger

  • board.

  • So can you play Tic-Tac-Toe in a 5 by 5 or a 7 by 7?

  • The winning conditions have changed.

  • The optimal play would change.

  • That would be exciting to try.

  • I hope somebody does that.

  • You could try more players.

  • Like, how would you have a Tic-Tac-Toe game

  • with three players?

  • That's something you try with a larger board.

  • You could try another game.

  • Maybe Connect Four would work.

  • Might be able to apply Minimax too.

  • Just thinking about the interface, animation.

  • Like, right now whenever I go, the next turn

  • happens immediately.

  • You could think about timing and that sort of thing.

  • But then I would say high degree of difficulty.

  • And maybe worth me coming back to

  • in a future video at some point would be these two last items.

  • One is known as alpha beta pruning.

  • Alpha beta pruning refers to an aspect of the Minimax algorithm

  • where you find an optimal path.

  • And you know that all the other possibilities

  • are going to be worse.

  • And you don't need to go forth and check every possibility.

  • So you could research how that works

  • and add it to this algorithm.

  • It's explained in the Sebastian Lague video.

  • Then there's a possibility of a game where the number of moves

  • is so vast you couldn't possibly compute the entire tree.

  • Chess is the classic example of that.

  • So you need some heuristic or a way

  • of an estimated guess of what the score could

  • be with any given state.

  • So one way of doing this, which I'm not saying

  • is a good strategy.

  • But a simple thing you could do with chess

  • is you could add up the total number of white pieces

  • versus black pieces.

  • And you could say, like, well, the score

  • is higher if the maximizing player has

  • more pieces than the opponent.

  • So that's one way of approaching it.

  • So you don't actually stop at the end of the tree.

  • But you just go some predetermined depth

  • and then calculate an estimate and have that ripple back up

  • the tree.

  • So maybe give that a try with a more complex game.

  • Certainly, other types of AI algorithms and neural networks

  • could play a role at some point with how

  • you make that estimate.

  • But ultimately, that is something

  • that would be interesting to try as well.

  • One more idea that just came from the chat.

  • So this is quite similar to the idea of a larger board.

  • But instead of just a larger two-dimensional board,

  • you could create a three-dimensional board.

  • So Tic-Tac-Toe, that happens in a cube.

  • I should just do that as a coding challenge separately

  • myself.

  • So hopefully, you've learned something.

  • And maybe this algorithm looked a little terrifying to you

  • at one point, now seems quite accessible and doable.

  • Maybe your creative idea is not on this list.

  • Please make your own version of this.

  • Share it with me at the codingtrain.com.

  • They'll be a link in this video's description

  • with instructions for how to share it.

  • If you don't know how to share it, just write a comment,

  • file an issue.

  • We, the coding train community are here to help you.

  • [WHISTLE] And I can't wait to see you in a future video.

  • Goodbye.

  • [MUSIC PLAYING]

[WHISTLE] Hello, and welcome to a coding challenge, Tic, Tac,

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

編碼挑戰賽154:嘀嘀打車人工智能與Minimax算法 (Coding Challenge 154: Tic Tac Toe AI with Minimax Algorithm)

  • 0 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字