Wecanlookthatitthoughtthatwhat I wouldplayhereinsteadwasitthoughtthat I wouldtakethosetwostonesonthebottomtoBlackwoodrecapturewiththree.
Andnowthisis a realgameago.
Ifyou'reifyou'reinterested, I definitely, youknow, learnmoreaboutthegameago.
Youcancheckouttherulesortutorialsonline.
I canimaginesittingdownwith a couplechessexpertsandbuildingthat, uh, valuefunctionthattellsmehowgoodaboardit's.
Andthat's exactlywhattheydidwiththeBlue.
That's exactlyhowIBMbuiltDeepBlueistheyworkedwith a numberofgrandmasterstosay, Howdoyouevaluatethisandwhytheyextractedgeneralizedrulesandtheyusethosegeneralizedrules.
TheyjustcodifiedthemessentiallyhardcodedthoserulesandusethattoguidethisexhaustivealphabetofSearcherMinMaxsearchwherethere's a coupleofnuancesandhowthatsearchesperformed.
Sodoesit's alwaysexplorethebestmoves, ordoesitexplodespacein a differentway?
YouknowtheideabehindMonteCarloTreesearchestobalanceexplorationandexploitationin a waythatdoesn't lockitinto a singlepathin a waythat a NAFTAbetasearchmight, orthewaythatchessengineshasthehistoricallysearchedboardpositions.
So I knowmanygoispoweredby a CNNorcongressionalneuralnetwork, and I knowtheseairsimilartotheCNN's wewouldusetoclassifyimages, catsanddogs, right?
Soit's ah, residualnetworksimilartoresidentorah, whateveryourpreferredexampleiswhich, uh, has a numberofsharedlayersofconvolutionlayerswithyourstandard.
There's a wholestackofthesesharedlayers, whichatthebottomofthatstackwiththetop, whichwhichyouwriteitfansoutinto a policyhead, whichgivesit a heatmapofthemostlikelymovestobeplayed.
And a Valuehead, whichjustspitsout a singlenumberindicatingwhothinksisgoingtowin.
That's a littlebitdifferentthanwhatweseeintheupperrighthandcorner.
Sothelossfunctionthatwe'retryingtominimizehereisthedifferencebetweenthepolicyoutputandthisvisitheatmapandthethat's thinkingtha S o Thelostfunctionisthedifferencebetweenthepolicyheatmapandtheactualvisitcountsaftersearchandthesomeofthatthatareaofthecrossentropyairthereandthemeansquaredairbetweenthevalueestimateandtheactualresultofthegame.