Placeholder Image

字幕列表 影片播放

  • LAURENCE MORONEY: All right.

  • Shall we get started?

  • So thanks, everybody, for coming to this session.

  • I'm going to be talking about TensorFlow

  • and particularly TensorFlow from a programmer's perspective--

  • so machine learning for programmers.

  • I'd like to show some code samples of using TensorFlow

  • in some simple scenarios as well as one slightly more

  • advanced scenario.

  • But before I do that, I always like to just do

  • a little bit of a level set.

  • And if you were at the previous session, sorry,

  • some of the content's going to be similar to what

  • you've seen already.

  • But when I like to think about AI,

  • and when I come to conferences like this one about AI,

  • or if I read the news about AI, there's

  • always stories about what it can do or what it might do,

  • but there's not a whole lot about what it actually is.

  • So part of my mandate and part of what I actually

  • like to educate people around is from a programmer's

  • perspective, what AI actually is, what it is for you,

  • what you can begin to learn how to program,

  • and then how you can apply it to your business scenarios.

  • But we're also at the cusp of this revolution

  • in this technology, and lots of people

  • are calling it like the fourth Industrial Revolution.

  • And for me, I can only describe it

  • as like it's the third big shift in my own personal career.

  • And so for me, the first one came in the early to mid-90s

  • when the web came about.

  • And if you remember when the web came about,

  • we were all desktop programmers.

  • I personally-- my first job was I

  • was a Visual Basic programmer programming

  • Windows applications.

  • Anybody ever do that?

  • It was fun, wasn't it?

  • And so then the web came around, and what happened with the web

  • then is it changed the audience of your application

  • from one person at a time to many people at a time.

  • You had to start thinking differently

  • about how you built your applications to be

  • able to scale it to lots of people using it.

  • And also, the runtime changed.

  • Instead of you being able to write something that

  • had complete control over the machine,

  • you would write something to this sort of virtual machine

  • that the browser gave you.

  • And then maybe that browser would have plugins like Java

  • and stuff like that you could use

  • to make it more intelligent.

  • But as a result, what ended up happening

  • was this paradigm shift gave birth to whole new industries.

  • And I work for a small company called Google.

  • Anybody heard of them?

  • And so things like Google weren't possible.

  • Anybody remember gophers?

  • Yeah, so that's really old school, right?

  • What gophers where were almost the opposite of a search

  • engine.

  • A search engine, like--

  • you type something into it, and it has already

  • found the results, and it gives them to you.

  • A gopher was this little application

  • that you would send out into the nascent internet,

  • and it would crawl everywhere, a little bit like a spider,

  • and then come back with results for you.

  • So for me, whoever had the great idea

  • to say, let's flip the axes on that

  • and come up with this new business paradigm,

  • ended up building the first search engines.

  • And as a result, companies like Google and Yahoo were born.

  • Ditto with things like Facebook--

  • that wouldn't have been possible with the browser.

  • Can you imagine trying to--

  • pre-internet, where there was no standard protocol

  • for communication, and you'd write desktop applications--

  • can imagine being able to build something

  • like a Facebook or a Twitter?

  • It just wasn't possible.

  • So that became-- to me, the web was

  • this first great tectonic shift in my own personal career.

  • The second one then came with the advent of the smartphone.

  • So now users had this device that they

  • can put in their pocket that's got

  • lots of computational power.

  • It's got memory.

  • It's got storage, and it's loaded

  • with sensors like cameras, and GPS, et cetera.

  • Now think about the types of applications

  • that you could build with that.

  • Now it's a case of companies like Uber became possible.

  • Now, I personally believe, by the way

  • that all of the applications are built

  • by introverts because you'll see all of these great things

  • that you can do nowadays is because they serve introverts.

  • I'm highly introverted, and one thing I hate to do

  • is stand on a street corner and hail a taxi.

  • So when Uber came along, it was like a dream come true for me

  • that I could just do something on my phone,

  • and a car would show up.

  • And now it's shopping.

  • It's the same kind of thing, right?

  • I personally really dislike going to a store

  • and having somebody say, can I help you?

  • Can I do something for you?

  • Can I help you find something?

  • I'm introverted.

  • I want to go find it myself, put my eyes down, and take it

  • to the cash register, and pay for it.

  • And now online shopping, it's done the same thing.

  • So I don't know why I went down that rabbit hole,

  • but it's just one that I find that the second tectonic shift

  • has been the advent of the mobile application

  • so that these new businesses, these new companies

  • became possible.

  • So the third one that I'm seeing now

  • is the AI and the machine learning revolution.

  • Now, there's so much hype around this,

  • so I like to draw a diagram of the hype cycle.

  • And so if you think about the hype cycle,

  • every hype cycle starts off with some kind

  • of technological trigger.

  • Now, with AI and machine learning,

  • that technological trigger really

  • happened a long time ago.

  • Machine learning has been something,

  • and AI has been something that's been in universities.

  • It's been in industry for quite some time--

  • decades.

  • So it's only relatively recently that the intersection

  • of compute power and data has made it possible so

  • that now everybody can jump on board-- not just university

  • researchers.

  • And with the power of things such as TensorFlow

  • that I'm going to show later, anybody with a laptop

  • can start building neural networks

  • where in the past neural networks

  • were reserved for the very best of universities.

  • So that technological trigger that's rebooted,

  • in many ways, the AI infrastructure,

  • has only happened in the last few years.

  • And with any hype cycle, what happens

  • is you end up with this peak of increased expectations

  • where everybody is thinking AI's going to be the be-all

  • and end-all, and will change the world,

  • and will change everything as we know it,

  • before it falls into the trough of disillusionment.

  • And then at some point, we get enlightenment,

  • and then we head up into the productivity.

  • So when you think about the web, when

  • you think about mobile phones and those revolutions

  • that I spoke about, they all went through this cycle,

  • and AI went through this cycle.

  • Now, you can ask 100 people where we are on this life cycle

  • right now, and you'd probably get 100 different answers.

  • But I'm going to give my answer that I

  • think we're right about here.

  • And when we start looking at the news cycle,

  • it kind of shows that.

  • We start looking at news.

  • We start looking at glossy marketing videos.

  • AI's is going to do this.

  • AI's was going to do that.

  • At the end of the day, AI isn't really doing any of that.

  • It's smart people building neural networks

  • with a new form of, a new metaphor for programming

  • have been the ones who've been able to do them,

  • have been able to build out these new scenarios.

  • So we're still heading up that curve

  • of increased expectations.

  • And at some point, we're probably

  • going to end up in the trough of disillusionment

  • before things will get real and you'll be able to really build

  • whatever the Uber or the Google of the AI generation's

  • going to be.

  • It may be somebody in this room will do that.

  • I don't know.

  • So at Google, we have this graph that we

  • draw that we train our internal engineers

  • and our internal folks around AI and around the hype around AI.

  • And we like to layer it in these three ways.

  • First of all, AI, from a high level,

  • is the ability to program a computer

  • to act like an intelligent human.

  • And how do you do that?

  • There might be some traditional coding in that,

  • but there may also be something called machine

  • learning in that.

  • And what machine learning is all about is instead of writing

  • code where it's all about how the human solves a problem,

  • how they think about a problem, and expressing that

  • in a language like Java, C#, or C++,

  • it's a case of you train a computer by getting it

  • to recognize patterns and then open up whole new scenarios

  • in that way.

  • I'm going to talk about that in a little bit more.

  • And then another part of that is deep

  • learning with the idea behind deep learning

  • is now machines being able to take over some of the role

  • that humans are taking in the machine learning phase.

  • And where machine learning is all about--

  • I'm going to, for example, show a slide next

  • about activity detection.

  • But in the case of activity detection,

  • instead of me explicitly programming a computer

  • to detect activities, I will train a computer

  • based on people doing those activities.

  • So let me think about it.

  • Let me describe it this way.

  • First of all, how many people in this room are coders?

  • Have written code?

  • Oh, wow, most of you.

  • OK, cool.

  • What languages, out of interest?

  • Just shout them out.

  • [INTERPOSING VOICES]

  • LAURENCE MORONEY: C#.

  • Thank you.

  • [INTERPOSING VOICES]

  • LAURENCE MORONEY: Python.

  • OK.

  • I've written about a bunch of books on C#.

  • I still love it.

  • I don't get to use it anymore, but it's nice to hear.

  • So I heard C#.

  • I heard Python.

  • C++?

  • OK, cool.

  • Now, what do all of these languages have in common?

  • Ruby?

  • Nice.

  • What do all of these languages have in common?

  • That you, as a developer, have to figure out

  • how to express a problem in that language, right?

  • So if you think about if you're building a problem, if you're

  • building an application for activity detection,

  • and say you want to detect an activity of somebody walking--

  • like I'm wearing a smartwatch right now.

  • I love it because since I started wearing smartwatches,

  • I became much more conscious of my own fitness.

  • And I think about how this smartwatch monitors my activity

  • that when I start running, I want

  • it to know that I'm running, so it logs that I'm running.

  • When I start walking, I want it to do

  • the same thing, and count calories,

  • and all that kind of stuff.

  • But if you think about it from a coding perspective, how would

  • you build a smartwatch like this one if you're a coder?

  • Now, you might, for example, be able to detect the speed

  • that the person's moving at, and you'd write

  • a little bit of code like this.

  • If speed is less than 4, then the person's walking.

  • That's kind of naive because if you're walking uphill,

  • you're probably going slower.

  • If you're walking downhill, you're going faster.

  • But I'll just keep it simple like that.

  • So in code, you have a problem, and you

  • have to express the problem in a way

  • that the computer understands that you can compile,

  • and then you build an application out of it.

  • So now I say, OK, what if I'm running?

  • If I'm running, well, I can probably go by the speed again.

  • And I say, hey, if my speed is less than a certain amount,

  • I'm walking.

  • Otherwise, I'm running.

  • I go, OK.

  • Now I've built an activity detector,

  • and it detects if I'm walking or if I'm running.

  • Pretty cool.

  • Pretty easy to do with code.

  • So I go, OK.

  • Now, my next scenario is biking.

  • And I go, OK.

  • If I'm going based on the speed, the data of my speed,

  • I can do a similar thing.

  • If I say if my speed is less than this much, I'm walking.

  • Otherwise, I'm running, or otherwise I'm biking.

  • So great.

  • I've now written an activity detector--

  • a very naive activity detector-- just by looking at the speed

  • that the person's moving at.

  • But now my boss loves to play golf,

  • and he's like, this is great.

  • I want you to detect golf, and tell me when I'm playing golf,

  • and calculate what I'm doing when I'm playing golf.

  • How do I do that?

  • I'm in what I call, as a programmer,

  • the oh crap phase because now I realize that all of this code

  • that I've written, and all this code that I'm maintaining,

  • I now have to throw away because it can't

  • be used in something like this.

  • This scenario just doesn't become possible with the code

  • that I've written.

  • So when I think about going back to the revolutions

  • that I spoke about--

  • for example, something like an Uber

  • wouldn't have been possible before the mobile phone.

  • Something like a Google wouldn't have been possible

  • before the web.

  • And something like my golf detector,

  • it wouldn't be possible or would be

  • extremely difficult without machine learning.

  • So what is machine learning?

  • So traditional programming I like

  • to summarize in a diagram like this one.

  • And traditional programming is a case of you express rules using

  • a programming language like Ruby, or C#, or whatever.

  • And you have data that you feed into it,

  • and you compile that into something

  • that gives you answers.

  • So keeping the very simple example

  • that I have of an activity detector, that's

  • giving me the answer of you're playing golf.

  • You're running.

  • You're walking-- all those kind of things.

  • The machine learning revolution just flips the axes on this.

  • So the idea behind the machine learning revolution

  • is now I feed in answers, I feed in data, and I get out rules.

  • So instead of me needing to have the intelligence

  • to define the rules for something,

  • this revolution is saying that, OK, I'm

  • going to tell a computer that I'm doing this,

  • I'm doing that, I'm doing the other,

  • and it's going to figure out the rules.

  • It's going to match those patterns

  • and figure out the rules for me.

  • So now something like my activity detector for golf,

  • and walking, and running changes.

  • So now instead of me writing the code for that, I would say, OK.

  • I'm going to get lots of people to walk.

  • I'm going to get lots of people to wear whatever

  • sensor it is-- like maybe it's a watch or a smartphone--

  • in their pocket.

  • And I'm going to gather all that data,

  • and I'm going to tell a computer,

  • this is what walking looks like.

  • I'm going to do the same for running.

  • I'm going to do the same for biking.

  • And I may as well do the same for golfing.

  • So now my scenario becomes expandable,

  • and I can start detecting things that I previously would not

  • have been able to detect.

  • So I've opened up new scenarios that I previously would not be

  • able to program by using if-then rules or using whatever

  • language--

  • C.

  • Anybody remember the language Prolog?

  • Anybody use that?

  • Yeah.

  • Even Prolog couldn't handle that,

  • even though they said Prolog was an AI language.

  • So the idea behind this is it kind of emulates

  • how the human mind works.

  • So instead of me telling the computer

  • by having the intelligence to know what golf looks like, I

  • train the computer by taking data about what

  • golf looks like, and the computer recognizes that data,

  • matches that data.

  • So in the future, when I give it more data,

  • it will say, that kind of looks like golf,

  • so I'm going to say this is golf.

  • So we talk about learning.

  • We talk about the human brain.

  • So I always like to think like, well,

  • think about how you learn something--

  • like maybe this game.

  • Anybody remember this game?

  • Everybody knows how to play this game, right?

  • It seems, by the way, this game has

  • different names in every country,

  • and it's always hard to remember.

  • I grew up calling it knots and crosses.

  • Ed's nodding.

  • Most people grew up maybe calling

  • in this country tic-tac-toe.

  • I gave a talk similar to this in Japan earlier this year,

  • and they had this really strange name that I

  • couldn't remember for it.

  • But this is a very simple game, right?

  • Now, if I were to ask you to play that game right now,

  • and it's your move, where would you go?

  • How many people would go in the center?

  • How many people would not go in the center?

  • We need to talk.

  • So you've probably learned this as a young child--

  • and maybe you teach this to children.

  • But the strategy of winning this game.

  • If it's your turn, you will never win this game,

  • unless you're playing against somebody

  • who doesn't know how to play the game,

  • by not going in the center first.

  • Now, remember how you learned that.

  • OK?

  • If you have a really tough teacher like me,

  • I would teach my kids by beating them

  • every time at the game and that kind of stuff.

  • So if they would start in the corner, I would beat them.

  • And they would start somewhere else,

  • and I would beat them-- at the game.

  • And keep doing this kind of thing

  • until they eventually figured out

  • that they have to go in the center,

  • or they're going to lose.

  • So that was a case of this is how the human brain learns.

  • So how do we teach a computer the same way?

  • Now think about, for example, if your kids goes,

  • and they've never seen this board before.

  • So in this society, we read left to right, top to bottom.

  • So the first thing they'd probably do

  • is go in the top left-hand corner.

  • And then you'd go in the center, and then they'd

  • go somewhere else.

  • And you go somewhere else, and then they go somewhere else,

  • and you get three in a row, and you beat them.

  • They now have what, in machine language parlance,

  • is a labeled example.

  • They see the board.

  • They remember what they did on the board,

  • and that's been labeled as they lost.

  • Then they might play again, and they

  • have another labeled example of they lost.

  • And they'll keep doing that until they have labeled

  • examples of tying and then maybe, eventually,

  • labeled examples of winning.

  • So knowing how to learn is a step

  • towards this kind of intelligence.

  • And this is when we talk about machine learning,

  • we get our data, and we label them.

  • It's exactly the same as teaching

  • a child how to play tic-tac-toe or knots and crosses.

  • So let's take a look at--

  • so if I go back to this diagram for a moment

  • before I look at some code, now the idea

  • is thinking in terms of tic-tac-toe,

  • you have the answers of experience of playing the game.

  • You have the labels for that--

  • that you won, you lost, whatever.

  • And out of that, as a human, you'd begin to infer the rules.

  • Did anybody ever teach you you must go first?

  • Must go in the center first.

  • If you don't go in the center first, you go in a corner.

  • If you don't go in a corner, you block somebody

  • with two in a row?

  • You don't learn by those if-then rules.

  • I know I didn't, and most people I speak to didn't.

  • So as a result, they ended up playing the game,

  • and they infer the rules for themselves,

  • and it's exactly the same thing with machine learning.

  • So you build something.

  • The computer learns how to infer the rules

  • with a neural network.

  • And then at runtime, you give it data,

  • and it will give you back classifications,

  • or predictions, or give you back intelligent answers based

  • on the data that you've given it

  • So let's look at some code.

  • So this is what we call the training phase.

  • This is what we call the inference phase.

  • But enough theory.

  • So I like to explain a lot of this in coding.

  • So a very simple Hello, World scenario,

  • as all programmers have, is I'm going to get use some numbers.

  • And I'm going to give you some numbers,

  • and there's a relationship between these numbers.

  • And let's see who can figure out what the relationship is.

  • Are you ready?

  • OK.

  • Here's the numbers .

  • So where x is minus 1, y is minus 3.

  • Where x is 0, y is minus 1, et cetera, et cetera.

  • Can you see the relationship between the x and the y?

  • So if y equals something, what would y equal?

  • AUDIENCE: 2x minus 1.

  • LAURENCE MORONEY: 2x minus 1.

  • Excellent.

  • So the relationship here is y equals 2x minus 1.

  • How do you know that?

  • How did you get that?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: What's that?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: I can't hear you, sorry.

  • AUDIENCE: It's called a linear fit.

  • LAURENCE MORONEY: Oh, linear fit.

  • OK, thanks.

  • Yeah.

  • So you've probably done some basic geometry in school,

  • and you think about usually there's a relationship.

  • Y equals mx plus c, something along those lines.

  • So you start plugging in m and the c in.

  • And then your mind ultimately finds

  • something that works, right?

  • So you go, OK.

  • Well, if y is minus 3, maybe that's

  • a couple of x's, which will give me minus 2.

  • And I'll subtract 1 from that, give me minus 3.

  • And then I'll try that with 0 and 1.

  • Yep, that works.

  • Now I'll try that with 1 and 1.

  • That works.

  • So what happened is there were a couple of parameters

  • around the y that you started guessing

  • what those parameters were and started trying to fit them in

  • to get that relationship.

  • That's exactly what a neural network does,

  • and that's exactly the process of training a neural network.

  • When you train a neural network to try and pick

  • a relationship between numbers like this, all it's doing

  • is guessing those random parameters, calculating--

  • look through each of the parameters,

  • calculate which ones it got right, which ones it got wrong,

  • calculate how far it got them wrong by,

  • and then try and come up with new values that would be closer

  • to getting more of them right.

  • And that's the process called training.

  • So whenever you see training and talking

  • about needing lots of cycles for training, needing lots of GPU

  • time for training, all the computer is doing is trying,

  • failing, trying, failing, trying, failing, but each time

  • getting a little closer to the answer.

  • So let's look at the code for that.

  • So using TensorFlow and using Keras--

  • I don't have the code on my laptop,

  • so I've got to look back at the screen.

  • Sorry.

  • So using TensorFlow and Keras, here's

  • how I'm going to define a neural network

  • to do that linear fitting in just a few lines.

  • So the first thing I'm going to do

  • is I'm going to create my neural network.

  • This is the simplest possible neural network.

  • It's got one layer with one neuron in it.

  • And this is the code to do that.

  • So where you see keras.layers.Dense(units=1,

  • input shape=1), that's all that I'm doing is I'm saying

  • I've got a single neuron.

  • I'm going to pass a single number into that,

  • and you're going to try and figure out what the number I

  • want to come out of that is.

  • So very, very simple.

  • So then my next line of code is remember

  • I said all a neural network is going to do

  • is try and guess the parameters that

  • will make all the numbers fit?

  • So it will come up with a couple of rough guesses

  • for these parameters.

  • And then it has these two functions.

  • One's called a loss function, and one's called an optimizer.

  • And all they're doing is--

  • if you remember that set of six numbers I gave you,

  • it's saying, OK.

  • Well if y equals something times x plus something,

  • I'm going to guess those two somethings.

  • I'm going to measure how many of my y's I got right.

  • I'm going to measure how far I'm wrong in all of the ones

  • that I got wrong, and then I'm going

  • to try and guess new values for those somethings.

  • So the loss function is the part where

  • it's measuring how far it got wrong,

  • and the optimizer is saying, OK.

  • Here's what I got the last time.

  • I'm going to try to guess these new parameters,

  • and I'll keep going until I get y equals 2x minus 1

  • or something along those lines.

  • So that's all you do.

  • You just compile your model.

  • You specify the loss function.

  • You specify the optimizer.

  • These are both really heavy mathy things.

  • One of the nice things about Keras,

  • one of the nice things about TensorFlow,

  • is they're all done for you.

  • You're just going to specify them in code.

  • And so I'm going to say, I'm going

  • to try the mean squared error as my loss function.

  • And I'm going to try something called

  • SGD, which is stochastic gradient

  • descent as my optimizer.

  • And every time it loops around, it's

  • going to just guess new parameters based on those.

  • OK.

  • So then the next thing I'm going to do

  • is I'm going to feed my values into my neural network.

  • So I'm going to say, my x is going

  • to be this array-- minus 1, 0, 1, et cetera.

  • My y is that going to be this array.

  • So here I'm creating the data.

  • And so I just get them, and I load them

  • into a couple of arrays.

  • This is Python code, by the way.

  • And now all that I'm going to ask my neural network to do

  • is to try and come up with an answer.

  • And I do that with the fit method.

  • So here I just say, hey, try and fit my x's to my y's.

  • And this epochs=500 means you're going to just try 500 times.

  • So it's going to loop 500 times like that.

  • Remember I was saying it's going to get those parameters.

  • It's going to get it wrong.

  • It's going to optimize.

  • It's going to guess again.

  • It's going to get it wrong.

  • It's going to optimize.

  • So in this case in my code, I'm just saying do that 500 times.

  • And at the end of those 500 times,

  • it's going to come up with a model

  • that if I gave it a y-- sorry.

  • If I give it an x, it's going to give me what

  • it thinks the y is for that x.

  • OK.

  • And you do that using model.predict().

  • So if I pass it model.predict() for the value 10,

  • what do you think it would give me?

  • If you remember the numbers from earlier, y is 2x minus 1.

  • What do you think it would give?

  • 19, right?

  • It doesn't because it will give me something really close

  • to 19.

  • It gives me about 18.97, and I'm going

  • to try to run the code in a moment to show.

  • But why do you think it would do that?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: What's that?

  • AUDIENCE: It's predicting.

  • LAURENCE MORONEY: It's predicting.

  • And it's just been trained on a very few pieces of data.

  • With those six pieces of data, it looks like a line,

  • and it looks like a linear relationship,

  • but it might not be.

  • There's room for error there that with the fact that I'm

  • training on very, very little data,

  • this could be a small part of a line that, for all we know,

  • goes like this instead of being linear once you move out

  • of those points.

  • And as a result, those kind of things

  • get factored into the model as the model's training on it.

  • So you'll see it's going to get a very close answer,

  • but it's not going to be an exact answer.

  • Let me see if I can get the code running.

  • It's a little complex with this laptop.

  • When I'm presenting, it's hard to move stuff over

  • to that screen.

  • Just one second.

  • This requires some mouse-fu.

  • All right.

  • So I have that code.

  • Let's see.

  • Yeah.

  • So you can see that code I have now running up there.

  • And if you look right at the bottom of the screen over here,

  • we can see here's where it has actually done the training.

  • It's done 500 epochs worth of training.

  • And then when I called the model.predict(),

  • it gave me this answer, which is 18.976414.

  • And so that was one that I ran earlier.

  • I'm just going to try and run it again now, if I can.

  • But it's really hard to see.

  • So I'll click that Run arrow.

  • So this IDE is PyCharm, by the way.

  • So you see that it ran very quickly because it's

  • a very simple neural network.

  • And as a result, I was able to train it

  • through 500 epochs in whatever that is-- half a second.

  • What did it give me this time?

  • Was it 18.9747?

  • Is that what I see?

  • So again, very simple neural network, very simple code,

  • but this just shows some of the basics for how it works.

  • So next, I want to just get to a slightly more advanced example

  • once I get my slides back.

  • Whoops.

  • OK.

  • So that was very simple.

  • That was Hello, World.

  • We all remember our first Hello, World program which we wrote.

  • If you wrote it in Java, it was like 10 lines.

  • If you wrote it in C#, it was five lines.

  • If you wrote it in Python, it was one line.

  • If you wrote it in C++, it was like 300 lines.

  • [LAUGHTER]

  • Do you remember that--

  • I remember Petzold's book on programming Windows.

  • Anybody ever read that?

  • The whole first chapter was how to do Hello, World in MFC,

  • and it was like 15 pages long.

  • I thought it was great.

  • But that was a pretty easy example.

  • That, to me, is the Hello, World of machine learning-- just

  • doing that basic linear fitting.

  • But let's think about something more complicated.

  • So here are some items of clothing.

  • Now, as a human, you are looking at these items of clothing,

  • and you've instantly classified them.

  • And you instantly recognize them, or at least

  • hopefully most of them.

  • But think about the difficulty for a computer

  • to classify them.

  • For example, there are two shoes on this slide.

  • One is the high heel shoe in the upper right,

  • and one is the sneaker in the second row.

  • But they look really different to each other--

  • other than the fact that they're both red,

  • and you think they vaguely fit a foot.

  • The high heel, obviously your foot has to change to fit it.

  • And the sneaker, the foot is flat.

  • But as a human brain, we automatically recognize these,

  • and we see these as shoes.

  • Or if we look at the two shirts in the image, one of them

  • doesn't have arms because we automatically

  • see it as being folded-- the one with the tie.

  • And then the green one in the lower left--

  • we already know it's a shirt-- it's a t-shirt--

  • because we recognize it as such.

  • But think about how would you program a computer

  • to recognize these things, given the differences?

  • It's really hard to tell the difference

  • between a high heeled shoe and a sneaker, for example.

  • So the idea behind this is there's actually a data

  • set called Fashion MNIST.

  • And what it does is it gets 70,000 items of clothing,

  • and it's labeled those 70,000 items of clothing

  • in 10 different classes from shirts, to shoes, to handbags,

  • and all that kind of thing.

  • And it's built into Keras.

  • So one of the really neat things that came out of the research

  • behind this, by the way, is that the images are only 28

  • by 28 pixels.

  • So if you think about, it's faster to train a computer

  • if you're using less data.

  • You saw how quickly I trained with my linear example

  • earlier on.

  • But if I were to try and train it

  • with high definition images of handbags

  • and that kind of stuff, it would still work,

  • but it would just be slower.

  • And a lot of the research that's gone into this dataset,

  • they've actually been able to train and show

  • how to train a neural network that all you need

  • is a 28 by 28 pixel image for you

  • to be able to tell the difference

  • between different items of clothing.

  • As you are doing probably right now, you can take a look,

  • and you see which ones are pants,

  • which ones are shoes, which ones are

  • handbags, that kind of thing.

  • So this allows us to build a model that's

  • very, very quick to train.

  • And if I take a look, here's an example

  • of one item of clothing in 28 by 28 pixels.

  • And you automatically recognize that, right?

  • It's a boot, or a shoe, or something along those lines.

  • And so this is the kind of resolution of data--

  • all you need to be able to build an accurate classifier.

  • So let's look at the code for that.

  • So if you remember earlier on, the code that I was building

  • was I created the neural network.

  • I compiled a neural network by specifying

  • the loss function and the optimizer, and then I fit it.

  • So in this case, a little bit more complex.

  • Your code's going to look like-- you're going to use TensorFlow.

  • From TensorFlow, you're going to import the Keras namespace

  • because the Keras namespace really nicely gives you access

  • to that Fashion MNIST dataset.

  • So think about all the code that you'd typically

  • have to write to download those 70,000 images,

  • download their labels, correspond

  • a label with an image, load all of that in--

  • that kind of stuff.

  • All that coding is saved and just put

  • into these two lines of code.

  • And that's one of the neat things also about Python that I

  • find that makes Python great for machine learning because that

  • second line of code there where it's like train_images,

  • train_labels, test_images, test_labels equals

  • fashion_mnist.load_data(), what that's actually doing is

  • it's loading data from the dataset which is stored

  • in the cloud.

  • It's sorting that 70,000 items of data into four sets.

  • Those four sets are then split into two sets, one for training

  • and one for testing.

  • And that data is going to contain-- the one on the left

  • there, the training images, is 60,000 images

  • and 60,000 labels.

  • And then the other side is 10,000 images and 10,000 labels

  • that you're going to use for testing.

  • Now, anybody guess why would you separate them like this?

  • Why would you have a different set

  • for testing than you would have for training?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: The clue's in the name.

  • So how do you know your neural network

  • is going to work unless you've got something

  • to test it against?

  • Earlier, we could test with our linear thing

  • by feeding 10 in because I know I'm expecting 2x minus 1

  • to give me 19.

  • But now it's a case of, well, it'd

  • be great for me to be able to test it against something

  • that's known, against something that's labeled,

  • so I can measure the accuracy as I go forward.

  • So that's all I got to do in code.

  • So now if I come back here, let's look

  • at how we actually define the neural net-- oh, sorry.

  • Before I do that, so the training images

  • are things like the boot that I showed you earlier on.

  • It's 28 by 28 pixels.

  • The labels are actually just going

  • to be numbers rather than like a word like shoe.

  • Why do you think that would be?

  • So that you can define your own labels,

  • and you're not limited to English.

  • So for example, 09 in English could be an ankle boots.

  • The second one is in Chinese.

  • The third one is in Japanese.

  • And the fourth language, can anybody guess?

  • Brog ruitin?

  • That's actually Irish Gaelic.

  • Sorry, I'm biased.

  • I have to put some in.

  • So now, for example, I could build a classifier

  • not just to give me items of clothing

  • but to do it in different languages.

  • So that's just what my labels are going to look like.

  • So now let's take a look at the code

  • for defining my neural network.

  • So here is-- if you remember the first line of code

  • where I defined the single layer with the single neuron

  • for the classification, this is what it's going to look like.

  • And this is all it takes to build this clothing classifier.

  • So you see there are three layers here.

  • The first layer, where it says keras.layers.Flatten(input

  • shape=(28, 28)), all that is is I'm defining a layer to take

  • in 28 squared values.

  • Remember, the image is a square of 28 by 28 pixels,

  • but you don't feed a neural network with a square.

  • You feed it with a flat layer of values.

  • In this case, the values are between 0 and 256.

  • So I'm just flattening that out, and I'm

  • saying, that's my first layer.

  • You're going to take in whatever 28 squared is.

  • My second layer now is just 128 neurons,

  • and there's an activation function on them

  • which I'll explain in a moment.

  • And then my third layer is going to be 10 neurons.

  • Why do you think there are 10 in that one?

  • Can anybody guess?

  • So 28 squared for the inputs, 10 for the output.

  • Anybody remember where the number 10 was mentioned?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: Yeah, number of labels.

  • There were 10 different classes.

  • So what happens when a neural network,

  • when you train it like this one, it's not just going to pop out

  • and give you an answer and say, this is number 03,

  • or this is number 04.

  • Typically what will happen is that you

  • want to have 10 outputs for your 10 different labels,

  • and each output is going to give you a probability that it

  • is that label.

  • So for example, the boot that I showed earlier on was labeled

  • 09, so neuron 0 is going to give me a very low number.

  • Neuron 1 is going to give me a very low number.

  • Neuron 2 is going to give me a very low number.

  • Neuron 9 is going to give me a very high number.

  • And then by looking at the outputs across all

  • of these neurons, I can now determine

  • which one the neural network thinks it's classified for.

  • Remember, we're training this with a bunch of data.

  • So I'm giving it a whole bunch of data to say this is what

  • a number 09 looks like.

  • This is what a number 04 looks like.

  • This is what a number 03 looks like.

  • By saying, OK, this is what they are.

  • I encode the data in the same way.

  • And as a result, we'll get our output like this.

  • Now, every neuron has what's called an activation function.

  • And the idea behind that-- it's a very mathy kind of thing.

  • But in programmer's terms, the tf.nn.relu that you see there--

  • if you think about this in terms of code,

  • if I say if x is greater than zero, return x.

  • Else, return zero.

  • OK?

  • Very simple function, and that's what the relu is.

  • And all that's going to do is as the code is being filtered in

  • and then down into those neurons,

  • all of the stuff that's negative just gets filtered out.

  • So as a result, it makes it much quicker

  • for you to train your neural network by getting

  • rid of things that are negative, they're by rid

  • of things you don't need.

  • So every time when you specify a layer in a neural network,

  • there's usually an activation function like that.

  • Relu is one of the most common ones

  • that you'll see, particularly for classification

  • things like this.

  • But again, relu is a very mathy thing.

  • A lot of times, you go to the documentation,

  • you'll wonder what relu is.

  • You'll go look it up.

  • You'll see a page full of Greek letters.

  • I don't understand that stuff.

  • So for me, something like relu is as simple

  • as if x is greater than zero, return x.

  • Else, return zero.

  • All right.

  • So now I've defined my neural network.

  • And the next thing I'm going to do,

  • you'll see the same code as we saw

  • earlier on where what I'm going to do

  • is compile my neural network.

  • And in compiling my neural network,

  • I've got to specify the loss, and I've

  • got to specify the optimizer.

  • Now, there's a whole bunch of different types

  • of loss functions.

  • There's a whole bunch of different types

  • of optimizer functions.

  • When you read academic research papers around AI, a lot of them

  • specialize on these to say, for this type of problem,

  • you should use a loss function of sparse categorical cross

  • entropy because x.

  • For this type of problem, you should

  • use an optimizer, which is an Adam-based optimizer,

  • because x.

  • A lot of this as a programmer, you just

  • have to learn through trial and error.

  • I could specify the same loss function and the same optimizer

  • that I use for my linear and then try and train

  • my neural network, see how accurate it is,

  • how quick it is.

  • And then I could try these ones, see how accurate it is,

  • see how quick it is.

  • There's a lot of trial and error in that way.

  • And understanding which ones to use right now

  • is an inexact science.

  • It's a lot like, for example, as a traditional coder, which

  • is better-- using a for loop or a do loop?

  • Which is better-- using a while or using a when?

  • Those type of things.

  • And as a result, you see as you're

  • building your neural networks, there's

  • a lot of trial and error that you'll do here.

  • But reading academic papers can certainly

  • help if you can understand them.

  • So in this case now, like for the Fashion MNIST,

  • after a bit of trial and error, we

  • ended up selecting for the tutorial

  • to use these two functions. / But as you

  • read through the documentation, you'll

  • see all the functions that are available.

  • So in this case, I'm training it with an AdamOptimizer.

  • And remember, the process of training,

  • every iteration it will make a guess that says, OK.

  • This piece of data, I think it's a shoe.

  • OK, it's not a shoe.

  • It's a dress.

  • Why did I get it wrong?

  • I'll use my loss function to calculate where I got it wrong,

  • and then I'll use my optimizer to change

  • my weights on the next loop to try and see

  • if I can get it better.

  • This is what the neural network is thinking.

  • This how it works as you're actually training it.

  • So in this case, the AdamOptimizer

  • is what it's using to do that optimization.

  • The categorical cross entropy is what it's using for the loss.

  • So now if I train it, it's the same thing that we saw earlier

  • on-- model.fit().

  • So all I'm going to say is, hey, model.fit().

  • I'm going to train it with the input images and the input

  • labels, and in this case, I'm going

  • to train it for five epochs.

  • OK.

  • So that epochs number, it's up to you to tweak it.

  • What you'll do as you're training your network

  • and as you're testing your network,

  • you'll see how accurate it is.

  • Sometimes you can get the process called converging,

  • means as it gets more and more accurate,

  • sometimes you'll find convergence

  • in only a few epochs.

  • Sometimes, you'll need hundreds of epochs.

  • Of course, the bigger and more complex

  • the dataset, and the more labels that you have,

  • the longer it takes to actually train and converge.

  • But the Fashion MNIST dataset, actually

  • using the neural network that I defined in the previous slide,

  • five epochs is actually pretty accurate.

  • It gets there pretty quickly with just five.

  • OK.

  • And now if I then just want to test it and the model itself--

  • again, the important object here is the model object.

  • So if I call model.evaluate(), and I pass it the test images

  • and the text labels, it will then iterate through the 10,000

  • test images and test labels.

  • It will calculate.

  • It will say, I think it's going to be this.

  • It will compare it with the label.

  • If it gets it right, it improves its score.

  • If it gets it wrong, it decreases its score,

  • and it gives you that score back.

  • So the idea here is-- remember earlier

  • when we separated the data into 60,000 for training and 10,000

  • for test?

  • Instead of you manually writing all that code to do all that,

  • you can just call the evaluate() function on the model,

  • pass it the test stuff, and it will give you back the results.

  • It will do all that looping and checking for you.

  • All right.

  • And then, of course, if I want to predict an image,

  • if I have my own images, and I've formatted them into 28

  • by 28 grayscale, and I put them into a set,

  • now I can just say model.predict() my images,

  • and it will give me back a set of predictions.

  • Now, what do those predictions look like?

  • So for every image, because the output of the neural network

  • was--

  • there were 10 layers, so every image

  • is going to give you back a set of 10 numbers.

  • And those 10 numbers, as I mentioned earlier on,

  • nine of them should be very close to 0, and one of them

  • should be very close to 1.

  • And then using the one that's very close to 1,

  • you could determine your prediction

  • to be whatever that item of clothing is.

  • So if I demo this and show it in code--

  • let's see.

  • Go back here.

  • It's really hard to see it, so forgive me.

  • Whoops.

  • I'm going to select Fashion--

  • oh.

  • I really need a mouse.

  • I'm going to select Fashion.

  • OK.

  • And can you see the fashion code,

  • or is it's still showing the linear code?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: Is that fashion right there?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: All right.

  • OK.

  • Did I just close it?

  • I'm sorry.

  • It's really hard to see.

  • So let me go back.

  • Is that one fashion?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: Up one?

  • All right.

  • That one?

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: OK.

  • So here's the code that I was showing on the earlier slide.

  • So this is exactly the same code that was on my slides.

  • I'm just going to go down.

  • There's one thing I've done here that I didn't show

  • on the slides, and that was the images themselves

  • were grayscales, so every pixel was between 0 and 256.

  • For training my neural network, it

  • was just easier for me to normalize that data.

  • So instead of it being from 0 to 255,

  • it's a value from 0 to 1, which is relative to that value.

  • And that's what those two lines of codes there.

  • And that's one of the things that makes Python really

  • useful for this kind of thing.

  • Because I can just say that train_images set

  • is a set of 60,000 28 by 28 images,

  • and I can just say divide that by 255,

  • and that normalized that for me.

  • So that's one of the things that makes Python

  • really handy in data science.

  • So we can see it's just the same code.

  • So I'm going to do a bit of live audience participation.

  • Hopefully, I can get it to work with us.

  • So remember I said there are 10,000 testing images?

  • OK.

  • So somebody give me a number between 0 and 99,999.

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: Don't be shy.

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: What's that?

  • Just 27?

  • OK.

  • So hopefully I can see it so I can get it.

  • That's not 27, is it?

  • OK.

  • 27.

  • And here-- 27.

  • I tested it earlier what value 4560.

  • So what's going to happen here is

  • that I'm going to train the neural network to identify

  • those pieces of clothing.

  • And so-- or to be able to identify pieces of clothing.

  • I have no idea what piece of clothing number 27

  • is in the test set.

  • But what it's going to do once it's done is by the end,

  • you'll see it says print the test labels for 27.

  • So whatever item of clothing 27 is,

  • there's is a pre-assigned label for that.

  • It will print that out.

  • And then the next thing it'll do is

  • it will print out what the predicted label will be.

  • And hopefully, the two of them are going to be the same.

  • There's about a 90% chance, if I remember right from this one,

  • that they will.

  • So if I run it, it's going to take a little longer

  • than the previous one.

  • So now we can see it starting to train the network.

  • AUDIENCE: [INAUDIBLE]

  • LAURENCE MORONEY: And because I'm doing in PyCharm,

  • I can see in my debug window.

  • So you can see the epochs--

  • epoch 2, epoch 3, epoch 4.

  • This accuracy number here is how accurate it is against testing.

  • So it's about 89% correct.

  • And then you see it's actually printed two numbers below,

  • and they're both 0.

  • So that means for item of clothing number 27,

  • that class was 0.

  • And then the predicted for that class was actually also 0,

  • so it got it right.

  • Yay.

  • Anybody want to try one more just to prove that?

  • [INTERPOSING VOICES]

  • LAURENCE MORONEY: Let's see if we can--

  • what's that?

  • AUDIENCE: 42.

  • LAURENCE MORONEY: 42, I love it.

  • That's the ultimate answer, but what is the question?

  • OK.

  • 42.

  • And I'm guessing 42 is probably also item 0, but let's see.

  • Hopefully, I haven't broken any of the bracketing.

  • Let me run it again.

  • So because it's running all of the code,

  • it's just going to train the network again.

  • OK.

  • There's epoch 2, epoch 3.

  • Hello.

  • There we go.

  • So let's remember earlier I said I'm just

  • training it for five epochs.

  • It just makes it a little bit quicker.

  • And I'm also seeing-- if you look at the convergence,

  • on epoch 1 it was 82% accurate.

  • Oh, we got it wrong for 42.

  • It predicted it would be a 6, but it's actually a 3.

  • But the first epoch you see, this accuracy figure--

  • 82.45%.

  • That means it calculated it was 82% accurate.

  • The second epoch, 86% accurate; the third, 87%;

  • all the way down to the fifth--

  • 89%.

  • I could probably train it for 500 epochs,

  • but we don't have the time.

  • But then it might be more likely to get number 42 correct.

  • And thanks, Mr. Douglas Adams, that you've actually

  • given me one that doesn't work, so I can go back and test it.

  • OK.

  • So that's Fashion MNIST and how it works.

  • And so hopefully, this was a good introduction to you

  • for really the concept from a programmer's perspective

  • of what machine learning is all about.

  • And I always like to say at talks

  • that if you only take one slide away from this talk,

  • if you've never done machine learning,

  • or you want to get into programming machine learning,

  • take this one here.

  • Because this is really what the core of the revolution

  • is all about, and hopefully the code that I

  • showed you demonstrates that--

  • that machine learning is really all

  • about taking answers and data and feeding them in

  • to get rules out.

  • I didn't write a single line of code there today that says,

  • this is a t-shirt, or this is a jacket, or this is a handbag.

  • This has sleeves.

  • If has sleeves, then is t-shirt.

  • If has heels, then is shoe.

  • I didn't have to write any of that kind of code.

  • I just trained something on the data

  • using the below thing, the below part of the diagram--

  • feeding in answers, feeding in data,

  • building a model that will then infer the rules about it.

  • So with that, I just want to say thank you very much,

  • and I hope you enjoy the rest of the conference.

  • [APPLAUSE]

LAURENCE MORONEY: All right.

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

程序員的TF機器學習(TensorFlow @ O'Reilly AI大會,舊金山18年)。 (TF Machine Learning for Programmers (TensorFlow @ O’Reilly AI Conference, San Francisco '18))

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字