Placeholder Image

字幕列表 影片播放

  • Classification lets us pick one or the other or some small number of labels for our data

  • The problem is that real life doesn't fit into these neat little categories

  • When we have label data there isn't yes or no or a B or C or some labels?

  • Right, then we have what we call a regression problem. We're actually trying to predict actual outputs, right so given these inputs

  • What's the temperature at which something will occur or?

  • Given this movie on a streaming site and the attributes and the people that have watched it

  • What amount of action is it right because that informs who should watch that movie

  • There's lots of times when you don't want to say--but sees this and isn't this you want to say it's a little bit of this

  • And a little bit of this

  • and that's what regression is for and some of the algorithms we use for regression are actually quite similar to

  • Classify. So for example, you can regress using a support vector machine or support vector of aggressor, right?

  • But we also use other ones like so we're more likely to use things like linear regression and things like this

  • So let's start off with perhaps for simplest form of regression. That's linear regression, right?

  • It might not occur to people who use linear regression for actually what you're doing is machine learning

  • But you are let's imagine we have just data that's got one input

  • so one attribute attribute one and

  • Our output which is why this is our table of data just like before and this is our instance data

  • So we've got one two, three four like this

  • so what we want to do is we want to input attribute one and

  • We want to output Y which instead of being a yes or no is going to be some number on a scale

  • Let's say between Norton one. So really what we're trying to do is we've got our graph here of our input variable

  • Attribute one and we've got our Y output and these are our data points in our training set

  • So here like this and they sort of go up like this

  • what we're going to do using linear regression is fit a line through this data and a line is of the form y

  • equals MX plus C

  • so in this case M is going to be the gradient of our line and C is going to be B intercept so in this

  • Case I guess something along the lines of this straight up like this

  • So if our M was one in this case M

  • Equals one or maybe equals one point two to make it slightly more interesting and then our C is going to be let's say C

  • His naught point naught to these are the values that we're going to learn using linear regression

  • So, how do we train something like this?

  • What we're going to do is we want to find the values for our unknowns which are M and C

  • Given a lot of x and y pairs, right?

  • So we've got our x and y pairs here and we want to predict these values the optimal values for this data set

  • So we're going to find values for M. And C where this distance the prediction error is minimized the better fit

  • This line is the average prediction error is going to go down if this line is over here

  • It's going to be a huge error. And so the hope is that if we predict this correctly and we have an M

  • And we have a C then when we come up with a new

  • Value that we're trying to predict we can pass it through this formula. We can multiply it by 1.2 and then add

  • 0.02 and that will produce our prediction for y and hopefully that would be quite close to what it is

  • So for example, let's imagine. We have a new value for attribute 1. Let's come in here

  • We're gonna look up here and this is going to be the prediction for our Y and that's the output of our aggressor

  • So this linear regression is capable of producing

  • Predictions based on its attribute now if we have more than one attribute

  • This is called multivariate linear regression and the principle is exactly the same is this we're going to have lots of these multiplier ends

  • We could say something like Y is

  • m1 x1 plus

  • m2 x2 and so on for all of our different attributes

  • so it's going to be a linear combination a bit like PCA a linear combination of

  • These different attributes and it's obviously going to be multi-dimensional

  • So one interesting thing about linear regression is but what it's going to do is predict us a straight line

  • regardless of how many dimensions we've got now sometimes if we want to use this for a classification

  • Purpose we still can all right

  • Now I'm supposed to be talking about regression not classification

  • But just briefly if you indulge me we can pass this function through something called a logistic function or in the sigmoid curve

  • And we can squash it into something. There's this shape

  • And now what we're doing is we're pushing our values up to 1 and down to 0

  • Right and that is our classification between 1 and 0

  • So it is possible to perform linear regression using this additional logistic function to perform

  • Classification and this is called logistic regression. I

  • Just what I mention, but that's something you will see being done on some data

  • So let's talk a little bit about something more powerful

  • That's artificial neural networks

  • now

  • Anytime in the media at the moment when you see the term AI what they're actually talking about is machine learning and what they're talking

  • About is some large neural network. Now. Let's keep it a little bit smaller

  • Let's imagine what we want to do is take item for attributes and map them to some prediction some regressed value, right?

  • How are we going to do this?

  • Well, what we can do is we can essentially combine a lot of different linear regressions through some nonlinear functions into a really powerful

  • Regression algorithm, right. So let's imagine that we have some data which has got three inputs

  • So we've got our instances and we've got our attributes a B and C. Our inputs are a B and C

  • And then we have some hidden New Orleans right and I explained a neuron in a moment

  • Then we have an output value that we'd like to address. This is where we're trying to predict the value

  • So, you know how much disease does something have how hot is it these kind of things depending on our attributes?

  • this is where we put in a this is where we put in B and this is where we put in C and

  • Then we perform a weighted sum of all of these things for each of these neurons

  • So for example this neurons going to have three inputs from these three here and this is going to have weight one

  • This is going to be weight - this is going to be weight three

  • And we're gonna do a weighted sum just like in linear regression

  • So we're going to do weight one times a plus weight two times B plus weight three times c add

  • them together and then we're going to add any bias that we want to so this is going to be plus some bias and that's

  • Going to give us a value for this neuron, which let's call it hidden want right because this is generally speaking

  • We don't look at these values. It's not too important. We're going to do a different sum for this one

  • So I'm going to all them in different colors so we don't get confused. So this has got three weights

  • So this is going to be a different way

  • This is going to have another different weight

  • And we're going to do this much times a Plus this much times B plus this much times C

  • Add them all up add a plus a bias and we're going to get hidden - and we're going to do the same thing

  • With these ones here like this

  • This is going to be hidden three hidden for hidden five and so on for as far as we like to go

  • All right

  • now

  • the nice thing about this is for each of these can calculate a different weighted sum now the problem is that if we just did

  • This then what happens is we actually get a series of linear regressions

  • All right

  • because this is just multivariate linear regression and in the end our

  • Algorithm doesn't end up any good right? If you combine multiple linear functions together, you just get one different linear function

  • so we pass all of these hidden values through a nonlinear function like a sigmoid or

  • Tan so a sigmoid goes between naught and 1 so this is not than 1 and a tan

  • Hyperbolic tangent will go between minus 1 and 1

  • Things like this and what that will do is add a sufficiently complex

  • Function that when we combined them all together

  • We can actually get quite a powerful algorithm the way this works is we put in a B and C

  • We calculate all the weighted sums through these functions into our hidden units and then we calculate another series of weighted sums

  • so add together to be our final output and this will be our final output prediction Y now the way we train this is

  • we're going to put in lots and lots of test data where we have the values for a b c and we know what the

  • Output should have been we go through the network and then we say, well actually we were a little bit off

  • So can we change all of these weights so that next time we're a little bit closer to the correct answer and let's keep doing

  • this over and over again in a process called gradient descent and

  • Slowly settle upon some weights where for the most part when we put in our a B and C

  • We get what we want out the other side now, it's unlikely to be perfect

  • but just like with the other machine learning as we've talked about we're going to be trying to make our

  • Prediction on our testing set as good as possible

  • All right

  • So we've put in a lot of training data and hopefully when we take this network and apply it to some new data it also

  • Performs. Well, let's look at an example

  • We looked at credit checks in the previous video and we will classify whether or not someone should be given credit

  • Well something that we cut we often calculate is credit rating

  • which is a value from let's say naught to 1 of

  • How good your credit score is so a could be how much money you have in your bank B could be whether you have any

  • Loans and C could be whether you own a car and obviously there's going to be more of these because you can't make a decision

  • On this those three things. So what we do is we get a number of people that we've already made decisions about right?

  • so we know the person a has a bank account balance of five thousand two thousand in loans, and he does own a car and

  • He has a credit rating of 75 or Northpoint 75 whatever your scale is

  • So we put this in we sieze wait

  • So but this is the correct

  • Prediction and then hopefully when another person comes along with a different set of variables will predict the right thing for them

  • So you can make this network as deep or as big as you want. We're typically

  • Multi-layer perceptrons or artificial neural networks, like this won't be very deep

  • one two three hidden units deep maybe but what's been shown in the literature is but actually

  • If you have a sufficient number of hidden units

  • You can basically model any function like this right as long as you've got sufficient training data to create it

  • So we're going to use Weka again

  • because Weka has lots of

  • regression algorithms built-in like artificial neural networks

  • And linear regression. So let's open up a data set. We're going to use this time

  • So they said we've got is a data set on superconductivity right now

  • Obviously my knowledge of physics is should we say average?

  • But a superconductor is something that when you get it to a critical temperature it becomes it has no resistance

  • Right, which is very useful for electrical circuits

  • And so this is a data set about what are the properties of material and what is the critical temperature?

  • Below, which it will be a superconductor

  • Now, I'm sure there's going to be some physicists in the comments that might point out some areas of what I just said

  • But we'll move on. So we're reading a file. This is quite a big data set

  • So we have a lot of input attributes and then at the end we have this critical temperature that we're trying to predict this

  • temperature if we look at this histogram goes from 0 to

  • 185 if we look at some of the other things so for example, we've got this entropy atomic radius, which I can pretend

  • I know what that is, which goes from naught to two point one four. Is that good?

  • Right, what we're going to do is we're going to start by using

  • Multivariate linear regression to try and predict this critical temperature as a combination of these input features

  • So I'm going to go to classify. There's just one classified tab even for regression

  • we're going to use our same percentage splitters before so 70% and

  • We're going to use a simple linear regression function for this

  • Let's go

  • So we've trained our linear regression and what we want to do now is work out whether it's worked or not on our testing set

  • We've got the variables. We wanted Y and we've got the variables that have been predicted Y hat and

  • Hopefully they're exactly the same if they're exactly the same then they're going to be on a straight line like this

  • So we were hoping to get a why down here and we it now, of course this won't actually happened

  • What will happen is these wines are ever so slightly different than the Y's

  • we were expecting so you might see a bit of noise around the center like this and

  • The way we would normally measure this is something called mean absolute error or mean squared error or root mean squared error

  • Which all very similar ways to measure the same thing

  • It's to measure what is the average distance between what we wanted and what we got

  • so if we were hoping to get away of North Point - but we actually got a Y of North Point for then our

  • Mistake was we were not point to too high

  • And so for every single instance in our test set we can sum up all of the areas we've got and we can work out

  • What the average error was right. So we have a hundred in our test set

  • We sum up the errors and we divide by a hundred and that tells us I mean error was a certain amount

  • What will sometimes happen is your predictions will be above or below right?

  • and so your actual mean error might be zero because half a time you predicted too high half a time you predicted too low and

  • So on average, you've got it exactly right. Obviously, that's not correct

  • So what we tend to do is calculate something called mean absolute error

  • So essentially if you're too small, we just remove the minus sign and call call it an error of that amount

  • All right

  • So if your mean absolute error is nour point four then what that's saying is but on average you're naught point far away

  • Live above or below than where you were hoping to be

  • It's also quite common to see similar measures like root mean squared error for every instance

  • We take our error we square it we sum them all up and then right at the end

  • We take a square root, right?

  • And again, this is a very similar measure to mean absolute error like the squaring. We move our negative symbols for us

  • It's also quite common particularly in

  • fields like biology and medicine to see something called R squared or the R squared coefficient and this is essentially the

  • Correlation squared it's a measure of how well or how tightly correlated our

  • Predictions and our ground truth were for example

  • This would be a pretty good correlation if maybe naught point eight or nor point nine if these were our points like this and were

  • Absolutely. Perfect. That would be an R squared of one if our points were everywhere

  • That will be an R squared of 0 and what I saying is it's a value between 0 and 1 that tells

  • How well we predicted zero means you basically didn't predict anything at all

  • It was completely random output one means you predicted everything exactly, correct

  • Now, of course that's unlikely to happen or a test set

  • What you'll find is you'll hope to get some number but somewhere around point seven point eight, right?

  • But it will depend on how difficult your problem is to solve

  • So maybe on a really difficult problem an r-squared of 0.5 is actually pretty good, right?

  • So it's just going to depend on the situation. So we've got our linear regression trained up

  • We know that the correlation coefficient is 0.85. We know that the mean absolute error for example is 13 degrees

  • What we haven't done is visualize cyst sometimes a simplify to do

  • This is just to plot a scatter plot of what we wanted and what we actually got from our predictor

  • So I'm going to right click on linear regression. I'm going to say visualize

  • classify errors

  • it's going to be a scatter plot of the

  • Expected value and the prediction we actually got from our networks so you can see generally speaking. It's not too bad

  • Obviously the data set is quite bunched up in some of these areas which means that it's sometimes harder to predict

  • But we've got a general upward trend which is exactly what we wanted

  • You can see that the prediction around zero is not good at all

  • The x-axis in this instance is the actual critical temperature of that particular substance

  • The y-axis is what the linear regression actually predicted

  • You can see that the range here is from about zero to about

  • 136 on our actual values and the predicted values are from about minus 30 which doesn't really make sense to 131, but they're pretty close

  • most of the ones that caused a problem with a very low values right because you've essentially got lots and lots of values that have

  • a very small critical temperature on this scale, but different attributes that's been hard to fit a line to

  • something more powerful for example a

  • Multi-layer perceptron, you know an artificial neural network might do a better job of those kind of instances

  • But you can see that there's a general

  • Upward slope in this particular scatter plot be larger X's represent a larger error so you can see this is of a line

  • We're actually trying to fit here down here with all these small X's and there's quite a few of them on there

  • So actually for a lot of these substances the prediction even by linear regression has been pretty good

  • regression algorithms

  • Let us predict real scalar data out of our input variables and then this can be really useful in a huge array of different

  • Situations where we want to predict some it doesn't fit neatly into a yes-or-no answer or an ABC category label

  • We've looked at linear regression and artificial neural networks, and obviously neural networks get pretty deep these days

  • But these are a great starting point, so

  • Thanks very much for watching. I hope you enjoyed this series on data analysis something a little bit different from computerphile

  • I wanted to thank my colleague. Dr. Mercedes Torres Torres for helping me design the content

  • Please let us know what you liked what you didn't like let us know in the comments what you'd like to see more of

  • And we'll see you back again next time

Classification lets us pick one or the other or some small number of labels for our data

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

數據分析9:數據迴歸 - Computerphile (Data Analysis 9: Data Regression - Computerphile)

  • 2 0
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字