Placeholder Image

字幕列表 影片播放

  • JOSH GORDON: Classifiers are only

  • as good as the features you provide.

  • That means coming up with good features

  • is one of your most important jobs in machine learning.

  • But what makes a good feature, and how can you tell?

  • If you're doing binary classification,

  • then a good feature makes it easy to decide

  • between two different things.

  • For example, imagine we wanted to write a classifier

  • to tell the difference between two types of dogs--

  • greyhounds and Labradors.

  • Here we'll use two features-- the dog's height in inches

  • and their eye color.

  • Just for this toy example, let's make a couple assumptions

  • about dogs to keep things simple.

  • First, we'll say that greyhounds are usually

  • taller than Labradors.

  • Next, we'll pretend that dogs have only two eye

  • colors-- blue and brown.

  • And we'll say the color of their eyes

  • doesn't depend on the breed of dog.

  • This means that one of these features is useful

  • and the other tells us nothing.

  • To understand why, we'll visualize them using a toy

  • dataset I'll create.

  • Let's begin with height.

  • How useful do you think this feature is?

  • Well, on average, greyhounds tend

  • to be a couple inches taller than Labradors, but not always.

  • There's a lot of variation in the world.

  • So when we think of a feature, we

  • have to consider how it looks for different values

  • in a population.

  • Let's head into Python for a programmatic example.

  • I'm creating a population of 1,000

  • dogs-- 50-50 greyhound Labrador.

  • I'll give each of them a height.

  • For this example, we'll say that greyhounds

  • are on average 28 inches tall and Labradors are 24.

  • Now, all dogs are a bit different.

  • Let's say that height is normally distributed,

  • so we'll make both of these plus or minus 4 inches.

  • This will give us two arrays of numbers,

  • and we can visualize them in a histogram.

  • I'll add a parameter so greyhounds are in red

  • and Labradors are in blue.

  • Now we can run our script.

  • This shows how many dogs in our population have a given height.

  • There's a lot of data on the screen,

  • so let's simplify it and look at it piece by piece.

  • We'll start with dogs on the far left

  • of the distribution-- say, who are about 20 inches tall.

  • Imagine I asked you to predict whether a dog with his height

  • was a lab or a greyhound.

  • What would you do?

  • Well, you could figure out the probability of each type

  • of dog given their height.

  • Here, it's more likely the dog is a lab.

  • On the other hand, if we go all the way

  • to the right of the histogram and look

  • at a dog who is 35 inches tall, we

  • can be pretty confident they're a greyhound.

  • Now, what about a dog in the middle?

  • You can see the graph gives us less information

  • here, because the probability of each type of dog is close.

  • So height is a useful feature, but it's not perfect.

  • That's why in machine learning, you almost always

  • need multiple features.

  • Otherwise, you could just write an if statement

  • instead of bothering with the classifier.

  • To figure out what types of features you should use,

  • do a thought experiment.

  • Pretend you're the classifier.

  • If you were trying to figure out if this dog is

  • a lab or a greyhound, what other things would you want to know?

  • You might ask about their hair length,

  • or how fast they can run, or how much they weigh.

  • Exactly how many features you should use

  • is more of an art than a science,

  • but as a rule of thumb, think about how many you'd

  • need to solve the problem.

  • Now let's look at another feature like eye color.

  • Just for this toy example, let's imagine

  • dogs have only two eye colors, blue and brown.

  • And let's say the color of their eyes

  • doesn't depend on the breed of dog.

  • Here's what a histogram might look like for this example.

  • For most values, the distribution is about 50/50.

  • So this feature tells us nothing,

  • because it doesn't correlate with the type of dog.

  • Including a useless feature like this in your training

  • data can hurt your classifier's accuracy.

  • That's because there's a chance they might appear useful purely

  • by accident, especially if you have only a small amount

  • of training data.

  • You also want your features to be independent.

  • And independent features give you

  • different types of information.

  • Imagine we already have a feature-- height and inches--

  • in our dataset.

  • Ask yourself, would it be helpful

  • if we added another feature, like height in centimeters?

  • No, because it's perfectly correlated with one

  • we already have.

  • It's good practice to remove highly correlated features

  • from your training data.

  • That's because a lot of classifiers

  • aren't smart enough to realize that height in inches

  • in centimeters are the same thing,

  • so they might double count how important this feature is.

  • Last, you want your features to be easy to understand.

  • For a new example, imagine you want

  • to predict how many days it will take

  • to mail a letter between two different cities.

  • The farther apart the cities are, the longer it will take.

  • A great feature to use would be the distance

  • between the cities in miles.

  • A much worse pair of features to use

  • would be the city's locations given by their latitude

  • and longitude.

  • And here's why.

  • I can look at the distance and make

  • a good guess of how long it will take the letter to arrive.

  • But learning the relationship between latitude, longitude,

  • and time is much harder and would require many more

  • examples in your training data.

  • Now, there are techniques you can

  • use to figure out exactly how useful your features are,

  • and even what combinations of them are best,

  • so you never have to leave it to chance.

  • We'll get to those in a future episode.

  • Coming up next time, we'll continue building our intuition

  • for supervised learning.

  • We'll show how different types of classifiers

  • can be used to solve the same problem and dive a little bit

  • deeper into how they work.

  • Thanks very much for watching, and I'll see you then.

JOSH GORDON: Classifiers are only

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

A2 初級

什麼才是一個好的特徵?- 機器學習食譜#3 (What Makes a Good Feature? - Machine Learning Recipes #3)

  • 52 8
    scu.louis 發佈於 2021 年 01 月 14 日
影片單字