## 字幕列表 影片播放

• JOSH GORDON: Classifiers are only

• as good as the features you provide.

• That means coming up with good features

• is one of your most important jobs in machine learning.

• But what makes a good feature, and how can you tell?

• If you're doing binary classification,

• then a good feature makes it easy to decide

• between two different things.

• For example, imagine we wanted to write a classifier

• to tell the difference between two types of dogs--

• Here we'll use two features-- the dog's height in inches

• and their eye color.

• Just for this toy example, let's make a couple assumptions

• about dogs to keep things simple.

• First, we'll say that greyhounds are usually

• Next, we'll pretend that dogs have only two eye

• colors-- blue and brown.

• And we'll say the color of their eyes

• doesn't depend on the breed of dog.

• This means that one of these features is useful

• and the other tells us nothing.

• To understand why, we'll visualize them using a toy

• dataset I'll create.

• Let's begin with height.

• How useful do you think this feature is?

• Well, on average, greyhounds tend

• to be a couple inches taller than Labradors, but not always.

• There's a lot of variation in the world.

• So when we think of a feature, we

• have to consider how it looks for different values

• in a population.

• Let's head into Python for a programmatic example.

• I'm creating a population of 1,000

• I'll give each of them a height.

• For this example, we'll say that greyhounds

• are on average 28 inches tall and Labradors are 24.

• Now, all dogs are a bit different.

• Let's say that height is normally distributed,

• so we'll make both of these plus or minus 4 inches.

• This will give us two arrays of numbers,

• and we can visualize them in a histogram.

• I'll add a parameter so greyhounds are in red

• and Labradors are in blue.

• Now we can run our script.

• This shows how many dogs in our population have a given height.

• There's a lot of data on the screen,

• so let's simplify it and look at it piece by piece.

• of the distribution-- say, who are about 20 inches tall.

• Imagine I asked you to predict whether a dog with his height

• was a lab or a greyhound.

• What would you do?

• Well, you could figure out the probability of each type

• of dog given their height.

• Here, it's more likely the dog is a lab.

• On the other hand, if we go all the way

• to the right of the histogram and look

• at a dog who is 35 inches tall, we

• can be pretty confident they're a greyhound.

• Now, what about a dog in the middle?

• You can see the graph gives us less information

• here, because the probability of each type of dog is close.

• So height is a useful feature, but it's not perfect.

• That's why in machine learning, you almost always

• need multiple features.

• Otherwise, you could just write an if statement

• instead of bothering with the classifier.

• To figure out what types of features you should use,

• do a thought experiment.

• Pretend you're the classifier.

• If you were trying to figure out if this dog is

• a lab or a greyhound, what other things would you want to know?

• or how fast they can run, or how much they weigh.

• Exactly how many features you should use

• is more of an art than a science,

• but as a rule of thumb, think about how many you'd

• need to solve the problem.

• Now let's look at another feature like eye color.

• Just for this toy example, let's imagine

• dogs have only two eye colors, blue and brown.

• And let's say the color of their eyes

• doesn't depend on the breed of dog.

• Here's what a histogram might look like for this example.

• For most values, the distribution is about 50/50.

• So this feature tells us nothing,

• because it doesn't correlate with the type of dog.

• Including a useless feature like this in your training

• data can hurt your classifier's accuracy.

• That's because there's a chance they might appear useful purely

• by accident, especially if you have only a small amount

• of training data.

• You also want your features to be independent.

• And independent features give you

• different types of information.

• Imagine we already have a feature-- height and inches--

• in our dataset.

• if we added another feature, like height in centimeters?

• No, because it's perfectly correlated with one

• It's good practice to remove highly correlated features

• That's because a lot of classifiers

• aren't smart enough to realize that height in inches

• in centimeters are the same thing,

• so they might double count how important this feature is.

• Last, you want your features to be easy to understand.

• For a new example, imagine you want

• to predict how many days it will take

• to mail a letter between two different cities.

• The farther apart the cities are, the longer it will take.

• A great feature to use would be the distance

• between the cities in miles.

• A much worse pair of features to use

• would be the city's locations given by their latitude

• and longitude.

• And here's why.

• I can look at the distance and make

• a good guess of how long it will take the letter to arrive.

• But learning the relationship between latitude, longitude,

• and time is much harder and would require many more

• examples in your training data.

• Now, there are techniques you can

• use to figure out exactly how useful your features are,

• and even what combinations of them are best,

• so you never have to leave it to chance.

• We'll get to those in a future episode.

• Coming up next time, we'll continue building our intuition

• for supervised learning.

• We'll show how different types of classifiers

• can be used to solve the same problem and dive a little bit

• deeper into how they work.

• Thanks very much for watching, and I'll see you then.

JOSH GORDON: Classifiers are only

# 什麼才是一個好的特徵？- 機器學習食譜#3 (What Makes a Good Feature? - Machine Learning Recipes #3)

• 52 8
scu.louis 發佈於 2021 年 01 月 14 日