Placeholder Image

字幕列表 影片播放

  • Hey, I'm Jabril and welcome to Crash Course AI!

  • One way to make an artificial brain is by creating a neural network, which can have

  • millions of neurons and billions (or trillions) of connections between them.

  • Nowadays, some neural networks are fast and big enough to do some tasks even better than

  • humans can, like for example playing chess or predicting the weather!

  • But as we've talked about in Crash Course AI, neural networks don't just work on their

  • own.

  • They need to learn to solve problems by making mistakes.

  • Sounds kind of like us, right?

  • INTRO

  • Neural networks handle mistakes.

  • using an algorithm called backpropagation to make sure all the neurons that contributed

  • to an error get their math adjusted, and we'll unpack this a bit later.

  • And neural networks have two main parts: the architecture and the weights.

  • The architecture includes neurons and their connections.

  • And the weights are numbers that fine-tune how the neurons do their math to get an output.

  • So if a neural network makes a mistake, this often means that the weights aren't adjusted

  • correctly and we need to update them so they make better predictions next time.

  • The task of finding the best weights for a neural network architecture is called optimization.

  • And the best way to understand some basic principles of optimization is with an example

  • with the help of my pal John Green Bot.

  • Say that I manage a swimming pool, and I want to predict how many people will come next

  • week, so that I can schedule enough lifeguards.

  • A simple way to do this is by graphing some data points, like the number of swimmers and

  • the temperature in fahrenheit for every day over the past few weeks.

  • Then, we can look for a pattern in that graph to make predictions.

  • A way computers do this is with an optimization strategy called linear regression.

  • We start by drawing a random straight line on the graph, which kind of fits the data

  • points.

  • To optimize though, we need to know how incorrect this guess is.

  • So we calculate the distance between the line and each of the data points, add it all up,

  • and that gives us the error.

  • We're quantifying how big of a mistake we made.

  • The goal of linear regression is to adjust the line to make the error as small as possible.

  • We want the line to fit the training data as much as it can.

  • The result is called the line of best fit.

  • We can use this straight line to predict how many swimmers will show up for any temperature,

  • but parts of it defy logic.

  • For example, super cold days have a negative number, while dangerously hot days have way

  • more people than the pool can handle.

  • To get more accurate results, we might want to consider more than two features, like for

  • example adding the humidity which would turn our 2d graph into 3d.

  • And our line of best fit would be more like a plane of best fit.

  • But if we added a fourth feature, like whether it's raining or not, suddenly we can't

  • visualize this anymore.

  • So as we consider more features, we add more dimensions to the graph, the optimization

  • problem gets trickier, and fitting the training data is tougher.

  • This is where neural networks come in handy.

  • Basically, by connecting together many simple neurons with weights, a neural network can

  • learn to solve complicated problems, where the line of best fit becomes a weird multi-dimensional

  • function.

  • Let's give John Green-bot an untrained neural network.

  • To stick with the same example, the input layer of this neural network takes features

  • like temperature, humidity, rain, and so on.

  • And the output layer predicts the number of swimmers that will come to the pool.

  • We're not going to worry about designing the architecture of John Green-bot's neural

  • network right now.

  • Let's just focus on the weights.

  • He'll start, as always, by setting the weights to random numbers, like the random line on

  • the graph we drew earlier.

  • Only this time, it's not just one random line.

  • Because we have lots of inputs, it's lots of lines that are combined to make one big,

  • messy function.

  • Overall, this neural network's function resembles some weird multi-dimensional shape

  • that we don't really have a name for.

  • To train this neural network, we'll start by giving John Green-bot a bunch of measurements

  • from the past 10 days at the swimming pool, because these are the days where we also

  • know the output attendance.

  • We'll start with one day, where it was 80 degrees Fahrenheit, 65% humidity, and not

  • raining (which we'll represent with 0).

  • The neurons will do their thing by multiplying those features by the weights, adding the

  • results together, and passing information to the hidden layers until the output neuron

  • has an answer.

  • What do you think, John Green-bot?

  • John Green-bot: 145 people were at the pool!

  • Just like before, there is a difference between the neural network's output and the actual

  • swimming pool attendance -- which was recorded as 100 people.

  • Because we just have one output neuron, that difference of 45 people is the error.

  • Pretty simple.

  • In some neural networks though, the output layer may have a lot of neurons.

  • So the difference between the predicted answer and the correct answer is more than just one

  • number.

  • In these cases, the error is represented by what's known as a loss function.

  • Moving forward, we need to adjust the neural network's weights so that the next time

  • we give John Green-bot similar inputs, his math and final output will be more accurate.

  • Basically, we need John Green-bot to learn from his mistakes, a lot like when we pushed

  • a button to supervise his learning when he had the perceptron program.

  • But this is trickier because of how complicated neural networks are.

  • To help neural networks learn, scientists and mathematicians came up with an algorithm

  • called backpropagation of the error, or just backpropagation.

  • The basic goal is to look at the loss function and then assign blame to neurons back in the

  • previous layers of the network.

  • Some neurons' calculations may have been more to blame for the error than others, so

  • their weights will be adjusted more.

  • This information is fed backwards, which is where the idea of backpropagation comes from.

  • So for example, the error from our output neuron would go back a layer and adjust the

  • weights that get applied to our hidden layer neuron outputs.

  • And the error from our hidden layer neurons would go back a layer and adjust the weights

  • that get applied to our features.

  • Remember: our goal is to find the best combination of weights to get the lowest error.

  • To explain the logic behind optimization with a metaphor, let's send John Green Bot on

  • a metaphorical journey through the Thought Bubble.

  • Let's imagine that weights in our neural network are like latitude and longitude coordinates

  • on a map.

  • And the error of our neural network is the altitude -- lower is better.

  • John Green-bot the explorer is on a quest to find the lowest point in the deepest valley.

  • The latitude and longitude of that lowest point -- where the error is the smallest -- are

  • the weights of the neural network's global optimal solution.

  • But John Green-bot has no idea where this valley actually is.

  • By randomly setting the initial weights of our neural network, we're basically dumping

  • him in the middle of the jungle.

  • All he knows is his current latitude, longitude, and altitude.

  • Maybe we got lucky and he's on the side of the deepest valley.

  • But he could also be at the top of the highest mountain far away.

  • The only way to know is to explore!

  • Because the jungle is so dense, it's hard to see very far.

  • The best John Green-bot can do is look around and make a guess.

  • He notices that he can descend down a little by moving northeast, so he takes a step down

  • and updates his latitude and longitude.

  • From this new position, he looks around and picks another step that decreases his altitude

  • a little more.

  • And then anotherand another.

  • With every brave step, he updates his coordinates and decreases his altitude.

  • Eventually, John Green-bot looks around and finds that he can't go down anymore.

  • He celebrates, because it seems like he found the lowest point in the deepest valley!

  • Or... so he thinks.

  • If we look at the whole map, we can see that John Green-bot only found the bottom of a

  • small gorge when he ran out ofdown.”

  • It's way better than where he started, but it's definitely not the lowest point of

  • the deepest valley.

  • So he just found a local optimal solution, where the weights make the error relatively

  • small, but not the smallest it could be.

  • Sorry, buddy.

  • Thanks, Thought Bubble.

  • Backpropagation and learning always involves lots of little steps, and optimization is

  • tricky with any neural network.

  • If we go back to our example of optimization as exploring a metaphorical map, we're never

  • quite sure if we're headed in the right direction or if we've reached the lowest

  • valley with the smallest error -- again that's the global optimal solution.

  • But tricks have been discovered to help us better navigate.

  • For example, when we drop an explorer somewhere on the map, they could be really far from

  • the lowest valley, with a giant mountain range in the way.

  • So it might be a good idea to try different random starting points to be sure that the

  • neural network isn't getting stuck at a locally optimal solution.

  • Or instead of restarting over and over again, we could have a team of explorers that start

  • from different locations and explore the jungle simultaneously.

  • This strategy of exploring different solutions at the same time on the same neural network

  • is especially useful when you have a giant computer with lots of processors.

  • And we could even adjust the explorer's step size, so that they can step right over

  • small hills as they try to find and descend into a valley.

  • This step size is called the learning rate, and it's how much the neuron weights get

  • adjusted every time backpropagation happens.

  • We're always looking for more creative ways to explore solutions, try different combinations

  • of weights, and minimize the loss function as we train neural networks.

  • But even if we use a bunch of training data and backpropagation to find the global optimal

  • solutionwe're still only halfway done.

  • The other half of training an AI is checking whether the system can answer new questions.

  • It's easy to solve a problem we've seen before, like taking a test after studying

  • the answer key.

  • We may get an A, but we didn't actually learn much.

  • To really test what we've learned, we need to solve problems we haven't seen before.

  • Same goes for neural networks.

  • This whole time, John Green-bot has been training his neural network with swimming pool data.

  • His neural network has dozens of features like temperature, humidity, rain, day of the

  • week, and wind speedbut also grass length, number of butterflies around the pool, and

  • the average GPA of the lifeguards.

  • More data can be better for finding patterns and accuracy, as long as the computer can

  • handle it!

  • Over time, backpropagation will adjust the neuron weights, so that neural network's

  • output matches the training data.

  • Remember, that's called fitting to the training data, and with this complicated neural network,

  • we're looking for a multi-dimensional function.

  • And sometimes, backpropagation is too good at making a neural network fit to certain

  • data.

  • See, there are lots of coincidental relationships in big datasets.

  • Like for example, the divorce rate in Maine may be correlated with U.S. margarine consumption,

  • or skiing revenue may be correlated with the number of people dying by getting trapped

  • in their bedsheets.

  • Neural networks are really good at finding these kinds of relationships.

  • And it can be a big problem, because if we give a neural network some new data that doesn't

  • adhere to these silly correlations, then it will probably make some strange errors.

  • That's a danger known as overfitting.

  • The easiest way to prevent overfitting is to keep the neural network simple.

  • If we retrain John Green-bot's swimming pool program /without/ data like grass length

  • and number of butterflies, and we observe that our accuracy doesn't change, then ignoring

  • those features is best.

  • So training a neural network isn't just a bunch of math!

  • We need to consider how to best represent our various problems as features in AI systems,

  • and to think carefully about what mistakes these programs might make.

  • Next time, we'll jump into our very first lab of the course, where we'll apply all

  • this knowledge and build a neural network together.

  • Crash Course Ai is produced in association with PBS Digital Studios.

  • If you want to help keep Crash Course free for everyone, forever, you can join our community

  • on Patreon.

  • And if you want to learn more about the math of k-means clustering, check out this video

  • from Crash Course Statistics.

Hey, I'm Jabril and welcome to Crash Course AI!

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

訓練神經網絡。訓練神經網絡:速成班AI #4 (Training Neural Networks: Crash Course AI #4)

  • 1 1
    林宜悉 發佈於 2021 年 01 月 14 日
影片單字