Placeholder Image

字幕列表 影片播放

  • So now youre probably thinkingwow, deep nets are really great! But why did it

  • take so long for them to become popular? Well as it turns out, when you try to train them

  • with a method called backpropagation, you run into a fundamental problem called the

  • vanishing gradient, or sometimes the exploding gradient. When that happens, training takes

  • too long and the accuracy really suffers. Let’s take a closer look.

  • When youre training a neural net, youre constantly calculating a cost value. The cost

  • is typically the difference between the net’s predicted output and the actual output from

  • a set of labelled training data. The cost is then lowered by making slight adjustments

  • to the weights and biases over and over throughout the training process, until the lowest possible

  • value is obtained. Here is that forward prop again; and here are the example weights and

  • biases. The training process utilizes something called a gradient, which measures the rate

  • at which the cost will change with respect to a change in a weight or a bias.

  • Deep architectures are your best and sometimes your only choice for complex machine learning

  • problems such as facial recognition. But up until 2006, there was no way to accurately

  • train deep nets due to a fundamental problem with the training process: the vanishing gradient.

  • Let’s think of a gradient like a slope, and the training process like a rock rolling

  • down that slope. A rock will roll quickly down a steep slope but will barely move at

  • all on a flat surface. The same is true with the gradient of a deep net. When the gradient

  • is large, the net will train quickly. When the gradient is small, the net will train

  • slowly. Here's that deep net again. And here is how the gradient could potentially vanish

  • or decay back through the net. As you can see, the gradients are much smaller in the

  • earlier layers. As a result, the early layers of the network are the slowest to train. But

  • this is a fundamental problem! The early layers are responsible for detecting the simple patterns

  • and the building blockswhen it came to facial recognition, the early layers detected

  • the edges which were combined to form facial features later in the network. And if the

  • early layers get it wrong, the result built up by the net will be wrong as well. It could

  • mean that instead of a face like this, your net looks for this.

  • The process used for training a neural net is called back-propagation or back-prop. We

  • saw before that forward prop starts with the inputs and works forward; back-prop does the

  • reverse, calculating the gradient from right to left. For example, here are 5 gradients,

  • 4 weight and 1 bias. It starts with the left and works back through the layers, like so.

  • Each time it calculates a gradient, it uses all the previous gradients up to that point.

  • So, lets start with that node. That edge uses the gradient at that node. And the next. So

  • far things are simple. As you keep going back, things get a bit more complex - that one for

  • example uses a lot of gradients, even though this is a relatively simple net. If your net

  • gets larger and deeper, like this one, it gets even worse. But why is that? Well, a

  • gradient at any point is the product of the previous gradients up to that point. And the

  • product of two numbers between 0 and 1 gives you a smaller number. Say this rectangle is

  • a one. Also, say there are two gradients - a fourth - like that - and a third. If you multiply

  • them, you get a fourth of a third which is a twelfth. A fourth of a twelfth is a forty-eighth.

  • You can see that numbers keep getting smaller the more you multiply.

  • Have you ever had this issue while training a neural network with backpropagation? If

  • so, please comment and let me know your thoughts.

  • As a result of all this, backprop ends up taking a lot of time to train the net, and

  • the accuracy is often very low.

  • Up until 2006, deep nets were still underperforming shallow nets and other machine learning algorithms.

  • But everything changed after three breakthrough papers published by Hinton, Lecun, and Bengio

  • in 2006 and 2007. In the next video, well begin taking a closer look at these breakthroughs,

  • starting with the Restricted Boltzmann Machine.

So now youre probably thinkingwow, deep nets are really great! But why did it


單字即點即查 點擊單字可以查詢單字解釋

B1 中級 美國腔

一個老問題--第5集(深度學習SIMPLIFIED)。 (An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED))

  • 1302 51
    firefox 發佈於 2021 年 01 月 14 日