#11 機器學習專業【課程 1，第 1 周，第 3 課 (#11 Machine Learning Specialization [Course 1, Week 1, Lesson 3])

字幕列表影片播放

由 AI 自動生成

In order to implement linear regression, the first key step is for us to define something called a cost function.

要實現線性迴歸，關鍵的第一步是要定義一個成本函數。
This is something we'll build in this video.

我們將在本視頻中介紹這一點。
The cost function will tell us how well the model is doing so that we can try to get it to do better.

成本函數將告訴我們模型做得如何，以便我們嘗試讓它做得更好。
Let's look at what this means.

讓我們來看看這意味著什麼。
Recall that you have a training set that contains input features x and output targets y.

回想一下，你有一個包含輸入特徵 x 和輸出目標 y 的訓練集。
The model you're going to use to fit this training set is this linear function fwp of x equals w times x plus b.

用於擬合訓練集的模型是 x 的線性函數 fwp 等於 w 乘以 x 再加上 b。
To introduce a little bit more terminology, the w and b are called the parameters of the model.

再介紹一下術語，w 和 b 稱為模型參數。
In machine learning, parameters of a model are the variables you can adjust during training in order to improve the model.

在機器學習中，模型參數是指在訓練過程中可以調整的變量，以便改進模型。
Sometimes you also hear the parameters w and b referred to as coefficients or as weights.

有時，你也會聽到參數 w 和 b 被稱為係數或權重。
Now, let's take a look at what these parameters w and b do.

現在，讓我們來看看 w 和 b 這兩個參數的作用。
Depending on the values you've chosen for w and b, you get a different function f of x, which generates a different line on the graph.

根據你選擇的 w 和 b 值，你會得到不同的 x 的函數 f，從而在圖形上生成不同的線。
Remember that we can write f of x as a shorthand for fwp of x.

請記住，我們可以把 x 的 f 寫成 x 的 fwp 的簡稱。
We're going to take a look at some plots of f of x on a chart.

我們將在圖表上看到一些 x 的 f 的曲線圖。
Maybe you're already familiar with drawing lines on charts, but even if this is a review for you, I hope this will help you build intuition on how w and b, the parameters, determine f.

也許你已經熟悉了在圖表上畫線，但即使這是你的複習，我也希望這能幫助你建立直覺，瞭解參數 w 和 b 是如何決定 f 的。
When w is equal to 0 and b is equal to 1.5, then f looks like this horizontal line.

當 w 等於 0，b 等於 1.5 時，f 就像這條水平線。
In this case, the function f of x is 0 times x plus 1.5, so f is always a constant value.

在這種情況下，x 的函數 f 是 0 乘以 x 加 1.5，是以 f 始終是一個常量。
It always predicts 1.5 for the estimated value of y, so y hat is always equal to b.

它總是預測 y 的估計值為 1.5，是以 y 帽子總是等於 b。
Here, b is also called the y-intercept because that's where it crosses the vertical axis or the y-axis on this graph.

在這裡，b 也被稱為 y-截距，因為它在此圖中與縱軸或 y 軸相交。
As a second example, if w is 0.5 and b is equal to 0, then f of x is 0.5 times x.

再比如，如果 w 是 0.5，b 等於 0，那麼 x 的 f 就是 x 的 0.5 倍。
When x is 0, the prediction is also 0, and when x is 2, then the prediction is 0.5 times 2, which is 1.

當 x 為 0 時，預測值也是 0；當 x 為 2 時，預測值是 2 的 0.5 倍，即 1。
You get a line that looks like this.

你會看到這樣一行字
Notice that the slope is 0.5 divided by 1, so the value of w gives you the slope of the line, which is 0.5.

請注意，斜率是 0.5 除以 1，是以 w 的值就是直線的斜率，即 0.5。
Finally, if w equals 0.5 and b equals 1, then f of x is 0.5 times x plus 1.

最後，如果 w 等於 0.5，b 等於 1，那麼 x 的 f 就是 x 的 0.5 倍加 1。
When x is 0, then f of x equals b, which is 1.

當 x 為 0 時，x 的 f 等於 b，即 1。
The line intersects the vertical axis at b, the y-intercept.

直線與縱軸相交於 b，即 y 的截距。
Also, when x is 2, then f of x is 2, so the line looks like this.

另外，當 x 為 2 時，則 x 的 f 為 2，是以直線看起來像這樣。
Again, the slope is 0.5 divided by 1, so the value of w gives you the slope, which is 0.5.

同樣，斜率是 0.5 除以 1，所以 w 的值就是斜率，即 0.5。
Recall that you have a training set like the one shown here.

回想一下，你有一個訓練集，如圖所示。
With linear regression, what you want to do is to choose values for the parameters w and b, so that the straight line you get from the function f somehow fits the data well, like maybe this line shown here.

在線性迴歸中，您要做的是選擇參數 w 和 b 的值，這樣您從函數 f 中得到的直線就能很好地擬合數據，就像這裡顯示的這條直線。
When I say that the line fits the data visually, you can think of this to mean that the line defined by f is roughly passing through or somewhat close to the training examples as compared to other possible lines that are not as close to these points.

當我說這條直線直觀地擬合了數據時，你可以認為這意味著 f 定義的直線大致經過或在一定程度上接近訓練示例，而其他可能的直線並不那麼接近這些點。
Just to remind you of some notation, a training example like this point here is defined by x superscript i, y superscript i, where y is the target.

為了提醒大家注意一些符號，像這個點這樣的訓練示例是由 x 上標 i、y 上標 i 定義的，其中 y 是目標。
For a given input x i, the function f also makes a predicted value for y, and the value that it predicts for y is y hat i shown here.

對於給定的輸入 x i，函數 f 也會為 y 預測一個值，它為 y 預測的值就是這裡顯示的 y hat i。
For our choice of a model, f of x i is w times x i plus b.

在我們選擇的模型中，x i 的 f 是 w 乘以 x i 再加上 b。
Stated differently, the prediction y hat i is f of w b of x i, where for the model we're using, f of x i is equal to w x i plus b.

換句話說，預測結果 y hat i 是 x i 的 w b 的 f，而對於我們使用的模型，x i 的 f 等於 w x i 加上 b。
So now the question is, how do you find values for w and b so that the prediction y hat i is close to the true target y i for many or maybe all training examples x i, y i?

那麼現在的問題是，如何找到 w 和 b 的值，使預測結果 y hat i 接近於許多或所有訓練示例 x i、y i 的真實目標 y i？
To answer that question, let's first take a look at how to measure how well a line fits the training data.

要回答這個問題，我們先來看看如何測量直線與訓練數據的擬合程度。
To do that, we're going to construct our cost function.

為此，我們要構建成本函數。
The cost function takes the prediction y hat and compares it to the target y by taking y hat minus y.

成本函數將預測值 y hat 與目標值 y 進行比較，將 y hat 減去 y。
This difference is called the error.

這個差值被稱為誤差。
We're measuring how far off the prediction is from the target.

我們測量的是預測與目標的偏差程度。
Next, let's compute the square of this error.

接下來，我們來計算這個誤差的平方。
Also, we're going to want to compute this term for different training examples i in the training set.

此外，我們還需要針對訓練集中不同的訓練示例 i 計算這個項。
So when measuring the error, for example, i, we'll compute this squared error term.

是以，在測量誤差（例如 i）時，我們將計算這個平方誤差項。
Finally, we want to measure the error across the entire training set.

最後，我們要測量整個訓練集的誤差。
In particular, let's sum up the squared errors like this.

具體來說，我們可以這樣總結平方誤差。
We'll sum from i equals 1, 2, 3, all the way up to m.

我們將從 i 等於 1、2、3，一直求和到 m。
Remember that m is the number of training examples, which is 47 for this dataset.

請記住，m 是訓練示例的數量，這個數據集的訓練示例數量為 47 個。
Notice that if we have more training examples, m is larger and your cost function will calculate a bigger number since it's summing over more examples.

請注意，如果我們有更多的訓練示例，m 就會更大，你的成本函數計算出的數字也會更大，因為它要對更多的示例求和。
So to build a cost function that doesn't automatically get bigger as the training set size gets larger, by convention, we will compute the average squared error instead of the total squared error and we do that by dividing by m like this.

是以，為了建立一個不會隨著訓練集規模增大而自動變大的成本函數，按照慣例，我們將計算平均平方誤差，而不是總平方誤差。
We're nearly there.

我們快到了
Just one last thing.

最後一件事
By convention, the cost function that machine learning people use actually divides by 2 times m.

按照慣例，機器學習人員使用的成本函數實際上是除以 2 乘以 m。
The extra division by 2 is just meant to make some of our later calculations a little bit neater, but the cost function still works whether you include this division by 2 or not.

額外的除以 2 只是為了使我們後面的一些計算更加簡潔，但無論是否包含除以 2，成本函數仍然有效。
This expression right here is the cost function and we're going to write J of WB to refer to the cost function.

這裡的表達式就是成本函數，我們將用 J of WB 來表示成本函數。
This is also called the squared error cost function and it's called this because you're taking the square of these error terms.

這也叫平方誤差成本函數，之所以叫平方誤差成本函數，是因為要取這些誤差項的平方。
In machine learning, different people will use different cost functions for different applications.

在機器學習中，不同的人會在不同的應用中使用不同的成本函數。
But the squared error cost function is by far the most commonly used one for linear regression, and for that matter, for all regression problems, where it seems to give good results for many applications.

但平方誤差成本函數是迄今為止線性迴歸中最常用的函數，也是所有迴歸問題中最常用的函數，在許多應用中似乎都能取得很好的結果。
Just as a reminder, the prediction y hat is equal to the output of the model f at x.

提醒一下，預測結果 y hat 等於模型 f 在 x 處的輸出。
We can rewrite the cost function J of WB as 1 over 2m times the sum from i equals 1 to m of f of x, i minus yi, the quantity squared.

我們可以將 WB 的成本函數 J 改寫為 1 乘以 2m 乘以 i 等於 1 到 m 的 x 的 f 的總和，i 減去 yi，即數量的平方。
Eventually, we're going to want to find values of W and B that make the cost function small.

最終，我們需要找到 W 和 B 的值，使成本函數變小。
But before going there, let's first gain more intuition about what J of WB is really computing.

但在此之前，讓我們先對 J of WB 的真正計算方式有更直觀的瞭解。
At this point, you might be thinking we've done a whole lot of math to define the cost function, but what exactly is it doing?

說到這裡，你可能會想，我們做了一大堆數學運算來定義成本函數，但它到底在做什麼呢？
Let's go on to the next video where we'll step through one example of what the cost function is really computing, that I hope will help you build intuition about what it means if J of WB is large versus if the cost J is small.

讓我們繼續觀看下一個視頻，我們將舉例說明成本函數的真正計算方法，希望能幫助大家建立直覺，瞭解 WB 的 J 大與成本 J 小的區別。
Let's go on to the next video.

讓我們繼續觀看下一段視頻。