Placeholder Image

字幕列表 影片播放

由 AI 自動生成
  • In this video, we'll introduce terms and notation that we'll use throughout this course.

    在本視頻中,我們將介紹本課程中會用到的術語和符號。

  • Let's start with variable types.

    讓我們從變量類型開始。

  • We'll compare and contrast two pairs of variable types.

    我們將對比兩對變量類型。

  • Here's the first.

    這是第一個。

  • The first pair is response variable versus explanatory variable.

    第一對是響應變量與解釋變量。

  • The analyst is primarily interested in the response variable.

    分析人員主要關注的是響應變量。

  • We want to know if, and how, we can understand the response variable better using other variables.

    我們想知道是否以及如何利用其他變量更好地理解響應變量。

  • In the Commute and Chris setup, questions 1 and 2 have commute as the response variable.

    在 "通勤 "和 "克里斯 "設置中,問題 1 和 2 將 "通勤 "作為響應變量。

  • Chris hopes to understand how commute is affected by other variables.

    克里斯希望瞭解其他變量對通勤的影響。

  • The response variable goes by other names, like output variable and dependent variable.

    響應變量還有其他名稱,如輸出變量和因變量。

  • In contrast, an explanatory variable is any variable used to study the response variable.

    相反,解釋變量是用於研究響應變量的任何變量。

  • The goal here is to find potential relationships between the response variable and an explanatory variable.

    這樣做的目的是找到響應變量與解釋變量之間的潛在關係。

  • In the Commute and Chris setup, question 1 uses departure as an explanatory variable to analyze commute.

    在通勤和克里斯的設置中,問題 1 使用出發作為解釋變量來分析通勤情況。

  • We also call explanatory variables input variables or independent variables.

    我們也稱解釋變量為輸入變量或自變量。

  • Sometimes, we may even call them predictors or features.

    有時,我們甚至可以稱它們為預測因子或特徵。

  • Even though we can call response and explanatory variables dependent and independent variables, they are conceptually different from the independent random variables that we encountered in probability.

    儘管我們可以把反應變量和解釋變量稱為因變量和自變量,但它們在概念上與我們在概率論中遇到的獨立隨機變量不同。

  • Okay, that's our first pair.

    好了,這是我們的第一對。

  • Our second variable type pair is a quantitative variable versus a qualitative variable.

    我們的第二對變量類型是定量變量與定性變量。

  • As the name suggests, quantitative variables take on quantities, which we can divide into two main groups, which creates count variables and continuous variables.

    顧名思義,定量變量具有數量,我們可以將其分為兩大類,即計數變量和連續變量。

  • Count variables take on non-negative integers, while continuous variables take on values from an interval.

    計數變量取值為非負整數,而連續變量取值為區間值。

  • We commonly call qualitative variables as categorical variables.

    我們通常將定性變量稱為分類變量。

  • These variables take on a small number of possible categories, also known as classes or levels.

    這些變量有少量可能的類別,也稱為類別或級別。

  • It's common that we'll assign numbers to the categories, but this does not convert the variables into count variables.

    我們通常會給類別分配數字,但這並不會將變量轉換為計數變量。

  • Okay, let's review the commute and Chris setup and categorize the eight variables into count, continuous, and categorical variables.

    好了,讓我們回顧一下通勤和克里斯的設置,並將八個變量分為計數變量、連續變量和分類變量。

  • Commute is measured in minutes.

    通勤時間以分鐘計算。

  • Because any value exceeding zero is possible, commute is a continuous variable.

    因為任何超過零的值都是可能的,所以通勤是一個連續變量。

  • For similar reasons, departure, temp, and precip chance are also continuous variables.

    出於類似的原因,偏離、溫度和降水概率也是連續變量。

  • They just take on values from different intervals.

    它們只是在不同的時間間隔內取值。

  • Next, precip, season, and accident all take on two or four possible outcomes, making them categorical variables.

    其次,降水、季節和事故都有兩種或四種可能的結果,是以是分類變量。

  • Last is police, which takes on non-negative integers, making it a count variable.

    最後是警察,它接受非負整數,是一個計數變量。

  • We can subdivide categorical variables further into nominal and ordinal variables.

    我們可以將分類變量進一步細分為名義變量和順序變量。

  • If there's not a meaningful order to the categories, then it's a nominal variable.

    如果分類沒有一個有意義的順序,那麼它就是一個名義變量。

  • In the case where we assign numbers to categories, the numbers only act as labels.

    在我們為類別分配數字的情況下,數字只起到標籤的作用。

  • If there is a meaningful order to the categories, then it's an ordinal variable.

    如果類別有一個有意義的順序,那麼它就是一個序數變量。

  • In the case where numbers are assigned to the categories, the numbers communicate the order.

    在為類別分配數字的情況下,數字表示順序。

  • Let's use season from the commute and Chris setup as an example.

    讓我們以通勤中的季節和克里斯的設置為例。

  • If we assign 1 to winter, 2 to spring, 3 to summer, and 4 to fall, then the categories follow the calendar seasons in sequence and therefore have meaningful order.

    如果我們把 1 指定為冬季,2 指定為春季,3 指定為夏季,4 指定為秋季,那麼這些類別就會按照日曆上的季節順序排列,是以就有了有意義的順序。

  • This makes season an ordinal variable.

    這使得季節成為一個順序變量。

  • If instead we assign numbers based on alphabetical order of the seasons, then we do not have meaningful order to the categories.

    如果我們根據季節的字母順序來分配數字,那麼我們的分類順序就沒有意義了。

  • And this makes season a nominal variable.

    這使得季節成為一個名義變量。

  • Now that we've explored variable types, let's establish basic notation that we'll use throughout this course.

    既然我們已經瞭解了變量類型,那麼我們就來建立本課程中將一直使用的基本符號。

  • We denote variables in general by the letter x.

    我們一般用字母 x 來表示變量。

  • If there are multiple variables, we use the subscript j to distinguish between variables.

    如果存在多個變量,我們使用下標 j 來區分變量。

  • However, it's common to use the letter y to denote response variables.

    不過,通常使用字母 y 來表示響應變量。

  • We use p to represent the number of variables in a dataset, excluding the response variable if there is one.

    我們用 p 表示數據集中的變量數量,如果有響應變量,則不包括響應變量。

  • This means j can take on integer values from 1 to p.

    這意味著 j 可以取 1 到 p 的整數值。

  • For example, x sub 2 represents the second explanatory variable.

    例如,x 子 2 代表第二個解釋變量。

  • Now, what if we want to refer to a specific observation of a variable?

    現在,如果我們想引用變量的某個具體觀測值,該怎麼辦?

  • We use subscript i for this, and the letter n represents the total number of observations in the dataset.

    我們使用下標 i 來表示,字母 n 代表數據集中的觀察結果總數。

  • This means i can take on integer values from 1 to n.

    這意味著 i 可以取 1 到 n 的整數值。

  • Using the commute and Chris scenario as an example, the fifth observation contains values from the fifth recorded day.

    以通勤和克里斯的情況為例,第五個觀測值包含第五個記錄日的值。

  • These include y sub 5, the response variable data point recorded on that day, and x sub 5 comma 1 through x sub 5 comma p, data points for the explanatory variables from the same day.

    其中包括 y sub 5(當天記錄的響應變量數據點)和 x sub 5 逗號 1 至 x sub 5 逗號 p(當天的解釋變量數據點)。

  • But a word of caution about subscripts.

    但關於下標,還是要提醒一下。

  • When x has two numbers in its subscript, the first number is i, the second number is j, as we have shown.

    當 x 的下標中有兩個數字時,第一個數字是 i,第二個數字是 j,如我們所示。

  • However, if x has only one number in its subscript, it can be either i or j.

    但是,如果 x 的下標只有一個數字,那麼它可以是 i 或 j。

  • So, how can we identify which is which?

    那麼,我們怎樣才能識別哪個是哪個呢?

  • Well, it depends on the context.

    這要看具體情況。

  • Make sure you read carefully.

    請務必仔細閱讀。

  • In general, if there is only one x variable, that is, p equals 1, then there is no purpose for subscript j.

    一般來說,如果只有一個 x 變量,即 p 等於 1,那麼就不需要下標 j 了。

  • So, in this case, the one number in the subscript is usually i.

    是以,在這種情況下,下標中的一個數字通常是 i。

  • However, if there are multiple x variables, then the one number in the subscript is usually j.

    不過,如果有多個 x 變量,那麼下標中的一個數字通常就是 j。

  • You may recall from a probability course that we use uppercase letters to represent random variables, such as capital X and capital Y.

    您可能還記得,在概率課程中,我們用大寫字母來表示隨機變量,如大寫 X 和大寫 Y。

  • We can add subscripts to these letters the same way.

    我們可以用同樣的方法為這些字母添加下標。

  • Introducing subscript i adds clarity, but it can also make equations or expressions messy and difficult to read.

    引入下標 i 會增加清晰度,但也會使等式或表達式變得混亂難讀。

  • But we combat this problem by moving to vector and matrix notations.

    不過,我們通過改用向量和矩陣符號來解決這個問題。

  • If we use a matrix to represent a data set, rows represent the observations, while columns represent the variables.

    如果我們用矩陣來表示數據集,那麼行代表觀測值,列代表變量。

  • We'll see more of this in future sections, but for now, let's review some basic facts about matrices.

    我們將在以後的章節中看到更多這方面的內容,但現在,讓我們回顧一下有關矩陣的一些基本事實。

  • First, for matrix A, A superscript T is A's transpose.

    首先,對於矩陣 A,A 的上標 T 是 A 的轉置。

  • Transposing simply means swapping the rows and columns so that the k-th column becomes the k-th row, and vice versa.

    對換簡單地說就是交換行和列,使第 k 列變成第 k 行,反之亦然。

  • Notice transposing a matrix also reverses its dimensions.

    請注意,矩陣的轉置也會反轉其維度。

  • If A is an A by B matrix, then A transpose is a B by A matrix.

    如果 A 是一個 A 乘 B 的矩陣,那麼 A 的轉置就是一個 B 乘 A 的矩陣。

  • And second, A superscript negative one is A's inverse.

    其次,A 的上標負一是 A 的倒數。

  • If we multiply a matrix by its inverse in any order, we will get the identity matrix.

    如果我們以任何順序將矩陣與它的逆矩陣相乘,就會得到同一矩陣。

  • Note that the identity matrix has ones in its diagonal and zeros elsewhere.

    請注意,同一矩陣的對角線上為 1,其他地方為 0。

  • One of the enemies in this course is confusion.

    本課程的敵人之一就是混亂。

  • We'll try to minimize confusion by using clear and consistent notation.

    我們將盡量使用清晰一致的符號,以減少混淆。

  • However, don't assume that the conventions that we use here are universal.

    不過,不要以為我們在這裡使用的慣例是通用的。

  • Remember, notation only represents concepts.

    記住,符號只代表概念。

  • However, authors may use different notation to suit their needs.

    不過,作者可以根據自己的需要使用不同的符號。

  • They may even use the same notation for different but similar concepts.

    他們甚至可能對不同但相似的概念使用相同的符號。

  • So, train yourself to distinguish the concept from the notation.

    是以,要訓練自己區分概念和符號。

In this video, we'll introduce terms and notation that we'll use throughout this course.

在本視頻中,我們將介紹本課程中會用到的術語和符號。

字幕與單字
由 AI 自動生成

單字即點即查 點擊單字可以查詢單字解釋