字幕列表 影片播放 列印英文字幕 Hi. It is time for the actual introduction of regressions. Let's start with some dry theory. A linear regression is a linear approximation of a causal relationship between two or more variables. Regressions models are highly valuable, as they're one of the most common ways to make inferences and predictions. The process goes like this. You get sample data, come up with a model that explains the data and then make predictions for the whole population. Based on the model you have developed, there is a dependent variable labeled Why being predicted and an independent variable labeled x one x two and so forth. These are the predictors. Why is a function of the ex variables? And the regression model is a linear approximation of this function. The easiest regression model is the simple linear regression. Why is equal to beta zero plus beta, one times X plus epsilon? Let's see what these values mean. Why is the variable we're trying to predict and is called the dependent? Variable X is the independent variable. When using regression analysis, we want to predict the value of why provided we have the value of X, but to have a regression, why must depend on X in some causal way. Whenever there is a change in X, such change must translate into a change in why think about the following equation. The income a person receives depends on the number of years of education that person has received. The dependent variable is income, while the independent variable is the years of education. There is a causal relationship between the two. The more education you get, the higher income you're likely to receive. This relationship is so trivial that it is probably the reason you are watching this course right now. You want to get a higher income, so you are increasing your education. Now let's pause for a second and think about the reverse relationship. What if education depends on income? This would mean the higher your income, the more years you spend educating yourself. Putting high tuition fees aside. Wealthier individuals don't spend more years in school, and high school and college take the same number of years, no matter your tax bracket. Therefore, a causal relationship like this one is faulty, if not plain wrong. Hence it is unfit for regression analysis. Let's go back to the original example. Income is a function of education. The more years you study, the higher income you will receive. This sounds about right. All right. What we haven't mentioned so far is that in our model there are coefficients. Beta one is the coefficient that stands before the independent variable. It quantifies the effective education on income. If beta one is 50 than for each additional year of education, your income would grow by $50 in the U. S. A. The number is much bigger somewhere around 3 to $5000. So for each additional year you spend on education, your yearly income is expected to rise by 3 to $5000. And that's not considering higher education or tailored courses like this one. The other two components are the constant beta zero and the error Absalon. In this example, you can think of the constant beta zero as the minimum wage. No matter your education, if you have a job, you will get the minimum wage. This is a guaranteed amount. So if you never went to school and plug in an education value of zero years in the formula, the regression will predict that your income will be the minimum wage makes sense, right? The last term is Absalon. This represents the error of estimation. The error is the actual difference between the observed income and the income. The regression predicted on average, across all observations, the error is zero. If you earn more than what the regression has predicted, then someone earns less than what the regression has predicted. Everything evens out all right. The original formula was written with Greek letters. What does this tell us? It was the population formula. But we know statistics is all about sample data. In practice, we use the linear regression equation. It is simply why hat equals B zero plus B one times X. You heard it right? The why here is referred to as Why hat? Whenever we have a hat symbol, it is an estimated or a predicted value. Be zero is the estimate of the regression constant beta zero, while be one is the estimate of beta one and X is the sample data for the independent variable form or videos like this one?