## 字幕列表 影片播放

• In statistics, logistic regression, or logit regression, is a type of probabilistic statistical

• classification model. It is also used to predict a binary response from a binary predictor,

• used for predicting the outcome of a categorical dependent variable based on one or more predictor

• variables. That is, it is used in estimating the parameters of a qualitative response model.

• The probabilities describing the possible outcomes of a single trial are modeled, as

• a function of the explanatory variables, using a logistic function. Frequently "logistic

• regression" is used to refer specifically to the problem in which the dependent variable

• is binarythat is, the number of available categories is twowhile problems with more

• than two categories are referred to as multinomial logistic regression or, if the multiple categories

• are ordered, as ordered logistic regression. Logistic regression measures the relationship

• between a categorical dependent variable and one or more independent variables, which are

• usually continuous, by using probability scores as the predicted values of the dependent variable.

• As such it treats the same set of problems as does probit regression using similar techniques.

• Fields and examples of applications Logistic regression was put forth in the 1940s

• as an alternative to Fisher's 1936 classification method, linear discriminant analysis. It is

• used extensively in numerous disciplines, including the medical and social science fields.

• For example, the Trauma and Injury Severity Score, which is widely used to predict mortality

• in injured patients, was originally developed by Boyd et al. using logistic regression.

• Logistic regression might be used to predict whether a patient has a given disease, based

• on observed characteristics of the patient. Another example might be to predict whether

• an American voter will vote Democratic or Republican, based on age, income, gender,

• race, state of residence, votes in previous elections, etc. The technique can also be

• used in engineering, especially for predicting the probability of failure of a given process,

• system or product. It is also used in marketing applications such as prediction of a customer's

• propensity to purchase a product or cease a subscription, etc. In economics it can be

• used to predict the likelihood of a person's choosing to be in the labor force, and a business

• application would be to predict the likehood of a homeowner defaulting on a mortgage. Conditional

• random fields, an extension of logistic regression to sequential data, are used in natural language

• processing. Basics

• Logistic regression can be binomial or multinomial. Binomial or binary logistic regression deals

• with situations in which the observed outcome for a dependent variable can have only two

• possible types. Multinomial logistic regression deals with situations where the outcome can

• have three or more possible types. In binary logistic regression, the outcome is usually

• coded as "0" or "1", as this leads to the most straightforward interpretation. If a

• particular observed outcome for the dependent variable is the noteworthy possible outcome

• it is usually coded as "1" and the contrary outcome as "0". Logistic regression is used

• to predict the odds of being a case based on the values of the independent variables.

• The odds are defined as the probability that a particular outcome is a case divided by

• the probability that it is a noncase. Like other forms of regression analysis, logistic

• regression makes use of one or more predictor variables that may be either continuous or

• categorical data. Unlike ordinary linear regression, however, logistic regression is used for predicting

• binary outcomes of the dependent variable rather than continuous outcomes. Given this

• difference, it is necessary that logistic regression take the natural logarithm of the

• odds of the dependent variable being a case to create a continuous criterion as a transformed

• version of the dependent variable. Thus the logit transformation is referred to as the

• link function in logistic regressionalthough the dependent variable in logistic regression

• is binomial, the logit is the continuous criterion upon which linear regression is conducted.

• The logit of success is then fit to the predictors using linear regression analysis. The predicted

• value of the logit is converted back into predicted odds via the inverse of the natural

• logarithm, namely the exponential function. Therefore, although the observed dependent

• variable in logistic regression is a zero-or-one variable, the logistic regression estimates

• the odds, as a continuous variable, that the dependent variable is a success. In some applications

• the odds are all that is needed. In others, a specific yes-or-no prediction is needed

• for whether the dependent variable is or is not a case; this categorical prediction can

• be based on the computed odds of a success, with predicted odds above some chosen cut-off

• value being translated into a prediction of a success.

• Logistic function, odds ratio, and logit

• An explanation of logistic regression begins with an explanation of the logistic function,

• which always takes on values between zero and one:

• and viewing as a linear function of an explanatory variable , the logistic function can be written

• as:

• This will be interpreted as the probability of the dependent variable equalling a "success"

• or "case" rather than a failure or non-case. We also define the inverse of the logistic

• function, the logit:

• and equivalently:

• A graph of the logistic function is shown in Figure 1. The input is the value of and

• the output is . The logistic function is useful because it can take an input with any value

• from negative infinity to positive infinity, whereas the output is confined to values between

• 0 and 1 and hence is interpretable as a probability. In the above equations, refers to the logit

• function of some given linear combination of the predictors, denotes the natural logarithm,

• is the probability that the dependent variable equals a case, is the intercept from the linear

• regression equation, is the regression coefficient multiplied by some value of the predictor,

• and base denotes the exponential function. The formula for illustrates that the probability

• of the dependent variable equaling a case is equal to the value of the logistic function

• of the linear regression expression. This is important in that it shows that the value

• of the linear regression expression can vary from negative to positive infinity and yet,

• after transformation, the resulting expression for the probability ranges between 0 and 1.

• The equation for illustrates that the logit is equivalent to the linear regression expression.

• Likewise, the next equation illustrates that the odds of the dependent variable equaling

• a case is equivalent to the exponential function of the linear regression expression. This

• illustrates how the logit serves as a link function between the probability and the linear

• regression expression. Given that the logit ranges between negative infinity and positive

• infinity, it provides an adequate criterion upon which to conduct linear regression and

• the logit is easily converted back into the odds.

• Multiple explanatory variables If there are multiple explanatory variables,

• then the above expression can be revised to Then when this is used in the equation relating

• the logged odds of a success to the values of the predictors, the linear regression will

• be a multiple regression with m explanators; the parameters for all j = 0, 1, 2, ..., m

• are all estimated. Model fitting

• Estimation Maximum likelihood estimation

• The regression coefficients are usually estimated using maximum likelihood estimation. Unlike

• linear regression with normally distributed residuals, it is not possible to find a closed-form

• expression for the coefficient values that maximizes the likelihood function, so an iterative

• process must be used instead, for example Newton's method. This process begins with

• a tentative solution, revises it slightly to see if it can be improved, and repeats

• this revision until improvement is minute, at which point the process is said to have

• converged. In some instances the model may not reach

• convergence. When a model does not converge this indicates that the coefficients are not

• meaningful because the iterative process was unable to find appropriate solutions. A failure

• to converge may occur for a number of reasons: having a large proportion of predictors to

• cases, multicollinearity, sparseness, or complete separation.

• Having a large proportion of variables to cases results in an overly conservative Wald

• statistic and can lead to nonconvergence. Multicollinearity refers to unacceptably high

• correlations between predictors. As multicollinearity increases, coefficients remain unbiased but

• standard errors increase and the likelihood of model convergence decreases. To detect

• multicollinearity amongst the predictors, one can conduct a linear regression analysis

• with the predictors of interest for the sole purpose of examining the tolerance statistic

• used to assess whether multicollinearity is unacceptably high.

• Sparseness in the data refers to having a large proportion of empty cells. Zero cell

• counts are particularly problematic with categorical predictors. With continuous predictors, the

• model can infer values for the zero cell counts, but this is not the case with categorical

• predictors. The reason the model will not converge with zero cell counts for categorical

• predictors is because the natural logarithm of zero is an undefined value, so final solutions

• to the model cannot be reached. To remedy this problem, researchers may collapse categories

• in a theoretically meaningful way or may consider adding a constant to all cells.

• Another numerical problem that may lead to a lack of convergence is complete separation,

• which refers to the instance in which the predictors perfectly predict the criterion

• all cases are accurately classified. In such instances, one should reexamine the data,

• as there is likely some kind of error. Although not a precise number, as a general

• rule of thumb, logistic regression models require a minimum of 10 events per explaining

• variable. Minimum chi-squared estimator for grouped

• data While individual data will have a dependent

• variable with a value of zero or one for every observation, with grouped data one observation

• is on a group of people who all share the same characteristics; in this case the researcher

• observes the proportion of people in the group for whom the response variable falls into

• one category or the other. If this proportion is neither zero nor one for any group, the

• minimum chi-squared estimator involves using weighted least squares to estimate a linear

• model in which the dependent variable is the logit of the proportion: that is, the log

• of the ratio of the fraction in one group to the fraction in the other group.

• Evaluating goodness of fit Goodness of fit in linear regression models

• is generally measured using the R2. Since this has no direct analog in logistic regression,

• various methods including the following can be used instead.

• Deviance and likelihood ratio tests In linear regression analysis, one is concerned

• with partitioning variance via the sum of squares calculationsvariance in the criterion

• is essentially divided into variance accounted for by the predictors and residual variance.

• In logistic regression analysis, deviance is used in lieu of sum of squares calculations.

• Deviance is analogous to the sum of squares calculations in linear regression and is a

• measure of the lack of fit to the data in a logistic regression model. Deviance is calculated

• by comparing a given model with the saturated model – a model with a theoretically perfect

• fit. This computation is called the likelihood-ratio test:

• In the above equation D represents the deviance and ln represents the natural logarithm. The

• log of the likelihood ratio will produce a negative value, so the product is multiplied

• by negative two times its natural logarithm to produce a value with an approximate chi-squared

• distribution. Smaller values indicate better fit as the fitted model deviates less from

• the saturated model. When assessed upon a chi-square distribution, nonsignificant chi-square

• values indicate very little unexplained variance and thus, good model fit. Conversely, a significant

• chi-square value indicates that a significant amount of the variance is unexplained.

• Two measures of deviance are particularly important in logistic regression: null deviance

• and model deviance. The null deviance represents the difference between a model with only the

• intercept and the saturated model. And, the model deviance represents the difference between

• a model with at least one predictor and the saturated model. In this respect, the null

• model provides a baseline upon which to compare predictor models. Given that deviance is a

• measure of the difference between a given model and the saturated model, smaller values

• indicate better fit. Therefore, to assess the contribution of a predictor or set of

• predictors, one can subtract the model deviance from the null deviance and assess the difference

• on a chi-square distribution with degree of freedom equal to the difference in the number

• of parameters estimated. Let

• Then

• If the model deviance is significantly smaller than the null deviance then one can conclude

• that the predictor or set of predictors significantly improved model fit. This is analogous to the

• F-test used in linear regression analysis to assess the significance of prediction.

• Pseudo-R2s In linear regression the squared multiple

• correlation, R2 is used to assess goodness of fit as it represents the proportion of

• variance in the criterion that is explained by the predictors. In logistic regression

• analysis, there is no agreed upon analogous measure, but there are several competing measures

• each with limitations. Three of the most commonly used indices are examined on this page beginning

• with the likelihood ratio R2, R2L:

• This is the most analogous index to the squared multiple correlation in linear regression.

• It represents the proportional reduction in the deviance wherein the deviance is treated

• as a measure of variation analogous but not identical to the variance in linear regression

• analysis. One limitation of the likelihood ratio R2 is that it is not monotonically related

• to the odds ratio, meaning that it does not necessarily increase as the odds ratio increases

• and does not necessarily decrease as the odds ratio decreases.

• The Cox and Snell R2 is an alternative index of goodness of fit related to the R2 value

• from linear regression. The Cox and Snell index is problematic as its maximum value

• is .75, when the variance is at its maximum. The Nagelkerke R2 provides a correction to

• the Cox and Snell R2 so that the maximum value is equal to one. Nevertheless, the Cox and

• Snell and likelihood ratio R2s show greater agreement with each other than either does

• with the Nagelkerke R2. Of course, this might not be the case for values exceeding .75 as

• the Cox and Snell index is capped at this value. The likelihood ratio R2 is often preferred

• to the alternatives as it is most analogous to R2 in linear regression, is independent

• of the base rate and varies between 0 and 1.

• A word of caution is in order when interpreting pseudo-R2 statistics. The reason these indices

• of fit are referred to as pseudo R2 is because they do not represent the proportionate reduction

• in error as the R2 in linear regression does. Linear regression assumes homoscedasticity,

• that the error variance is the same for all values of the criterion. Logistic regression

• will always be heteroscedasticthe error variances differ for each value of the predicted

• score. For each value of the predicted score there would be a different value of the proportionate

• reduction in error. Therefore, it is inappropriate to think of R2 as a proportionate reduction

• in error in a universal sense in logistic regression.

• HosmerLemeshow test The HosmerLemeshow test uses a test statistic

• that asymptotically follows a distribution to assess whether or not the observed event

• rates match expected event rates in subgroups of the model population.

• Evaluating binary classification performance If the estimated probabilities are to be used

• to classify each observation of independent variable values as predicting the category

• that the dependent variable is found in, the various methods below for judging the model's

• suitability in out-of-sample forecasting can also be used on the data that were used for

• estimationaccuracy, precision, recall, specificity and negative predictive value.

• In each of these evaluative methods, an aspect of the model's effectiveness in assigning

• instances to the correct categories is measured. Coefficients

• After fitting the model, it is likely that researchers will want to examine the contribution

• of individual predictors. To do so, they will want to examine the regression coefficients.

• In linear regression, the regression coefficients represent the change in the criterion for

• each unit change in the predictor. In logistic regression, however, the regression coefficients

• represent the change in the logit for each unit change in the predictor. Given that the

• logit is not intuitive, researchers are likely to focus on a predictor's effect on the exponential

• function of the regression coefficientthe odds ratio. In linear regression, the significance

• of a regression coefficient is assessed by computing a t-test. In logistic regression,

• there are several different tests designed to assess the significance of an individual

• predictor, most notably the likelihood ratio test and the Wald statistic.

• Likelihood ratio test The likelihood-ratio test discussed above

• to assess model fit is also the recommended procedure to assess the contribution of individual

• "predictors" to a given model. In the case of a single predictor model, one simply compares

• the deviance of the predictor model with that of the null model on a chi-square distribution

• with a single degree of freedom. If the predictor model has a significantly smaller deviance,

• then one can conclude that there is a significant association between the "predictor" and the

• outcome. Although some common statistical packages do provide likelihood ratio test

• statistics, without this computationally intensive test it would be more difficult to assess

• the contribution of individual predictors in the multiple logistic regression case.

• To assess the contribution of individual predictors one can enter the predictors hierarchically,

• comparing each new model with the previous to determine the contribution of each predictor.

• (There is considerable debate among statisticians regarding the appropriateness of so-called

• "stepwise" procedures. They do not preserve the nominal statistical properties and can

• be very misleading.[1] Wald statistic

• Alternatively, when assessing the contribution of individual predictors in a given model,

• one may examine the significance of the Wald statistic. The Wald statistic, analogous to

• the t-test in linear