Name: 事件歷史分析：考克斯模型 (Event History Analysis: the Cox model)
Uploaded: 2021-01-14T08:26:39.000Z
Duration: 8 min 25 s
Description: 【看影片學英語】數萬部 YouTube 影片，搭配英漢字典即點即查，輕鬆掌握單字發音與用法，長久累積看電影不必再看字幕。

This is an introduction to modeling in event history analysis.

The 1st part deals with the famous Cox model. The brilliant idea of David R Cox in 1972

was to combine two types of analysis: regression and life tables. The Cox model

can be seen as the control of the effect of the explanatory variables in the

survival analysis through regression, or as the introduction of the temporal

dimension in the regression.  The advantage of one technique can make it

possible to fill the gaps of the other. In the case of the logit model, odds of

belonging to a category are computed at a given point in the life of the

individual regardless of when the status changed.  The duration, the elapsed time is

therefore an important dimension that is missing in the logit model.

In particular, the censoring by the date of the survey or emigration is not taken

into account. A good part of the sample whose observations are censored is not

taken into account in the analysis if  we do not explicitly consider time.

On the other hand, if we simply make the description of the event by the survival

table technique, it would be difficult to control the influence of explanatory variables.

Splitting the sample into different categories according to

generations ,or rural origin, etc., leads to small sub-samples with insufficient

number for analysis, especially to measure the combined influence of

several explanatory factors.  To solve both the problem of duration and

that of explanatory factors, David Cox's idea  was to combine survival analysis with

regression analysis. First, Cox proposed a regression not on the characteristics

acquired by the individual at the end of his life or at the time of the observation

but on the characteristic aquired each year of life. In a way, each year lived

by each member of the sample constitutes an observation.

The reference category of the regression  is not unique for the whole sample

but it is specific to each observation period.  This series of probabilities makes it possible to

establish a reference survival curve, also called a baseline survival function.

This is the nonparametric part of the model.  Then the Cox regression model

calculates the effect of the explanatory variables on the annual risk of

experiencing the event. Each variable is associated with a regression coefficient

that measures the average effect of this variable on the annual risk.

This is the parametric part of the model.  In this model h0(t) is the hazard function

for the reference category,  Bi is a series of coefficients associated with

indicator variables Xij. The model therefore has a nonparametric component

the baseline hazard function formed from the series of hazards h0(t),

and a parametric component, the vector of independent variables.

Because of these 2 components, the model  is also called the semi-parametric model.

in fact, for statistical computations reasons, it is the logarithms of the hazards

and not the hazards themselves  that are modeled in an additive model.

The model is part of the family of log-linear models.  But at the moment of analysis, it is usually

the exponential of the coefficients  that are interpreted as multiplicative effects.

The coefficients of the regression  do not have an easy

the only explanatory element in this minimal model is the entry of the

individual into the population subjected to the risk with such or such characteristics.

The relation of the diagram reads: entry into observation O

at time (t - 1) with X being a possible cause of the occurrence of

event E in the interval (t - 1, t). This representation follows the

principle of the anteriority of the cause X on the effect E.

The probability of occurrence of the event varies depending on whether the individual has

characteristics X or not. It is assumed that the observation time interval is

small enough that the risk is constant during the interval. Here again the

smaller the interval the weakest this assumption. The calculation is repeated

as many times as they are time intervals until the end of observation OBE.

Although X is not an event, we can consider it as such on the interval (t-1,t).

Indeed, if X is defined at the beginning of each time interval, and if

the calculated risk is assumed to be constant over the interval, we approach

the causal relationship where O,  the observation entry at the beginning of

the interval is taken as an explanatory event,  since one must be present at time (t-1)

to experience the risk in the interval (t - 1, t). We are very close

to the basic causal relationship but not quite.  The effect X is not calculated

separately over each time interval but averaged over all time interval.

Each variable X is therefore not associated with a particular unit of time,

which distinguishes it from a cause precisely located in time and event.

One says that the effect of the variable is proportional to the annual probability

of knowing the event.  This is why the Cox model

is called a proportional hazard model. Let's take a very simple example

with a single explanatory variable,  for example sex.

the corresponding coefficient B1.  This model is as follows:

let's see 2 possible cases,  either the individual is exposed or is is not.

For example either he is a man or is not.  If the individual is exposed then X1 is equal to 1

and the model is written h0(t) * exp(B1).  If the individual is not exposed,

then X1 is 0 and the expression  is reduced to h0(t).

We can see that the exponential of the B1  does not depend on "t" and therefore applies

multiplicatively to all the values of h0(t).  It is therefore assumed that the

explanatory variables apply to the entire hazard function whatever "t".

This assumption of proportionality is quite strong  and it is necessary to test it

for each variable of the model.  If it is not verified, the model becomes

inconsistent and it is then necessary to consider stratifying the sample

according to the incriminated variable. Graphical and statistical methods make

it possible to test this assumption which we'll see in the following screencast.

Thank you for your attention... and work well!