|
|
### 1. Introduction
|
|
|
|
|
|
In statistical inferences about a population of the study, it is considered that the data and the model are given. As a result, valid inference depends on using a model that is a good representation of the data.
|
|
|
|
|
|
We should keep in mind that assessing the goodness-of-fit of a model to a dataset should never be based on a single statistic or statistical test. Actually, evaluating a model is a process of gathering evidence for and against a model or a subset of plausible models. Here we discuss three aspects: global measures for goodness-of-fit to the data, comparing competing models, and assessing local lack of fit.
|
|
|
|
|
|
### 2. Global Measures of Fit
|
|
|
|
|
|
This is a measure to compare observed values of the response variable with fitted or predicted values. There are two common measures to evaluate global measure of fit: deviance (*Dev*) and the generalized Pearson <img src="https://latex.codecogs.com/svg.latex?\chi^2" title="\chi^2" /> statistic. Deviance compares the maximum value of the likelihood function of a model (<img src="https://latex.codecogs.com/svg.latex?M_1" title="M_1" />) and the maximum possible value of the likelihood function computing using the data (<img src="https://latex.codecogs.com/svg.latex?M_y" title="M_y" />):
|
|
|
|
|
|
<img src="https://latex.codecogs.com/svg.latex?Dev&space;=&space;-2(ln(L(M_1))-ln(L(M_y)))" title="Dev = -2(ln(L(M_1))-ln(L(M_y)))" />
|
|
|
|
|
|
If the model fits the data perfectly, <img src="https://latex.codecogs.com/svg.latex?Dev&space;=&space;0" title="Dev = 0" />. In practice, <img src="https://latex.codecogs.com/svg.latex?L(M_1)&space;<&space;L(M_y)" title="L(M_1) < L(M_y)" /> and therefore, <img src="https://latex.codecogs.com/svg.latex?Dev&space;>&space;0" title="Dev > 0" />. Another common global measure of fit is a generalized Pearson's <img src="https://latex.codecogs.com/svg.latex?\chi^2" title="\chi^2" /> statistic:
|
|
|
|
|
|
<img src="https://latex.codecogs.com/svg.latex?\chi^2&space;=&space;\sum_i{\frac{(\mu_i-\hat{\mu}_i)^2}{\sqrt{var(\hat{\mu}_i)}}}" title="\chi^2 = \sum_i{\frac{(\mu_i-\hat{\mu}_i)^2}{\sqrt{var(\hat{\mu}_i)}}}" />
|
|
|
|
|
|
### 3. Comparing Models
|
|
|
|
|
|
There is always a trade off between goodness-of-fit of a model and its complexity. Too simple models have very low complexity but is not a good representation of the information in the data. On the other hand, too complex models do not provide enough of a summary of the information in the data and may capture the noise noise in the data as information.
|
|
|
|
|
|
There are two approaches to compare different models: *Likelihood Ratio Test* and *Information Criteria*. Likelihood ratio test is used to compare two models that one of them is nested inside the other (is a simpler version of the other one). Information criteria, weigh both goodness-of-fit of the model to the data and its complexity. They can be used for both nested and non-nested models.
|
|
|
|
|
|
#### 3.1 Likelihood Ratio Tests
|
|
|
|
|
|
Likelihood ratio tests are often used to compare two models that one of them is a special case of the other. For example, in the generalized linear models, they can be used to compare models with different linear predictors; or to compare models with different distributions if the distribution of one of them is a special case of the distribution of the other one. The likelihood ratio tests is a conditional test in that given the full model (the more complex model) fits the data, it tests whether the simpler model also fits the data. If <img src="https://latex.codecogs.com/svg.latex?M_0" title="M_0" /> is the simpler model and <img src="https://latex.codecogs.com/svg.latex?M_1" title="M_1" /> represents the full model, the likelihood ratio statistic equals:
|
|
|
|
|
|
<img src="https://latex.codecogs.com/svg.latex?LR&space;=&space;-2[ln(L(M_0))-ln(L(M_1))]" title="LR = -2[ln(L(M_0))-ln(L(M_1))]" />
|
|
|
|
|
|
where <img src="https://latex.codecogs.com/svg.latex?L(M_0)" title="L(M_0)" /> and <img src="https://latex.codecogs.com/svg.latex?L(M_1)" title="L(M_1)" /> are maximum values of the likelihood function for the simple and full models respectively.
|
|
|
|
|
|
It can be shown that the distribution for the statistic is chi-square with degree of freedom equal to the difference between number of parameters of the two models. Having both *LR* statistic and degree of freedom we can calculate the p-value of the test. If p-value is less than a predefined threshold (e.g. 0.05), two models are significantly different and the full model will be considered as the better fit to the data.
|
|
|
|
|
|
#### 3.2 Information Criteria
|
|
|
|
|
|
Information criteria can be used for both nested and non-nested models. Here we introduce two famous information criteria: Akaike's Information Criteria (AIC) and Baysian Information Criteria (BIC). For a model, they can be calculated as:
|
|
|
|
|
|
<img src="https://latex.codecogs.com/svg.latex?AIC&space;=&space;-2ln(L(M_1))+2Q&space;\\&space;\\&space;\indent&space;BIC&space;=&space;-2ln(L(M_1))+Qln(N)" title="AIC = -2ln(L(M_1))+2Q \\ \\ \indent BIC = -2ln(L(M_1))+Qln(N)" />
|
|
|
|
|
|
where Q equals the number of parameters in the model and N is the sample size. **Smaller values of AIC and BIC indicate better models.**
|
|
|
|
|
|
**Important:** When using AIC and BIC to compare models, the same dataset should be used for both models. This becomes relevant when some cases are excluded from a model due to missing values on some of the variables.
|
|
|
|
|
|
### 4. Local Measures of Fit
|
|
|
|
|
|
local measures of fit deals with the observations which are influential. Models may represent most of the data well, except for a subset of observations. This influential observations can affect the goodness-of-fit of the model to the data or estimated parameters.
|
|
|
|
|
|
With respect to goodness-of-fit of the model to the data, standardized residuals can be examined (e.g. Pearson residuals or deviance residuals). These residuals should be normally distributed. Also adjusted residuals can be computed which should be distributed as <img src="https://latex.codecogs.com/svg.latex?N(0,1)" title="N(0,1)" />. To find influential observation, we can exclude an observation and re-calculate the statistic for the others. If the statistic is significantly different from the statistic that was calculated including the observation, the observation would be an influential observation. These can be considered as the outliers of the dataset. |
|
|
\ No newline at end of file |