Classic linear models such as anova and multiple regression are known as general linear models. There are several assumptions under these models:
- Response variables are assumed to be normally distributed.
- Response variables have constant variance over the values of the predictor variables.
- Response variables equal linear functions of predictor variables.
In this case, if the response variables are not normally distributed or don't have equal variance over the predictor variables, proper transformations should be applied on the data. However, Generalized Linear Models, which go beyond the general linear models, solve mentioned problem by:
- Allowing for non-normally distributed response variables,
- Heteroscedasticity,
- non-linear relationship between the mean of the response variable and the predictor variables.
Generalized linear model (called GLM hereafter) is a general framework whose special cases include not only linear regression and anova, but also logistic regression, probit models, Poisson regression, log-linear models, and many more.
### 2. Three Components of a GLM
When constructing a generalized linear model, three major decisions must be made:
1.**Specifying the Random Component:**
An appropriate probability distribution for the response variable must be chosen. This
should be any member from the *natural exponential family* distributions. For the
psychological and behavioral research the most important distributions are:
If a GLM does not fit the data well, the considered distribution for the response variable may not be appropriate, the link function may not be appropriate, the linear predictor may not contain all relevant variables, or a combination of the mentioned problems.
### 3. GLM Parameter Estimation
Typical approach to estimate GLM parameters is maximum likelihood estimation (MLE). Given the data (response variable) and the distribution considered for the data (random component of the model), a likelihood function can be constructed and parameters which maximize the likelihood function would be the solution of the estimation problem. Likelihood function is same as distribution function but with considering it as a function of parameters rather than a function of data. For example, for a Poisson distribution, the likelihood function would be:
which is the same as the probability density function of the Poisson distribution but considering a function of <imgsrc="https://latex.codecogs.com/svg.latex?\mu"title="\mu"/> with fixed <imgsrc="https://latex.codecogs.com/svg.latex?y"title="y"/>. Suppose we have a N independent observations in a dataset. The final likelihood function would be:
In a GLM, we should note that the mean <imgsrc="https://latex.codecogs.com/svg.latex?\mu"title="\mu"/> is conditioned to the predictor variables via the link function:
As a result, the likelihood function would be a function of regression coefficients <imgsrc="https://latex.codecogs.com/svg.latex?\boldsymbol{\beta}"title="\boldsymbol{\beta}"/>. To solve the optimization problem of finding the best <imgsrc="https://latex.codecogs.com/svg.latex?\boldsymbol{\beta}"title="\boldsymbol{\beta}"/> usually an iterative algorithm such as Newton-Raphson or Fisher scoring is used.
**Important:** In the optimization procedure, we may confront some problems such as lack of convergence, fitted values outside the permitted range, and a singular or nearly singular Hessian matrix. All of the mentioned problems can generally be solved by modifying the model. For example, an out of range estimation problem can be solved by modifying the link function. In other case, if the model has too many predictors or the predictors are highly correlated, the Hessian matrix would be singular or nearly singular. A singular Hessian problem can be detected by observing very large standard errors. To solve the problem, we should modify the linear predictor of the model.
### 4. Statistical Inference for Model Parameters
Statistical inference of parameters includes hypothesis testing and formation of confidence intervals. Here we just discuss hypothesis testing using Wald, F, and likelihood tests.
#### 4.1 Hypothesis Testing
When the dispersion parameter is known in the model and there is no need to estimate it, we can perform the Wald test and the Wald statistic would be compared to a chi-squared distribution. If the dispersion parameter should be estimated, an F statistic should be computed and and compared to the F distribution. These two tests can be applied on single models. Likelihood ratio tests are more powerful than Wald or F test but they require estimating two models.
**Wald Statistics**
It can be shown that using MLE for parameter estimation, the sampling distribution of the parameter estimates would be asymptotically multivariate normal (MVN) for the large sample sizes:
Using this fact, the <imgsrc="https://latex.codecogs.com/svg.latex?q^{th}"title="q^{th}"/> parameter <imgsrc="https://latex.codecogs.com/svg.latex?\hat{\beta}_q"title="\hat{\beta}_q"/> is normally distributed, <imgsrc="https://latex.codecogs.com/svg.latex?\hat{\beta}_q&space;\sim&space;N(\beta,&space;\sigma_{\beta_q}^2)"title="\hat{\beta}_q \sim N(\beta, \sigma_{\beta_q}^2)"/>. As a result, the Wald statistic to test the null hypothesis of <imgsrc="https://latex.codecogs.com/svg.latex?\beta_q=\beta_q^*"title="\beta_q=\beta_q^*"/> can be calculated as:
where <imgsrc="https://latex.codecogs.com/svg.latex?ASE_q"title="ASE_q"/> is the asymptomatic standard error (and is calculated during the parameter estimation procedure). The Wald statistic will then be compared to chi-square distribution with dof=1.
We can also define a more general form of Wald statistics to simultaneously test multiple hypothesis by introducing contrasts. In this case the hypothesis for <imgsrc="https://latex.codecogs.com/svg.latex?Q^*"title="Q^*"/> simultaneous tests is:
where <imgsrc="https://latex.codecogs.com/svg.latex?\boldsymbol{C}"title="\boldsymbol{C}"/> is a <imgsrc="https://latex.codecogs.com/svg.latex?Q^*\times&space;Q"title="Q^*\times Q"/> matrix of constants called contrast matrix. The Wald statistics then can be calculated as:
which can be compared with chi-squared distribution with dof=<imgsrc="https://latex.codecogs.com/svg.latex?Q^*"title="Q^*"/>.
**F-Tests**
For models where <imgsrc="https://latex.codecogs.com/svg.latex?\phi"title="\phi"/> is estimated, there is extra variability due to the estimation of <imgsrc="https://latex.codecogs.com/svg.latex?\phi"title="\phi"/> that needs to be taken into account. For a single parameter test, the statistic is the square root of the Wald test (the z score); however, the sampling distribution would be Student's t-distribution with <imgsrc="https://latex.codecogs.com/svg.latex?dof&space;=&space;N-Q"title="dof = N-Q"/>. Alternatively, we can square the test statistic (the Wald statistic) and compare the results to an F-distribution with <imgsrc="https://latex.codecogs.com/svg.latex?\nu_1=1"title="\nu_1=1"/> and <imgsrc="https://latex.codecogs.com/svg.latex?\nu_2=N-Q"title="\nu_2=N-Q"/>. For the general form with contrasts, the F statistics would be:
Then this statistic will be compared to an F distribution with <imgsrc="https://latex.codecogs.com/svg.latex?\nu_1&space;=&space;Q^*"title="\nu_1 = Q^*"/> and <imgsrc="https://latex.codecogs.com/svg.latex?\nu_2&space;=&space;N-Q"title="\nu_2 = N-Q"/>.
**Likelihood Ratio Tests**
Suppose that we wish to test the hypothesis that the <imgsrc="https://latex.codecogs.com/svg.latex?Q^*"title="Q^*"/> regression coefficients all equal to zero. We can construct a model including all the coefficients (full model) and a model excluding the <imgsrc="https://latex.codecogs.com/svg.latex?Q^*"title="Q^*"/> coefficients (nested (simple) model). Then we can calculate the LR statistic as:
This statistic can then be compared with a chi-squared distribution with <imgsrc="https://latex.codecogs.com/svg.latex?dof&space;=&space;Q^*"title="dof = Q^*"/>.