Short introduction to Generalized Linear Models

From Teachwiki
Jump to: navigation, search

Short introduction into Generalized Linear Models (GLM)[edit]

Linear regression[edit]

In linear regression we assume a model of the form E(Y|x_1,...,x_p) = Y_i = b_0 + b_1 x_{1,i} + ... + b_p x_{p,i} + \epsilon_i with Y_i a random variable, x_{1,i}, ..., x_{p,i} fixed values and \epsilon_i an random variable describing the error term. The distribution of Y_i is determined by the distribution of \epsilon_i, which is usually assumed to be normal distributed.

But what happens, if the distribution of Y_i is not normal distributed, e.g. if Y_i is a zero-one variable describing a fail (usually coded as 0) and a success (usually coded as 1) ? Can we extend the linear model such that the easy interpretability of the coefficient can be kept in the model ?

Basic generalized linear model[edit]

The linear model can be extended in the following way

E(Y|x_1,...,x_p) = G(b_0 + b_1 x_1 + ... + b_p x_p)=G(\eta)=\mu\

with G a fixed link function depending on the distribution of Y and all other parameters as in the linear model.

In contrast to the linear model we may not estimate the variable Y not directly, but, as in the case of a zero-one variable, the probability P(Y=1|x_1,....,x_p). This requires a framework for handling different distributions of Y.

Exponential family[edit]

Y is called a member of the exponential family if we can write the density or probability function of Y as

f(y, \theta, \psi) = \exp\left(\frac{y\theta - b(\theta)}{a(\psi)} + c(y,\psi)\right).

Example (Normal distribution):

Y \sim N(\mu;\sigma^2) is a member of the exponential family with

<latex template="eqnarray.tex"> E(Y) &=& \mu,\ Var(Y)=\sigma^2\\ f(y) &=& \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)\\ &=& \exp\left[-\frac{1}{2}\log(2\pi\sigma^2)-\frac{(y-\mu)^2}{2\sigma^2}\right]\\ &=& \exp\left[\underbrace{-\frac{1}{2}\log(2\pi\sigma^2)-\frac{y^2}{2\sigma^2}}_{=c(y,\psi)} + \underbrace{\frac{1}{\sigma^2}}_{=1/a(\psi)} \left(y\underbrace{\mu}_{=\theta}-\underbrace{\frac{\mu^2}{2}}_{=b(\theta)}\right)\right]\\ \mu &=& \theta\\ \psi&=&\sigma\\ b(\theta) &=& \frac{\theta^2}{2}\\ a(\psi)&=& \psi^2\\ c(y,\psi)&=& -\frac{1}{2}\log(2\pi\psi^2)-\frac{y^2}{2\psi^2} </latex>

Example (Binomial distribution):

Y \sim B(n, \mu/n) is a member of the exponential family with

<latex template="eqnarray.tex"> E(Y) &=& n\frac{\mu}{n} =\mu,\ Var(Y)=n\frac{\mu}{n}\left(1-\frac{\mu}{n}\right) = \mu\left(1-\frac{\mu}{n}\right)\\ P(Y=y) &=& {n \choose y} \left(\frac{\mu}{n}\right)^y\left(1-\frac{\mu}{n}\right)^{n-y} = {n \choose y} \left(\frac{\frac{\mu}{n}}{1-\frac{\mu}{n}}\right)^y\left(1-\frac{\mu}{n}\right)^{n}\\ &=& \exp\left[\log{n \choose y} +y \log\left(\frac{\frac{\mu}{n}}{1-\frac{\mu}{n}}\right) + n \log\left(1-\frac{\mu}{n}\right) \right]\\ &=& \exp\left[\underbrace{\log{n \choose y}}_{=c(y,\psi)} +y \underbrace{\log\left(\frac{\frac{\mu}{n}}{1-\frac{\mu}{n}}\right)}_{=\theta} - \underbrace{-n \log\left(1-\frac{\mu}{n}\right)}_{=b(\theta)} \right]\\ \mu &=&\frac{n\exp(\theta)}{1+\exp(\theta)}\\ \psi&=&\mbox{ unused }\\ b(\theta) &=& n\log(1+\exp(\theta))\\ a(\psi) &=& 1\\ c(y,\psi) &=& \log{n \choose y} </latex>

Common properties

We can derive (under some regularity conditions) some common properties:

  • E(Y) = b^\prime(\theta)
  • Var(Y) = b^{\prime\prime}(\theta)a(\psi)
  • l(Y,\theta,\psi) = \sum_{i=1}^n \left(\frac{y_i \theta_i - b(\theta_i)}{a(\psi)} - c(y_i, \psi)\right).

The log likelihood l(Y,\theta,\psi) can be solved by an iterative methods, e.g. Newton-Raphson method.

Link functions[edit]

The following table shows the parameters and link functions for some distributions:

Distribution Range of y \theta \psi b(\theta) c(y,\psi) b^{\prime\prime}(\theta) a(\psi) G(\eta)
Bernoulli B(1,\mu) \{0,1\} \log\left(\frac{\mu}{1-\mu}\right) unused \log(1+\exp(\theta)) 0 \mu(1-\mu) 1 \frac{\exp(\eta)}{1+\exp(\eta)}
Binomial B(n,\mu/n)
n known
\{0,1,2,...,n\} \log\left(\frac{\mu}{1-\mu}\right) unused n\log(1+\exp(\theta)) \log{n \choose y} \mu(1-\mu/n) 1 \frac{n\exp(\eta)}{1+\exp(\eta)}
Poisson Po(\mu) \{0,1,2,...\} \log(\mu) unused \exp(\theta) -\log(y!) \mu 1 \log(\eta)
Negative Binomial NB(n,\mu)
n known
\{0,1,2,...\} \log\left(\frac{\mu}{1+\mu}\right) unused -n\log(1-\exp(\theta)) \log{n+y-1 \choose n-1} n\mu(1+\mu) 1 \frac{\exp(\eta)}{1-\exp(\eta)}
Normal N(\mu,\sigma^2) (-\infty,\infty) \mu \sigma^2 \theta^2/2 -0.5\left(\frac{y^2}{\psi}+\log(2\mu\psi)\right) 1 \sigma^2 \eta
Gamma \Gamma(\mu,\nu) (0,\infty) -\frac{1}{\mu} \frac{1}{\nu} -log(-\theta) \psi\log(\psi y)-\log(y)-\log(\Gamma(\psi)) \mu^2 \frac{1}{\nu} \frac{1}{\eta}
Inverse Gaussian IG(\mu,\sigma^2) (0,\infty) -\frac{1}{2\mu^2} \sigma^2 -\sqrt{-2\theta} -0.5\left\{\log(2\pi\psi y^3) +\frac{1}{\psi y}\right\} \mu^2 \sigma^2 \frac{1}{\eta^2}

Note: For all distributions in the table the parameters of Y are scaled such that E(Y)=\mu (see example for Binomial distribution) and the densities and probability functions are taken from Rinne (2003).

Even for the same distribution of Y we can have different link functions, e.g.

  • for Bernoulli
    • logit: \theta=\log\left(\frac{\mu}{1-\mu}\right) (see table above)
    • probit: \theta=\Phi^{-1}(\mu)\ with \Phi the cumulative distribution function of the standard normal
    • complementary log-log: \theta=\log(-\log(1-\mu))\
  • and in general (if \mu positive)
    • power: \theta=\begin{cases} \mu^{\lambda} & \mbox{ if } \lambda\neq0 \\ \log(\mu) & \mbox{ if } \lambda=0\end{cases}


  • W. Härdle, M. Müller, S. Sperlich, A. Werwatz (2004), Nonparametric and Semiparametric Models, Springer Verlag, Heidelberg
  • P. McCullagh, J.A. Nelder (1989). Generalized linear Models, Chapman & Hall, London
  • H. Rinne (2003). Taschenbuch der Statistik, 3. Auflage, Verlag Harri Deutsch
  • B. Rönz (1999), Modelling the perception of current and prospective economic situation, Statistics Research Report No. 99.002, The Australian National University