Generalized functional linear model

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template. The Generalized Functional Linear Model(GFLM) is an extension of the Generalized linear model(GLM) that allows one to regress univariate responses of various types (continuous or discrete) on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $X$ with a smooth parameter function $\beta$ . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.

Overview

A key aspect of GFLM is estimation and inference for the smooth parameter function $\beta$ which is usually obtained by dimension reduction of the infinite dimensional functional predictor. A common method is to expand the predictor function $X$ in an orthonormal basis of L² space, the Hilbert space of square integrable functions with the simultaneous expansion of the parameter function in the same basis. This is then applied to reduce the contribution of the of the parameter function $\beta$ in the linear predictor to a finite number of regression coefficients.Functional principal component analysis (FPCA) , that employs the Karhunen-Loève expansion is a common and parsimonious approach to accomplish this. However other orthogonal expansions, like Fourier expansions and B-spline expansions may also be employed for the dimension reduction step. Akaike Information Criteria(AIC) can be used for selecting the number of included components but minimization of cross-validation prediction errors is another criteria often used in classification applications. Once the dimension of the predictor process is reduced, the simplified linear predictor allows the usage of GLM and quasi-likelihood estimation techniques to obtain estimates of the finite dimensional regression coefficients which in turn provide an estimate of the parameter function $\beta$ in the GFLM.

Model Components

Linear Predictor

The predictor function $\textstyle X(t),t\in T$ typically are square integrable stochastic processes on a real interval $T$ and the unknown smooth parameter function $\beta (t),t\in T$ is assumed to be square integrable on $T$ . Given a real measure $dw$ on $T$ , the linear predictor is given by $\eta =\alpha +\int X(t)\beta (t)dw(t)$ . Inclusion of the intercept $\alpha$ allows us to require $E(X(t))=0$ for all $t$ in $T$ .

Response Variable and Variance Function

The outcome $Y$ is typically a real valued random variable which may be either continuous or discrete. Often the conditional distribution of $Y$ given the predictor process is specified within the exponential family. However it is also sufficient to consider the functional quasi-likelihood set up, where instead of the distribution of the response one specifies the conditional variance function, ${\rm {{Var}(Y|X)=\sigma ^{2}(\mu )}}$ as a function of the conditional mean, ${\rm {{E}(Y|X)=\mu }}$ .

Link Function

The link function $g$ is a smooth invertible function, that relates the conditional mean of the response ${\rm {{E}(Y|X)=\mu }}$ with the linear predictor $\eta =\alpha +\int X(t)\beta (t)dw(t)$ . The relationship is given by $\mu =g(\eta )$ .

Formulation

In order to implement the necessary dimension reduction, the centered predictor function $X^{c}(t)=X(t)-{\text{E}}(X(t))$ and the parameter function $\beta (t)$ are expanded as,

X^{c}(t)=\sum _{j=1}^{\infty }\xi _{j}\rho _{j}(t)

and

\beta (t)=\sum _{j=1}^{\infty }\beta _{j}\rho _{j}(t)

,

where $\rho _{j},j=1,2,...$ is an orthonormal basis of the function space $L^{2}(dw)$ such that $\int _{T}\rho _{j}(t)\rho _{k}(t)dw(t)=\delta _{jk}$ .
The random variables $\xi _{j}$ are given by $\xi _{j}=\int X^{c}(t)\rho _{j}(t)dw(t)$ and the coefficients $\beta _{j}$ as $\beta _{j}=\int \beta (t)\rho _{j}(t)dw(t)$ for $j=1,2,....$ .
Note that ${\text{E}}(\xi _{j})=0$ and $\sum _{j=1}^{\infty }\beta _{j}^{2}<\infty$ and denoting $\sigma _{j}^{2}={\text{Var}}(\xi _{j})={\text{E}}(\xi _{j}^{2})$ , we also have $\sum _{j=1}^{\infty }\sigma _{j}^{2}=\int {\text{E}}((X^{c}(t))^{2})dw(t)<\infty$ .
From the orthonormality of the basis functions $\rho _{j}$ , it follows immediately that $\int X^{c}(t)\beta (t)dw(t)=\sum _{j=1}^{\infty }\beta _{j}\xi _{j}$ .
Therefore $\int X(t)\beta (t)dw(t)=\int {\text{E}}(X(t))\beta (t)dw(t)+\sum _{j=1}^{\infty }\xi _{j}\rho _{j}(t)$ ,where $\int {\text{E}}(X(t))\beta (t)dw(t)$ is a constant term and can be absorbed into the intercept $\alpha$ for the required analysis.
The key step is then approximating $\eta =\alpha +\int \beta (t)X(t)=\alpha +\sum _{j=1}^{\infty }\beta _{j}\xi _{j}$ by $\alpha +\sum _{j=1}^{p}\beta _{j}\xi _{j}$ for a suitably chosen $p$ .
FPCA gives the most parsimonious approximation of the linear predictor as given a fixed number of basis functions, the eigenfunction basis explains more of the variation than any other set of basis functions.

For a differentiable link function with bounded first derivative, the approximation error of the $p$ -truncated model i.e. the linear predictor trunctated to the summation of the first $p$ components, is a constant multiple of ${\text{Var}}(\sum _{j=p+1}^{\infty }\beta {j}\xi _{j})={\text{E}}((\sum _{j=p+1}^{\infty }\beta {j}\xi _{j})^{2})$ .
A heuristic motivation for the truncation strategy derives from the fact that ${\text{E}}((\sum _{j=p+1}^{\infty }\beta {j}\xi _{j})^{2})\leq \sum _{j=p+1}^{\infty }\beta {j}^{2}\ \sum _{j=p+1}^{\infty }\sigma {j}^{2}$ which is a consequence of the Cauchy-Schwarz inequality and by noting that the $RHS$ of the last inequality converges to 0 as $p\rightarrow \infty$ since both $\sum _{j=1}^{\infty }\beta _{j}^{2}$ and $\sum _{j=1}^{\infty }\sigma _{j}^{2}$ are finite.

For the special case of the eigenfunction basis, the sequence $\sigma _{j}^{2},j=1,2,...$ corresponds to the sequence of the eigenvalues of the covariance kernel $G(s,t)={\text{Cov}}(X(s),X(t))\ s,t\in T$ .
For a data with $n$ i.i.d observations, setting $\xi _{j}^{0}=1$ , $\beta _{0}=\alpha$ and $\xi _{j}^{i}=\int X_{i}(t)\rho _{j}(t)dw(t)$ , the approximated linear predictors can be represented as $\eta _{i}=\sum _{j=0}^{p}\beta _{j}\xi _{j}^{i},i=1,2,.....,n$ which are related to the means through $\mu _{i}=g(\eta _{i})$ .

Estimation

The main aim is to estimate the parameter function $\beta$ .
Once $p$ has been fixed, standard GLM and quasi-likelihood methods can be used for the $p$ -truncated model to estimate ${\boldsymbol {\beta }}^{T}=(\beta _{0},\beta _{1},...,\beta _{p})$ by solving the estimating equation or the score equation $U(\beta )=0.$
The vector valued score function turns out to be $U(\beta )=\sum _{i=1}^{n}(Y_{i}-\mu _{i})g'(\eta _{i})\xi _{i}/\sigma ^{2}(\mu _{i})$ which depends on ${\boldsymbol {\beta }}$ through $\mu$ and $\eta$ .
Just like in GLM, the equation $U(\beta )=0$ is solved using iterative methods like Newton Raphson(NR) or Fisher Scoring(FS) or Iterated Weighted Least Squares(IWLS) to get the estimate of the regression coefficients ${\boldsymbol {\hat {\beta }}}$ ,
leading to the estimate of the parameter function in the GFLM as ${\hat {\beta }}(t)={\hat {\beta _{o}}}+\sum _{j=1}^{p}{\hat {\beta _{j}}}\rho _{j}(t)$ . When using the canonical link function these methods are equivalent.
Several analysis is available in literature of $p$ -truncated models as $p\rightarrow \infty$ which provide asymptotic inference for the deviation of the estimated parametric function from the true parametric function and also asymptotic tests for regression effects and asymptotic confidence regions.