User:Dnliu97/sandbox
In statistics, coefficient of variation is defined as the ratio of the standard deviation to the mean (). It is frequently used in chemistry, biology, and other laboratory sciences for measurement such as weights and concentrations of virus, bacteria, blood and proteins, etc. Constant coefficient of variation models[1] can be used to model data which satisfies constant coefficient of variation.
Constant coefficient of variation
[edit]In linear regression, the variance of response variable is assumed to be constant . Poisson regression assumes that the variance of the data is equal to the mean. In constant coefficient of variation models, data with constant coefficient of variation satisfy , which also takes the form
where is a constant, and , .
Constant coefficient of variation data are always non-negative, by Taylor expansion evaluated at , assuming remainder and are negligible,
- first order:
- second order:
Therefore, the log transformation stabilizes the variance for data with constant coefficient of variation. The intercept will be biased by the off-set by running least square estimate after log transformation.
, constant.
Gamma regression
[edit]In statistics, gamma regression is a generalized linear model in which the response variable belongs to gamma data that are continuous, non-negative, and right-skewed.
Gamma data
[edit]The probability density function of gamma data follows gamma distribution with parameter and has the form
,
Since gamma distribution is in exponential family, using parameters and is easier to show the property with the form
,
where is the nuisance parameter, is the mean, variance is , and the variance function is .
Therefore, gamma regression is a constant coefficient of variation model because gamma data satisfies that , constant.
Regression model
[edit]Gamma regression model assumes that can be modeled by a linear combination of unknown parameters and predictors with the form
, and
where , is the link function of GLM. The canonical link for Gamma regression is negative inverse link . However, the canonical link restricts that . A more practical link is log link , then the model can be written as
The parameter can be estimated by method of maximum likelihood estimation with iterated weighted least square.
Transformation models
[edit]In statistical modeling, model transformations are often used when there is lack of fit in the model. Transformation on variables can present data on a different scale that improves modeling assumptions. Transformation can be applied to predictors or responses. There are many types of transformation, some common transformation functions include
- Box-cox transformations with the form , the simplest case could be linear (), quadratic (), or squared roots ().
- Inverse transformation:
- Exponential transformation:
- Two parameters transformation: or , where , are two different predictors. The form of product between predictors should be considered if there is substantial justification showing interaction effects between predictors.
Linear predictor with transformed variables
[edit]For linear predictor , we can replace by a function of , where is one of the predictors, is the corresponding parameter for , and is the number of predictors. Then the linear predictor has the form
where is the transformation function for variable , is the corresponding parameter for transformed variable . When performing transformations on predictors, the model usually keeps original predictors and adds transformed variables. Although transformations can improve the goodness of fit of the model, adding too many transformed predictors could result in losses of natural information of the original data, it is also more difficult for result interpretation. Therefore, checking model assumptions, violations, and primary interest is necessary before and after transformations.
Transformed responses
[edit]In constant coefficient of variation models, transformation is accordingly performed on responses, usually such that the non-negative response can be mapped into a real number scale. The model takes the form
It can also be written as
where is the additive error with mean , and finite variance .
Examples
[edit]Log-additive model
[edit]The log-additive model is a regression method that takes log transformation on the response variable. The log-additive model takes the form
The mean and variance can be calculated as
since , which shows that the log-additive model behaves very similar to constant coefficient of variation model.
Log-normal model
[edit]The log-normal model is used to investigate data that follows Log-normal distribution and works sufficiently well for constant coefficient variation situations. In log-normal model, the error term is assumed to follow normal distribution after log transformation. It can be implemented by ordinary multiple linear regression after transformation. The model takes the form
Transformation model is not generalized linear model. In GLM, the non-linear link function transformed the conditional expectation . While in transformation models, the conditional expectation of transformed response can be expressed as linear combination of parameters and predictors. The link function of gamma regression and the transformation function of log-linear model are both logarithm, but these two models are not interchangeable in some cases[2].
Application
[edit]The constant coefficient of variation models are used in various fields, mostly laboratory sciences. For example, applying log-normal model to estimate the relationship between children's blood concentrations and residential dust-lead levels[3], using gamma regression to evaluate the relative effects of confounding variables on antibody concentration of quantitative assay data[4]. Many other fields such as behavioral sciences or healthcare used log-normal model to estimate response[5] time or surgical procedure time[6] with the combination of Bayesian or machine learning techniques.
References
[edit]- ^ MÜLLER, HANS-GEORG. GENERALIZED LINEAR MODELS Lecture Notes for BST/STA 223 Winter 2022. UC Davis.
- ^ Wiens, Brian L. (1999). "When Log-Normal and Gamma Models Give Different Results: A Case Study". The American Statistician. 53: 89–93 – via JSTOR.
- ^ Rust SW, Burgoon DA, Lanphear BP, Eberly S. (1997). "Log-additive versus log-linear analysis of lead-contaminated house dust and children's blood-lead levels. Implications for residential dust-lead standards". Environmental Research. 72: 173–84 – via PMID: 9177659.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ Moulton LH, Halsey NA (1996). "A mixed gamma model for regression analyses of quantitative assay data". Vaccine. 14: 1154–8 – via PMID: 8911013.
- ^ van der Linden WJ (2006). "A Lognormal Model for Response Times on Test Items". Journal of Educational and Behavioral Statistics. 31: 181–204.
- ^ Spangler, W.E., Strum, D.P., Vargas, L.G. (2004). "Estimating Procedure Times for Surgeries by Determining Location Parameters for the Lognormal Model". Health Care Management Science. 7: 97–104 – via Springer.
{{cite journal}}
: CS1 maint: multiple names: authors list (link)