Jump to content

Multivariate logistic regression

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Faster than Thunder (talk | contribs) at 16:36, 11 April 2025 (Added info from [http://www.med.mcgill.ca/epidemiology/Joseph/courses/EPIB-621/logistic2.pdf]). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Multivariate logistic regression is a type of data analysis that predicts any number of[1] outcomes based on multiple[2] independent variables.[3][4]

Procedure

First, the baseline odds of a specific outcome compared to not having that outcome are calculated, giving a constant (intercept).[5] Next, the independent variables are incorporated into the model, giving a regression coefficient (beta) and a "P" value for each independent variable.[6] The "P" value determines how significantly the independent variable impacts the odds of having the outcome or not.[7]

Formula

Multivariate logistic regression uses a formula similar to univariate logistic regression,[8] but with multiple independent variables.

where v is the number of independent variables. The following formula shows that multivariate logistic regression is simply a standard linear regression model:[9]

Failed to parse (unknown function "\textnormal"): {\displaystyle \textnormal{logit}\left(\pi\left(x\right)\right)=\beta_{0}+\beta_{1}X_1+\beta_{2}X_2+\dots+\beta_{v}X_v}

Types

The two main types of multivariate logistic regression are linear regression and logistic regression.

Linear regression

Linear regression produces results that show a linear relationship with a single independent variable (IV) and can be plotted on a graph as a straight line.[10]

Logistic regression

In contrast, logistic regression produces results that show a nonlinear relationship. As a result, plotting the data on a graph produces a curved line called a sigmoid. Unlike linear regression, logistic regression produces results based on two or more independent variables.[11][12][4]

Assumptions

Multivariate logistic regression assumes that the different observations are independent.[13] It also assumes that the natural logarithm of the odds ratio and the dependent variables show a linear relationship. However, it does not assume a normal distribution of the dependent variables.

Null hypothesis

A null hypothesis is an assumption that the independent variables do not have any impact on the dependent variable.[14]

Dependent variables

There are three main types of logistic regression dependent variables (DVs): Binary, multi-class, and ordinal.[15]

Binary

A binary dependent variable is a variable with only two outcomes, and the possible values must be opposites of each other.[16]

Multi-class

A multi-class dependent variable is a variable with at least three qualitative (non-numerical) outcomes, usually with a constant numerical stand-in.[17]

Ordinal

An ordinal dependent variable is a variable with at least three possible outcomes, which are numerically different.[18]

Uses

Scientists

When scientists use logistic regression, they usually include as many independent variables as necessary.[4]

Doctors and physicians

Multivariate logistic regression is used by physicians to:[19]

  • associate certain characteristics with certain outcomes
  • determine the effects of certain techniques
  • give people with certain conditions appropriate treatments
  • develop appropriate models

Market

Multivariate logistic regression is also used to analyze customer preferences for products.[20]

Artificial intelligence

Multivariate logistic regressions are also used in machine learning.[21]

References

  1. ^ "... while multivariate logistic regression deals with multiple outcomes simultaneously." - [1] (This vs. That)
  2. ^ "In contrast, multivariate logistic regression involves analyzing the relationship between multiple outcome variables and one or more predictors. This can lead to more complex interpretations, as researchers must consider the impact of predictors on each outcome variable." - [2] (This vs. That)
  3. ^ "Multivariate logistic regression is a type of analysis that can help predict results when you're working with multiple variables." - [3] (Indeed)
  4. ^ a b c Sperandei, Sandro (2014). "Understanding logistic regression analysis". Biochemia Medica. 24 (1): 12–18. doi:10.11613/BM.2014.003. ISSN 1330-0962. PMC 3936971. PMID 24627710.
  5. ^ "The statistical program first calculates the baseline odds of having the outcome versus not having the outcome without using any predictor." - [4] (National Library of Medicine)
  6. ^ "Then, the chosen independent (input/predictor) variables are entered into the model, and a regression coefficient (known also as “beta”) and “P” value for each of these are calculated." - [5] (National Library of Medicine)
  7. ^ "The “P” value indicates whether the particular variable contributes significantly to the occurrence of the outcome or not." - [6] (National Library of Medicine)
  8. ^ "As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation for modeling the probability, we have: ..." - [7] (McGill Medical)
  9. ^ "... which shows that logistic regression is really just a standard linear regression model, ..." - [8] (McGill Medical)
  10. ^ "Linear regression has a continuous set of results that can easily be mapped on a graph as a straight line." - [9] (Indeed)
  11. ^ "Logistic regressions are non-linear and are portrayed on a graph with a curved shape called a sigmoid. Instead of a continuous set of results, a logistical regression has two or more categories for data." - [10] (Indeed)
  12. ^ "Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous)." - [11] (National Library of Medicine)
  13. ^ "Multiple logistic regression assumes that the observations are independent." - [12] (Statistics LibreTexts)
  14. ^ "he main null hypothesis of a multiple logistic regression is that there is no relationship between the X variables and the Y variable;" - [13] (LibreTexts)
  15. ^ "Logistic regression includes three basic types: ..." - [14] (Indeed)
  16. ^ "A binary output is a variable where there are only two possible outcomes. These outcomes must be opposite of each other and mutually exclusive." - [15] (Indeed)
  17. ^ "A multi-class has three or more categories without any numerical value, though they usually have a numerical stand-in for datasets." - [16] (Indeed)
  18. ^ "An ordinal output also has three or more categories, though they're in a ranked output." - [17] (Indeed)
  19. ^ "Multivariable regression models are used to establish the relationship between a dependent variable (i.e. an outcome of interest) and more than 1 independent variable. Multivariable regression can be used to (i) identify patient characteristics associated with an outcome (often called ‘risk factors’), (ii) determine the effect of a procedural technique on a particular outcome, (iii) adjust for differences between groups to allow a comparison of different treatment strategies, (iv) quantify the magnitude of an effect size, (v) develop a propensity score and (vi) develop risk-prediction models." - [18] (Oxford Academic)
  20. ^ "It is also used in market research to analyze consumer preferences among product categories." - [19] (This vs. That)
  21. ^ "This is a common classification algorithm used in data science and machine learning." - [20] (Indeed)