Jump to content

Draft:Multivariate logistic regression

From Wikipedia, the free encyclopedia

Multivariate logistic regression is a type of data analysis that predicts outcomes based on multiple independent variables.[1][2]

Procedure

[edit]

First, the baseline odds of a specific outcome compared to not having that outcome are calculated, giving a constant (intercept).[3] Next, the independent variables are incorporated into the model, giving a regression coefficient (beta) and a "P" value for each independent variable.[4] The "P" value determines how significantly the independent variable impacts the odds of having the outcome or not.[5]

Types

[edit]

The two main types of multivariate logistic regression are linear regression and logistic regression.

Linear regression

[edit]

Linear regression produces results that show a linear relationship with a single independent variable (IV) and can be plotted on a graph as a straight line.[6]

Logistic regression

[edit]

In contrast, logistic regression produces results that show a nonlinear relationship. As a result, plotting the data on a graph produces a curved line called a sigmoid. Unlike linear regression, logistic regression produces results based on two or more independent variables.[7][8][2]

Assumptions

[edit]

Multivariate logistic regression assumes that the different observations are independent.[9] It also assumes that the natural logarithm of the odds ratio and the dependent variables show a linear relationship. However, it does not assume a normal distribution of the dependent variables.

Null hypothesis
[edit]

A null hypothesis is an assumption that the independent variables do not have any impact on the dependent variable.[10]

Dependent variables

[edit]

There are three main types of logistic regression dependent variables (DVs): Binary, multi-class, and ordinal.[11]

Binary
[edit]

A binary dependent variable is a variable with only two outcomes, and the possible values must be opposites of each other.[12]

Multi-class
[edit]

A multi-class dependent variable is a variable with at least three qualitative (non-numerical) outcomes, usually with a constant numerical stand-in.[13]

Ordinal
[edit]

An ordinal dependent variable is a variable with at least three possible outcomes, which are numerically different.[14]

Scientists

[edit]

When scientists use logistic regression, they usually include as many independent variables as necessary.[2]

Artificial intelligence

[edit]

Multivariate logistic regressions are also used in machine learning.[15]

References

[edit]
  1. ^ "Multivariate logistic regression is a type of analysis that can help predict results when you're working with multiple variables." - [1] (Indeed)
  2. ^ a b c Sperandei, Sandro (2014). "Understanding logistic regression analysis". Biochemia Medica. 24 (1): 12–18. doi:10.11613/BM.2014.003. ISSN 1330-0962. PMC 3936971. PMID 24627710.
  3. ^ "The statistical program first calculates the baseline odds of having the outcome versus not having the outcome without using any predictor." - [2] (National Library of Medicine)
  4. ^ "Then, the chosen independent (input/predictor) variables are entered into the model, and a regression coefficient (known also as “beta”) and “P” value for each of these are calculated." - [3] (National Library of Medicine)
  5. ^ "The “P” value indicates whether the particular variable contributes significantly to the occurrence of the outcome or not." - [4] (National Library of Medicine)
  6. ^ "Linear regression has a continuous set of results that can easily be mapped on a graph as a straight line." - [5] (Indeed)
  7. ^ "Logistic regressions are non-linear and are portrayed on a graph with a curved shape called a sigmoid. Instead of a continuous set of results, a logistical regression has two or more categories for data." - [6] (Indeed)
  8. ^ "Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous)." - [7] (National Library of Medicine)
  9. ^ "Multiple logistic regression assumes that the observations are independent." - [8] (Statistics LibreTexts)
  10. ^ "he main null hypothesis of a multiple logistic regression is that there is no relationship between the X variables and the Y variable;" - [9] (LibreTexts)
  11. ^ "Logistic regression includes three basic types: ..." - [10] (Indeed)
  12. ^ "A binary output is a variable where there are only two possible outcomes. These outcomes must be opposite of each other and mutually exclusive." - [11] (Indeed)
  13. ^ "A multi-class has three or more categories without any numerical value, though they usually have a numerical stand-in for datasets." - [12] (Indeed)
  14. ^ "An ordinal output also has three or more categories, though they're in a ranked output." - [13] (Indeed)
  15. ^ "This is a common classification algorithm used in data science and machine learning." - [14] (Indeed)