language-icon Old Web
English
Sign In

Logistic regression

In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc... Each object being detected in the image would be assigned a probability between 0 and 1 and the sum adding to one.A group of 20 students spends between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability of the student passing the exam? In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc... Each object being detected in the image would be assigned a probability between 0 and 1 and the sum adding to one. Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression). Mathematically, a binary logistic model has a dependent variable with two possible values, such as pass/fail which is represented by an indicator variable, where the two values are labeled '0' and '1'. In the logistic model, the log-odds (the logarithm of the odds) for the value labeled '1' is a linear combination of one or more independent variables ('predictors'); the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value). The corresponding probability of the value labeled '1' can vary between 0 (certainly the value '0') and 1 (certainly the value '1'), hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. Analogous models with a different sigmoid function instead of the logistic function can also be used, such as the probit model; the defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate, with each independent variable having its own parameter; for a binary dependent variable this generalizes the odds ratio. The binary logistic regression model has extensions to more than two levels of the dependent variable: categorical outputs with more than two values are modeled by multinomial logistic regression, and if the multiple categories are ordered, by ordinal logistic regression, for example the proportional odds ordinal logistic model. The model itself simply models probability of output in terms of input, and does not perform statistical classification (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a binary classifier. The coefficients are generally not computed by a closed-form expression, unlike linear least squares; see § Model fitting. The logistic regression as a general statistical model was originally developed and popularized primarily by Joseph Berkson, beginning in Berkson (1944), where he coined 'logit'; see § History. Logistic regression is used in various fields, including machine learning, most medical fields, and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Logistic regression may be used to predict the risk of developing a given disease (e.g. diabetes; coronary heart disease), based on observed characteristics of the patient (age, sex, body mass index, results of various blood tests, etc.). Another example might be to predict whether a Nepalese voter will vote Nepali Congress or Communist Party of Nepal or Any Other Party, based on age, income, sex, race, state of residence, votes in previous elections, etc. The technique can also be used in engineering, especially for predicting the probability of failure of a given process, system or product. It is also used in marketing applications such as prediction of a customer's propensity to purchase a product or halt a subscription, etc. In economics it can be used to predict the likelihood of a person's choosing to be in the labor force, and a business application would be to predict the likelihood of a homeowner defaulting on a mortgage. Conditional random fields, an extension of logistic regression to sequential data, are used in natural language processing. Let us try to understand logistic regression by considering a logistic model with given parameters, then seeing how the coefficients can be estimated from data. Consider a model with two predictors, x 1 {displaystyle x_{1}} and x 2 {displaystyle x_{2}} , and one binary (Bernoulli) response variable Y {displaystyle Y} , which we denote p = P ( Y = 1 ) {displaystyle p=P(Y=1)} . We assume a linear relationship between the predictor variables, and the log-odds of the event that Y = 1 {displaystyle Y=1} . This linear relationship can be written in the following mathematical form (where ℓ is the log-odds, b {displaystyle b} is the base of the logarithm, and β i {displaystyle eta _{i}} are parameters of the model): We can recover the odds by exponentiating the log-odds: By simple algebraic manipulation, the probability that Y = 1 {displaystyle Y=1} is The above formula shows that once β i {displaystyle eta _{i}} are fixed, we can easily compute either the log-odds that Y = 1 {displaystyle Y=1} for a given observation, or the probability that Y = 1 {displaystyle Y=1} for a given observation. The main use-case of a logistic model is to be given an observation ( x 1 , x 2 ) {displaystyle (x_{1},x_{2})} , and estimate the probability `` p {displaystyle p} `` that Y = 1 {displaystyle Y=1} . In most applications, the base b {displaystyle b} of the logarithm is usually taken to be ``e``. However in some cases it can be easier to communicate results by working in base 2, or base 10. We consider an example with b = 10 {displaystyle b=10} , and coefficients β 0 = − 3 {displaystyle eta _{0}=-3} , β 1 = 1 {displaystyle eta _{1}=1} , and β 2 = 2 {displaystyle eta _{2}=2} . To be concrete, the model is

[ "Diabetes mellitus", "Population", "Statistics", "Machine learning", "Surgery", "Cross-sectional regression", "Antenatal care provider", "Mixed logit", "Widowed Status", "Health Information National Trends Survey" ]
Parent Topic
Child Topic
    No Parent Topic