language-icon Old Web
English
Sign In

Confounding

In statistics, a confounder (also confounding variable, confounding factor, or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. P ( y ∣ do ( x ) ) = P ( y ∣ x ) {displaystyle P(ymid { ext{do}}(x))=P(ymid x)}     (1) P ( y ∣ d o ( x ) ) ≠ P ( y ∣ x ) {displaystyle P(ymid do(x)) eq P(ymid x)}     (2) P ( y ∣ d o ( x ) ) = ∑ z P ( y ∣ x , z ) P ( z ) {displaystyle P(ymid do(x))=sum _{z}P(ymid x,z)P(z)}     (3) P ( Y = recovered ∣ d o ( x = give drug ) ) = P ( Y = recovered ∣ X = give drug , Z = male ) P ( Z = male ) + P ( Y = recovered ∣ X = give drug , Z = female ) P ( Z = female ) {displaystyle {egin{aligned}P(Y={ ext{recovered}}mid do(x={ ext{give drug}}))={}&P(Y={ ext{recovered}}mid X={ ext{give drug}},Z={ ext{male}})P(Z={ ext{male}})\&{}+P(Y={ ext{recovered}}mid X={ ext{give drug}},Z={ ext{female}})P(Z={ ext{female}})end{aligned}}}     (4) In statistics, a confounder (also confounding variable, confounding factor, or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. Confounding is defined in terms of the data generating model (as in the Figure above). Let X be some independent variable, Y some dependent variable. To estimate the effect of X on Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that X and Y are confounded by some other variable Z whenever Z is a cause of both X and Y. Let P ( y ∣ do ( x ) ) {displaystyle P(ymid { ext{do}}(x))} be the probability of event Y = y under the hypothetical intervention X = x. X and Y are not confounded if and only if the following holds: for all values X = x and Y = y, where P ( y ∣ x ) {displaystyle P(ymid x)} is the conditional probability upon seeing X = x. Intuitively, this equality states that X and Y are not confounded whenever the observationally witnessed association between them is the same as the association that would be measured in a controlled experiment, with x randomized. In principle, the defining equality P ( y ∣ do ( x ) ) = P ( y ∣ x ) {displaystyle P(ymid { ext{do}}(x))=P(ymid x)} can be verified from the data generating model assuming we have all the equations and probabilities associated with the model. This is done by simulating an intervention d o ( X = x ) {displaystyle do(X=x)} (see Bayesian network) and checking whether the resulting probability of Y equals the conditional probability P ( y ∣ x ) {displaystyle P(ymid x)} . It turns out, however, that graph structure alone is sufficient for verifying the equality P ( y ∣ do ( x ) ) = P ( y ∣ x ) {displaystyle P(ymid { ext{do}}(x))=P(ymid x)} . Consider a researcher attempting to assess the effectiveness of drug X, from population data in which drug usage was a patient's choice. The data shows that gender (Z) differences influence a patient's choice of drug as well as their chances of recovery (Y). In this scenario, gender Z confounds the relation between X and Y since Z is a cause of both X and Y:

[ "Diabetes mellitus", "Population", "Internal medicine", "Surgery", "Pathology", "Confounding Factors (Epidemiology)", "Collider (epidemiology)", "Marginal structural model", "unmeasured confounding", "confounding by indication" ]
Parent Topic
Child Topic
    No Parent Topic