Từ nhận dạng đến ước tính

Tôi hiện đang đọc tác phẩm của Pearl (Pearl, 2009, tái bản lần thứ 2) về quan hệ nhân quả và đấu tranh để thiết lập mối liên hệ giữa nhận dạng phi mô hình của một mô hình và ước tính thực tế. Thật không may, bản thân Pearl rất im lặng về chủ đề này.

Để đưa ra một ví dụ, tôi có một mô hình đơn giản với một đường dẫn nhân quả, và một bộ gây nhiễu ảnh hưởng đến tất cả các biến , và . Ngoài ra, và có liên quan bởi các ảnh hưởng không quan sát được, . Theo quy tắc tính toán, giờ đây tôi biết rằng phân phối xác suất sau can thiệp (rời rạc) được đưa ra bởi: $x \rightarrow z \rightarrow y$ $w \rightarrow x$ $w \rightarrow z$ $w \rightarrow y$ $x$ $y$ $x \leftarrow \rightarrow y$

P (y ∣ d o (x)) = \sum_{w, z} [P (z ∣ w, x) P (w) \sum_{x} [P (y ∣ w, x, z) P (x ∣ w)]] .

$P(y \mid do(x)) = \sum_{w,z}\bigl[P(z\mid w,x)P(w)\sum_{x}\bigl[P(y\mid w,x,z)P(x\mid w)\bigr]\bigr].$

I know wonder how I can estimate this quantity (non-parametrically or by introducing parametric assumptions)? Especially for the case when $w$ is a set of several confounding variables and quantities of interest are continuous. To estimate the joint pre-intervention distribution of the data appears to be very impractical in this case. Does somebody know an application of Pearl's methods that deals with these problems? I would be very happy for a pointer.

estimation references causality

— PHU
nguồn

If you have unobserved factors that influence both x and y, then I think you cannot estimate this without actually randomizing x. But, although I know quite a lot about the counterfactual approach to causality, I'm not that familiar with Pearl's do-calculus (I'm still working through his book myself).

— Ellie

This is a very good question. First let's verify if your formula is correct. The information you have given corresponds to the following causal model:

And as you have said we can derive the estimand for $P(Y|do(X))$ using the rules of do-calculus. In R we can easily do that with the package causaleffect. We first load igraph to create an object with the causal diagram you are proposing:

library(igraph)
g <- graph.formula(X-+Y, Y-+X, X-+Z-+Y, W-+X, W-+Z, W-+Y, simplify = FALSE)
g <- set.edge.attribute(graph = g, name = "description", index = 1:2, value = "U")

Where the first two terms X-+Y, Y-+X represents the unobserved confounders of $X$ and $Y$ and the rest of the terms represent the directed edges you mentioned.

Then we ask for our estimand:

library(causaleffect)
cat(causal.effect("Y", "X", G = g, primes = TRUE, simp = T, expr = TRUE))

\sum_{W, Z} (\sum_{X^{'}} P (Y | W, X^{'}, Z) P (X^{'} | W)) P (Z | W, X) P (W)

$\sum_{W,Z}\left(\sum_{X^{\prime}}P(Y|W,X^{\prime},Z)P(X^{\prime}|W)\right)P(Z|W,X)P(W)$

Which indeed coincides with your formula --- a case of frontdoor with an observed confounder.

Now let's go to the estimation part. If you assume linearity (and normality), things are greatly simplified. Basically what you want to do is to estimate the coefficients of the path $X \rightarrow Z \rightarrow Y$ .

Let's simulate some data:

set.seed(1)
n <- 1e3
u <- rnorm(n) # y -> x unobserved confounder
w <- rnorm(n)
x <- w + u + rnorm(n)
z <- 3*x + 5*w + rnorm(n)
y <- 7*z + 11*w + 13*u + rnorm(n)

Notice in our simulation the true causal effect of a change of $X$ on $Y$ is 21. You can estimate this by running two regressions. First $Y \sim Z+W+X$ to get the effect of $Z$ on $Y$ and then $Z \sim X + W$ to get the effect of $X$ on $Z$ . Your estimate will be the product of both coefficients:

yz_model <- lm(y ~ z + w + x)
zx_model <- lm(z ~ x + w)

yz <- coef(yz_model)[2]
zx <- coef(zx_model)[2]
effect <- zx*yz
effect
       x 
21.37626

And for inference you may compute the (asymptotic) standard error of the product:

se_yz <- coef(summary(yz_model))[2, 2]
se_zx <- coef(summary(zx_model))[2, 2]
se <- sqrt(yz^2*se_zx^2 + zx^2*se_yz^2)

Which you may use for tests or confidence intervals:

c(effect - 1.96*se, effect + 1.96*se) # 95% CI
       x        x 
19.66441 23.08811

You can also perform (non/semi)-parametric estimation, I will try to update this answer including other procedures later.

— Carlos Cinelli
nguồn