The proofs of the front-door adjustment that I've read take three steps:
- Show P(M|do(X)) is identifiable
- Show P(Y|do(M)) is identifiable
- Multiply the do-free expressions for P(M|do(X)) and P(y|do(M)) to obtain P(Y|do(X))
where Y,X,M meet the assumptions for the frontdoor adjustment. A graph meeting these assumptions is:
X->M;M->Y;U->X;U->Y
I'm sure I'm being daft here, but I don't understand what justifies simply multiplying the expressions together to get P(Y|do(X)).
This is like saying:
P(Y|do(X)) = P(Y|do(M)) * P(M|do(X))
(where perhaps the assumptions for the front-door adjustment are necessary) but I don't recognize this rule in my study of causal inference.
Formal Description In a graph with a mediator and an unobserved confounder like the one you described, the path X->M->Y is confounded. Yet, the path X->M is unconfounded (Y is a collider), and we can estimate M->Y by controlling for X. Doing so gives us the two causal quantities that make up the path X->Y. For a graph with any kind of causal functions, propagating an effect along several edges means composing the functions that describe each edge. In the case of linear functions, this simply amounts to their multiplication (the assumption of linearity is very common in the context of causal inference).
Just like you said, we can write this as
Intuition: The whole "trick" of the front door procedure lies in realizing that we can break up the estimation of a causal path along more than one edge into the estimation of its components, which can be solved without adjustment (X->M in our case), or using the backdoor criterion (M->Y in our case). Then, we can simply compose the individual parts to get the overall effect.
A nice explanation is also given in this video.