Joint and conditional probability

Suppose that outcome can be either of events A or B (but never both) with probabilities 0.4 and 0.6 correspondingly in case event X happens. If mutually disjoint to X, event Y occurs instead then probabilites of A-B distribute evenly, like .5 and .5. These data can be summarized in a Markov matrix:

${\begin{matrix}&X&Y\\A&P(A|X)&P(A|Y)\\B&P(B|X)&P(B|Y)\end{matrix}}\quad =\quad {\begin{matrix}&X&Y\\A&.4&.5\\B&.6&.5\end{matrix}}$

Here, P(A|X) stands for probability of event A provided that X has occurred. A|X generally denotes a conditional probability of event A under condition X.

Note that the sum of columns adds up to 1 since their entries represent mutually exclusive events.

Now, suppose that X can occur with probability .8 and Y has probability of .2. We multiply the first unit/column with .8 and second with .2 so that the joint distribution breaks down into

1 * .8 + 1 * .2 = 1 * (.8 +.2) = 1 * 1 = 1 = 1 * .8 + 1 * .2 = (.4 + .6) * .8 + (.5+.5)/5 = (.32 + .48) + (.1 + .1)

where first parenthesis is a sum of event probabilities under X and .1 + .1 are probabilities under event Y

This can be represented again by matrix again

${\begin{matrix}&X|_{P(X)}&Y|_{P(Y)}\\A&P(A\land X)&P(A\land Y)\\B&P(B\land X)&P(B\land Y)\end{matrix}}\quad =\quad {\begin{matrix}&X|_{P(X)}&Y|_{P(Y)}\\A&P(X\land A)&P(Y\land A)\\B&P(X\land B)&P(Y\land B)\end{matrix}}\quad =\quad {\begin{matrix}&X|_{.8}&Y|_{.2}\\A&.32&.1\\B&.48&.1\end{matrix}}$

Note that columns now add up to .8 and .2 correpsondingly whereas all table adds up to .8+.2 = 1. We have got a 2-dimensional distribution of probability. In every cell we have the joint probability of pair of events occurring, e.g. P(A∩X) = .32. The probability of conjunction A∩X is less than the probability of components (A|X) and X alone because probability under X, probability of every column, added up to 1 in the conditional probability table but it adds to P(X) ≤ 1 in the joint distribution table.

This fact, that column $P(A_{1}\land X_{i})+P(A_{2}\land X_{i})+\ldots =P(X_{i})$ adds up to marginal probability of the column X_i, that is a probability that randomly drawn event ends up in the column i, enables us to recover the conditional probabilities. We just need to divide every $P(A_{j}\land X_{i})$ in the column i by P(X_i):

{\begin{bmatrix}P(A|X)\\P(B|X)\end{bmatrix}}={\begin{bmatrix}P(A\land X)\\P(B\land X)\end{bmatrix}}{1 \over P(X)}={\begin{bmatrix}.32\\.48\end{bmatrix}}{1 \over .8}={\begin{bmatrix}.4\\.6\end{bmatrix}}

The relationship

P(A\land X)=P(X)\cdot P(A|X)

is a basis for famous w:Bayes' theorem $P(X)\cdot P(A|X)=P(A)\cdot P(X|A)$ because we can symmetrically condition the probabilities within the rows by probabilities of observing the rows:

{\begin{bmatrix}(.32+.1)/p_{a}\\(.48+.1)/p_{b}\end{bmatrix}}={\begin{bmatrix}1\\1\end{bmatrix}}={\begin{bmatrix}(.32+.1)/.42\\(.48+.1)/.58\end{bmatrix}}={\begin{bmatrix}.76+.24\\.83+.17\end{bmatrix}}={\begin{bmatrix}P(X|A)+P(Y|A)\\P(X|B)+P(Y|B)\end{bmatrix}}

That is, conditional probability P(X|A) = .76.