User:Egm6936.f10/Probability concepts

From Wikiversity
Jump to: navigation, search

Contents

[edit] Probability concepts and notations

djvu notes: Probability, distribution, density

[edit] Events(Samples, Outcomes)

Event (or sample or outcome) is a subset of results of random experiments, which designated by \mathbf\omega .

For example, the result of tossing a coin once should be either head or tail, i.e., \mathbf\omega = head or \mathbf\omega = tail.
If we toss the coin twice, then \mathbf\omega = {head, tail} or \mathbf\omega = {head head} or \mathbf\omega = {tail head} or \mathbf\omega = {tail tail}.


[edit] Algebra of Events

In some ways, the algebra of events share some similarities with the algebra of real numbers, with intersection ( \cap ) corresponding to multiplication ( \times ), complement ( ^{c} ) to subtraction ( \mathbf - or  \setminus  ) and union ( \cup ) to addition ( \mathbf + ).

The union of n events  A_{1},\, A_{2},\, ..., A_{n} is the set collecting all points in all those events  A_{1},\, A_{2},\, ..., A_{n} .

Notation:

\displaystyle
A_{1} \cup A_{2} \cup ... \cup A_{n}, or \bigcup_{i=1}^{\infty}A_{i}

(1)


The intersection of n events  A_{1},\, A_{2},\, ..., A_{n} is the set collecting all points belonging to all those events  A_{1},\, A_{2},\, ..., A_{n} .

We call they are disjoint if the intersection of sets is empty.

Intersection has the associative property.

Notation:

\displaystyle
A_{1} \cap A_{2} \cap ... \cap A_{n} , or \bigcap_{i=1}^{\infty}A_{i}

(2)


\displaystyle
(A_{1} \cap A_{2}) \cap A_{3} = A_{1} \cap (A_{2} \cap A_{3})

(3)


The complement of a event A in \Omega is the set collecting all points in \Omega but not in the event A. Generally, we can have two kinds of complement: relative complement and absolute complement.

Relative complement of A_{1} with respect to A_{2} is the set of points in A_{2} but not in A_{1}. If union of all sets A_{1},\, A_{2},\, ..., A_{n} considered to be \cup, the absolute complement of A_{1} is the set of points in \cup, but not in A_{1}.

Notation:

\displaystyle
A^{c}

(4)


For schematic representations of union, intersection and complement, we can use Venn Diagram.

Probability-Venn Diagram.svg
Fig 1: Venn Diagram


[edit] Sample space (Outcome space)

Sample space(Statistical Theory) or outcome space(Probability Theory) is a collection of all possible outcomes (or events or samples) of random experiments, which denoted by \mathbf \Omega.

For example, in the coin - tossing experiment, a coin is tossed once, the outcome space \mathbf \Omega ={heads, tails}. If we tossing twice, the outcome space \mathbf\Omega = { {head, tail}, {head head}, {tail head}, {tail tail} }.

( Xiu 2010, p.9[1];, Shao 2007, p.1[2].).


[edit] Sigma-Field

Sigma-Field is a collection of subsets of a sample space \mathbf\Omega (not necessary all), which denoted by \mathcal F. For instance, \mathcal F = \{ \emptyset, {\rm heads}, {\rm tails}, \mathbf\Omega\} in the coin - tossing experiment.

Three conditions that the sigma-field must satisfy:

\bullet Non-empty: \Omega \in \mathcal F and \emptyset \in \mathcal F;

\bullet Given A \in \mathcal F, then A^c \in \mathcal F;

\bullet Given A_1, A_2,...\in \mathcal F, then

\bigcap_{i=1}^{\infty} A_{i} \in \mathcal F and \bigcup_{i=1}^{\infty} A_{i} \in \mathcal F.

i.e., sum or union of any subsets of \mathcal F is a subset of \mathcal F.

( Xiu 2010, p.10[1];, Shao 2007, p.2[2].)


Note:
\mathcal F is called a " sigma-field " or " sigma-algebra ", written as \sigma-field or \sigma-algebra.
\sigma is mnemonic for " S ", and " Sum ", due to property.


[edit] Probability

Probability is used to measure the likelihood of the occurrence of certain event (or outcome). Probability of an event \mathbf \omega belonging to an element A \in \mathcal F is a non-negative number (or measure), which is mathematically denoted by

\displaystyle
P(\omega \in A)=P(A)

(5)

For example, in the coin - tossing experiment,
 P(heads)=P(tails)=\frac{1}{2}, P(\emptyset)=0,  P(heads, tails)=P(heads)+P(tails)=\frac{1}{2}+\frac{1}{2}=1.


[edit] Algebra of Probability

The complement of an event  A is the event not A (that is, the event of A not occurring); its probability is given by  P(not A) = 1 - P(A). As an example, the chance of not rolling a six on a six-sided die is 1 – (chance of rolling a six)=  1 - \frac{1}{6} = \frac{5}{6}.

If both events  A and  B occur on a single performance of an experiment, this is called the intersection or joint of  A and  B , denoted as P(A \cap B). If two events,  A and  B are independent, then the joint probability is

\displaystyle
P(A \mbox{ and }B) =  P(A \cap B) = P(A) P(B)

(6)

For example, if two coins are tossed, the chance of both being heads is \frac{1}{2}\times\frac{1}{2} = \frac{1}{4}.

If either event  A or event  B or both events occur on a single performance of an experiment this is called the union of the events  A and  B denoted as P(A \cup B).

If two events are mutually exclusive, then the probability of either occurring is

\displaystyle
P(A\mbox{ or }B) =  P(A \cup B)= P(A) + P(B)

(7)

For example, the chance of rolling a 2 or 3 or 5 on a six-sided die is P(\{2,\, 3,\, 5\}) = P(\{2\}) + P(\{3\}) + P(\{5\}) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6}= \frac{1}{2}.

If the events are not mutually exclusive then

\displaystyle
\mathrm{P}\left(A \hbox{ or } B\right)=\mathrm{P}\left(A\right)+\mathrm{P}\left(B\right)-\mathrm{P}\left(A \mbox{ and } B\right)

(8)



[edit] Random variable and vector

Intuitively, Random Variable is used to designate a random outcome (event or sample) in a random experiment, usually denoted in capital letters, e.g., \mathbf X is a random variable. It's a numerical description of the outcome of an experiment.

Formally, it is a mapping from a probability space to the real numbers, which is measurable.

\displaystyle
\mathbf X:(\Omega ,\mathcal F ) \to (\mathbb R, \mathcal B)
\displaystyle
\omega \mapsto\mathbf X(\omega)

(9)

Where (\Omega ,\mathcal F ) is event space \Omega endowed with \sigma - algebra \mathcal F, (\mathbb R, \mathcal B) is set of real numbers \mathbb R endowed with " Borel \sigma - algebra " \mathcal B (sigma-algebra of finite open subsets of \mathbb R). (Shao 2007, p.7[2].)

\mathbf X(\omega) = ( arbitrary ) number selected to represent each event \mathbf\omega in \mathbf\Omega. For Example, typically, in the coin - tossing experiment, we can use number 1 to designate the heads and 0 for the tails, i.e., \mathbf X(\rm heads)= 1, \mathbf X(\rm tails)= 0. But it is also possible to select, event though not a good choice, since not as mnemonic as \{0,1\}, \mathbf X(\rm heads)= 5, \mathbf X(\rm tails)= -3.

Example:

\mathcal B = \sigma \Big( \{(a,b]: a,b \in \mathbb R \} \Big)

\{(a,b]: a,b \in \mathbb R \} Set of finite open intervals in \mathcal B

This choice of \mathcal B allows for the probability of

\mathbf X \in (a,b], i.e., \mathbf P \big(\mathbf X \in (a,b]\big).

(Xiu 2010, p.11[1])


In the turbulent flows case, the sample space \Omega can be thought of as a set of repeat experiments(samples) to verify, say, a hypothesis or observations on a given flow.

\displaystyle
\Omega = \{ \omega_{1},\, \omega_{2},\, ...,\, \omega_{n_{exp}}\}

where n_{exp} is the total number of repeated experiments, e.g., until the standard deviation is small enough compared to the mean.

The ith velocity component(a random variable) at (x,\, t) in experiment \mathbb\omega_{k} is U_{i}(x,\, t,\, \omega_{k}).


A random vector \displaystyle \mathbf X is composed of real-valued random variables \displaystyle X_i, i=1,2,3.... A typical n-dimensional random vector can be represented as \displaystyle \mathbf X = (X_1, X_2, X_3,...).

Theorem 1】:

Let \displaystyle \mathbf X = (X_1,X_2,...,X_n) be a Gaussian random vector with distribution \displaystyle N(\mu, \mathbf C) and let \displaystyle \mathbf A be an \displaystyle m \times n matrix. Then \displaystyle \mathbf {AX}^T has an \displaystyle N(\mathbf A\mu^T, \mathbf {ACA}^T) distribution.

Probability-Vector.svg
Fig 2: 2-D random vector

In case events were already representable by real numbers, then the event space is already \Omega \equiv \mathbb R. It's then not necessary to mention \mathbf \omega, but directly \mathbf X. An example of such variable is a velocity component in a turbulent flow.(Pope 2000[3])

[edit] \mathbf P_{\mathbf X} Probability distribution

Probability Distribution is a function that describes the probability of a random variable taking certain values.

\displaystyle
P_X = P\circ X^{-1}:\mathcal B \to \mathbb R_{0}^{+}

(10)

Probability-Mapping.svg
Fig 3: Mapping

In practice, only to refer to an open interval in \mathcal B

\displaystyle
{[(a,\, b] \in \mathcal B] \mapsto [P_X((a,\, b]) \in \mathbb R_{0}^{+}]}

(11)

\displaystyle
P_X((a,b]) \equiv P_X(X( \omega) \in (a,b])

(12)

i.e., the probability that a < X(\omega) \le b.

[edit] \mathbf F \mathbf X Cumulative distribution function ( CDF )

Cumulative Distribution Function (CDF), or only Distribution Function, describes the probability a real-valued random variable \mathbf X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution.

\displaystyle
F_X(x) := P_X((-\infty, x]) = P_X(X \le x)

(13)


For random vectors,

\displaystyle
F_X(\mathbf x) := P_X((-\infty, \mathbf x]) = P_X(X_1 \le x_1, X_1 \le x_2,...,X_n \le x_n), \mathbf x = (x_1, x_2,...,x_n) \in \mathbb R^n

(14)


Normal(Gaussian) Distribution

\displaystyle
F_X(x) = \frac12\left[\, 1 + \operatorname{erf} \left(\displaystyle \frac{x}{\sqrt{2}} \right) \right])

(15)

Normal Distribution CDF.svg
Fig 4: CDF of Normal Distribution

[edit] f_{\mathbf X} Probability density function ( PDF )

Probability Density Function (PDF), or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the integral of this variable’s density over the region. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.

\displaystyle
f_{\mathbf X}(x):=\frac{d}{dx}\mathbf F_{\mathbf X}(x)

(24)

\displaystyle
P_{\mathbf X} (a < X \leq b) = \int_a^b f_{\mathbf X}(x) \, \mathrm{d}x

(25)

\displaystyle
\mathbf F_{\mathbf X}(x)=\int_{-\infty}^{x} f_{\mathbf X}(t)dt

(26)


For random vectors,

\displaystyle
\mathbf F_{\mathbf X}(\mathbf x)=\mathbf F_{\mathbf X}(x_1,x_2,...,x_n)=\int_{-\infty}^{x_1}...\int_{-\infty}^{x_n} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_n.

(27)

and

\displaystyle
\int_{-\infty}^{+\infty}...\int_{-\infty}^{+\infty} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_n = 1

(28)

If a vector \displaystyle \mathbb X has density \displaystyle f_{\mathbb X}, then all its subsets have a density, called marginal densities.

\displaystyle
f_{X_i}(x_i) = \int_{-\infty}^{+\infty}...\int_{-\infty}^{+\infty} f_{\mathbf X}(t_1,...,t_n)dt_1...dt_{i-1}dt_{i+1}...dt_n

(29)

Normal(Gaussian) distribution

\displaystyle
f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}{\rm exp}\left[{-\frac{(x-\mu)^2}{2\sigma^2}}\right]

(30)

Normal Distribution PDF.svg
Fig 7: Normal(Gaussian) distribution

Binomial distribution

\displaystyle
P(X=k)=\binom{n}{k}p^{k}(1-p)^{n-k}, k=0,1,...,n

(31)

Probability Binomial distribution.svg
Fig 8: Binomial distribution

Poisson distribution

\displaystyle
P(X=k)=e^{-\lambda}\frac{\lambda^{k}}{k!}, k=0,1,...

(32)

Probability Poisson distribution.svg
Fig 9: Poisson distribution

Note:

Notation \mathbf X and x

In recent literature, an uppercase letter, e.g., \mathbf X, is used to designate a random variable , whereas the corresponding lowercase letter, e.g., x, is used to designate the real variable that is the upper bound of \mathbf X.

Kolmogorov, 1933[4]; Famous work influencing subsequent mathematical probability and statistics works.

\displaystyle
F^{(x)}(a) = \displaystyle \int_{-\infty}^{a}f^{(x)}(a)da

(33)

(Kolmogorov, 1933 p.24[4])

[edit] Expectations and Moments

The expectation, also mean value or the first moment, of a random variable \displaystyle X is defined as:

For continuous distribution, with the probability density function \displaystyle f_X(x) ,

\displaystyle
\mu_X = \mathbb E[X] := \int_{-\infty}^{+\infty}xf_X(x)dx

(31)


The expectation of \displaystyle g(X), where \displaystyle g is a real-valued function, is:

\displaystyle
\mathbb E[g(X)] := \int_{-\infty}^{+\infty}g(X)f_X(x)dx

(32)


The \displaystyle kth moment of random variable \displaystyle X, where \displaystyle k \in \mathbb N, is:

\displaystyle
\mathbb E[X^k] := \int_{-\infty}^{+\infty}x^kf_X(x)dx

(33)


The variance of random variable \displaystyle X, \displaystyle \sigma_X^2, is:

\displaystyle
\sigma_X^2 = var(X) := \int_{-\infty}^{+\infty}(x-\mu_X)^2f_X(x)dx =: \mathbb E[(X-\mu_X)^2]

(34)


For discrete distribution, with the probability \displaystyle p_n=P(X = x_n) , the definition of above terms are:

\displaystyle
\mu_X = \mathbb E(X) := \sum_{n=1}^{+\infty}x_{n}p_{n}

(35)


\displaystyle
\mathbb E[g(X)] := \sum_{n=1}^{+\infty}g(X_{n})p_{n}

(36)


\displaystyle
\mathbb E[X^k] := \sum_{n=1}^{+\infty}x_{n}^{k}p_{n}

(37)


\displaystyle
\sigma_X^2 = var(X) := \sum_{n=1}^{+\infty}(x_{n}-\mu_X)^2p_{n} =: \mathbb E[(X-\mu_X)^2]

(38)


From equation (34)&(38), we can deduce that

\displaystyle
\sigma_X^2 = \mathbb E[(X-\mu_X)^2] = \mathbb E[X^2 - 2X\mu_X + \mu_X^2] = \mathbb E[X^2] - \mathbb E[2X\mu_X] + \mathbb E[\mu_X^2] = \mathbb E[X^2] - \mu_X^2

(39)


Variance equals the mean of the square minus the square of the mean.

The expectation of a random vector \displaystyle \mathbf X is

\displaystyle
\mu_{\mathbf X}=\mathbb E[\mathbf X] = (\mathbb E[X_1],\mathbb E[X_2],...,\mathbb E[X_n]).

(40)


A very frequently used quantity of random vector is the covariance matrix, which is defined as

\displaystyle
\mathbf C_{\mathbf X} = cov(X_i,X_j), i,j = 1,2,...,n.

(41)

where \displaystyle cov(X_i,X_j)=\mathbb E[(X_i-\mu_{X_i})(X_j-\mu_{X_j})]= \mathbb E[X_iX_j]-\mu_{X_i}\mu_{X_j} is the covariance of \displaystyle X_i and \displaystyle X_j. We have \displaystyle cov(X_i,X_i) = \sigma_{X_i}^2.

[edit] Convergence modes

Given a sequence of random variables \displaystyle X_1, X_2,..., we defined following convergence modes.

Convergence in distribution, \displaystyle X_n \overset{d}{\rightarrow} X

For all continuous points \displaystyle x of the distribution function \displaystyle F_X , if we have the relation

\displaystyle
F_{X_n}(x) \rightarrow F_X(x), \;\;\; n \rightarrow \infty

(42)

is satisfied.


Convergence in distribution is a weak convergence.

Convergence in probability, \displaystyle X_n \overset{P}{\rightarrow} X

If the probability of the difference between \displaystyle X_n and \displaystyle X , with \displaystyle n \rightarrow \infty larger than any positive \displaystyle \epsilon tends to zero, then we call \displaystyle X_n converges to \displaystyle X in probability,which can be written as \displaystyle X_n \overset{P}{\rightarrow} X .

\displaystyle
P(|X_n - X| > \epsilon) \rightarrow 0, \;\;\; n \rightarrow \infty

(43)

Convergence in probability implies convergence in distribution. The converse is true if and only if \displaystyle X = x for some constant \displaystyle x.

Almost sure Convergence, \displaystyle X_n \overset{a.s.}{\rightarrow} X


\displaystyle L^p Convergence, \displaystyle X_n \overset{L^p}{\rightarrow} X



\displaystyle

[edit] Questions&Answers

[edit] References

  1. 1.0 1.1 1.2 Xiu, D., Numerical methods for stochastic computations: A spectral method approach, Princeton University Press, 2010
  2. 2.0 2.1 2.2 Shao, J., Mathematical statistics, 2nd edition, Springer, 2007
  3. S. B. Pope, Turbulent Flows, 1st edition, Cambridge University Press, 2000
  4. 4.0 4.1 A.N. Kolmogorov, Foundations of the theory of Probability, Chelsea Publishing Co., 1933.


Personal tools

Variants
Actions
Navigation
Community
Toolbox
Wikimedia projects
Print/export