Bayesian statistics

< School:Mathematics/Undergraduate/Probability and Statistics

Subject classification: this is a statistics resource.

Type classification: please help Wikiversity by tagging this resource with a resource type.

Educational level: this is a tertiary (university) resource.

Bayesian vs. Frequentist Statistics[edit | edit source]

Resampling vs. Bayesian Computation[edit | edit source]

Typically, the question one attempts to answer using statistics is that there is a relationship between two variables. To demonstrate that there is a relationship the experimenter must show that when one variable changes the second variable changes and that the amount of change is more than would be likely from mere chance alone.

There are two ways to figure the probability of an event. The first is to do a mathematical calculation to determine how often the event can happen. The second is to observe how often the event happens by counting the number of times the event could happen and also counting the number of times the event actually does happen.

The use of a mathematical calculation is when a person can say that the chance of the event rolling a one on a six sided die is one in six. The probability is figured by figuring the number of ways the event can happen and divide that number by the total number of possible outcomes. Another example is in a well shuffled deck of cards, what is the probability of the event of drawing a three. The answer is four in fifty two since there are four cards numbered three and there are a total of fifty two cards in a deck. The chance of the event of drawing a card in the suite of diamonds is thirteen in fifty two (there are thirteen cards of each of the four suites). The chance the event of drawing the three of diamonds is one in fifty two.

Sometimes, the size of the total event space, the number of different possible events, is not known. In that case, you will need to observe the event system and count the number of times the event actually happens versus the number of times it could happen but doesn't.

For instance, a warranty for a coffee maker is a probability statement. The manufacturer calculates that the probability the coffee maker will stop working before the warranty period ends is low. The way such a warranty is calculated involves testing the coffee maker to calculate how long the typical coffee maker continues to function. Then the manufacturer uses this calculation to specify a warranty period for the device. The actual calculation of the coffee maker's life span is made by testing coffee makers and the parts that make up a coffee maker and then using probability to calculate the warranty period.

Experiments, Outcomes and Events[edit | edit source]

The easiest way to think of probability is in terms of experiments and their potential outcomes. Many examples can be drawn from everyday experience: On the drive home from work, you can encounter a flat tire, or have an uneventful drive; the outcome of an election can include either a win by candidate A, B, or C, or a runoff.

Definition: The entire collection of possible outcomes from an experiment is termed the sample space, indicated as $\Omega$ (Omega)

The simplest (albeit uninteresting) example would be an experiment with only one possible outcome, say $A$ . From elementary set theory, we can express the sample space as follows:

$\Omega =\{A\}$

A more interesting example is the result of rolling a six sided die. The sample space for this experiment is:

$\Omega =\{1,2,3,4,5,6\}$

We may be interested in events in an experiment.

Definition: An event is some subset of outcomes from the sample space

In the die example, events of interest might include
a) the outcome is an even number
b) the outcome is less than three

These events can be expressed in terms of the possible outcomes from the experiment:
a) : $\{2,4,6\}$
b) : $\{1,2\}$

We can borrow definitions from set theory to express events in terms of outcomes. Here is a refresher of some terminology, and some new terms that will be important later:
$\cup$ represents the Union of two events

$\cap$ represents the Intersection of two events

$\{\cdots \}^{c}$ represents the complement of an event. For instance, "the outcome is an even number" is the complement of "the outcome is an odd number" in the dice example.
$A\backslash B$ represents difference, that is, $A$ but not $B$ . For example, we may be interested in the event of drawing the queen of spades from a deck of cards. This can be expressed as the event of drawing a queen, but not drawing a queen of hearts, diamonds or clubs.
$\varnothing$ or $\{\}$ represent an impossible event
$\Omega$ represents a certain event
$A$ and $B$ are called disjoint events if $A\cap B=\varnothing$

Probability[edit | edit source]

Now that we know what events are, we should think a bit about a way to express the likelihood of an event occurring. The classical definition of probability comes from the following. If we can perform our experiment over and over in a way that is repeatable, we can count the number of times that the experiment gives rise to event $A$ . We also keep track of the number of times that we perform the same experiment. If we repeat the experiment a large enough number of times, we can express the probability of event $A$ as follows:
$P(A)={\frac {N_{A}}{N}}$
where $N_{A}$ is the number of times event $A$ occurred, and $N$ is the number of times the experiment was repeated. Therefore the equation can be read as "the probability of event $A$ equals the number of times event $A$ occurs divided by the number of times the experiment was repeated (or the number of times event $A$ could have occurred)." As $N$ approaches infinity, the fraction above approaches the true probability of the event $A$ . The value of $P(A)$ is clearly between 0 and 1. If our event is the certain event $\Omega$ , then for each time we perform the experiment, the event $\Omega$ is observed; $N_{\Omega }=N$ and $P(\Omega )=1$ . If our event is the impossible event $\varnothing$ , we know $N_{\varnothing }=0$ and $P(\varnothing )=0$ .

If $A$ and $B$ are disjoint events, then whenever event $A$ is observed, then it is impossible for event $B$ to be observed simultaneously. Therefore the number of times events $A$ union $B$ occurs are equal to the number of times event $A$ occurred plus the number of times $B$ occurs. This can be expressed as:
$N(A\cup B)=N(A)+N(B)$
Given our definition of probability, we can arrive at the following:
$P(A\cup B)=P(A)+P(B)$

At this point it's worth remembering that not all events are disjoint events. For events that are not disjoint, we end up with the following probability definition.
$P(A\cup B)=P(A)+P(B)-P(A\cap B)$
How can we see this from example? Well, let's consider drawing from a deck of cards. I'll define two events: "drawing a Queen", and "drawing a Spade". It is immediately clear that these are not disjoint events, because you can draw a queen that is also a spade. There are four queens in the deck, so if we perform the experiment of drawing a card, putting it back in the deck and shuffling (what statisticians refer to as sampling with replacement, we will end up with a probability of ${\frac {1}{13}}$ for a queen draw. By the same argument, we obtain a probability for drawing a spade as ${\frac {1}{4}}$ . The expression $P(A\cup B)$ here can be translated as "the chance of drawing a queen or a spade". If we incorrectly assume that for this case $P(A\cup B)=P(A)+P(B)$ , we can simply add our probabilities together for "the chance of drawing a queen or a spade" as ${\frac {1}{13}}+{\frac {1}{4}}$ . If we were to gather some data experimentally, we would find that our results would differ from the prediction -- the probability observed would be slightly less than ${\frac {1}{13}}+{\frac {1}{4}}$ . Why? Because we're counting the queen of spades twice in our expression, once as a spade, and again as a queen. We need to count it only once, as it can only be drawn with probability of ${\frac {1}{52}}$ . Still confused?

Proof: If $A$ and $B$ are not disjoint, we have to avoid the double counting problem by exactly specifying their union.
$A\cup B=A\cup (B\backslash A)$ so
$P(A\cup B)=P(A\cup (B\backslash A))$
$A$ and $B\backslash A$ are disjoint sets. We can then use the definition of disjoint events from above to express our desired result:
$P(A\cup B)=P(A)+P(B\backslash A)$
We also know that
$P(B\backslash A)=P(B)-P(B\cap A)$
so
$P(A\cup B)=P(A)+P(B)-P(B\cap A)$
Whew! Our first proof. I hope that wasn't too dry.

Conditional Probability[edit | edit source]

Many events are conditional on the occurrence of other events. Sometimes this coupling is weak. One event may become more or less probable depending on our knowledge that another event has occurred. For instance, the probability that your friends and relatives will call asking for money is likely to be higher if you win the lottery. In my case, I don't think this probability would change.

Let's get formal for a second and remember our original definition of probability.
$P(A)={\frac {N_{A}}{N}}$ Consider an additional event $B$ , and a situation where we are only interested in the probability of the occurrence of $A$ when $B$ occurs. A way at this probability is to perform a set of experiments (trials) and only record our results when the event $B$ occurs. In other words
${\frac {N_{A\cap B}}{N_{B}}}$
We can divide through on top and bottom by $N$ the total number of trials to get $P(A\cap B)/P(B)$ . We define this as 'conditional probability':
$P(A|B)={\frac {P(A\cap B)}{P(B)}}$
which when spoken, takes the sound "probability of $A$ given $B$ ."

Bayes' Law[edit | edit source]

On Wikipedia: Bayes' theorem

An important theorem in statistics and the cornerstone of Bayesian statistics is Bayes' Law (also known as Bayes' theorem), which states that

$P(B|A)={\frac {P(A|B)P(B)}{P(A)}}$ ,
It is easy to prove. We start with identical expressions for $P(A\cap B)$ .
We know that: $P(A\cap B)=P(B\cap A)$ ,
$P(A\cap B)=P(A|B)P(B)$ , and
$P(B\cap A)=P(B|A)P(A)$ .
Since $P(A\cap B)=P(B\cap A)$ ,
$P(A|B)P(B)=P(B|A)P(A)$ .

A simple rearrangement of above line gives us Bayes' Law.

Independence[edit | edit source]

Two events $A$ and $B$ are called independent if the occurrence of one has absolutely no effect on the probability of the occurrence of the other. Mathematically, this is expressed as:
$P(A\cap B)=P(A)P(B)$ .

Note that this implies that $P(A|B)=P(A)$ That is, adding the information about $B$ is irrelevant towards determining the likelihood of $A$ .

Random Variables[edit | edit source]

It's usually possible to represent the outcome of experiments in terms of integers or real numbers. For instance, in the case of conducting a poll, it becomes a little cumbersome to present the outcomes of each individual respondant. Let's say we poll ten people for their voting preferences (Republican - R, or Democrat - D) in two different electorial districts. Our results might look like this:
$\{RRRDRRDRRR\}$ and $\{DDDDDRDDDD\}$
But we're probably only interested in the overall breakdown in voting preference for each district. If we assign an integer value to each outcome, say 0 for Democrat and 1 for Republican, we can obtain a concise summary of voting preference by district simply by adding the results together.

Discrete and Continuous Random Variables[edit | edit source]

There are two important subclasses of random variables: discrete random variable (DRV) and continuous random variable (CRV). Discrete random variables take only countably many values. It means that we can list the set of all possible values that a discrete random variable can take, or in other words, the number of possible values in the set that the variable can take is finite. If the possible values that a DRV X can take are a0,a1,a2,...an, the probability that X takes each is p0=P(X=a0), p1=P(X=a1), p2=P(X=a2),...pn=P(X=an). All these probabilites are greater than or equal zero.

For continuous random variables, we cannot list all possible values that a continuous variable can take because the number of values it can take is extremely large. It means that there is no use to calculate the probability of each value separately because the probability that the variable takes a particular value is extremely small and can be considered zero P(X=x)=0).

Distribution Functions[edit | edit source]

Expectation Values[edit | edit source]

External links[edit | edit source]

Bayesian statistics

Contents

Bayesian vs. Frequentist Statistics[edit | edit source]

Resampling vs. Bayesian Computation[edit | edit source]

Experiments, Outcomes and Events[edit | edit source]

Probability[edit | edit source]

Conditional Probability[edit | edit source]

Bayes' Law[edit | edit source]

Independence[edit | edit source]

Random Variables[edit | edit source]

Discrete and Continuous Random Variables[edit | edit source]

Distribution Functions[edit | edit source]

Expectation Values[edit | edit source]

See also[edit | edit source]

External links[edit | edit source]

Navigation menu

Bayesian statistics

Bayesian vs. Frequentist Statistics[edit | edit source]

Resampling vs. Bayesian Computation[edit | edit source]

Experiments, Outcomes and Events[edit | edit source]

Probability[edit | edit source]

Conditional Probability[edit | edit source]

Bayes' Law[edit | edit source]

Independence[edit | edit source]

Random Variables[edit | edit source]

Discrete and Continuous Random Variables[edit | edit source]

Distribution Functions[edit | edit source]

Expectation Values[edit | edit source]

See also[edit | edit source]

External links[edit | edit source]

Navigation menu

Search