Joint, Marginal, and Conditional Probability

Joint probability is the probability of two or more events occurring simultaneously.
Marginal probability is the probability of an event irrespective of the outcome of other variables.
Conditional probability is the probability of one event occurring in the presence of one or more other events.

Probability for One Random Variable

Probability for Multiple Random Variables

In machine learning, we are likely to work with many random variables. For example, given a table of data, such as in excel, each row represents a separate observation or event, and each column represents a separate random variable. Variables may be either discrete, meaning that they take on a finite set of values, or continuous, meaning they take on a real or numerical value. As such, we are interested in the probability across two or more random variables.

This is complicated as there are many ways that random variables can interact, which, in turn, impacts their probabilities. This can be simplified by reducing the discussion to just two random variables (X, Y ), although the principles generalize to multiple variables

Therefore, we will introduce the probability of multiple random variables as the probability of event A and event B, which in shorthand is X = A and Y = B. We assume that the two variables are related or dependent in some way. As such, there are three main types of probability we might want to consider; they are: * Joint Probability: Probability of events A and B. * Marginal Probability: Probability of event A given variable Y . * Conditional Probability: Probability of event A given event B.

These types of probability form the basis of much of predictive modeling with problems such as classification and regression. For example: * The probability of a row of data is the joint probability across each input variable. * The probability of a specific value of one input variable is the marginal probability across the values of the other input variables. * The predictive model itself is an estimate of the conditional probability of an output given an input example.

Joint Probability for Two Variables

We may be interested in the probability of two simultaneous events, e.g. the outcomes of two different random variables. The probability of two (or more) events is called the joint probability. The joint probability of two or more random variables is referred to as the joint probability distribution. For example, the joint probability of event A and event B is written formally as: P (A and B) = P (A ∩ B) = P (A, B)

The joint probability for events A and B is calculated as the probability of event A given event B multiplied by the probability of event B. This can be stated formally as follows:

P (A ∩ B) = P (A given B) × P (B)

The calculation of the joint probability is sometimes called the fundamental rule of probability or the product rule of probability. Here, P (A given B) is the probability of event A given that event B has occurred, called the conditional probability, described below. The joint probability is symmetrical, meaning that P (A ∩ B) is the same as P (B ∩ A).

Marginal Probability

We may be interested in the probability of an event for one random variable, irrespective of the outcome of another random variable. For example, the probability of X = A for all outcomes of Y . The probability of one event in the presence of all (or a subset of) outcomes of the other random variable is called the marginal probability or the marginal distribution. The marginal probability of one random variable in the presence of additional random variables is referred to as the marginal probability distribution.

P (X = A) = y∈YX P (X = A, Y = y)

This is another important foundational rule in probability, referred to as the sum rule. The marginal probability is different from the conditional probability (described next) because it considers the union of all events for the second variable rather than the probability of a single event.

Conditional Probability

We may be interested in the probability of an event given the occurrence of another event. The probability of one event given the occurrence of another event is called the conditional probability. The conditional probability of one to one or more random variables is referred to as the conditional probability distribution. For example, the conditional probability of event A given event B is written formally as:

P (A given B) = P (A|B)

The conditional probability for events A given event B is calculated as follows: P (A|B) = P (A ∩ B) / P (B)

Probability for Independence and Exclusivity

When considering multiple random variables, it is possible that they do not interact. We may know or assume that two variables are not dependent upon each other instead are independent. Alternately, the variables may interact but their events may not occur simultaneously, referred to as exclusivity.

Independence

If one variable is not dependent on a second variable, this is called independence or statistical independence. This has an impact on calculating the probabilities of the two variables. For example, we may be interested in the joint probability of independent events A and B, which is the same as the probability of A and the probability of B. Probabilities are combined using multiplication, therefore the joint probability of independent events is calculated as the probability of event A multiplied by the probability of event B. This can be stated formally as follows: Joint Probability : P (A ∩ B) = P (A) × P (B)

As we might intuit, the marginal probability for an event for an independent random variable is simply the probability of the event. It is the idea of probability of a single random variable that are familiar with:

Marginal Probability : P (A)

We refer to the marginal probability of an independent probability as simply the probability.

Similarly, the conditional probability of A given B when the variables are independent is simply the probability of A as the probability of B has no effect. For example:

Conditional Probability : P (A|B) = P (A)

We may be familiar with the notion of statistical independence from sampling. This assumes that one sample is unaffected by prior samples and does not affect future samples. Many machine learning algorithms assume that samples from a domain are independent to each other and come from the same probability distribution, referred to as independent and identically distributed, or i.i.d. for short.

Exclusivity

If the occurrence of one event excludes the occurrence of other events, then the events are said to be mutually exclusive. The probability of the events are said to be disjoint, meaning that they cannot interact, are strictly independent. If the probability of event A is mutually exclusive with event B, then the joint probability of event A and event B is zero. P (A ∩ B) = 0.0

Instead, the probability of an outcome can be described as event A or event B, stated formally as follows:

P (A or B) = P (A ∪ B) = P (A) + P (B)

If the events are not mutually exclusive, we may be interested in the outcome of either event. The probability of non-mutually exclusive events is calculated as the probability of event A and the probability of event B minus the probability of both events occurring simultaneously. This can be stated formally as follows:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)