# Probability cheatsheet

*By Afshine Amidi and Shervine Amidi*

## Introduction to Probability and Combinatorics

Sample space The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by $S$.

Event Any subset $E$ of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in $E$, then we say that $E$ has occurred.

Axioms of probability For each event $E$, we denote $P(E)$ as the probability of event $E$ occurring.

*Axiom 1* ― Every probability is between 0 and 1 included, i.e:

*Axiom 2* ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:

*Axiom 3* ― For any sequence of mutually exclusive events $E_1, ..., E_n$, we have:

Permutation A permutation is an arrangement of $r$ objects from a pool of $n$ objects, in a given order. The number of such arrangements is given by $P(n, r)$, defined as:

Combination A combination is an arrangement of $r$ objects from a pool of $n$ objects, where the order does not matter. The number of such arrangements is given by $C(n, r)$, defined as:

Remark: we note that for $0\leqslant r\leqslant n$, we have $P(n,r)\geqslant C(n,r)$.

## Conditional Probability

Bayes' rule For events $A$ and $B$ such that $P(B)>0$, we have:

Remark: we have $P(A\cap B)=P(A)P(B|A)=P(A|B)P(B)$.

Partition Let $\{A_i, i\in[\![1,n]\!]\}$ be such that for all $i$, $A_i\neq\varnothing$. We say that $\{A_i\}$ is a partition if we have:

Remark: for any event $B$ in the sample space, we have $\displaystyle P(B)=\sum_{i=1}^nP(B|A_i)P(A_i)$.

Extended form of Bayes' rule Let $\{A_i, i\in[\![1,n]\!]\}$ be a partition of the sample space. We have:

Independence Two events $A$ and $B$ are independent if and only if we have:

## Random Variables

### Definitions

Random variable A random variable, often noted $X$, is a function that maps every element in a sample space to a real line.

Cumulative distribution function (CDF) The cumulative distribution function $F$, which is monotonically non-decreasing and is such that $\underset{x\rightarrow-\infty}{\textrm{lim}}F(x)=0$ and $\underset{x\rightarrow+\infty}{\textrm{lim}}F(x)=1$, is defined as:

Remark: we have $P(a < X\leqslant B)=F(b)-F(a)$.

Probability density function (PDF) The probability density function $f$ is the probability that $X$ takes on values between two adjacent realizations of the random variable.

### Relationships involving the PDF and CDF

Discrete case Here, $X$ takes discrete values, such as outcomes of coin flips. By noting $f$ and $F$ the PDF and CDF respectively, we have the following relations:

On top of that, the PDF is such that:

Continuous case Here, $X$ takes continuous values, such as the temperature in the room. By noting $f$ and $F$ the PDF and CDF respectively, we have the following relations:

On top of that, the PDF is such that:

## Expectation and Moments of the Distribution

In the following sections, we are going to keep the same notations as before and the formulas will be explicitly detailed for the discrete **(D)** and continuous **(C)** cases.

Expected value The expected value of a random variable, also known as the mean value or the first moment, is often noted $E[X]$ or $\mu$ and is the value that we would obtain by averaging the results of the experiment infinitely many times. It is computed as follows:

Generalization of the expected value The expected value of a function of a random variable $g(X)$ is computed as follows:

$k^{th}$ moment The $k^{th}$ moment, noted $E[X^k]$, is the value of $X^k$ that we expect to observe on average on infinitely many trials. It is computed as follows:

Remark: the $k^{th}$ moment is a particular case of the previous definition with $g:X\mapsto X^k$.

Variance The variance of a random variable, often noted Var$(X)$ or $\sigma^2$, is a measure of the spread of its distribution function. It is determined as follows:

Standard deviation The standard deviation of a random variable, often noted $\sigma$, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:

Characteristic function A characteristic function $\psi(\omega)$ is derived from a probability density function $f(x)$ and is defined as:

Euler's formula For $\theta \in \mathbb{R}$, the Euler formula is the name given to the identity:

Revisiting the $k^{th}$ moment The $k^{th}$ moment can also be computed with the characteristic function as follows:

Transformation of random variables Let the variables $X$ and $Y$ be linked by some function. By noting $f_X$ and $f_Y$ the distribution function of $X$ and $Y$ respectively, we have:

Leibniz integral rule Let $g$ be a function of $x$ and potentially $c$, and $a, b$ boundaries that may depend on $c$. We have:

## Probability Distributions

Chebyshev's inequality Let $X$ be a random variable with expected value $\mu$. For $k, \sigma>0$, we have the following inequality:

Discrete distributions Here are the main discrete distributions to have in mind:

Distribution |
$P(X=x)$ |
$\psi(\omega)$ |
$E[X]$ |
$\textrm{Var}(X)$ |
Illustration |

$X\sim\mathcal{B}(n, p)$ | $\displaystyle\binom{n}{x} p^xq^{n-x}$ | $(pe^{i\omega}+q)^n$ | $np$ | $npq$ | |

$X\sim\textrm{Po}(\mu)$ | $\displaystyle \frac{\mu^x}{x!}e^{-\mu}$ | $e^{\mu(e^{i\omega}-1)}$ | $\mu$ | $\mu$ |

Continuous distributions Here are the main continuous distributions to have in mind:

Distribution |
$f(x)$ |
$\psi(\omega)$ |
$E[X]$ |
$\textrm{Var}(X)$ |
Illustration |

$X\sim\mathcal{U}(a, b)$ | $\displaystyle \frac{1}{b-a}$ | $\displaystyle\frac{e^{i\omega b}-e^{i\omega a}}{(b-a)i\omega}$ | $\displaystyle\frac{a+b}{2}$ | $\displaystyle\frac{(b-a)^2}{12}$ | |

$X\sim\mathcal{N}(\mu, \sigma)$ | $\displaystyle \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ | $e^{i\omega\mu-\frac{1}{2}\omega^2\sigma^2}$ | $\mu$ | $\sigma^2$ | |

$X\sim\textrm{Exp}(\lambda)$ | $\displaystyle \lambda e^{-\lambda x}$ | $\displaystyle\frac{1}{1-\frac{i\omega}{\lambda}}$ | $\displaystyle\frac{1}{\lambda}$ | $\displaystyle\frac{1}{\lambda^2}$ |

## Jointly Distributed Random Variables

Joint probability density function The joint probability density function of two random variables $X$ and $Y$, that we note $f_{XY}$, is defined as follows:

Marginal density We define the marginal density for the variable $X$ as follows:

Cumulative distribution We define cumulative distrubution $F_{XY}$ as follows:

Conditional density The conditional density of $X$ with respect to $Y$, often noted $f_{X|Y}$, is defined as follows:

Independence Two random variables $X$ and $Y$ are said to be independent if we have:

Moments of joint distributions We define the moments of joint distributions of random variables $X$ and $Y$ as follows:

Distribution of a sum of independent random variables Let $Y=X_1+...+X_n$ with $X_1, ..., X_n$ independent. We have:

Covariance We define the covariance of two random variables $X$ and $Y$, that we note $\sigma_{XY}^2$ or more commonly $\textrm{Cov}(X,Y)$, as follows:

Correlation By noting $\sigma_X, \sigma_Y$ the standard deviations of $X$ and $Y$, we define the correlation between the random variables $X$ and $Y$, noted $\rho_{XY}$, as follows:

Remark 1: we note that for any random variables $X, Y$, we have $\rho_{XY}\in[-1,1]$.

Remark 2: If X and Y are independent, then $\rho_{XY} = 0$.