The Central Limit Theorem
March 10, 2026
Around a week ago, my professor in Math 151 brought up the central limit theorem. I remember feeling like the proof was extremely scuffed (and it may be so!), so I'm excited to share that pain here. Before we begin embarking on this proof, it would be best to give a quick statement of the CLT.
(Central Limit Theorem) Let \(X_1,X_2,\dots\) be random variables that are i.i.d., each with \(\mathbb E[X_i]=\mu\) and \(\mathrm{Var}X_i=\sigma^2\). Define the "running sum" to be \(S_n=\sum_{i=1}^n X_i.\) The Central Limit Theorem states that $$ \frac{S_n-n\mu}{\sigma\sqrt n} \overset{d}{\to} \mathcal N(0,1)$$ as \(n\to\infty\).
We will prove a sufficient condition for the CLT today - often referred to as Lindeberg's CLT. To do this, we will assume a lemma, which is left to the reader as an exercise. The proof of this lemma is also extremely scuffed and it may be the subject of a future blog post.
(Lemma 1) Let \(X_1,X_2,\dots\) be a sequence of random variables. Let \(X\) also be a random variable. Then \(X_n\overset d\to X\) if and only if \(\mathbb E[h(X_n)]\overset{n\to\infty}\to\mathbb E[h(X)]\) for all bounded functions \(h\in C^\infty\) with bounded derivatives.
Let \(T_n = \frac{S_n-n\mu}{\sigma\sqrt n}\), \(Z\sim\mathcal N(0,1)\), and \(\varepsilon>0\). A sufficient condition for the CLT (by Lindeberg) is to assume that the third moment of \(X_i-\mu\) is bounded. Then, for every function \(h\in C^3\) where \(|h'''(x)|\leq c<\infty\) holds for all \(x\in\mathbb R\), we can claim that $$ |\mathbb E[h(T_n)]-\mathbb E[h(Z)]|\leq \varepsilon$$ whenever $$ \varepsilon \geq \frac{c}{6\sqrt n}\left(\frac{\mathbb E[|X_1-\mu|^3]}{\sigma^3}+\mathbb E[|Z|^3]\right).$$ As a consequence, we can conclude that \(T_n\overset d\to Z\).
To see why this is true, we have to define a bunch of auxiliary random variables. First, let \(Y_i=\frac{X_i-\mu}{\sigma\sqrt n}\) so that \(T_n=\sum_{i=1}^n Y_i\). In particular, notice that \(Y_1,Y_2,\dots,Y_n\) are i.i.d. with mean \(0\) and variance \(1/n\). Next, find \(Z_1,Z_2,\dots, Z_n\) random variables that are i.i.d. \(\mathcal N(0,1/n)\) such that \(Z=\sum_{i=1}^n Z_i\). At this point, we are ready to define an "interpolation" of our random variables \(Y_i\) and \(Z_i\). Let $$ A_i=Y_1+Y_2+\cdots+Y_{i}+Z_{i+1}+Z_{i+2}+\cdots+Z_n $$ and $$ B_i=Y_1+Y_2+\cdots+Y_{i-1}+Z_{i+1}+Z_{i+2}+\cdots +Z_n. $$ The neat property about these random variables is that \(A_0=Z\) and \(A_n=T_n\). It is also useful to note that \(A_i=B_i+Y_i\) and \(A_{i-1}=B_i+Z_i\), which will be immensely useful later. You may be thinking that this is completely unmotivated, but trust me it works.
Let \(h\in C^3\) as in the theorem statement. The key strategy is to look at the difference \(h(T_n)-h(Z)\) first, relating it back to the random variables \(A_i\) and \(A_{i-1}\). Then, we can take the expectation. To this end, notice that through a telescoping series, we have $$ h(T_n)-h(Z) = \sum_{i=1}^n(h(A_i)-h(A_{i-1})).$$ Let's perform some estimates on \(h(A_i)\) first. By a second-degree Taylor expansion with the Lagrange error bound, we have $$ h(x+\Delta x)= h(x)+h'(x)\Delta x+\frac 1 2h''(x)(\Delta x)^2+R_2(\Delta x),$$ where \(|R_2(\Delta x)|\leq\frac{c|\Delta x|^3}{3!}\) for the constant \(c\) as given in the theorem statement. In pursuit of \(h(A_i)\), let \(x=B_i\) and \(\Delta x=Y_i\) so that $$\begin{align*} h(x+\Delta x)&=h(B_i+Y_i)\\ &=h(B_i+(A_i-B_i))\\ &=h(A_i). \end{align*}$$ Substituting this into our Taylor approximation gives $$ h(A_i)=h(B_i)+h'(B_i)Y_i+\frac 1 2h''(B_i)Y_i^2+R_2(Y_i).$$ Upon applying the bound on the remainder after rearrangement, we get $$ |h(A_i)-h(B_i)-h'(B_i)Y_i-\frac 12h''(B_i)Y_i^2|\leq \frac{c|Y_i|^3}{6}. $$ Repeating this analogously for \(x=B_i\) and \(\Delta x=Z_i\) in pursuit of \(h(A_{i-1})\) yields $$ |h(A_{i-1})-h(B_i)-h'(B_i) Z_i-\frac 1 2 h''(B_i) Z_i^2|\leq\frac{c|Z_i|^3}{6}.$$ We will now compute the expectations of these. Let $$ \begin{cases}P_Y = h(B_i)+h'(B_i)Y_i+\frac 12h''(B_i)Y_i^2 \\ P_Z = h(B_i)+h'(B_i) Z_i+\frac 1 2 h''(B_i) Z_i^2.\end{cases}$$ Since \(|\mathbb E[X]|\leq \mathbb E[|X|]\), conclude that $$\begin{cases}|\mathbb E[h(A_i)-P_Y]|\leq\mathbb E[|h(A_i)-P_Y|]\leq\frac{c\mathbb E[|Y_i|^3]}{6} \\ |\mathbb E[h(A_{i-1})-P_Z]|\leq\mathbb E[|h(A_{i-1})-P_Z|]\leq\frac{c\mathbb E[|Z_i|^3]}{6}.\end{cases}$$ Crucially, notice by definition that \(Y_i\) and \(Z_i\) are independent of the values making up \(B_i\). This means we can split \(\mathbb E[h'(B_i)Y_i]=\mathbb E[h'(B_i)]\mathbb E[Y_i]\) and \(\mathbb E[h'(B_i)Z_i]=\mathbb E[h'(B_i)]\mathbb E[Z_i]\). Further, since we know \(\mathbb E[Y_i]=0\) for any \(i\), it is not so hard to put it all together and conclude $$\begin{cases} \left|\mathbb E[h(A_i)] - \mathbb E[h(B_i)] - \frac 1 2 \mathbb E[h''(B_i)] \cdot \frac 1 n\right|\leq\frac{c\mathbb E[|Y_i|^3]}{6}\\ \left|\mathbb E[h(A_{i-1})] - \mathbb E[h(B_i)] - \frac 1 2 \mathbb E[h''(B_i)] \cdot \frac 1 n\right|\leq\frac{c\mathbb E[|Z_i|^3]}{6}. \end{cases}$$ For the sake of convenience, let $$\begin{cases} \alpha:=\mathbb E[h(A_i)] - \mathbb E[h(B_i)] - \frac 1 2 \mathbb E[h''(B_i)] \cdot \frac 1 n\\ \beta:=\mathbb E[h(A_{i-1})] - \mathbb E[h(B_i)] - \frac 1 2 \mathbb E[h''(B_i)] \cdot \frac 1 n. \end{cases} $$ Now, we can use the triangle inequality to combine these: $$\begin{align*} |\mathbb E[h(A_i)]-\mathbb E[h(A_{i-1})]|&= |\alpha-\beta|\\ &\leq |\alpha|+|\beta|\\ &\leq \frac c6(\mathbb E[|Y_i|^3]+\mathbb E[|Z_i|^3]). \end{align*}$$ Finally, we can relate this back to \(|\mathbb E[h(T_n)]-\mathbb E[h(Z)]|\). By our telescoping sum and the triangle inequality, we can write $$\begin{align*} |\mathbb E[h(T_n)]-\mathbb E[h(Z)]|&\leq \sum_{i=1}^n |\mathbb E[h(A_i)]-\mathbb E[h(A_{i-1})]| \tag{$\Delta$-inequality}\\ &\leq \frac c 6\sum_{i=1}^n(\mathbb E[|Y_i|^3]+\mathbb E[|Z_i|^3]). \end{align*}$$ Substituting our definitions for \(Y_i\) and \(Z_i\) will get us to our desired bound: $$\begin{align*} \sum_{i=1}^n(\mathbb E[|Y_i|^3]+\mathbb E[|Z_i|^3])&=\sum_{i=1}^n \left(\frac{\mathbb E[|X_1-\mu|^3]}{\sigma^3n^{3/2}}+\frac{\mathbb E[|Z|^3]}{n^{3/2}}\right)\\ &=n\cdot \left(\frac{\mathbb E[|X_1-\mu|^3]}{\sigma^3n^{3/2}}+\frac{\mathbb E[|Z|^3]}{n^{3/2}}\right)\\ &=\frac{1}{\sqrt n}\left(\frac{\mathbb E[|X_1-\mu|^3]}{\sigma^3}+\mathbb E[|Z|^3]\right), \end{align*}$$ where we have used the fact that \(X_i\) are i.i.d., so we choose \(\mathbb E[X_i]=\mathbb E[X_1]\) for simplicity. To put it together, we have now shown that $$|\mathbb E[h(T_n)]-\mathbb E[h(Z)]| \leq \frac{c}{6\sqrt n}\left(\frac{\mathbb E[|X_1-\mu|^3]}{\sigma^3}+\mathbb E[|Z|^3]\right)$$
Concluding, notice that as \(n\to\infty\), we have $$ |\mathbb E[h(T_n)]-\mathbb E[h(Z)]|\leq 0.$$ It follows (by the squeeze theorem) that \(\mathbb E[h(T_n)]\overset{n\to\infty}\to\mathbb E[h(Z)].\) Since \(h\in C^3\) was arbitrary and \(C^\infty\subset C^3\), we can invoke Lemma 1 to win: \(T_n\overset d\to Z.\)