Sidon Sets

March 13, 2026

Background

Sidon sets are an interesting combinatorical object I came across when studying field theory. They often come up in Fourier analysis, but a pure combinatorical study is also possible. In this blog post, we will go over the definition of a Sidon set, an upper bound on its size as given by Erdős and Turán, and a finite field construction due to Singer and Bose.

(Definition) A Sidon Set \(\mathcal A=\{a_1,a_2,\dots,a_k\}\subset [1,N]\) is a subset of the integers such that every \(a_i,a_j\in\mathcal A\) with \(1\leq i\leq j\leq k\) admits a unique pairwise sum \(a_i+a_j\).

Immediately, we can find an (admittedly bad) upper bound for the size \(k=|\mathcal A|\). To do this, we will look at the size of all possible distinct sums \(a_i+a_j\) where we pick \(a_i,a_j\in\mathcal A\). Since the inequality \(i\leq j\) is not strict, we will choose two elements from the Sidon set with replacement. The correct combinatorical technique is then stars and bars: $$ \binom{k+2-1}{2}=\frac{k(k+1)}{2}.$$ We can get an upper bound by considering all the integers in the interval \([2,2N]\) as \(2\leq a_i+a_j\leq 2N.\) There are \(2N-1\) integers here, so $$ \frac{k(k+1)}{2}\leq 2N-1. $$ Some crude estimation then gives $$ k<\sqrt{2N}+\frac12.$$

Erdős-Turán Bound

We can do so much better! In fact, Erdős and Turán proved that \(|\mathcal A|\leq \sqrt N+\sqrt 2 N^\frac 14.\) This proof is especially cool since it uses concepts from probability theory to justify bounds for the Sidon set.

To recreate the proof due to Erdős and Turán, we will first consider a Sidon set \(\mathcal A\subset [1,N]\) and choose a positive integer \(x\in\mathbb Z^+\) to be determined later. Let \(M\sim\mathrm{Unif}((-x, N])\) be a random variable and define the transformation \(\mathcal A(M)=\mathcal A\cap (M,M+x]\). We can intepret \(\mathcal A(M)\) as the set of Sidon elements in the interval \([M,M+x)\) where \(-x\leq M\leq N-1\). By "sliding" the random variable \(M\) and looking at the expected value of \(|\mathcal A(M)|\), we can get some good bounds!

Since \(|\mathcal A(M)|\) is a transformation of the random variable \(M\), we can use LOTUS to compute the expected value: $$\begin{align*} \mathbb E[|\mathcal A(M)|] &= \sum_{m} |\mathcal A(m)| \mathbb P(M=m)\\ &= \sum_{m} |\mathcal A(m)| \left(\frac{1}{N+x-1}\right)\\ &= \frac{1}{N+x-1} \sum_{m} |\mathcal A(m)|. \end{align*}$$ Now, notice that as \(\sum_{m} |\mathcal A(m)|\) slides across \((-x, N]\) covering the entire Sidon set, there are \(x\) different starting values for \(m\) that capture a specific \(a\in\mathcal A\). Hence \(\sum_{m} |\mathcal A(m)| = kx\), where \(k=|\mathcal A|\), so $$ \mathbb E[|\mathcal A(M)|] = \frac{kx}{N+x-1}.$$ Great! Now we just need to relate this back to \( |\mathcal A(m)|\). In search of more information, let's compute the second moment by LOTUS once again: $$\begin{align*} \mathbb E[|\mathcal A(M)|^2] &= \sum_{m} |\mathcal A(m)|^2 \mathbb P(M=m)\\ &=\frac{1}{N+x-1} \sum_{m} |\mathcal A(m)|^2. \end{align*}$$ As \(\mathrm{Var}(X)\geq 0\) for any random variable \(X\) and \(\mathrm{Var}(X)=\mathbb E[X^2]-(\mathbb E X)^2\), by monotonicity of expectation and the moment calculations above, we have $$ \mathbb E[|\mathcal A(M)|^2] \geq (\mathbb E[|\mathcal A(M)|])^2 \implies \sum_{m} |\mathcal A(m)|^2 \geq \frac{(kx)^2}{N+x-1}.$$

The main problem we're facing is that we don't actually know how to find an exact value for \(\sum_{m} |\mathcal A(m)|^2\). However, we do know how to find an upper bound for it. Consider the quantity $$ \begin{align*} \sum_{m} \binom{|\mathcal A(m)|}{2}&=\frac 12 \sum_m |\mathcal A(m)|^2-\frac 12\sum_m |\mathcal A(m)|\\ &=\frac 12 \sum_m |\mathcal A(m)|^2-\frac 12kx. \end{align*}$$ Intuitively, we are counting the number of times any pair of elements \(a_i,a_j\in\mathcal A\) appear in the same sliding interval, as \(\mathcal A(m)\) slides through the entirety of the Sidon set.

In order to compute the above, let's reframe the problem slightly: now we look at a specific pair \(a,b\in\mathcal A\) and see how many times it appears in different sliding intervals. Let the distance be \(a-b=r\), where WLOG we assume \(a> b\). For both elements to fit within an interval of length \(x\), their difference must be bounded by \(1\leq r\leq x-1\). Now, to count all the sliding intervals containing \(\{a,b\}\), the valid starting points of \(m\) are bounded by \(a-x< m\leq b\). This is because the interval must start at (or below) the smaller element \(b\) and end strictly greater than the larger element. The number of integers within this interval is \(b-(a-x)=x-r\). Therefore any pair of elements with distance \(r\) will be contained in exactly \(x-r\) sliding intervals. By summing over all distances, we are counting the total number of times any pair of elements \(a_i,a_j\in\mathcal A\) appear in the same sliding interval.

Symbolically, $$\begin{align*} \sum_{m} \binom{|\mathcal A(m)|}{2}&=\sum_{\substack{a,b\in\mathcal A\\ a-b=r \\ 1\leq r\leq x-1}}(x-r)\\ &\leq \sum_{r=1}^{x-1}(x-r)\tag{Sidon property}\\ &=\frac{x(x-1)}{2}. \end{align*}$$ Slightly clarifying on the Sidon property, we know that there is at most one pair in \(\mathcal A\) for every distance \(r\). Assuming that every distance \(1\leq r\leq x-1\) is accounted for, we get our upper bound. Finally, we solve for \(\sum_m |\mathcal A(m)|^2\) to yield $$ \sum_m |\mathcal A(m)|^2 \leq x(x-1)+kx.$$ Combining lower and upper bounds and solving for \(k^2\), we get $$ k^2\leq \left (1+\frac kx\right)(N+x-1).$$ Using calculus to minimize in terms of \(x\), it turns out our optimal choice is \(x=\lceil N^\frac 34\rceil\). Some crude estimates upon substitution then yield $$ k\leq \sqrt N+\sqrt 2 N^\frac 14.$$

Singer-Bose Construction

Now that we have shown that we can find a better bound, let's prove (due to Singer and Bose) that there exists a Sidon set of size \(q\), where \(q=p^k\) is a prime power, in the interval \([1,q^2-1]\). We will primarily be working in \(\mathbb F_{p^{2k}}^\times\), the multiplicative group of units of \(\mathbb F_{p^{2k}}\), which is cyclic. To this end, pick a generator \(\alpha\) of \(\mathbb F_{p^{2k}}^\times\).

Notice that as \(d\) ranges from \(1\leq d\leq p^{2k}-1\), the elements \(\alpha^d-\alpha\in \mathbb F_{p^{2k}}^\times\) are all unique (I encourage the reader to check this!) and range over the entire set of elements of \(\mathbb F_{p^{2k}}^\times\) with the exception of \(-\alpha\): this is because we can never have \(\alpha^d=0\). In particular, since \(\mathbb F_{p^{k}}\subset\mathbb F_{p^{2k}}\), and \(-\alpha\not\in \mathbb F_{p^{k}}^\times\) (otherwise it would not be a generator of \( \mathbb F_{p^{2k}}^\times\)), we know that \(\alpha^d-\alpha\) must range through every element of \( \mathbb F_{p^{k}}\).

To this end, define $$\mathcal A =\{d\in[1,p^{2k}-1]:\alpha^d-\alpha \in \mathbb F_{p^k}\}.$$ We claim that \(\mathcal A\) is a Sidon set of size \(q\). Suppose for contradiction that \(\mathcal A\) is not a Sidon set. That means we have distinct pairs \(\{d_1, d_2\}\neq \{d_3,d_4\}\in\mathcal A\) where WLOG \(d_1\leq d_2\) and \(d_3\leq d_4\) such that \(d_1+d_2=d_3+d_4\). Now, define $$a_i=\alpha^{d_i}-\alpha\in \mathbb F_{p^k}^\times \implies \alpha^{d_i}=a_i+\alpha.$$ It follows that $$ d_1+d_2=d_3+d_4 \implies \alpha^{d_1}\alpha^{d_2}=\alpha^{d_3}\alpha^{d_4}.$$ When viewed in light of our definition, we have $$ d_1+d_2=d_3+d_4 \implies (a_1+\alpha)(a_2+\alpha)=(a_3+\alpha)(a_4+\alpha); $$ expanding yields $$ (a_1+a_2)\alpha+a_1a_2=(a_3+a_4)\alpha+a_3a_4.$$ Since \(\alpha\not\in \mathbb F_{p^k}\) (otherwise it would not be a generator of \( \mathbb F_{p^{2k}}^\times\)), we must satisfy \(a_1+a_2=a_3+a_4\) and \(a_1a_2=a_3a_4\).

These two conditions are essentially Vieta's formulas in reverse, so we must necessarily have $$(x+a_1)(x+a_2)=(x+a_3)(x+a_4)$$ in \(\mathbb F_{p^k}[x]\). Since \(\mathbb F_{p^k}\) is a field, \(\mathbb F_{p^k}[x]\) is a Euclidean Domain (and subsequently a UFD). By unique factorization, we must have \(a_1=a_3\) or \(a_1=a_4\). Viewed in light of the definition of \(a_i\), this means \(d_1=d_3\) and \(d_2=d_4\), or \(d_1=d_4\) and \(d_2=d_3\), forming our contradiction.