Lecture notes on recent improvements for the sunflower lemma


Presented January 2022, revised May 2022

This document gives a self-contained presentation of the proof of the (improved) sunflower lemma.

Definition. A sunflower with r petals is a collection of sets S1,...,Sr where for all {i,j}[r], SiSj has the same value. Equivalently, for any three sets Si,Sj,Sk, and no element in the universe is contained in only 2 of the sets.

Lemma 1. Original sunflower lemma.[ ER60] Set system containing k!(r1)k sets of size k, must contain sunflower with r petals.

Example 2. Application: Erdős + Sarkozy. Given T[n], let sum(T)=xTx. For all S{1,...,n}, there are subsets T1,...,Tr for r|S|/(logn)2 where sum(T1),sum(T2),...,sum(Tr+1) is arithmetic progression. Proof: have |S|logn(wlogn)logn different subsets of size logn with the same value, since there are only n2 possible values. By sunflower lemma, have a sunflower in here, S1,...,Sr; let V=SiSj be common area. Let subsets be T1=S1, T2=S1(S2\V), and T3=S1(S2\V)(S3\V), and so on.

Some history:

Theorem 3. There exists a universal constant C64 so that for any r1, k2, every set system containing (Crlogk)k sets of size k contains an r-petaled sunflower.

1 Entry material

Definition. Spread set; for R>1, random set A is R-spread if Pr[SA]R−|S| for all S. Set family {Ai}iI is R spread if uniformly picking random set in family is R-spread.

For example, if the set system {Ai}iI has the property that every set contains one of a special set of 3 elements, then {Ai}iI can be no more than 3-spread.

Remark 4. Given a random R-spread set A, let B be a random set for which BA almost surely. Then B is also R-spread.

Lemma 5. Locating a single petal. Assume {Ai}iI subset family on X is R-spread. Let V be a random subset of X drawn where each coordinate is independently included with probability 12r. Then with probability 12, V contains some Ai.

We now show that the sunflower lemma reduces to locating a single petal (Lemma 5).

Proof. Let A={Ai}iI be the given set system, with universe X, and all |Ai|k; we are given that |I|Rk. First, we show that it suffices to prove the sunflower lemma in the case where {Ai}iI is R-spread. Assume that A is not R-spread. For each set SX, define

Q(S):=R|S||{iI:SAi}|

and let S0 be the set maximizing this. As Q(S)=0 if |S|>k, it must be |S0|k. Also, Q()=|I|Rk. Let J={iI:S0Ai}. Note that J must be nonempty. Consider now the family of sets A={Aj\S0}jJ. Given a sunflower in A, we can form a sunflower in A by adding S0 to each petal. Also, A is R-spread on X, because for any SX, and jJ drawn u.a.r,

Pr[SAj\S0]=1SS0=|{iI:S0SAi}||{iI:S0Ai}|1SS0=Q(SS0)/R|S0S|Q(S0)/R|S0|1SS0=R|S0|−|S0S|=R−|S|

(The inequality comes from the fact Q(S0)Q(SS0). The fact that |S0||S0S|=|S| can be assumed because, if S and S0 intersect, then Pr[SAj\S0]=0.)

Thus, it suffices to prove the sunflower lemma if A is R-spread. Randomly partition X into 2r subsets V1,...,V2r, where for each index iX, we sample an independent uniform integer in [2r] and use it to determine which subset contains i. The distribution of each individual Vi will be the same as if each Vi were sampled from the product distribution where each coordinate is present with probability 12r. By Lemma 5, with probability 12, each Vi contains some AiA. Since the expected number of Vi containing a set in A is 2r·12=r, there must be a specific partition of subsets ˆV1,...,ˆV2r where at least r of them contain an element of A. Let S1,...,Sr be these contained elements. They are all in A, and all disjoint, because the ˆVi are disjoint. __

Definition 6. Given a random variable X and deterministic function f, a conditionally independent copy X of a variable X subject to a condition f(X)=f(X) is a random variable identically distributed to X, where f(X)=f(X) always holds, and the variables X and X are conditionally independent given f(X).

Lemma 7. Refinement inequality. Let A be an R-spread random subset of X. Let W be a random subset of X in which each element is included with probability δ, which is independent of A. Let (A,W) be a conditionally independent copy of (A,W) subject to AW=AW. Then A is distributed identically to A and satisfies:

AAWandE[A\W]1+Hb(δ)log(Rδ)E|A|.

We now reduce Lemma 5 to the preceding Lemma 7:

Proof. Let m be an integer chosen later, and let W1,W2,...,Wm be independent random subsets of X where each possible element of Wi is present with probability δ=1112r1/m, so that V=mi=1Wi contains each iX independently with probability 12r=1(1δ)m. For brevity, write Ui=ij=1Wj, and CR=1+Hb(δ)log(Rδ). Let A0 be uniformly randomly chosen from the set {Ai}iI – independent of the (Wi)i[m], and construct A1,...,Am iteratively as follows. Say we have constructed Ai, which is identically distributed to A0, may be correlated with (W1,W2,...,Wi), and satisfies AiA0Ui and E[Ai\Ui]CiRE|A0|. We construct a conditionally independent copy (Ai,Wi+1,Ui) of (Ai,Wi+1,Ui) subject to Ui=Ui and

(Ai\Ui)Wi+1=(Ai\Ui)Wi+1.

For each TX, we note that the conditioned random variable Bi,T=(Ai\Ui|Ui=T) is R-spread, because for any SX, we have:

Pr[SBi,T]=Pr[SAi\Ui|Ui=T]Pr[SA0|Ui=T]=Pr[SA0]R−|S|,

where we use the property that AiA0Ui, and that A0 is independent of Ui. Furthermore, with Bi,T=(Ai\Ui|Ui=T), we observe that Bi,T,Wi+1 is a conditionally independent copy of (Bi,T,Wi+1) subject to Bi,TWi+1=Bi,TWi+1. We apply Lemma 7 to (Bi,T,Wi+1), finding that Bi,TBi,TWi+1, and

EBi,T\Wi+1CRE|Bi,T|.

Combining the results of the lemma over all TX gives Ai\Ui(Ai\Ui)Wi+1, since for each possible value T of Ui, we have Ai\T(Ai\T)Wi+1. This implies AiAiUi+1. Similarly,

E[Ai\Ui+1]=TXPr[Ui=T]E[(Ai\Ui)\Wi+1|Ui=T]=TXPr[Ui=T]EBi,T\Wi+1CRTXPr[Ui=T]E|Bi,T|=CRTXPr[Ui=T]E[|Ai\Ui||Ui=T]=CRE|Ai\Ui|.

Finally, letting Ai+1=Ai, we observe that Ai+1 is identically distributed to A0, satisfies Ai+1A0Ui+1, and has

E[Ai+1\Ui+1]CRE|Ai\Ui|Ci+1RE|A0|.

Since |Ai|k for all i[N], we have |A0|k and hence E|A0|k; then using the fact that V=mi=1Wi=Ui, we have:

E|Am\V|CmRk

We want to prove that the right hand side is 12, because then E|Am\V|12; since |Am\V| is integral, this means that at least half the time, |Am\V|=0 and AmV holds. (Since Am is distributed identically to A0, this means that we can couple a uniform random index n in I to V so that AnV with probability 12.) Setting m=log(2k), we observe that E|Am\V|12 if CR2logRδ12 – or equivalently, R1δ22·2. Since

δ=1112r1/m=1112r1/log(2k)1log(2k)2r

it follows setting

R=24log(2k)·2r

is sufficient. (As we have assumed k2, log(2k)2logk, so we have R64log(k)r.) __

2 Interlude: Entropy lemmas

Definition 8. Given a random variable X with values in S, its Shannon entropy is

H(X):=xSPr[X=x]log1Pr[X=x].

The conditional Shannon entropy on event E is

H(X|E):=xSPr[X=x|E]log1Pr[X=x|E]

and conditioning on random variable Y with range T gives

H(X|Y):=yTPr[Y=y]H(X|Y=y)

Fact 9. Entropy refresher

  1. Monotonicity; if X=f(Y) for a fixed function f, then 0H(X)H(Y) . If X=f(Y,Z), then 0H(X|Z)H(Y|Z). If Y=h(Z), then H(X|Z)H(X|Y).

  2. Subadditivity: H(X,Y)H(X)+H(Y) with equality if independent.

  3. Chain rule: H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y). Also have: 0H(X|Y)H(X).

  4. Jensen’s inequality: for convex ψ, ψ(EX)Eψ(X). For entropy, means that if X is supported on S, then H(X)log|S|. Similarly, if X supported on SY, then H(X|Y)Elog|SY|.

  5. Binary entropy: Hb(p)=plog1p+(1p)log11p, and ddpHb(p)=log1pp. Also, Hb is concave, so d2dp2Hb(p)0.

Lemma 10. Subsets of small sets have small conditional entropy. Let A,B be random finite sets. If AB almost surely, have H(A|B)E|B|.

Proof. This follows from Jensen’s inequality: H(A|B)Elog2B=E|B| since A2B. __

Lemma 11. Let A be an R-spread random set in X. Then for any random set S for which SA almost surely, H(A|S)H(A)(logR)E|S|.

Proof. We want to prove that the mutual information I(A:S)=H(A)H(A|S)(logR)E|S|. Write p(A)=Pr[A=A], p(S)=Pr[S=S], and p(A,S)=Pr[A=AS=S], and p(A|S)=p(A,S)/p(S). Then:

H(A)H(A|S)=AXp(A)log1p(A)SXp(S)AXp(A,S)p(S)logp(S)p(A,S)=SXAXp(A,S)logp(A,S)p(A)p(S)=SXp(S)AXp(A|S)logp(A|S)p(A)=SXp(S)SAXp(A|S)logp(A|S)p(A)becauseSA

Because A is R-spread, for any S, we have

Pr[SA]=SAXp(A)R−|S|;

Note also that SAXp(A|S)=1, because SA almost surely. Then because f(s)=log(s) is convex, by Jensen’s inequality,

SXp(S)SAXp(A|S)·logp(A)p(A|S)SXp(S)·logSAXp(A|S)p(A)p(A|S)=SXp(S)log1SAXp(A)SXp(S)logR|S|=logRSX|S|p(S)=(logR)E|S|.

This completes the proof. __

Corollary 12. Let W be a random subset of X with each coordinate included independently with probability δ. Let S be a random subset of X for which SW almost surely. Then H(W|S)H(W)log1δE|S|.

Proof. W is 1δ-spread, because for any TX, Pr[TW]=Pr[iT:Wi=1]=1δ|T|. Then apply Lemma 11. __

Lemma 13. Info properties of uniformly random sets. Let X be finite set, and W a set with each coordinate included independently with probability δ. Let S be a random subset of X. Then H(WS)H(W)+log1δE|S|.

Proof. We will show that for any coordinate iX,

H(WiSi)Hb(δ)+log1δE|Si|
(1)

Summing this over all iX and using subadditivity gives:

H(WS)iXH((WS)i)|X|Hb(δ)+log1δiXE|S{i}|=H(W)+log1δE|S|

To prove Eq. 1, we first note that Hb is concave, for all γ s.t. δ+γ[0,1]

Hb(δ+γ)Hb(δ)+γ·ddpHb(δ)

By monotonicity and the union bound, δ=Pr[Wi]Pr[WiSi]δ+Pr[Si]. This and the previous statement imply:

Hb(WiSi)maxγ[0,Pr[Si]]Hb(δ+γ)Hb(δ)+maxγ[0,Pr[Si]]γ·log1δδ

We have two cases. If δ12, then log1δδ0, and the maximum of this last upper bound happens at γ=Pr[Si]; if δ12, then the maximum happens at γ=0. In both cases, log1δPr[Si] is an upper bound, so:

Hb(WiSi)Hb(δ)+0δ12Pr[Si]log1δδδ12Hb(δ)+log1δPr[Si]

which completes the proof, as Pr[Si]=E|Si|. __

Lemma 14. Let X be finite set, and W a set with each coordinate included independently with probability δ. Let S be a random subset of X, independent of W. Then H(S\W|S)=Hb(δ)E|S|.

Proof. H(S\W|S)=ESH(S\W|S=S)=ES|Hb(δ)|S||=Hb(δ)E|S| since for any set TX, we have H(T\W)=|T|Hb(1δ)=|T|Hb(δ). __

Lemma 15. Properties of “Conditionally independent copy.” Given X and deterministic f, we can construct a conditionally independent copy X with the exact same distribution as X, but linked so that f(X)=f(X) , and X, X are independent when conditioned on f(X)=f(X).

  1. H(X|X)=H(X|f(X))=H(X)H(f(X))

  2. H(X,X)=2H(X)H(f(X))

Proof. The joint distribution for (X,X) has

Pr[(X,X)=(X,X)]=1f(X)=f(X)Pr[f(X)=f(X)]·Pr[X=X|f(X)=f(X)]Pr[X=X|f(X)=f(X)]

Then since X,X are conditionally independent given f(X)=f(X),

H(X|X)=H(X|f(X))=H(X,f(X))H(f(X))=H(X)H(f(X))

Adding H(X) to both sides and using the chain rule gives H(X,X)=2H(X)H(f(X)), since H(X)=H(X). __

3 Conclusion: proving Lemma 7

Proof. We are given a conditionally independent copy (A,W) of (A,W) subject to AW=AW; equivalently, this ensures

AAW=A\WAA

We now prove E|AA|1+Hb(δ)logRδE|A|.1 We will estimate H(A,W,A,W) in two different ways, and prove lower and upper bounds for it; some light algebra afterwards reveals the result.

For the lower bound:

H(A,W,A,W)=H(A,W)+H(A,W|A,W)=H(A)+H(W)+H(A,W|A,W)

Now,

H(A,W|A,W)=H(A,W|A,W,AW)becauseAWA,W=H(A,W|AW)cond.indep.ofA,WgivenAW=H(A,W|AW)identdistrib.ofA,WgivenAW=H(A,W)H(AW)defn.condentropy,andAWA,W=H(A)+H(W)H(AW)independenceH(A)log1δE|A|byLemma13

Thus

H(A,W,A,W)2H(A)+H(W)log1δE|A|

The upper bound uses a more complicated decomposition:

H(A,W,A,W)=H(A,AA,W,A,W)=H(A)+H(AA|A)+H(A|A,AA)+H(W|A,AA,A)+H(W|A,AA,A,W)

Since AAA, Lemma 10 implies

H(AA|A)E|A|.

By monotonicity and Lemma 11,

H(A|A,AA)H(A|AA)H(A)(logR)E|AA|.

By monotonicity, and applying Lemma 12 since A\AW,

H(W|A,AA,A)H(W|A\A)H(W)log1δE|A\A|.

Finally, as we can write W=(AW)\(A\W), (i.e, as a function of A, W, and (A\W)), by applying monotonicity twice, and then Lemma 14:

H(W|A,AA,A,W)H(A\W|A,AA,A,W)H(A\W|A)=Hb(δ)E|A|.

Combining these upper bounds gives:

H(A,W,A,W)H(A)+E|A|+H(A)(logR)E|AA|+H(W)log1δE|A\A|+Hb(δ)E|A|=2H(A)+H(W)log1δE|A|+(1+Hb(δ))E|A|(logRδ)E|AA|

Combining this with the lower bound on H(A,W,A,W) gives, after canceling identical terms:

0(1+Hb(δ))E|A|(logRδ)E|AA|

hence E|AA|1+Hb(δ)logRδE|A|. __

4 Misc

Remark 16. A remark of [ BCW21] implies that the idea of an R-spread construction, combined with a random selection of a subset, cannot be further improved. They show the example of a set family F on V1...Vk,where each set Vi has size R and is disjoint from the others. The set family F contains all sets Fi of size k which |FiVj|=1 for all j[k]. This is R-spread, but the probability that a random set of density δ=Θ1r contains any set of F is

(1(1δ)r)ke(1δ)rk

which is 12 if (1δ)rkln2, i.e, if rlnkδ. Would need r1δ for the multiple disjoint random sampling trick to work.

References

[ALW+20]

Ryan Alweiss, Shachar Lovett, Kewen Wu, and Jiapeng Zhang. Improved bounds for the sunflower lemma. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 624–630, Chicago, IL, USA. Association for Computing Machinery, 2020. isbn: 9781450369794. doi: 10 . 1145 / 3357713 . 3384234. url: https://doi.org/10.1145/3357713.3384234.

[BCW21]

Tolson Bell, Suchakree Chueluecha, and Lutz Warnke. Note on sunflowers. Discrete Mathematics, 344(7):112367, 2021. issn: 0012-365X. doi: https://doi.org/10.1016/j.disc.2021.112367. url: https://www.sciencedirect.com/science/article/pii/S0012365X21000807.

[ER60]

P. Erdös and R. Rado. Intersection theorems for systems of sets. Journal of the London Mathematical Society, s1-35(1):85–90, 1960. doi: https://doi.org/10.1112/jlms/s1-35.1.85. eprint: https://londmathsoc.onlinelibrary.wiley.com/doi/pdf/10.1112/jlms/s1-35.1.85. url: https://londmathsoc.onlinelibrary.wiley.com/doi/abs/10.1112/jlms/s1-35.1.85.

[Hu21]

Lunjia Hu. Entropy Estimation via Two Chains: Streamlining the Proof of the Sunflower Lemma. 2021. url: https://web.archive.org/web/20220502190641/https://theorydish.blog/2021/05/19/entropy-estimation-via-two-chains-streamlining-the-proof-of-the-sunflower-lemma/. Accessed: 2022-05-02.

[Rao20]

Anup Rao. Coding for sunflowers, 2020. arXiv: 1909.04774 [math.CO].

[Tao20]

Terence Tao. The sunflower lemma via Shannon entropy. 2020. url: https://web.archive.org/web/20220502185424/https://terrytao.wordpress.com/2020/07/20/the-sunflower-lemma-via-shannon-entropy/. Accessed: 2022-05-02.

1In fact, not only do we have E|A\W|E|AA|, but it is possible to prove E|A\W|=(1δ)E|AA|.


Return to main page | as PDF