This answer keeps mutating. The current version does not relate to the discussion I had with @cardinal in the comments (although it was through this discussion that I thankfully realized that the conditioning approach did not appear to lead anywhere).
For this attempt, I will use another part of Hoeffding's original 1963 paper, namely section 5 "Sums of Dependent Random Variables".
Set
Wi≡Yi∑ni=1Yi,∑i=1nYi≠0,∑i=1nWi=1,n≥2
while we set Wi=0 if ∑ni=1Yi=0.
Then we have the variable
Zn=∑i=1nWiXi,E(Zn)≡μn
We are interested in the probability
Pr(Zn≥μn+ϵ),ϵ<1−μn
As for many other inequalities, Hoeffding starts his reasoning by noting that
Pr(Zn≥μn+ϵ)=E[1{Zn−μn−ϵ≥0}]
and that
1{Zn−μn−ϵ≥0}≤exp{h(Zn−μn−ϵ)},h>0
For the dependent-variables case, as Hoeffding we use the fact that ∑ni=1Wi=1 and invoke Jensen's inequality for the (convex) exponential function, to write
ehZn=exp{h(∑i=1nWiXi)}≤∑i=1nWiehXi
and linking results to arrive at
Pr(Zn≥μn+ϵ)≤e−h(μn+ϵ)E[∑i=1nWiehXi]
Focusing on our case, since Wi and Xi are independent, expected values can be separated,
Pr(Zn≥μn+ϵ)≤e−h(μn+ϵ)∑i=1nE(Wi)E(ehXi)
In our case, the Xi are i.i.d Bernoullis with parameter θ, and E[ehXi] is their common moment generating function in h, E[ehXi]=1−θ+θeh. So
Pr(Zn≥μn+ϵ)≤e−h(μn+ϵ)(1−θ+θeh)∑i=1nE(Wi)
Minimizing the RHS with respect to h, we get
eh∗=(1−θ)(μn+ϵ)θ(1−μn−ϵ)
Plugging it into the inequality and manipulating we obtain
Pr(Zn≥μn+ϵ)≤(θμn+ϵ)μn+ϵ⋅(1−θ1−μn−ϵ)1−μn−ϵ∑i=1nE(Wi)
while
Pr(Zn≥θ+ϵ)≤(θθ+ϵ)θ+ϵ⋅(1−θ1−θ−ϵ)1−θ−ϵ∑i=1nE(Wi)
Hoeffding shows that
(θθ+ϵ)θ+ϵ⋅(1−θ1−θ−ϵ)1−θ−ϵ≤e−2ϵ2
Courtesy of the OP (thanks, I was getting a bit exhausted...)
∑i=1nE(Wi)=1−1/2n
So, finally, the "dependent variables approach" gives us
Pr(Zn≥θ+ϵ)≤(1−12n)e−2ϵ2≡BD
Let's compare this to Cardinal's bound, that is based on an "independence" transformation, BI. For our bound to be tighter, we need
BD=(1−12n)e−2ϵ2≤e−nϵ2/2=BI
⇒2n−12n≤exp{(4−n2)ϵ2}
So for n≤4 we have BD≤BI. For n≥5, pretty quickly BI becomes tighter than BD but for very small ϵ, while even this small "window" quickly converges to zero. For example, for n=12, if ϵ≥0.008, then BI is tighter. So in all, Cardinal's bound is more useful.
COMMENT
To avoid misleading impressions regarding Hoeffding's original paper, I have to mention that Hoeffding examines the case of a deterministic convex combination of dependent random variables. Specificaly, his Wi's are numbers, not random variables, while each Xi is a sum of independent random variables, while the dependency may exist between the Xi's. He then considers various "U-statistics" that can be represented in this way.