Cập nhật : Với lợi ích của một vài năm sau, tôi đã đưa ra một cách xử lý ngắn gọn hơn về cơ bản cùng một tài liệu để trả lời cho một câu hỏi tương tự.
Cách xây dựng vùng tin cậy
Chúng ta hãy bắt đầu với một phương pháp chung để xây dựng các vùng tin cậy. Nó có thể được áp dụng cho một tham số duy nhất, để mang lại khoảng tin cậy hoặc tập hợp các khoảng; và nó có thể được áp dụng cho hai hoặc nhiều tham số, để mang lại vùng tin cậy chiều cao hơn.
Chúng tôi khẳng định rằng số liệu thống kê quan sát DD có nguồn gốc từ một phân phối với các tham số qθ , cụ thể là sự phân bố lấy mẫu s ( d | θ )s(d|θ) về thống kê có thể dd , và tìm kiếm một khu vực tin cậy cho θθ trong tập hợp các giá trị có thể ΘΘ . Xác định vùng mật độ cao nhất (HDR): hh -HDR của PDF là tập hợp con nhỏ nhất trong miền hỗ trợ xác suất hh . Biểu thị hh -HDR của s ( d | ψ )s(d|ψ) như H ψHψ , đối với bất kỳ ψ ∈ Θψ∈Θ . Sau đó, hh khu vực tin cậy cho θθ , được đưa ra dữ liệu DD , là tập hợp C D = { φ : D ∈ H φ }CD={ϕ:D∈Hϕ} . Giá trị điển hình của hh sẽ là 0,95.
Phiên dịch thường xuyên
From the preceding definition of a confidence region follows
d∈Hψ⟷ψ∈Cdd∈Hψ⟷ψ∈Cd
with Cd={ϕ:d∈Hϕ}Cd={ϕ:d∈Hϕ}. Now imagine a large set of (imaginary) observations {Di}{Di}, taken under similar circumstances to DD. i.e. They are samples from s(d|θ)s(d|θ). Since HθHθ supports probability mass hh of the PDF s(d|θ)s(d|θ), P(Di∈Hθ)=hP(Di∈Hθ)=h for all ii. Therefore, the fraction of {Di}{Di} for which Di∈HθDi∈Hθ is hh. And so, using the equivalence above, the fraction of {Di}{Di} for which θ∈CDiθ∈CDi is also hh.
This, then, is what the frequentist claim for the hh confidence region for θθ amounts to:
Take a large number of imaginary observations {Di}{Di} from the sampling distribution s(d|θ)s(d|θ) that gave rise to the observed statistics DD. Then, θθ lies within a fraction hh of the analogous but imaginary confidence regions {CDi}{CDi}.
The confidence region CDCD therefore does not make any claim about the probability that θθ lies somewhere! The reason is simply that there is nothing in the fomulation that allows us to speak of a probability distribution over θθ. The interpretation is just elaborate superstructure, which does not improve the base. The base is only s(d|θ)s(d|θ) and DD, where θθ does not appear as a distributed quantity, and there is no information we can use to address that. There are basically two ways to get a distribution over θθ:
- Assign a distribution directly from the information at hand: p(θ|I)p(θ|I).
- Relate θθ to another distributed quantity: p(θ|I)=∫p(θx|I)dx=∫p(θ|xI)p(x|I)dxp(θ|I)=∫p(θx|I)dx=∫p(θ|xI)p(x|I)dx.
In both cases, θθ must appear on the left somewhere. Frequentists cannot use either method, because they both require a heretical prior.
A Bayesian View
The most a Bayesian can make of the hh confidence region CDCD, given without qualification, is simply the direct interpretation: that it is the set of ϕϕ for which DD falls in the hh-HDR HϕHϕ of the sampling distribution s(d|ϕ)s(d|ϕ). It does not necessarily tell us much about θθ, and here's why.
The probability that θ∈CDθ∈CD, given DD and the background information II, is:
P(θ∈CD|DI)=∫CDp(θ|DI)dθ=∫CDp(D|θI)p(θ|I)p(D|I)dθP(θ∈CD|DI)=∫CDp(θ|DI)dθ=∫CDp(D|θI)p(θ|I)p(D|I)dθ
Notice that, unlike the frequentist interpretation, we have immediately demanded a distribution over θθ. The background information II tells us, as before, that the sampling distribution is s(d|θ)s(d|θ):
P(θ∈CD|DI)=∫CDs(D|θ)p(θ|I)p(D|I)dθ=∫CDs(D|θ)p(θ|I)dθp(D|I)i.e.P(θ∈CD|DI)=∫CDs(D|θ)p(θ|I)dθ∫s(D|θ)p(θ|I)dθP(θ∈CD|DI)i.e.P(θ∈CD|DI)=∫CDs(D|θ)p(θ|I)p(D|I)dθ=∫CDs(D|θ)p(θ|I)dθp(D|I)=∫CDs(D|θ)p(θ|I)dθ∫s(D|θ)p(θ|I)dθ
Now this expression does not in general evaluate to hh, which is to say, the hh confidence region CDCD does not always contain θθ with probability hh. In fact it can be starkly different from hh. There are, however, many common situations in which it does evaluate to hh, which is why confidence regions are often consistent with our probabilistic intuitions.
For example, suppose that the prior joint PDF of dd and θθ is symmetric in that pd,θ(d,θ|I)=pd,θ(θ,d|I)pd,θ(d,θ|I)=pd,θ(θ,d|I). (Clearly this involves an assumption that the PDF ranges over the same domain in dd and θθ.) Then, if the prior is p(θ|I)=f(θ)p(θ|I)=f(θ), we have s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D)s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D). Hence
P(θ∈CD|DI)=∫CDs(θ|D)dθ∫s(θ|D)dθi.e.P(θ∈CD|DI)=∫CDs(θ|D)dθP(θ∈CD|DI)i.e.P(θ∈CD|DI)=∫CDs(θ|D)dθ∫s(θ|D)dθ=∫CDs(θ|D)dθ
From the definition of an HDR we know that for any ψ∈Θψ∈Θ
∫Hψs(d|ψ)dd=hand therefore that∫HDs(d|D)dd=hor equivalently∫HDs(θ|D)dθ=h∫Hψs(d|ψ)ddand therefore that∫HDs(d|D)ddor equivalently∫HDs(θ|D)dθ=h=h=h
Therefore, given that s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d), CD=HDCD=HD implies P(θ∈CD|DI)=hP(θ∈CD|DI)=h. The antecedent satisfies
CD=HD⟷∀ψ[ψ∈CD↔ψ∈HD]CD=HD⟷∀ψ[ψ∈CD↔ψ∈HD]
Applying the equivalence near the top:
CD=HD⟷∀ψ[D∈Hψ↔ψ∈HD]CD=HD⟷∀ψ[D∈Hψ↔ψ∈HD]
Thus, the confidence region CDCD contains θθ with probability hh if for all possible values ψψ of θθ, the hh-HDR of s(d|ψ)s(d|ψ) contains DD if and only if the hh-HDR of s(d|D)s(d|D) contains ψψ.
Now the symmetric relation D∈Hψ↔ψ∈HDD∈Hψ↔ψ∈HD is satisfied for all ψψ when s(ψ+δ|ψ)=s(D−δ|D)s(ψ+δ|ψ)=s(D−δ|D) for all δδ that span the support of s(d|D)s(d|D) and s(d|ψ)s(d|ψ). We can therefore form the following argument:
- s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d) (premise)
- ∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)]∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)] (premise)
- ∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)]⟶∀ψ[D∈Hψ↔ψ∈HD]∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)]⟶∀ψ[D∈Hψ↔ψ∈HD]
- ∴∀ψ[D∈Hψ↔ψ∈HD]∴∀ψ[D∈Hψ↔ψ∈HD]
- ∀ψ[D∈Hψ↔ψ∈HD]⟶CD=HD∀ψ[D∈Hψ↔ψ∈HD]⟶CD=HD
- ∴CD=HD∴CD=HD
- [s(d|θ)f(θ)=s(θ|d)f(d)∧CD=HD]⟶P(θ∈CD|DI)=h[s(d|θ)f(θ)=s(θ|d)f(d)∧CD=HD]⟶P(θ∈CD|DI)=h
- ∴P(θ∈CD|DI)=h∴P(θ∈CD|DI)=h
Let's apply the argument to a confidence interval on the mean of a 1-D normal distribution (μ,σ)(μ,σ), given a sample mean ˉxx¯ from nn measurements. We have θ=μθ=μ and d=ˉxd=x¯, so that the sampling distribution is
s(d|θ)=√nσ√2πe−n2σ2(d−θ)2s(d|θ)=n−−√σ2π−−√e−n2σ2(d−θ)2
Suppose also that we know nothing about θθ before taking the data (except that it's a location parameter) and therefore assign a uniform prior: f(θ)=kf(θ)=k. Clearly we now have s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d), so the first premise is satisfied. Let s(d|θ)=g((d−θ)2)s(d|θ)=g((d−θ)2). (i.e. It can be written in that form.) Then
s(ψ+δ|ψ)=g((ψ+δ−ψ)2)=g(δ2)ands(D−δ|D)=g((D−δ−D)2)=g(δ2)so that∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)]s(ψ+δ|ψ)=g((ψ+δ−ψ)2)=g(δ2)ands(D−δ|D)=g((D−δ−D)2)=g(δ2)so that∀ψ∀δ[s(ψ+δ|ψ)=s(D−δ|D)]
whereupon the second premise is satisfied. Both premises being true, the eight-point argument leads us to conclude that the probability that θθ lies in the confidence interval CDCD is hh!
We therefore have an amusing irony:
- The frequentist who assigns the hh confidence interval cannot say that P(θ∈CD)=hP(θ∈CD)=h, no matter how innocently uniform θ looks before incorporating the data.
- The Bayesian who would not assign an h confidence interval in that way knows anyhow that P(θ∈CD|DI)=h.
Final Remarks
We have identified conditions (i.e. the two premises) under which the h confidence region does indeed yield probability h that θ∈CD. A frequentist will baulk at the first premise, because it involves a prior on θ, and this sort of deal-breaker is inescapable on the route to a probability. But for a Bayesian, it is acceptable---nay, essential. These conditions are sufficient but not necessary, so there are many other circumstances under which the Bayesian P(θ∈CD|DI) equals h. Equally though, there are many circumstances in which P(θ∈CD|DI)≠h, especially when the prior information is significant.
We have applied a Bayesian analysis just as a consistent Bayesian would, given the information at hand, including statistics D. But a Bayesian, if he possibly can, will apply his methods to the raw measurements instead---to the {xi}, rather than ˉx. Oftentimes, collapsing the raw data into summary statistics D destroys information in the data; and then the summary statistics are incapable of speaking as eloquently as the original data about the parameters θ.