Từ góc độ xác suất Bayes, tại sao khoảng tin cậy 95% không chứa tham số thực với xác suất 95%?


14

Từ trang Wikipedia về khoảng tin cậy :

... Nếu khoảng tin cậy được xây dựng trên nhiều phân tích dữ liệu riêng biệt của các thử nghiệm lặp lại (và có thể khác nhau), tỷ lệ của các khoảng đó có chứa giá trị thực của tham số sẽ khớp với mức độ tin cậy ...

Và từ cùng một trang:

Khoảng tin cậy không dự đoán rằng giá trị thực của tham số có xác suất cụ thể nằm trong khoảng tin cậy được cung cấp cho dữ liệu thực sự thu được.

Nếu tôi hiểu đúng, tuyên bố cuối cùng này được đưa ra với sự diễn giải thường xuyên về xác suất trong tâm trí. Tuy nhiên, từ góc độ xác suất Bayes, tại sao khoảng tin cậy 95% không chứa tham số thực với xác suất 95%? Và nếu không, điều gì sai với lý do sau đây?

Nếu tôi có một quy trình mà tôi biết tạo ra câu trả lời đúng 95% thì xác suất của câu trả lời tiếp theo là đúng là 0,95 (với điều kiện là tôi không có thêm thông tin nào về quy trình). Tương tự như vậy nếu ai đó chỉ cho tôi một khoảng tin cậy được tạo bởi một quy trình sẽ chứa tham số thực 95% thời gian, tôi có nên nói đúng rằng nó chứa tham số thực với xác suất 0,95, cho những gì tôi biết không?

Câu hỏi này tương tự, nhưng không giống như, Tại sao CI 95% không ngụ ý 95% cơ hội chứa giá trị trung bình? Các câu trả lời cho câu hỏi đó đã tập trung vào lý do tại sao 95% CI không ngụ ý 95% cơ hội chứa giá trị trung bình từ quan điểm thường xuyên. Câu hỏi của tôi là như vậy, nhưng từ góc độ xác suất Bayes.


Một cách để nghĩ về điều này là 95% CI là "trung bình dài hạn". Bây giờ có nhiều cách để phân tách các trường hợp "ngắn hạn" của bạn để có được phạm vi bảo hiểm khá tùy tiện - nhưng khi tính trung bình, tổng thể đạt 95%. Một cách trừu tượng nhiều hơn là tạo ra x i ~ B e r n o u l l i ( p i )xiBernoulli(pi) cho i = 1 , 2 , ...i=1,2, đến nỗi Σ i = 1 p i = 0,95i=1pi=0.95 . Có vô số cách bạn có thể làm điều này. Ở đây x ixicho biết liệu CI được tạo với tập dữ liệu thứ i có chứa tham số hay không và p ipi là xác suất bao phủ cho trường hợp này.
xác suất

Câu trả lời:


11

Cập nhật : Với lợi ích của một vài năm sau, tôi đã đưa ra một cách xử lý ngắn gọn hơn về cơ bản cùng một tài liệu để trả lời cho một câu hỏi tương tự.


Cách xây dựng vùng tin cậy

Chúng ta hãy bắt đầu với một phương pháp chung để xây dựng các vùng tin cậy. Nó có thể được áp dụng cho một tham số duy nhất, để mang lại khoảng tin cậy hoặc tập hợp các khoảng; và nó có thể được áp dụng cho hai hoặc nhiều tham số, để mang lại vùng tin cậy chiều cao hơn.

Chúng tôi khẳng định rằng số liệu thống kê quan sát DD có nguồn gốc từ một phân phối với các tham số qθ , cụ thể là sự phân bố lấy mẫu s ( d | θ )s(d|θ) về thống kê có thể dd , và tìm kiếm một khu vực tin cậy cho θθ trong tập hợp các giá trị có thể ΘΘ . Xác định vùng mật độ cao nhất (HDR): hh -HDR của PDF là tập hợp con nhỏ nhất trong miền hỗ trợ xác suất hh . Biểu thị hh -HDR của s ( d | ψ )s(d|ψ) như H ψHψ , đối với bất kỳ ψ ΘψΘ . Sau đó, hh khu vực tin cậy cho θθ , được đưa ra dữ liệu DD , là tập hợp C D = { φ : D H φ }CD={ϕ:DHϕ} . Giá trị điển hình của hh sẽ là 0,95.

Phiên dịch thường xuyên

From the preceding definition of a confidence region follows dHψψCd

dHψψCd
with Cd={ϕ:dHϕ}Cd={ϕ:dHϕ}. Now imagine a large set of (imaginary) observations {Di}{Di}, taken under similar circumstances to DD. i.e. They are samples from s(d|θ)s(d|θ). Since HθHθ supports probability mass hh of the PDF s(d|θ)s(d|θ), P(DiHθ)=hP(DiHθ)=h for all ii. Therefore, the fraction of {Di}{Di} for which DiHθDiHθ is hh. And so, using the equivalence above, the fraction of {Di}{Di} for which θCDiθCDi is also hh.

This, then, is what the frequentist claim for the hh confidence region for θθ amounts to:

Take a large number of imaginary observations {Di}{Di} from the sampling distribution s(d|θ)s(d|θ) that gave rise to the observed statistics DD. Then, θθ lies within a fraction hh of the analogous but imaginary confidence regions {CDi}{CDi}.

The confidence region CDCD therefore does not make any claim about the probability that θθ lies somewhere! The reason is simply that there is nothing in the fomulation that allows us to speak of a probability distribution over θθ. The interpretation is just elaborate superstructure, which does not improve the base. The base is only s(d|θ)s(d|θ) and DD, where θθ does not appear as a distributed quantity, and there is no information we can use to address that. There are basically two ways to get a distribution over θθ:

  1. Assign a distribution directly from the information at hand: p(θ|I)p(θ|I).
  2. Relate θθ to another distributed quantity: p(θ|I)=p(θx|I)dx=p(θ|xI)p(x|I)dxp(θ|I)=p(θx|I)dx=p(θ|xI)p(x|I)dx.

In both cases, θθ must appear on the left somewhere. Frequentists cannot use either method, because they both require a heretical prior.

A Bayesian View

The most a Bayesian can make of the hh confidence region CDCD, given without qualification, is simply the direct interpretation: that it is the set of ϕϕ for which DD falls in the hh-HDR HϕHϕ of the sampling distribution s(d|ϕ)s(d|ϕ). It does not necessarily tell us much about θθ, and here's why.

The probability that θCDθCD, given DD and the background information II, is: P(θCD|DI)=CDp(θ|DI)dθ=CDp(D|θI)p(θ|I)p(D|I)dθ

P(θCD|DI)=CDp(θ|DI)dθ=CDp(D|θI)p(θ|I)p(D|I)dθ
Notice that, unlike the frequentist interpretation, we have immediately demanded a distribution over θθ. The background information II tells us, as before, that the sampling distribution is s(d|θ)s(d|θ): P(θCD|DI)=CDs(D|θ)p(θ|I)p(D|I)dθ=CDs(D|θ)p(θ|I)dθp(D|I)i.e.P(θCD|DI)=CDs(D|θ)p(θ|I)dθs(D|θ)p(θ|I)dθ
P(θCD|DI)i.e.P(θCD|DI)=CDs(D|θ)p(θ|I)p(D|I)dθ=CDs(D|θ)p(θ|I)dθp(D|I)=CDs(D|θ)p(θ|I)dθs(D|θ)p(θ|I)dθ
Now this expression does not in general evaluate to hh, which is to say, the hh confidence region CDCD does not always contain θθ with probability hh. In fact it can be starkly different from hh. There are, however, many common situations in which it does evaluate to hh, which is why confidence regions are often consistent with our probabilistic intuitions.

For example, suppose that the prior joint PDF of dd and θθ is symmetric in that pd,θ(d,θ|I)=pd,θ(θ,d|I)pd,θ(d,θ|I)=pd,θ(θ,d|I). (Clearly this involves an assumption that the PDF ranges over the same domain in dd and θθ.) Then, if the prior is p(θ|I)=f(θ)p(θ|I)=f(θ), we have s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D)s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D). Hence P(θCD|DI)=CDs(θ|D)dθs(θ|D)dθi.e.P(θCD|DI)=CDs(θ|D)dθ

P(θCD|DI)i.e.P(θCD|DI)=CDs(θ|D)dθs(θ|D)dθ=CDs(θ|D)dθ
From the definition of an HDR we know that for any ψΘψΘ Hψs(d|ψ)dd=hand therefore thatHDs(d|D)dd=hor equivalentlyHDs(θ|D)dθ=h
Hψs(d|ψ)ddand therefore thatHDs(d|D)ddor equivalentlyHDs(θ|D)dθ=h=h=h
Therefore, given that s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d), CD=HDCD=HD implies P(θCD|DI)=hP(θCD|DI)=h. The antecedent satisfies CD=HDψ[ψCDψHD]
CD=HDψ[ψCDψHD]
Applying the equivalence near the top: CD=HDψ[DHψψHD]
CD=HDψ[DHψψHD]
Thus, the confidence region CDCD contains θθ with probability hh if for all possible values ψψ of θθ, the hh-HDR of s(d|ψ)s(d|ψ) contains DD if and only if the hh-HDR of s(d|D)s(d|D) contains ψψ.

Now the symmetric relation DHψψHDDHψψHD is satisfied for all ψψ when s(ψ+δ|ψ)=s(Dδ|D)s(ψ+δ|ψ)=s(Dδ|D) for all δδ that span the support of s(d|D)s(d|D) and s(d|ψ)s(d|ψ). We can therefore form the following argument:

  1. s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d) (premise)
  2. ψδ[s(ψ+δ|ψ)=s(Dδ|D)]ψδ[s(ψ+δ|ψ)=s(Dδ|D)] (premise)
  3. ψδ[s(ψ+δ|ψ)=s(Dδ|D)]ψ[DHψψHD]ψδ[s(ψ+δ|ψ)=s(Dδ|D)]ψ[DHψψHD]
  4. ψ[DHψψHD]ψ[DHψψHD]
  5. ψ[DHψψHD]CD=HDψ[DHψψHD]CD=HD
  6. CD=HDCD=HD
  7. [s(d|θ)f(θ)=s(θ|d)f(d)CD=HD]P(θCD|DI)=h[s(d|θ)f(θ)=s(θ|d)f(d)CD=HD]P(θCD|DI)=h
  8. P(θCD|DI)=hP(θCD|DI)=h

Let's apply the argument to a confidence interval on the mean of a 1-D normal distribution (μ,σ)(μ,σ), given a sample mean ˉxx¯ from nn measurements. We have θ=μθ=μ and d=ˉxd=x¯, so that the sampling distribution is s(d|θ)=nσ2πen2σ2(dθ)2

s(d|θ)=nσ2πen2σ2(dθ)2
Suppose also that we know nothing about θθ before taking the data (except that it's a location parameter) and therefore assign a uniform prior: f(θ)=kf(θ)=k. Clearly we now have s(d|θ)f(θ)=s(θ|d)f(d)s(d|θ)f(θ)=s(θ|d)f(d), so the first premise is satisfied. Let s(d|θ)=g((dθ)2)s(d|θ)=g((dθ)2). (i.e. It can be written in that form.) Then s(ψ+δ|ψ)=g((ψ+δψ)2)=g(δ2)ands(Dδ|D)=g((DδD)2)=g(δ2)so thatψδ[s(ψ+δ|ψ)=s(Dδ|D)]
s(ψ+δ|ψ)=g((ψ+δψ)2)=g(δ2)ands(Dδ|D)=g((DδD)2)=g(δ2)so thatψδ[s(ψ+δ|ψ)=s(Dδ|D)]
whereupon the second premise is satisfied. Both premises being true, the eight-point argument leads us to conclude that the probability that θθ lies in the confidence interval CDCD is hh!

We therefore have an amusing irony:

  1. The frequentist who assigns the hh confidence interval cannot say that P(θCD)=hP(θCD)=h, no matter how innocently uniform θ looks before incorporating the data.
  2. The Bayesian who would not assign an h confidence interval in that way knows anyhow that P(θCD|DI)=h.

Final Remarks

We have identified conditions (i.e. the two premises) under which the h confidence region does indeed yield probability h that θCD. A frequentist will baulk at the first premise, because it involves a prior on θ, and this sort of deal-breaker is inescapable on the route to a probability. But for a Bayesian, it is acceptable---nay, essential. These conditions are sufficient but not necessary, so there are many other circumstances under which the Bayesian P(θCD|DI) equals h. Equally though, there are many circumstances in which P(θCD|DI)h, especially when the prior information is significant.

We have applied a Bayesian analysis just as a consistent Bayesian would, given the information at hand, including statistics D. But a Bayesian, if he possibly can, will apply his methods to the raw measurements instead---to the {xi}, rather than ˉx. Oftentimes, collapsing the raw data into summary statistics D destroys information in the data; and then the summary statistics are incapable of speaking as eloquently as the original data about the parameters θ.


Would it be correct to say that a Bayesian is committed to take all the available information into account, while interpretation given in the question ignored D in some sense?
qbolec

Is it a good mental picture to illustrate the situation: imagine a grayscale image, where intensity of pixel x,y is the joint ppb of real param being y and observed stat being x. In each row y, we mark pixels which have 95% mass of the row. For each observed stat x, we define CI(x) to be the set of rows which have marked pixels in column x. Now, if we choose x,y randomly then CI(x) will contain y iff x,y was marked, and mass of marked pixels is 95% for each y. So, frequentists say that keeping y fixed, chance is 95%, OP says, that not fixing y also gives 95%, and bayesians fix y and don't know
qbolec

@qbolec It is correct to say that in the Bayesian method one cannot arbitrarily ignore some information while taking account of the rest. Frequentists say that for all y the expectation of yCI(x) (as a Boolean integer) under the sampling distribution prob(x|y,I) is 0.95. The frequentist 0.95 is not a probability but an expectation.
CarbonFlambe--Reinstate Monica

6

from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability?

Two answers to this, the first being less helpful than the second

  1. There are no confidence intervals in Bayesian statistics, so the question doesn't pertain.

  2. In Bayesian statistics, there are however credible intervals, which play a similar role to confidence intervals. If you view priors and posteriors in Bayesian statistics as quantifying the reasonable belief that a parameter takes on certain values, then the answer to your question is yes, a 95% credible interval represents an interval within which a parameter is believed to lie with 95% probability.

If I have a process that I know produces a correct answer 95% of the time then the probability of the next answer being correct is 0.95 (given that I don't have any extra information regarding the process).

yes, the process guesses a right answer with 95% probability

Similarly if someone shows me a confidence interval that is created by a process that will contain the true parameter 95% of the time, should I not be right in saying that it contains the true parameter with 0.95 probability, given what I know?

Just the same as your process, the confidence interval guesses the correct answer with 95% probability. We're back in the world of classical statistics here: before you gather the data you can say there's a 95% probability of randomly gathered data determining the bounds of the confidence interval such that the mean is within the bounds.

With your process, after you've gotten your answer, you can't say based on whatever your guess was, that the true answer is the same as your guess with 95% probability. The guess is either right or wrong.

And just the same as your process, in the confidence interval case, after you've gotten the data and have an actual lower and upper bound, the mean is either within those bounds or it isn't, i.e. the chance of the mean being within those particular bounds is either 1 or 0. (Having skimmed the question you refer to it seems this is covered in much more detail there.)

How to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.

There are a couple of ways of looking at this

  1. Technically, the confidence interval hasn't been produced using a prior and Bayes theorem, so if you had a prior belief about the parameter concerned, there would be no way you could interpret the confidence interval in the Bayesian framework.

  2. Another widely used and respected interpretation of confidence intervals is that they provide a "plausible range" of values for the parameter (see, e.g., here). This de-emphasises the "repeated experiments" interpretation.

Moreover, under certain circumstances, notably when the prior is uninformative (doesn't tell you anything, e.g. flat), confidence intervals can produce exactly the same interval as a credible interval. In these circumstances, as a Bayesianist you could argue that had you taken the Bayesian route you would have gotten exactly the same results and you could interpret the confidence interval in the same way as a credible interval.


but for sure confidence intervals exist even if I subscribe to a bayesian view of probability, they just wont dissapear, right? :)The situation I was asking about was how to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.
Rasmus Bååth

The problem is that confidence intervals aren't produced using a Bayesian methodology. You don't start with a prior. I'll edit the post to add something which might help.
TooTone

2

I'll give you an extreme example where they are different.

Suppose I create my 95% confidence interval for a parameter θ as follows. Start by sampling the data. Then generate a random number between 0 and 1. Call this number u. If u is less than 0.95 then return the interval (,). Otherwise return the "null" interval.

Now over continued repititions, 95% of the CIs will be "all numbers" and hence contain the true value. The other 5% contain no values, hence have zero coverage. Overall, this is a useless, but technically correct 95% CI.

The Bayesian credible interval will be either 100% or 0%. Not 95%.


So is it correct to say that before seeing a confidence interval there is a 95% probability that it will contain the true parameter, but for any given confidence interval the probability that it covers the true parameter depends on the data (and our prior)? To be honest, what I'm really struggling with is how useless confidence intervals sounds (credible intervals I like on the other hand) and the fact that I never the less will have to teach them to our students next week... :/
Rasmus Bååth

This question has some more examples, plus a very good paper comparing the two approaches
probabilityislogic

1

"from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability? "

In Bayesian Statistics the parameter is not a unknown value, it is a Distribution. There is no interval containing the "true value", for a Bayesian point of view it does not even make sense. The parameter it's a random variable, you can perfectly know the probability of that value to be between x_inf an x_max if you know the distribuition. It's just a diferent mindset about the parameters, usually Bayesians used the median or average value of the distribuition of the parameter as a "estimate". There is not a confidence interval in Bayesian Statistics, something similar is called credibility interval.

Now from a frequencist point of view, the parameter is a "Fixed Value", not a random variable, can you really obtain probability interval (a 95% one) ? Remember that it's a fixed value not a random variable with a known distribution. Thats why you past the text :"A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained."

The idea of repeating the experience over and over... is not Bayesian reasoning it's a Frequencist one. Imagine a real live experiment that you can only do once in your life time, can you/should you built that confidence interval (from the classical point of view )?.

But... in real life the results could get pretty close ( Bayesian vs Frequencist), maybe thats why It could be confusing.

Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.