Lý thuyết giá trị cực đoan - Hiển thị: Bình thường đến Gumbel


21

Tối đa của X1,,Xn. iid Standardnormals hội tụ vào Phân phối Gumbel tiêu chuẩn theo Lý thuyết giá trị cực đoan .

Làm thế nào chúng ta có thể chỉ ra điều đó?

Chúng ta có

P(maxXix)=P(X1x,,Xnx)=P(X1x)P(Xnx)=F(x)n

We need to find/choose an>0,bnR sequences of constants such that:

F(anx+bn)nnG(x)=eexp(x)

Can you solve it or find it in literature?

There are some examples pg.6/71, but not for the Normal case:

Φ(anx+bn)n=(12πanx+bney22dy)neexp(x)

Câu trả lời:


23

An indirect way, is as follows:
For absolutely continuous distributions, Richard von Mises (in a 1936 paper "La distribution de la plus grande de n valeurs", which appears to have been reproduced -in English?- in a 1964 edition with selected papers of his), has provided the following sufficient condition for the maximum of a sample to converge to the standard Gumbel, G(x):

Let F(x) be the common distribution function of n i.i.d. random variables, and f(x) their common density. Then, if

limxF1(1)(ddx(1F(x))f(x))=0X(n)dG(x)

Using the usual notation for the standard normal and calculating the derivative, we have

ddx(1Φ(x))ϕ(x)=ϕ(x)2ϕ(x)(1Φ(x))ϕ(x)2=ϕ(x)ϕ(x)(1Φ(x))ϕ(x)1

Note that ϕ(x)ϕ(x)=x. Also, for the normal distribution, F1(1)=. So we have to evaluate the limit

limx(x(1Φ(x))ϕ(x)1)

But (1Φ(x))ϕ(x) is Mill's ratio, and we know that the Mill's ratio for the standard normal tends to 1/x as x grows. So

limx(x(1Φ(x))ϕ(x)1)=x1x1=0

and the sufficient condition is satisfied.

The associated series are given as

an=1nϕ(bn),bn=Φ1(11/n)

ADDENDUM

This is from ch. 10.5 of the book H.A. David & H.N. Nagaraja (2003), "Order Statistics" (3d edition).

ξa=F1(a). Also, the reference to de Haan is "Haan, L. D. (1976). Sample extremes: an elementary introduction. Statistica Neerlandica, 30(4), 161-172." But beware because some of the notation has different content in de Haan -for example in the book f(t) is the probability density function, while in de Haan f(t) means the function w(t) of the book (i.e. Mill's ratio). Also, de Haan examines the sufficient condition already differentiated.

enter image description here


I'm not quite sure I understood your solution. So you took F to be the standard normal CDF. I followed through and agree that the sufficient condition is satisfied. But how is the associated series an and bn all of the sudden given by those?
renrenthehamster

@renrenthehamster I think these two parts are independently stated (no direct connection).
emcor

And so how might the associated series be obtained? Anyway, I opened a question about this issue (and more generally, for other distributions beyond the standard normal)
renrenthehamster

@renrenthehamster I have added relevant material. I don't believe there is a standard recipe for all cases, to find these series.
Alecos Papadopoulos

14

The question asks two things: (1) how to show that the maximum X(n) converges, in the sense that (X(n)bn)/an converges (in distribution) for suitably chosen sequences (an) and (bn), to the Standard Gumbel distribution and (2) how to find such sequences.

The first is well-known and documented in the original papers on the Fisher-Tippett-Gnedenko theorem (FTG). The second appears to be more difficult; that is the issue addressed here.

Please note, to clarify some assertions appearing elsewhere in this thread, that

  1. The maximum does not converge to anything: it diverges (albeit extremely slowly).

  2. There appear to be different conventions concerning the Gumbel distribution. I will adopt the convention that the CDF of a reversed Gumbel distribution is, up to scale and location, given by 1exp(exp(x)). A suitably standardized maximum of iid Normal variates converges to a reversed Gumbel distribution.


Intuition

When the Xi are iid with common distribution function F, the distribution of the maximum X(n) is

Fn(x)=Pr(X(n)x)=Pr(X1x)Pr(X2x)Pr(Xnx)=Fn(x).

When the support of F has no upper bound, as with a Normal distribution, the sequence of functions Fn marches forever to the right without limit:

Figure 1

Partial graphs of Fn for n=1,2,22,24,28,216 are shown.

To study the shapes of these distributions, we can shift each one back to the left by some amount bn and rescale it by an to make them comparable.

Figure 2

Each of the previous graphs has been shifted to place its median at 0 and to make its interquartile range of unit length.

FTG asserts that sequences (an) and (bn) can be chosen so that these distribution functions converge pointwise at every x to some extreme value distribution, up to scale and location. When F is a Normal distribution, the particular limiting extreme value distribution is a reversed Gumbel, up to location and scale.


Solution

It is tempting to emulate the Central Limit Theorem by standardizing Fn to have unit mean and unit variance. This is inappropriate, though, in part because FTG applies even to (continuous) distributions that have no first or second moments. Instead, use a percentile (such as the median) to determine the location and a difference of percentiles (such as the IQR) to determine the spread. (This general approach should succeed in finding an and bn for any continuous distribution.)

For the standard Normal distribution, this turns out to be easy! Let 0<q<1. A quantile of Fn corresponding to q is any value xq for which Fn(xq)=q. Recalling the definition of Fn(x)=Fn(x), the solution is

xq;n=F1(q1/n).

Therefore we may set

bn=x1/2;n, an=x3/4;nx1/4;n; Gn(x)=Fn(anx+bn).

Because, by construction, the median of Gn is 0 and its IQR is 1, the median of the limiting value of Gn (which is some version of a reversed Gumbel) must be 0 and its IQR must be 1. Let the scale parameter be β and the location parameter be α. Since the median is α+βloglog(2) and the IQR is readily found to be β(loglog(4)loglog(4/3)), the parameters must be

α=loglog2loglog(4/3)loglog(4); β=1loglog(4)loglog(4/3).

It is not necessary for an and bn to be exactly these values: they need only approximate them, provided the limit of Gn is still this reversed Gumbel distribution. Straightforward (but tedious) analysis for a standard normal F indicates that the approximations

an=log((4log2(2))/(log2(43)))22log(n), bn=2log(n)log(log(n))+log(4πlog2(2))22log(n)

will work fine (and are as simple as possible).

Figure 3

The light blue curves are partial graphs of Gn for n=2,26,211,216 using the approximate sequences an and bn. The dark red line graphs the reversed Gumbel distribution with parameters α and β. The convergence is clear (although the rate of convergence for negative x is noticeably slower).


References

B. V. Gnedenko, On The Limiting Distribution of the Maximum Term in a Random Series. In Kotz and Johnson, Breakthroughs in Statistics Volume I: Foundations and Basic Theory, Springer, 1992. Translated by Norman Johnson.


@Vossler The formula in Alecos's post for an converges to 0 as n. It behaves like (2log(n)log(2π))1/2 for large n.
whuber

Yes, that's true, I realized this shortly after I posted my comment so I deleted it immediately. Thank you!
Vossler

@Jess I had hoped that this answer would be understood as showing, among other things, that there is no such thing as "the" formula: there are uncountably many correct formulas for the an and bn.
whuber

@Jess That's better, because demonstrating an alternative approach was the motivation to write this answer. I don't understand your insinuation that I considered it "useless to write down an answer," because that's explicitly what I have done here.
whuber

@Jess I cannot continue this conversation because it's entirely one-sided: I have yet to recognize anything I have written in any of your characterizations. I'm quitting while I'm behind.
whuber
Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.