Lý thuyết giá trị cực đoan - Hiển thị: Bình thường đến Gumbel

Tối đa của $X_1,\dots,X_n. \sim$ iid Standardnormals hội tụ vào Phân phối Gumbel tiêu chuẩn theo Lý thuyết giá trị cực đoan .

Làm thế nào chúng ta có thể chỉ ra điều đó?

Chúng ta có

P (max X_{i} \leq x) = P (X_{1} \leq x, \dots, X_{n} \leq x) = P (X_{1} \leq x) \dots P (X_{n} \leq x) = F (x)^{n}

$P(\max X_i \leq x) = P(X_1 \leq x, \dots, X_n \leq x) = P(X_1 \leq x) \cdots P(X_n \leq x) = F(x)^n$

We need to find/choose $a_n>0,b_n\in\mathbb{R}$ sequences of constants such that:

F {(a_{n} x + b_{n})}^{n} \to^{n \to \infty} G (x) = e^{- \exp (- x)}

$F\left(a_n x+b_n\right)^n\rightarrow^{n\rightarrow\infty} G(x) = e^{-\exp(-x)}$

Can you solve it or find it in literature?

There are some examples pg.6/71, but not for the Normal case:

Φ {(a_{n} x + b_{n})}^{n} = {(\frac{1}{\sqrt{2 π}} \int_{- \infty}^{a_{n} x + b_{n}} e^{- \frac{y^{2}}{2}} d y)}^{n} \to e^{- \exp (- x)}

$\Phi\left(a_n x+b_n\right)^n=\left(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{a_n x+b_n} e^{-\frac{y^2}{2}}dy\right)^n\rightarrow e^{-\exp(-x)}$

— emcor
nguồn

Câu trả lời:

An indirect way, is as follows:
For absolutely continuous distributions, Richard von Mises (in a 1936 paper "La distribution de la plus grande de n valeurs", which appears to have been reproduced -in English?- in a 1964 edition with selected papers of his), has provided the following sufficient condition for the maximum of a sample to converge to the standard Gumbel, $G(x)$ :

Let $F(x)$ be the common distribution function of $n$ i.i.d. random variables, and $f(x)$ their common density. Then, if

lim_{x \to F^{- 1} (1)} (\frac{d}{d x} \frac{(1 - F (x))}{f (x)}) = 0 \Rightarrow X_{(n)} \overset{d}{\to} G (x)

$\lim_{x\rightarrow F^{-1}(1)}\left (\frac d{dx}\frac {(1-F(x))}{f(x)}\right) =0 \Rightarrow X_{(n)} \xrightarrow{d} G(x)$

Using the usual notation for the standard normal and calculating the derivative, we have

\frac{d}{d x} \frac{(1 - Φ (x))}{ϕ (x)} = \frac{- ϕ (x)^{2} - ϕ^{'} (x) (1 - Φ (x))}{ϕ (x)^{2}} = \frac{- ϕ^{'} (x)}{ϕ (x)} \frac{(1 - Φ (x))}{ϕ (x)} - 1

$\frac d{dx}\frac {(1-\Phi(x))}{\phi(x)} = \frac {-\phi(x)^2-\phi'(x)(1-\Phi(x))}{\phi(x)^2} = \frac {-\phi'(x)}{\phi(x)}\frac {(1-\Phi(x))}{\phi(x)}-1$

Note that $\frac {-\phi'(x)}{\phi(x)} =x$ . Also, for the normal distribution, $F^{-1}(1) = \infty$ . So we have to evaluate the limit

lim_{x \to \infty} (x \frac{(1 - Φ (x))}{ϕ (x)} - 1)

$\lim_{x\rightarrow \infty}\left (x\frac {(1-\Phi(x))}{\phi(x)}-1\right)$

But $\frac {(1-\Phi(x))}{\phi(x)}$ is Mill's ratio, and we know that the Mill's ratio for the standard normal tends to $1/x$ as $x$ grows. So

lim_{x \to \infty} (x \frac{(1 - Φ (x))}{ϕ (x)} - 1) = x \frac{1}{x} - 1 = 0

$\lim_{x\rightarrow \infty}\left (x\frac {(1-\Phi(x))}{\phi(x)}-1\right) = x\frac {1}{x}-1= 0$

and the sufficient condition is satisfied.

The associated series are given as

a_{n} = \frac{1}{n ϕ (b_{n})}, b_{n} = Φ^{- 1} (1 - 1 / n)

$a_n = \frac 1{n\phi(b_n)},\;\;\; b_n = \Phi^{-1}(1-1/n)$

ADDENDUM

This is from ch. 10.5 of the book H.A. David & H.N. Nagaraja (2003), "Order Statistics" (3d edition).

$\xi_a = F^{-1}(a)$ . Also, the reference to de Haan is "Haan, L. D. (1976). Sample extremes: an elementary introduction. Statistica Neerlandica, 30(4), 161-172." But beware because some of the notation has different content in de Haan -for example in the book $f(t)$ is the probability density function, while in de Haan $f(t)$ means the function $w(t)$ of the book (i.e. Mill's ratio). Also, de Haan examines the sufficient condition already differentiated.

enter image description here

— Alecos Papadopoulos
nguồn

I'm not quite sure I understood your solution. So you took

F

$F$ to be the standard normal CDF. I followed through and agree that the sufficient condition is satisfied. But how is the associated series

a_{n}

$a_n$ and

b_{n}

$b_n$ all of the sudden given by those?

— renrenthehamster

@renrenthehamster I think these two parts are independently stated (no direct connection).

— emcor

And so how might the associated series be obtained? Anyway, I opened a question about this issue (and more generally, for other distributions beyond the standard normal)

— renrenthehamster

@renrenthehamster I have added relevant material. I don't believe there is a standard recipe for all cases, to find these series.

— Alecos Papadopoulos

The question asks two things: (1) how to show that the maximum $X_{(n)}$ converges, in the sense that $(X_{(n)}-b_n)/a_n$ converges (in distribution) for suitably chosen sequences $(a_n)$ and $(b_n)$ , to the Standard Gumbel distribution and (2) how to find such sequences.

The first is well-known and documented in the original papers on the Fisher-Tippett-Gnedenko theorem (FTG). The second appears to be more difficult; that is the issue addressed here.

Please note, to clarify some assertions appearing elsewhere in this thread, that

The maximum does not converge to anything: it diverges (albeit extremely slowly).
There appear to be different conventions concerning the Gumbel distribution. I will adopt the convention that the CDF of a reversed Gumbel distribution is, up to scale and location, given by $1-\exp(-\exp(x))$ . A suitably standardized maximum of iid Normal variates converges to a reversed Gumbel distribution.

Intuition

When the $X_i$ are iid with common distribution function $F$ , the distribution of the maximum $X_{(n)}$ is

F_{n} (x) = Pr (X_{(n)} \leq x) = Pr (X_{1} \leq x) Pr (X_{2} \leq x) \dots Pr (X_{n} \leq x) = F^{n} (x) .

$F_n(x) = \Pr(X_{(n)}\le x) = \Pr(X_1 \le x)\Pr(X_2 \le x) \cdots \Pr(X_n \le x) = F^n(x).$

When the support of $F$ has no upper bound, as with a Normal distribution, the sequence of functions $F^n$ marches forever to the right without limit:

Partial graphs of $F_n$ for $n=1,2,2^2, 2^4, 2^8, 2^{16}$ are shown.

To study the shapes of these distributions, we can shift each one back to the left by some amount $b_n$ and rescale it by $a_n$ to make them comparable.

Each of the previous graphs has been shifted to place its median at $0$ and to make its interquartile range of unit length.

FTG asserts that sequences $(a_n)$ and $(b_n)$ can be chosen so that these distribution functions converge pointwise at every $x$ to some extreme value distribution, up to scale and location. When $F$ is a Normal distribution, the particular limiting extreme value distribution is a reversed Gumbel, up to location and scale.

Solution

It is tempting to emulate the Central Limit Theorem by standardizing $F_n$ to have unit mean and unit variance. This is inappropriate, though, in part because FTG applies even to (continuous) distributions that have no first or second moments. Instead, use a percentile (such as the median) to determine the location and a difference of percentiles (such as the IQR) to determine the spread. (This general approach should succeed in finding $a_n$ and $b_n$ for any continuous distribution.)

For the standard Normal distribution, this turns out to be easy! Let $0 \lt q \lt 1$ . A quantile of $F_n$ corresponding to $q$ is any value $x_q$ for which $F_n(x_q) = q$ . Recalling the definition of $F_n(x) = F^n(x)$ , the solution is

x_{q; n} = F^{- 1} (q^{1 / n}) .

$x_{q;n} = F^{-1}(q^{1/n}).$

Therefore we may set

b_{n} = x_{1 / 2; n}, a_{n} = x_{3 / 4; n} - x_{1 / 4; n}; G_{n} (x) = F_{n} (a_{n} x + b_{n}) .

$b_n = x_{1/2;n},\ a_n = x_{3/4;n} - x_{1/4;n};\ G_n(x) = F_n(a_n x + b_n).$

Because, by construction, the median of $G_n$ is $0$ and its IQR is $1$ , the median of the limiting value of $G_n$ (which is some version of a reversed Gumbel) must be $0$ and its IQR must be $1$ . Let the scale parameter be $\beta$ and the location parameter be $\alpha$ . Since the median is $\alpha + \beta \log\log(2)$ and the IQR is readily found to be $\beta(\log\log(4) - \log\log(4/3))$ , the parameters must be

α = \frac{\log \log 2}{\log \log (4 / 3) - \log \log (4)}; β = \frac{1}{\log \log (4) - \log \log (4 / 3)} .

$\alpha = \frac{\log\log 2}{\log\log(4/3) - \log\log(4)};\ \beta = \frac{1}{\log\log(4) - \log\log(4/3)}.$

It is not necessary for $a_n$ and $b_n$ to be exactly these values: they need only approximate them, provided the limit of $G_n$ is still this reversed Gumbel distribution. Straightforward (but tedious) analysis for a standard normal $F$ indicates that the approximations

a_{n}^{'} = \frac{\log ((4 \log^{2} (2)) / (\log^{2} (\frac{4}{3})))}{2 \sqrt{2 \log (n)}}, b_{n}^{'} = \sqrt{2 \log (n)} - \frac{\log (\log (n)) + \log (4 π \log^{2} (2))}{2 \sqrt{2 \log (n)}}

$a_n^\prime = \frac{\log \left(\left(4 \log^2(2)\right)/\left(\log^2\left(\frac{4}{3}\right)\right)\right)}{2\sqrt{2\log (n)}},\ b_n^\prime = \sqrt{2\log (n)}-\frac{\log (\log (n))+\log \left(4 \pi \log ^2(2)\right)}{2 \sqrt{2\log (n)}}$

will work fine (and are as simple as possible).

The light blue curves are partial graphs of $G_n$ for $n=2, 2^6, 2^{11}, 2^{16}$ using the approximate sequences $a_n^\prime$ and $b_n^\prime$ . The dark red line graphs the reversed Gumbel distribution with parameters $\alpha$ and $\beta$ . The convergence is clear (although the rate of convergence for negative $x$ is noticeably slower).

References

B. V. Gnedenko, On The Limiting Distribution of the Maximum Term in a Random Series. In Kotz and Johnson, Breakthroughs in Statistics Volume I: Foundations and Basic Theory, Springer, 1992. Translated by Norman Johnson.

— whuber
nguồn

@Vossler The formula in Alecos's post for

a_{n}

$a_n$ converges to

0

$0$ as

n \to \infty

$n\to\infty$ . It behaves like

{(2 \log (n) - \log (2 π))}^{- 1 / 2}

$\left(2 \log(n) - \log(2\pi)\right)^{-1/2}$ for large

n

$n$ .

— whuber

Yes, that's true, I realized this shortly after I posted my comment so I deleted it immediately. Thank you!

— Vossler

@Jess I had hoped that this answer would be understood as showing, among other things, that there is no such thing as "the" formula: there are uncountably many correct formulas for the

a_{n}

$a_n$ and

b_{n} .

$b_n.$

— whuber

@Jess That's better, because demonstrating an alternative approach was the motivation to write this answer. I don't understand your insinuation that I considered it "useless to write down an answer," because that's explicitly what I have done here.

— whuber

@Jess I cannot continue this conversation because it's entirely one-sided: I have yet to recognize anything I have written in any of your characterizations. I'm quitting while I'm behind.

— whuber