Giá trị làm tăng độ lệch chuẩn

12

Tôi bối rối trước câu nói sau:

"Để tăng độ lệch chuẩn của một tập hợp số, bạn phải thêm một giá trị lớn hơn một độ lệch chuẩn so với giá trị trung bình"

Là gì bằng chứng về điều đó? Tất nhiên tôi biết làm thế nào chúng ta xác định độ lệch chuẩn nhưng phần đó tôi dường như bỏ lỡ bằng cách nào đó. Có ý kiến gì không?

standard-deviation

— JohnK
nguồn

1

Bạn đã cố gắng để làm ra đại số liên quan?

— Alecos Papadopoulos

Vâng tôi có. Tôi đã trừ phương sai mẫu của n giá trị từ phương sai của giá trị n + 1 và tôi đã yêu cầu chênh lệch phải lớn hơn 0. Tuy nhiên, tôi không thể tìm ra nó.

— JohnK

3

Một trong những cách đơn giản nhất là để phân biệt thuật toán Welford của liên quan đến các giá trị mới

và sau đó tích hợp để chứng minh rằng nếu giới thiệu

tăng phương sai, sau đó

x_{n}

$x_n$

x_{n}

$x_n$

trong đó

là giá trị trung bình của cácgiá trị

đầu tiênvà

là ước lượng phương sai của chúng.

(x_{n} - {\bar{x}}_{n - 1})^{2} \geq \frac{n}{n - 1} v_{n - 1}

$(x_n-\bar{x}_{n-1})^2 \ge \frac{n}{n-1}v_{n-1}$

{\bar{x}}_{n - 1}

$\bar{x}_{n-1}$

n - 1

$n-1$

v_{n - 1}

$v_{n-1}$

— whuber

Được rồi nhưng điều này có thể được hiển thị với đại số đơn giản có lẽ? Kiến thức về thống kê của tôi không phải là nâng cao.

— JohnK

@JohnK, bạn có thể vui lòng chia sẻ nguồn trích dẫn không?

— Pe Dro

20

Đối với bất kỳ số với trung bình $N$ $y_1,y_2, \ldots, y_N$ , phương sai được cho bởi $\displaystyle \bar{y} = \frac{1}{N}\sum_{i=1}^N y_i$ Áp dụngvào tập nhất địnhsố mà chúng tôi mất để thuận tiện trong triển lãm có bình, ta có

\begin{aligned} σ^{2} & = \frac{1}{N - 1} \sum_{i = 1}^{N} (y_{i} - \bar{y})^{2} \\ = \frac{1}{N - 1} \sum_{i = 1}^{N} (y_{i}^{2} - 2 y_{i} \bar{y} + {\bar{y}}^{2}) \\ = \frac{1}{N - 1} [(\sum_{i = 1}^{N} y_{i}^{2}) - 2 N (\bar{y})^{2} + N (\bar{y})^{2}] \\ (1) & σ^{2} & = \frac{1}{N - 1} \sum_{i = 1}^{N} (y_{i}^{2} - (\bar{y})^{2}) \end{aligned}

$\begin{align} \sigma^2 &= \frac{1}{N-1}\sum_{i=1}^N (y_i-\bar{y})^2\\ &= \frac{1}{N-1}\sum_{i=1}^N \left(y_i^2 - 2y_i\bar{y} + \bar{y}^2\right)\\ &= \frac{1}{N-1}\left[\left(\sum_{i=1}^Ny_i^2\right) - 2N(\bar{y})^2 + N(\bar{y})^2 \right] \\ \sigma^2 &=\frac{1}{N-1}\sum_{i=1}^N \left(y_i^2 - (\bar{y})^2\right) \tag{1} \end{align}$

(1)

$(1)$

n

$n$

x_{1}, x_{2}, \dots x_{n}

$x_1, x_2, \ldots x_n$

\bar{x} = 0

$\bar{x} = 0$

Nếu bây giờ chúng ta thêm vào một quan sát mới

để tập dữ liệu này, sau đó giá trị trung bình mới của tập dữ liệu là

σ^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i}^{2} - (\bar{x})^{2}) = \frac{1}{n - 1} \sum_{i = 1}^{n} x_{i}^{2}

$\sigma^2 = \frac{1}{n-1}\sum_{i=1}^n \left(x_i^2-(\bar{x})^2\right) = \frac{1}{n-1}\sum_{i=1}^n x_i^2$

x_{n + 1}

$x_{n+1}$

trong khi phương sai mới là

\frac{1}{n + 1} \sum_{i = 1}^{n + 1} x_{i} = \frac{n \bar{x} + x_{n + 1}}{n + 1} = \frac{x_{n + 1}}{n + 1}

$\frac{1}{n+1}\sum_{i=1}^{n+1}x_i = \frac{n\bar{x} + x_{n+1}}{n+1} = \frac{x_{n+1}}{n+1}$

\begin{aligned} {\hat{σ}}^{2} & = \frac{1}{n} \sum_{i = 1}^{n + 1} (x_{i}^{2} - \frac{x_{n + 1}^{2}}{(n + 1)^{2}}) \\ = \frac{1}{n} [((n - 1) σ^{2} + x_{n + 1}^{2}) - \frac{x_{n + 1}^{2}}{n + 1}] \\ = \frac{1}{n} [(n - 1) σ^{2} + \frac{n}{n + 1} x_{n + 1}^{2}] \\ > σ^{2} only if x_{n + 1}^{2} > \frac{n + 1}{n} σ^{2} . \end{aligned}

$\begin{align} \hat{\sigma}^2 &= \frac{1}{n}\sum_{i=1}^{n+1} \left(x_i^2-\frac{x_{n+1}^2}{(n+1)^2}\right)\\ &= \frac{1}{n}\left[\left((n-1)\sigma^2 + x_{n+1}^2\right) - \frac{x_{n+1}^2}{n+1}\right]\\ &= \left.\left.\frac{1}{n}\right[(n-1)\sigma^2 + \frac{n}{n+1}x_{n+1}^2\right]\\ &> \sigma^2 ~ \text{only if}~ x_{n+1}^2 > \frac{n+1}{n}\sigma^2. \end{align}$ So

| x_{n + 1} |

$|x_{n+1}|$ needs to be larger than

σ \sqrt{1 + \frac{1}{n}}

$\displaystyle\sigma\sqrt{1+\frac{1}{n}}$

x_{n + 1}

$x_{n+1}$

\bar{x}

$\bar{x}$

σ \sqrt{1 + \frac{1}{n}}

$\displaystyle\sigma\sqrt{1+\frac{1}{n}}$ , in order for the augmented data set to have larger variance than the original data set. See also Ray Koopman's answer which points out that the new variance is larger than, equal to, or smaller than, the original variance according as

x_{n + 1}

$x_{n+1}$ differs from the mean by more than, exactly, or less than

σ \sqrt{1 + \frac{1}{n}}

$\displaystyle\sigma\sqrt{1+\frac{1}{n}}$ .

— Dilip Sarwate
nguồn

5

+1 Finally somebody gets it right... ;-) The statement to be proved is correct; it's just not tight. Incidentally, you may also pick your units of measurement to make

σ^{2} = 1

$\sigma^2=1$ , which further simplifies the calculation, reducing it to about two lines.

— whuber

I suggest you use S instead of sigma in the first set of equations and thanks for the derivation. It was good to know :)

— Theoden

3

The puzzling statement gives a necessary but insufficient condition for the standard deviation to increase. If the old sample size is $n$ , the old mean is $m$ , the old standard deviation is $s$ , and a new point $x$ is added to the data, then the new standard deviation will be less than, equal to, or greater than $s$ according as $|x-m|$ is less than, equal to, or greater than $s\sqrt{1+1/n}$ .

— Ray Koopman
nguồn

1

Do you have a proof at hand?

— JohnK

2

Leaving aside the algebra (which also works) think about it this way: The standard deviation is square root of the variance. The variance is the average of the squared distances from the mean. If we add a value that is closer to the mean than this, the variance will shrink. If we add a value that is farther from the mean than this, it will grow.

This is true of any average of values that are non-negative. If you add a value that is higher than the mean, the mean increases. If you add a value that is less, it decreases.

— Peter Flom - Reinstate Monica
nguồn

I would love to see a rigorous proof as well. While I understand the principle I am puzzled by the fact that the value has to be at least 1 deviation away from the mean. Why precisely 1?

— JohnK

I don't see what is confusing. The variance is the average. If you add something greater than the average (that is, more than 1 sd) it increases. But I am not one for formal proofs

— Peter Flom - Reinstate Monica

It could be greater than the average by 0.2 standard deviations. Why wouldn't it increase then?

— JohnK

No, not greater than the mean of the data, greater than the variance, which is the mean of the squared distances.

— Peter Flom - Reinstate Monica

4

It is confusing because including a new value changes the mean, so all the residuals change. It is conceivable that even when the new value is far from the old mean, its contribution to the SD could be compensated by reducing the sum of squares of the residuals of the other values. This is one of the many reasons why rigorous proofs are useful: they provide not only security in one's knowledge, but insight (and even new information) as well. For instance, the proof will show that you have to add a new value that is strictly further than one SD from the mean in order to increase the SD.

— whuber

2

I'll get you started on the algebra, but won't take it quite all of the way. First, standardize your data by subtracting the mean and dividing by the standard deviation:

Z = \frac{x - μ}{σ} .

$Z = \frac{x-\mu}{\sigma} .$ Note that if

x

$x$ is within one standard deviation of the mean,

Z

$Z$ is between -1 and 1. Z would be 1 if

x

$x$ were exactly one sd away from the mean. Then look at your equation for standard deviation:

σ = \sqrt{\frac{\sum_{i = 1}^{N} Z_{i}^{2}}{N - 1}}

$\sigma = \sqrt{\frac{\sum_{i=1}^{N}Z_i^2}{N-1}}$ What happens to

σ

$\sigma$ if

Z_{N}

$Z_N$ is between -1 and 1?

— wcampbell
nguồn

A number whose absolute value is less than 1, when squared it is also going to be less than 1 in abs. value. Yet what I do not understand is that even if Z_N falls into that category, we are adding a positive value to σ, so shouldn't it increase?

— JohnK

Yes, you are adding a positive value, but it will be smaller than your average deviation from the mean and therefore reduce sigma. Maybe it would make more sense to consider the value as

Z_{N + 1}

$Z_{N+1}$ .

— wcampbell

1

1) Don't forget, when you add that value, you are also increasing

N

$N$ by 1. 2) You are not adding that value to

σ

$\sigma$ , you are adding it to

\sum Z_{i}^{2}

$\sum Z_i^2$ .

— jbowman

Exactly what I was trying to express!

— wcampbell

It's not that simple: in this answer you have computed the SD as if the new value were already part of the dataset. Instead, the

Z_{i}

$Z_i$ have to be standardized with respect to the SD and mean of the first

N - 1

$N-1$ values only, not all of them.

— whuber