Giá trị làm tăng độ lệch chuẩn


12

Tôi bối rối trước câu nói sau:

"Để tăng độ lệch chuẩn của một tập hợp số, bạn phải thêm một giá trị lớn hơn một độ lệch chuẩn so với giá trị trung bình"

Là gì bằng chứng về điều đó? Tất nhiên tôi biết làm thế nào chúng ta xác định độ lệch chuẩn nhưng phần đó tôi dường như bỏ lỡ bằng cách nào đó. Có ý kiến ​​gì không?


1
Bạn đã cố gắng để làm ra đại số liên quan?
Alecos Papadopoulos

Vâng tôi có. Tôi đã trừ phương sai mẫu của n giá trị từ phương sai của giá trị n + 1 và tôi đã yêu cầu chênh lệch phải lớn hơn 0. Tuy nhiên, tôi không thể tìm ra nó.
JohnK

3
Một trong những cách đơn giản nhất là để phân biệt thuật toán Welford của liên quan đến các giá trị mới và sau đó tích hợp để chứng minh rằng nếu giới thiệu x n tăng phương sai, sau đó ( x n - ˉ x n - 1 ) 2nxnxntrong đó ˉ x n-1là giá trị trung bình của cácgiá trịn-1đầu tiênvàvn-1là ước lượng phương sai của chúng. (xnx¯n1)2nn1vn1x¯n1n1vn1
whuber

Được rồi nhưng điều này có thể được hiển thị với đại số đơn giản có lẽ? Kiến thức về thống kê của tôi không phải là nâng cao.
JohnK

@JohnK, bạn có thể vui lòng chia sẻ nguồn trích dẫn không?
Pe Dro

Câu trả lời:


20

Đối với bất kỳ số y 1 , y 2 , ... , y N với trung bình ˉ y = 1Ny1,y2,,yN, phương sai được cho bởi σ 2y¯=1Ni=1Nyi Áp dụng(1)vào tập nhất địnhnsốx1,x2,...xn mà chúng tôi mất để thuận tiện trong triển lãm có bìnhˉx=0, ta có σ2=1

σ2=1N1i=1N(yiy¯)2=1N1i=1N(yi22yiy¯+y¯2)=1N1[(i=1Nyi2)2N(y¯)2+N(y¯)2](1)σ2=1N1i=1N(yi2(y¯)2)
(1)nx1,x2,xnx¯=0 Nếu bây giờ chúng ta thêm vào một quan sát mớixn+1để tập dữ liệu này, sau đó giá trị trung bình mới của tập dữ liệu là 1
σ2=1n1i=1n(xi2(x¯)2)=1n1i=1nxi2
xn+1 trong khi phương sai mới là σ 2
1n+1i=1n+1xi=nx¯+xn+1n+1=xn+1n+1
σ^2=1ni=1n+1(xi2xn+12(n+1)2)=1n[((n1)σ2+xn+12)xn+12n+1]=1n[(n1)σ2+nn+1xn+12]>σ2 only if xn+12>n+1nσ2.
So |xn+1| needs to be larger than σ1+1nxn+1x¯σ1+1n, in order for the augmented data set to have larger variance than the original data set. See also Ray Koopman's answer which points out that the new variance is larger than, equal to, or smaller than, the original variance according as xn+1 differs from the mean by more than, exactly, or less than σ1+1n.

5
+1 Finally somebody gets it right... ;-) The statement to be proved is correct; it's just not tight. Incidentally, you may also pick your units of measurement to make σ2=1, which further simplifies the calculation, reducing it to about two lines.
whuber

I suggest you use S instead of sigma in the first set of equations and thanks for the derivation. It was good to know :)
Theoden

3

The puzzling statement gives a necessary but insufficient condition for the standard deviation to increase. If the old sample size is n, the old mean is m, the old standard deviation is s, and a new point x is added to the data, then the new standard deviation will be less than, equal to, or greater than s according as |xm| is less than, equal to, or greater than s1+1/n.


1
Do you have a proof at hand?
JohnK

2

Leaving aside the algebra (which also works) think about it this way: The standard deviation is square root of the variance. The variance is the average of the squared distances from the mean. If we add a value that is closer to the mean than this, the variance will shrink. If we add a value that is farther from the mean than this, it will grow.

This is true of any average of values that are non-negative. If you add a value that is higher than the mean, the mean increases. If you add a value that is less, it decreases.


I would love to see a rigorous proof as well. While I understand the principle I am puzzled by the fact that the value has to be at least 1 deviation away from the mean. Why precisely 1?
JohnK

I don't see what is confusing. The variance is the average. If you add something greater than the average (that is, more than 1 sd) it increases. But I am not one for formal proofs
Peter Flom - Reinstate Monica

It could be greater than the average by 0.2 standard deviations. Why wouldn't it increase then?
JohnK

No, not greater than the mean of the data, greater than the variance, which is the mean of the squared distances.
Peter Flom - Reinstate Monica

4
It is confusing because including a new value changes the mean, so all the residuals change. It is conceivable that even when the new value is far from the old mean, its contribution to the SD could be compensated by reducing the sum of squares of the residuals of the other values. This is one of the many reasons why rigorous proofs are useful: they provide not only security in one's knowledge, but insight (and even new information) as well. For instance, the proof will show that you have to add a new value that is strictly further than one SD from the mean in order to increase the SD.
whuber

2

I'll get you started on the algebra, but won't take it quite all of the way. First, standardize your data by subtracting the mean and dividing by the standard deviation:

Z=xμσ.
Note that if x is within one standard deviation of the mean, Z is between -1 and 1. Z would be 1 if x were exactly one sd away from the mean. Then look at your equation for standard deviation:
σ=i=1NZi2N1
What happens to σ if ZN is between -1 and 1?

A number whose absolute value is less than 1, when squared it is also going to be less than 1 in abs. value. Yet what I do not understand is that even if Z_N falls into that category, we are adding a positive value to σ, so shouldn't it increase?
JohnK

Yes, you are adding a positive value, but it will be smaller than your average deviation from the mean and therefore reduce sigma. Maybe it would make more sense to consider the value as ZN+1.
wcampbell

1
1) Don't forget, when you add that value, you are also increasing N by 1. 2) You are not adding that value to σ, you are adding it to Zi2.
jbowman

Exactly what I was trying to express!
wcampbell

It's not that simple: in this answer you have computed the SD as if the new value were already part of the dataset. Instead, the Zi have to be standardized with respect to the SD and mean of the first N1 values only, not all of them.
whuber
Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.