Làm thế nào số lượng kết nối có thể là Gaussian nếu nó không thể âm?


14

Tôi đang phân tích các mạng xã hội (không phải ảo) và tôi đang quan sát các kết nối giữa mọi người. Nếu một người sẽ chọn một người khác để kết nối ngẫu nhiên, số lượng kết nối trong một nhóm người sẽ được phân phối bình thường - ít nhất là theo cuốn sách tôi hiện đang đọc.

Làm thế nào chúng ta có thể biết phân phối là Gaussian (bình thường)? Có các bản phân phối khác như Poisson, gạo, Rayliegh, vv Vấn đề với sự phân bố Gaussian về mặt lý thuyết là các giá trị đi từ đến + (mặc dù xác suất đi về phía không) và số lượng kết nối không thể phủ định.

Có ai biết phân phối nào có thể được mong đợi trong trường hợp mỗi người độc lập (ngẫu nhiên) chọn người khác để kết nối không?


1
Clarification: Is the question about the "total number of connections for the whole group" or "the total number of connections for one person"? My answer implicitly assumes the latter.

1
Riley distribution? That's a new one on me. Do you have a reference or link?
onestop

3
"Rayleigh" maybe?
whuber

Câu trả lời:


6

When there are n people and the number of connections made by person i,1in, is Xi, then the total number of connections is Sn=i=1nXi/2. Now if we take the Xi to be random variables, assume they are independent and their variances are not "too unequal" as more and more people are added to the mix, then the Lindeberg-Levy Central Limit Theorem applies. It asserts that the cumulative distribution function of the standardized sum converges to the cdf of the normal distribution. That means roughly that a histogram of the sum will look more and more like a Gaussian (a "bell curve") as n grows large.

Let's review what this does not say:

  • It does not assert that the distribution of Sn is ever exactly normal. It can't be, for the reasons you point out.

  • It does not imply the expected number of connections converges. In fact, it must diverge (go to infinity). The standardization is a recentering and rescaling of the distribution; the amount of rescaling is growing without limit.

  • It says nothing when the Xi are not independent or when their variances change too much as n grows. (However, there are generalizations of the CLT for "slightly" dependent series of variables.)


Note that I do not interpret the question to state that everyone chooses exactly one other person to connect to--that would lead to a sterile theory because the number of connections would be determined, not random. Instead I have interpreted it to state that everyone when they enter the network chooses connections randomly among the n others, winding up with anywhere from 0 through n connections total. The assumption on the variances is assured when there is a limit on the number of connections any newcomer will make and that number has some "minimal" randomness.
whuber

I'm a bit confused about Xi and the variance. Does this suggest people have an intrinsic variance?
Andy W

1
@Andy Not people: the number of connections made. The important thing is that there should be a good chance that the number of connections made by individuals actually varies and doesn't settle down to a constant. When that happens, the limiting distribution (of the number of connections) is determined by the finite number of initial connections that do vary, so it's not possible to approach a Normal distribution asymptotically.
whuber

1

The answer is dependent on the assumptions that you are willing to make. A social network constantly evolves over time and hence is not a static entity. Therefore, you need to make some assumptions about how the network evolves over time.

The trivial answer under the stated conditions is: If the network size is n then as asymptotically (in the sense of 'as time goes to infinity')

Prob(No of connections for any individual=n1)=1.

If a person selects another person at random to connect to then eventually everyone will be connected.

However, real life networks do not behave this way. People differ in several aspects.

  1. At any time a person has a fixed network size and the probability of another connection being made is a function of his/her network size (as people introduce other people etc).

  2. A person has his/her own intrinsic tendency to form a connection (as some are introvert/exterovert etc).

These probabilities change over time, context etc. I am not sure there is a straightforward answer unless we make some assumptions about the structure of the network (e.g., density of the network, how people behave etc).


@Srikant Could you explain how you derive the "trivial answer"? (There must be some unstated assumptions behind it.) And to what theorem do you refer when you conclude that "eventually everyone will be connected"? That's not at all obvious!
whuber

@whuber I am assuming that the network size is fixed. The question states: A person picks another person at random to make a connection and presumably this is an ongoing process. Thus, as time goes to infinity everyone should be connected. No theorem, just intuition. Perhaps, I am using imprecise language.

@Srikant I am still confused, because after a long time, "Prob(No of connections=n)" equals 1 when n = 3 and otherwise is always zero. After all, when "everyone should be connected" the number of connections equals n(n-1)/2. I suspect you may have several different random processes in mind at the same time. It could help to disclose the assumptions you are making and be a little more precise.
whuber
Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.