Tạo trọng số phân bố đồng đều mà tổng hợp?

Người ta thường sử dụng các trọng số trong các ứng dụng như mô hình hỗn hợp và kết hợp tuyến tính các hàm cơ bản. Trọng lượng $w_i$ phải thường xuyên tuân theo $w_i ≥$ 0 và $\sum_{i} w_i=1$ . Tôi muốn chọn ngẫu nhiên một vectơ trọng lượng $\mathbf{w} = (w_1, w_2, …)$ từ một phân bố đều của vectơ đó.

Nó có thể được hấp dẫn để sử dụng $w_i = \frac{\omega_i}{\sum_{j} \omega_j}$ trong đó $\omega_i \sim$ U (0, 1), tuy nhiên như được thảo luận trong các ý kiến dưới đây, phân phối của $\mathbf{w}$ không đồng nhất.

Tuy nhiên, với các ràng buộc $\sum_{i} w_i=1$ , có vẻ như chiều kích cơ bản của vấn đề là $n-1$ , và có thể chọn một $\mathbf{w}$ bằng cách chọn $n-1$ tham số theo một số phân phối và sau đó tính toán tương ứng $\mathbf{w}$ từ các tham số đó (vì một lần $n-1$ trong các trọng số được chỉ định, trọng lượng còn lại được xác định đầy đủ).

Vấn đề dường như là tương tự như vấn đề hái điểm cầu (nhưng, thay vì chọn 3 vectơ mà mức là sự thống nhất, tôi muốn chọn -vectors có $ℓ_2$ $n$ chuẩn mực là sự thống nhất). $ℓ_1$

Cảm ơn!

random-generation

— Chris
nguồn

Phương pháp của bạn không tạo ra một vectơ phân phối đồng đều trên đơn giản. Để làm những gì bạn muốn một cách chính xác, cách đơn giản nhất là tạo

iid

biến ngẫu nhiên và sau đó chuẩn hóa chúng bằng tổng của chúng. Bạn có thể thử để làm điều đó bằng cách tìm một số phương pháp khác để vẽ chỉ

variates trực tiếp, nhưng tôi đã nghi ngờ của tôi về sự cân bằng hiệu quả từ

variates có thể được rất tạo ra hiệu quả từ

variates .

n

$n$

E x p (1)

$\mathrm{Exp}(1)$

n - 1

$n-1$

E x p (1)

$\mathrm{Exp}(1)$

U (0, 1)

$U(0,1)$

— hồng y

Câu trả lời:

Chọn đồng nhất (bằng số thực đồng nhất trong khoảng ). Sắp xếp các hệ số để . Bộ $\mathbf{x} \in [0,1]^{n-1}$ $n-1$ $[0,1]$ $0 \le x_1 \le \cdots \le x_{n-1}$

w = (x_{1}, x_{2} - x_{1}, x_{3} - x_{2}, \dots, x_{n - 1} - x_{n - 2}, 1 - x_{n - 1}) .

$\mathbf{w} = (x_1, x_2-x_1, x_3 - x_2, \ldots, x_{n-1} - x_{n-2}, 1 - x_{n-1}).$

Because we can recover the sorted $x_i$ by means of the partial sums of the $w_i$ , the mapping $\mathbf{x} \to \mathbf{w}$ is $(n-1)!$ to 1; in particular, its image is the $n-1$ simplex in $\mathbb{R}^n$ . Because (a) each swap in a sort is a linear transformation, (b) the preceding formula is linear, and (c) linear transformations preserve uniformity of distributions, the uniformity of $\mathbf{x}$ implies the uniformity of $\mathbf{w}$ on the $n-1$ simplex. Cụ thể, lưu ý rằng các lề của không nhất thiết phải độc lập. $\mathbf{w}$

3D point plot

Biểu đồ điểm 3D này hiển thị kết quả của 2000 lần lặp của thuật toán này với . Các điểm được giới hạn trong đơn giản và được phân phối đồng đều trên nó. $n=3$

Bởi vì thời gian thực hiện của thuật toán này là , nó là không hiệu quả cho lớn . Nhưng điều này không trả lời câu hỏi! Một cách tốt hơn (nói chung) để tạo ra giá trị phân bố đều trên -simplex là để vẽ số thực thống nhất trên khoảng , tính toán $O(n \log(n)) \gg O(n)$ $n$ $n-1$ $n$ $(x_1, \ldots, x_n)$ $[0,1]$

y_{i} = - \log (x_{i})

$y_i = -\log(x_i)$

(làm cho mỗi dương với xác suất , từ đó tổng của chúng gần như chắc chắn là khác không) và đặt $y_i$ $1$

w = (y_{1}, y_{2}, \dots, y_{n}) / (y_{1} + y_{2} + \dots + y_{n}) .

$\mathbf w = (y_1, y_2, \ldots, y_n) / (y_1 + y_2 + \cdots + y_n).$

Điều này hoạt động vì mỗi có phân phối , ngụ ý có phân phối Dirichlet - và đó là thống nhất. $y_i$ $\Gamma(1)$ $\mathbf w$ $(1,1,1)$

[3D point plot 2]

— whuber
nguồn

@ Chris Nếu bởi "Dir (1)" bạn có nghĩa là phân phối với các thông số Dirichlet

, thì câu trả lời là có.

(α_{1}, \dots, α_{n})

$(\alpha_1, \ldots, \alpha_n)$

(1, 1, \dots, 1)

$(1,1,\ldots,1)$

— whuber

(+1) One minor comment: The intuition is excellent. Care in interpreting (a) may need to be taken, as it seems that the "linear transformation" in that part is a random one. However, this is easily worked around at the expense of additional formality by using exchangeability of the generating process and a certain invariance property.

— cardinal

More explicitly: For distributions with a density

f

$f$ , the density of the order statistics of an iid sample of size

n

$n$ is

n! f (x_{1}) \dots f (x_{n}) 1_{(x_{1} < x_{2} < \dots < x_{n})}

$n! f(x_1)\cdots f(x_n) 1_{(x_1 < x_2 < \cdots < x_n)}$ . In the case of

f = 1_{[0, 1]} (x)

$f = 1_{[0,1]}(x)$ , phân phối số liệu thống kê đơn hàng là thống nhất trên một đa giác. Lấy từ thời điểm này, các biến đổi còn lại là xác định và kết quả sau.

— Đức hồng y

I_{n - 1} = [0, 1]^{n - 1}

$I_{n-1}=[0,1]^{n-1}$ is carved into

(n - 1)!

$(n-1)!$ regions, of which one is distinguished from the others, and there's a predetermined affine bijection between each region and the distinguished one. Whence, the only additional fact we need is that a uniform distribution on a region is uniform on any measurable subset of it, which is a complete triviality.

— whuber

@whuber: Interesting remarks. Thanks for sharing! I always appreciate your insightful thoughts on such things. Regarding my previous comment on "random linear transformation", my point was that, at least through

x

$\mathbf{x}$ , the transformation used depends on the sample point

ω

$\omega$ . Another way to think of it is there is a fixed, predetermined function

T : R^{n - 1} \to R^{n - 1}

$T: \mathbb{R}^{n-1} \to \mathbb{R}^{n-1}$ such that

w = T (x)

$\mathbf{w} = T(\mathbf{x})$ , but I wouldn't call that function linear, though it is linear on subsets that partition the

(n - 1)

$(n-1)$ -cube. :)

— cardinal

    zz <- c(0, log(-log(runif(n-1))))
    ezz <- exp(zz)
    w <- ezz/sum(ezz)

The first entry is put to zero for identification; you would see that done in multinomial logistic models. Of course, in multinomial models, you would also have covariates under the exponents, rather than just the random zzs. The distribution of the zzs is the extreme value distribution; you'd need this to ensure that the resulting weights are i.i.d. I initially put rnormals there, but then had a gut feeling that this ain't gonna work.

— StasK
nguồn

That doesn't work. Did you try looking at a histogram?

— cardinal

Your answer is now almost correct. If you generate

n

$n$ iid

E x p (1)

$\mathrm{Exp}(1)$ and divide each by the sum, then you will get the correct distribution. See Dirichlet distribution for more details, though it doesn't discuss this explicitly.

— cardinal

Given the terminology you are using, you sound a little confused.

— cardinal

Actually, the Wiki link does discuss this (fairly) explicitly. See the second paragraph under the Support heading.

— cardinal

This characterization is both too restrictive and too general. It is too general in that the resulting distribution of

w

$\mathbf{w}$ must be "uniform" on the

n - 1

$n-1$ simplex in

R^{n}

$\mathbb{R}^n$ . It is too restrictive in that the question is worded generally enough to allow that

w

$\mathbf{w}$ be some function of an

n - 1

$n-1$ -variate distribution, which in turn presumably, but not necessarily, consists of

n - 1

$n-1$ independent (and perhaps iid) variables.

— whuber

The solution is obvious. The following MathLab code provides the answer for 3 weights.

function [  ] = TESTGEN( )
SZ  = 1000;
V  = zeros (1, 3);
VS = zeros (SZ, 3);
for NIT=1:SZ   
   V(1) = rand (1,1);     % uniform generation on the range 0..1
   V(2) = rand (1,1) * (1 - V(1));
   V(3) = 1 - V(1) - V(2);  
   PERM = randperm (3);    % random permutation of values 1,2,3
   for NID=1:3
         VS (NIT, NID) = V (PERM(NID));
    end
end 
figure;
scatter3 (VS(:, 1), VS(:,2), VS (:,3));
end

— user96990
nguồn

Your marginals do not have the correct distribution. Judging from the Wikipedia article on the Dirichlet distribution (random number generation section, which has the algorithm you have coded), you should be using a beta(1,2) distribution for V(1), not a uniform[0,1] distribution.

— soakley

It does appear that the density increases in the corners of this tilted triangle. Nonetheless, it provides a nice geometric display of the problem.

— DWin