Tính giá trị p trong bootstrap được ghép nối

Tôi đã xem qua một bài báo mới từ nhóm Berkeley NLP về kiểm tra thống kê, Một cuộc điều tra theo kinh nghiệm về ý nghĩa thống kê trong NLP .

Có mã giả để tính giá trị p trong bài báo, về cơ bản, ý tưởng là tập hợp mẫu của được lấy mẫu với sự thay thế từ dữ liệu . Sau đó $x_1,x_2,...,x_N$ $x$

$\text{p-value} = \text{count}(\delta(x_i) > 2\delta(x))/N$ , trong đó là mức tăng số liệu. $\delta(x_i)$

Tôi có thể hiểu công thức tính giá trị p trong bài kiểm tra ý nghĩa thống kê trên giấy của Koehn để đánh giá bản dịch máy , trong đó:

$\text{p-value} = \text{count}(\delta_a(x_i) < \delta_b(x_i))/N$ , where $\delta_a$ and $\delta_b$ are the metric gain for system $a$ and $b$ respectively.

Is there any explanation or reference for the formula $\text{p-value} = \text{count}(\delta(x_i) > 2\delta(x))/N$ . The authors also noted that if the mean of $\delta(x_i)$ is $\delta(x)$ and $\delta(x_i)$ is symmetric, then both formulas above are equivalent.

hypothesis-testing bootstrap p-value

— Ke Tran
nguồn

As far as I understand from looking at section 2, the authors seem to explain their rationale for the bootstrap test as follows-

"the $x_i$ were sampled from $x$ , and so their average $\delta(x_i)$ won’t be zero like the null hypothesis demands; the average will instead be around $\delta(x)$ ... The solution is a re-centering of the mean – we want to know how often $A$ does more than $\delta(x)$ better than expected. We expect it to beat $B$ by $\delta(x)$ . Therefore, we count up how many of the $x_i$ have $A$ beating $B$ by at least $\delta(x)$ ."

The authors want to test if the gain is non-zero so they write the p-value as $\delta(x_i) < 2\delta(x)$ , which could be re-written as $0 < 2\delta(x) - \delta(x_i)$ ; because $E[\delta(x_i)]=\delta(x)$ the R.H.S. of the inequality then becomes $\delta(x)$ , which is the $H_0$ they were seeking to reject.

— Sameer
nguồn