Mẫu từ phân phối chuẩn nhưng bỏ qua tất cả các giá trị ngẫu nhiên nằm ngoài phạm vi đã chỉ định trước khi mô phỏng.
Phương pháp này là chính xác, nhưng, như @ Xi'an đã đề cập trong câu trả lời của mình, sẽ mất nhiều thời gian khi phạm vi nhỏ (chính xác hơn, khi số đo của nó nhỏ theo phân phối bình thường).
F−1(U)FU∼Unif(0,1). When F is the distribution obtained by truncating some distribution G on some interval (a,b), this is equivalent to sample G−1(U) with U∼Unif(G(a),G(b)).
However, and this is already mentionned by @Xi'an's in a comment, for some situations the inversion method requires a very precise evaluation of the quantile function G−1, and I would add it also requires a fast computation of G−1. When G is a normal distribution, the evaluation of G−1 is rather slow, and it is not highly precise for values of a and b outside the "range" of G.
Simulate a truncated distribution using importance sampling
A possibility is to use importance sampling. Consider the case of the standard Gaussian distribution N(0,1). Forget the previous notations, now let G be the Cauchy distribution. The two above mentionned requirements are fulfilled for G : one simply has G(q)=arctan(q)π+12 and G−1(q)=tan(π(q−12)). Therefore, the truncated Cauchy distribution is easy to sample by the inversion method and it is a good choice of the instrumental variable for importance sampling of the truncated normal distribution.
After a bit of simplifications, sampling U∼Unif(G(a),G(b)) and taking G−1(U) is equivalent to take tan(U′) with U′∼Unif(arctan(a),arctan(b)):
a <- 1
b <- 5
nsims <- 10^5
sims <- tan(runif(nsims, atan(a), atan(b)))
Now one has to calculate the weight for each sampled value xi, defined as the ratio ϕ(x)/g(x) of the two densities up to normalization, hence we can take
w(x)=exp(−x2/2)(1+x2),
but it could be safer to take the log-weights:
log_w <- -sims^2/2 + log1p(sims^2)
w <- exp(log_w) # unnormalized weights
w <- w/sum(w)
The weighted sample (xi,w(xi)) allows to estimate the measure of every interval [u,v] under the target distribution, by summing the weights of each sampled value falling inside the interval:
u <- 2; v<- 4
sum(w[sims>u & sims<v])
## [1] 0.1418
This provides an estimate of the target cumulative function.
We can quickly get and plot it with the spatsat
package:
F <- spatstat::ewcdf(sims,w)
# estimated F:
curve(F(x), from=a-0.1, to=b+0.1)
# true F:
curve((pnorm(x)-pnorm(a))/(pnorm(b)-pnorm(a)), add=TRUE, col="red")
# approximate probability of u<x<v:
F(v)-F(u)
## [1] 0.1418
Of course, the sample (xi) is definitely not a sample of the target distribution, but of the instrumental Cauchy distribution, and one gets a sample of the target distribution by performing weighted resampling, for instance using the multinomial sampling:
msample <- rmultinom(1, nsims, w)[,1]
resims <- rep(sims, times=msample)
hist(resims)
mean(resims>u & resims<v)
## [1] 0.1446
Another method: fast inverse transform sampling
Olver and Townsend developed a sampling method for a broad class of continuous distribution. It is implemented in the chebfun2 library for Matlab as well as the ApproxFun library for Julia. I have recently discovered this library and it sounds very promising (not only for random sampling). Basically this is the inversion method but using powerful approximations of the cdf and the inverse cdf. The input is the target density function up to normalization.
The sample is simply generated by the following code:
using ApproxFun
f = Fun(x -> exp(-x.^2./2), [1,5]);
nsims = 10^5;
x = sample(f,nsims);
As checked below, it yields an estimated measure of the interval [2,4] close to the one previously obtained by importance sampling:
sum((x.>2) & (x.<4))/nsims
## 0.14191