Monte Carlo Integration

Content summary

A brief introduction to Monte Carlo integration and a few optimization techniques.

Goals

This learning project offers learning activities to Monte Carlo integration. A student should be able to effectively apply Monte Carlo methods to integrate basic functions over set boundaries and apply some level of optimizations to a given problem.

Concepts to learn include: /concepts

Learning materials

Texts

[1] Numerical Mathematics and Computing Chapter 12

Lessons

• Lesson 1: Introduction to Monte Carlo Integration

Monte Carlo methods use random samplings to approximate probability distributions. This

technique has applications from weather prediction to quantum mechanics.

One use for Monte Carlo methods is in the approximation of integrals. This is done by

choosing some number of random points over the desired interval and summing the function

evaluations at these points. The area of the desired interval is then multiplied by the

average function evaluation from the chosen points.

(1) $\int_a^b{f(x)} \approx$ $(b-a) * \sum_1^N{f(x_{n})}\over{N}$


This technique can further be implemented in multiple dimensions where the process becomes

more useful. A rigorous evaluation of this technique finds the error approximately $1\over\sqrt(N)$ [1] which means

$O{({1\over\sqrt{n}})}$ convergence. This is not very useful in the one dimension case above, as better techniques exist, but

since the error is not bounded by the number of dimensions evaluated, when many dimensional

integrals are evaluated, Monte Carlo methods can become increasingly effective.

The following Matlab script will approximate a triple integral for a given function and

boundary.

%F is the vectorized function to be evaluated
%bound is a vector representing the x,y,z bounds to the integrals
%N is the number of samples to use in the approximation
%e.g. MonteCarlo(inline('x.*y.*z'), [0 1 0 1 0 1], 1000)
function est = MonteCarlo(F, bound, N)
B = bound;
R = rand(3, N);
%Set the random samplings to the correct intervals
R(1, :) = (B(2)-B(1))*R(1, :)+ B(1);
R(2, :) = R(2, :)*(B(4) - B(3)) + B(3);
R(3, :) = R(3, :)*(B(6) - B(5)) + B(5);
Volume = (B(2)-B(1))*(B(4)-B(3))*(B(6)-B(5));
s = feval(F, R(1,:), R(2,:), R(3,:));
total = sum(s);
avgF = total/N;
Approx = avgF*Volume;
fprintf('Approximation: %f', Approx);


The next step in Monte Carlo integration is to optimize the evaluation to more accurately

and quickly determine the integral. There are numerous techniques for improving on (1). The

following require some pre-existing knowledge about the function being evaluated:

• Lesson 2: Control Variates

This technique breaks the function being evaluated into pieces in which one or more pieces

have known integral values or are more easily evaluated than the original function. In this

way, the random samplings will be more prevalent with the difficult part of the function

and not be wasted on the already known piece.
e.g.
let

$f(x) = e^{-x^2} + sin{(x)}$

over the interval of 0 to $2\pi$ then,

 $\int_0^{2\pi}{f(x)}$ $=$ $\int_0^{2\pi}{e^{-x^2}} + \int_0^{2\pi}{sin{(x)}}$ $=$ $\int_0^{2\pi}{e^{-x^2}} + 0$

since sin x is an odd function, its integral is 0 on this interval. By decreasing the variance of the

function being integrated, the approximated answer will be more accurate [2].

• Lesson 3: Stratified Sampling

This technique relies on breaking the desired interval into multiple sections and evaluating

the Monte Carlo integration on each section individually. In this way, the more important

sections, i.e. the intervals where f(x) gives its greatest contribution to the integral, are

able to receive more random samplings in approximating their integral values. This will

allow the more important sections to contribute more accurately to the final integral.

e.g.

let $f(x) = x^3$ over the interval [0,1]One can easily show $\int_0^1{x^3} = .25$.
It can also be seen that $\int_{.8}^1{x^3} = .1476$.
This shows that the interval of x from .8 to 1 makes up almost
60% of the integral value. Clearly this section is more important than
when x is between 0 and .8. It logically follows that a more accurate
approximation can be made if more of the samplings are chosen between .8
and 1 than the rest of the interval.


The Monte Carlo method then becomes the following, given that the interval is broken into m sections:

(2)    $\int_a^b{f(x)} \approx$ $\sum_{i=1}^{m}{Length_i * \sum_{j=1}^{N_i}{f(x_j)}\over{N_i}}$
where Length_i is the length of the ith section, N_i is the number of random samplings
chosen from the ith section.


If the interval lengths and number of samples for each length are chosen correctly,

this method can dramatically decrease the error in approximating the integral.

• Lesson 4: Importance Sampling

The previous technique showed that using a non-uniform distribution of random points can

lead to better samplings and more accurate approximations of an integral. Importance

sampling is an extension of this technique, but instead of using grids to split the

interval, a distribution function is used for choosing the random points. Now when picking

the sampling points to evaluate the integral, the points must conform to some distribution

function which ideally approximates the desired function.

i.e.

given a distribution function p(x) which simulates f(x).
Pick N random numbers s.t. the density of the points conforms to p(x)

p(x) = x^2 distribution

e.g.

 if p(x) is x^2 on the interval [0,1] then more of the sampling points should
appear closer to x=1 than x=0.


With the non-uniform distribution of random points, the equation approximating the integral

must be revisited. Now, given a distribution of random points with a density of p(x), the

approximation of the integral becomes:

(3)        $\int_a^b{f(x)} \approx$$(b-a) * \sum_1^N{f(x_{n})\over{p(x_{n})}}\over{N}$ [2]


Assignments

Activities

• Activity 1.

Approximate Pi using Monte Carlo integration and Matlab. [hint: $x^2 + y^2 = r^2$ forms a circle]

• Activity 2.

Extend the given matlab code from the introduction to perform integration on an arbitrary number of dimensions, rather than just three.