Markerless Tracking/Select

From Wikiversity

Jump to: navigation, search

This page is devoted to the question how selecting the correct data-set could be realized in the context of Markerless Tracking.

Contents

[edit] The Problem

After generating a whole bunch of virtual data-sets it is neccessary to decied which one matches the capured input data best. The match will likly not be 100% because we only generate a discrete number of data-sets and have a non-perfect data generator. Also optimization algorithms need good hints about where new promising samples should be generated. Threfore the comparision must result in a smooth and low frequency similarty function with their top at the perfect match.

[edit] Comparing Images

When do two images mostly look alike? Of course if they are the same. In this case their pixel colors will exactly match at all pixel positions. With most images also a small offset or missing part in one image will result in a high match of equal or morstly equal pixels. Therefore it's best to compute the mean distance of all pixels in color space:


 \mbox{distance} = \frac{
        \sum_{i,j}^{image\ size} \left | \mbox{rendering}(i,j) - \mbox{capturing}(i,j) \right | 
      }{
        \#\mbox{pixels}
      }
(example source code)

This formular won't be good to compare non-realistic images like a one pixel black line on a white background. But in the camera tracking context we are dealing with areas of color.

[edit] Ignoring Unknown Pixels

In practical setups it is not possible and/or desirable to generate the whole real environment. Often it is sufficient to track an object on arbitrary backgrund. We have come up with a trick in the image comparison for this case:

Only the object is rendered and all other pixels are marked invalid (with a special color or an alpha of zero). The comparision can now be carried out beween valid generated pixels and the real image.

[edit] Other Algorithms

Most other popular image processing techniques (e.g. Edge-Detectors, SIFT-Features, ...) are not a good choice because they try to be indipendend about certain variables (rotation, color, scaling, ...). This will easily result in wrong similarity evidence. Maybe it is possible to find a way to read out the rotation an scaling info of the SIFT-matches and feed them back into the next round of data generation. This would be tricky to integrate with an optimization algorithm.

Personal tools