OPENCV_PATH
environment variable in Eclipse's preferences (under C / C++ > Build > Environment).report.pdf
.create-submission.py
in the project folder. This should create a zip archive named [sunet-id]-project-2.zip
.Your submission must be your own work. No external libraries, besides the ones already referenced by the started code, may be used. We expect all students to adhere to the Stanford Honor Code.
Artsy is an augmented reality app for paintings. Here's a preview of what it can do:
There are two broad tasks demonstrated in the preview: classification and tracking. Artsy automatically classifies the painting using a combination of convolutional neural network and support vector machines. Next, it tracks the painting as the user moves around. All the processing is done on the device in real time. The figure below shows the various components you'll be working on for this assignment. The shaded boxes denote the components that will be treated as black boxes.
Here's how Artsy works:
We'll be using a small set of 5 paintings for this assignment. It includes:
You'll need a color copy of a few of these paintings (in particular, of Convergence and Concetto Spaziale).
As discussed in class, convolutional neural networks (CNN) have recently dominated the field of image classification. Currently, most CNNs are trained using powerful GPUs over multiple days. Luckily, a single forward-pass for classification is tractable on mobile platforms. However, it is not uncommon for these networks to take up over a gigabyte of memory. On memory constrained mobile platforms, this becomes an issue. To work around this, our classifier uses a variant of the network originally published by Krizhevsky, Sutskever, and Hinton, modified to work within our memory limitations. We achieve this reduction in memory usage by suffering a slight drop in accuracy.
The CNN we will use was originally trained on the ImageNet dataset. However, we're interested in classifying 5 paintings. As it turns out, the features learned by a CNN are quite powerful and are often applicable for tasks beyond classifying their original training set (this paper on CNN features goes over some interesting results). Therefore, we will use the CNN as a feature extractor. The actual classification will be done using one-vs-all SVM classifiers. We have provided you with 5 pre-trained SVM classifiers, corresponding to each painting.
In Classifier.cpp
, follow the instructions and implement the classify
function.
The tracking subsystem operates in one of the following states:
The initialization occurs when the user taps the screen. The flowchart below describes the how we transition between the remaining states.
In KLTTracker.cpp
, implement the initialize
and track
functions.
In PlaneTracker.cpp
, implement the estimate_homography
and track
functions.
In Augmentor.cpp
, implement the render_bounds
function.
You might find it useful to develop this using the provided test videos (in particular, mona-lisa.avi
).
In ORBTracker.cpp
, implement the initialize
and track
functions.
In PlaneTracker.cpp
, update the track
function to include the relocalization logic, as shown in the flowchart above. When re-initializing the KLT tracker, make sure you pass in the RANSAC inliers (as determined during the homography re-estimation following the ORB matching) as the initial points.
Test your relocalization system using the mona-lisa-blur.avi
test video. You should be able to successfully recover after the motion blur disruption.
calcOpticalFlowPyrLK
function call. How does it affect the tracking? Provide an explanation for your observations.Implement a relocalization system capable of handling mona-lisa-blur-extra-credit.avi
. In particular, your tracking results should be reasonably good after the second blur towards the end.
In class, we discussed the Good Features to Track paper by Shi and Tomasi, and the accompanying corner detection algorithm (which is similar to the Harris Corner algorithm). In this section, we will replace OpenCV's goodFeaturesToTrack
function with one that calls our own version of the Harris corner detector.
The version we covered in class computes the eigenvalues of the second moment matrix:
$$ A = \sum_{x, y} w(x, y) \begin{bmatrix} I_x I_x & I_xI_y \\ I_x I_y & I_y I_y \end{bmatrix} $$
However, computing the eigenvalues $\lambda_1$ and $\lambda_2$ can be expensive. As a result, most practical implementations use the following scoring function instead:
\begin{align} S &= \lambda_1 \lambda_2 - \kappa \cdot (\lambda_1 + \lambda_2)^2 \\ &= \text{det}(A) - \kappa \cdot \text{trace}^2(A) \end{align}
Where $\kappa$ is an empirical constant typically between 0.04 — 0.06.
The score can now be utilized as follows:
In KLTTracker.cpp
, implement the harris_corner_detector
function.
Set use_my_harris_detector = true
in the initialize
function to test your implementation.