Abstract
Often when we think of data analysis, we hope to find certain hidden variables that are independent to clearly see how each variable explains the data. But most of the time, the obvious variables in the data are mixed or sometimes, we don't even know what's the variables. Another situation we might encounter is that we might have a large number of variables, but only a few of them are actually contributing to the data. Singular Value Decomposition (SVD) and its variation Principle Component Analysis (PCA) illustrated here gives us a powerful tool to separate data into orthogonal variables, and also make it possible to cut down to low-dimension.
Introduction and Overview
Here, we use the following example problem to illustrate how SVD and its variation PCA can be used to reduce redundancy, find independent variables and also denoise the data.
Problem Description
We have four tests at hand to demonstrate the ability of SVD and its variation PCA. During each test, a spring-mass system's movement is being recorded in videos from three different angles. With three cameras, we might have redundancy, which needs to be addressed. Another typical problem during recording is the noise generated by the unstable cameras, which would make the computation more difficult. The coordinates that the spring-mass system lives in might be different from the camera frames (i.e. the system might move along the diagonal of the frame), we need to find a way to track certain point on the system.
General Approach
For each test, we first gather the
Theoretical Background
In our textbook \cite{kutz_2013}, a way to decompose matrix A is introduced:
A = \hat{U}\hat{\Sigma}V^*
where
Algorithm Implementation and Development
We illustrate here how the first test is done, and the rest of the tests are done in the same way.
Load first test's three camera data into \texttt{$cam_{11}, cam_{21}, cam_{31}$}
Inspect the length of those video, and cut them to the same length
Transform each frame to gray scale to save computing power later (also the data type is converted to uint8 since it's the datatype expected for later functions)
Algorithm 1 is used for all three cameras' data to find
\FOR{$k=[a, b, c]$}
\STATE{Detect important feature of the paint can and initialize three of them as the starting points to be tracked through each frame later}
\FOR{$j = 2:numFrames$}
\STATE{Get the updated locations of the points being tracked in frame j}
\STATE{Add the new locations to be stored with the previous locations of that point}
\ENDFOR
\STATE{Manually double check which point's tracking is correct and store the best one into $XY_k$}
\ENDFOR
Store
Obtain Covariance matrix
apply svd to matrix
Computational Results
Test 1: Ideal case
First, as we can see in the covaraince matrix Table~\ref{tab:test1_cov}, off diagonal, the covariance between
Next, in Figure~\ref{fig:test1_eigen} we can see the first mode is dominant; Figure~\ref{fig:test1_modes} shows us further that the first two modes already gives us pretty good results for
Test 2: Noisy case
As we can see in the covaraince matrix Table~\ref{tab:test2_cov}, compared to test 1, the variance in
Next, in Figure~\ref{fig:test2_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test, because of the noise. But Figure~\ref{fig:test2_modes} shows us that the first two modes still gives us pretty good results for
Test 3: Horizontal Displacement
As we can see in the covaraince matrix Table~\ref{tab:test3_cov}, compared to test 1, the variance in
Next, in Figure~\ref{fig:test3_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test. Figure~\ref{fig:test3_modes} shows us that the first two modes still gives us pretty good results for
Test 4: Horizontal Displacement and Rotation
As we can see in the covaraince matrix Table~\ref{tab:test4_cov}, compared to test 1 and test 3, the variance in
Next, in Figure~\ref{fig:test4_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test. Figure~\ref{fig:test4_modes} shows us that the first two modes still gives us pretty good results for
Summary and Conclusions
As we can see in the four tests, PCA does a very good job at cutting dimensions down, we can clearly see that first three or two modes can almost perfectly represent the whole data.