PCA

Abstract

Often when we think of data analysis, we hope to find certain hidden variables that are independent to clearly see how each variable explains the data. But most of the time, the obvious variables in the data are mixed or sometimes, we don't even know what's the variables. Another situation we might encounter is that we might have a large number of variables, but only a few of them are actually contributing to the data. Singular Value Decomposition (SVD) and its variation Principle Component Analysis (PCA) illustrated here gives us a powerful tool to separate data into orthogonal variables, and also make it possible to cut down to low-dimension.

Introduction and Overview

Here, we use the following example problem to illustrate how SVD and its variation PCA can be used to reduce redundancy, find independent variables and also denoise the data.

Problem Description

We have four tests at hand to demonstrate the ability of SVD and its variation PCA. During each test, a spring-mass system's movement is being recorded in videos from three different angles. With three cameras, we might have redundancy, which needs to be addressed. Another typical problem during recording is the noise generated by the unstable cameras, which would make the computation more difficult. The coordinates that the spring-mass system lives in might be different from the camera frames (i.e. the system might move along the diagonal of the frame), we need to find a way to track certain point on the system.

General Approach For each test, we first gather the $x, y$ coordinates information, $x_a, y_a, x_b, y_b, x_c, y_c$, of a certain point on the spring-mass system along the recording time for three cameras' data (we use vision.pointTracker to do so here). We then calculate the covariance matrix of the six variables to inspect which variables are of interest, and which are redundant. SVD is then applied to X. We then would do certain analysis on the computing results.

Theoretical Background

In our textbook \cite{kutz_2013}, a way to decompose matrix A is introduced:

A = \hat{U}\hat{\Sigma}V^*

where $\hat{U}$ have orthonormal columns, $V$ is an unitary matrix and $\hat{\Sigma}$ is a diagonal matrix. This can be used for any matrix $A$. The basic idea of SVD is to "stretch", "rotate" and "compress" the data, to make the variables independent. The idea of PCA here is that in this way, we are able to rank the variables based on how much variance they have, thus reducing the variables based on how important they are to the data.

Algorithm Implementation and Development

We illustrate here how the first test is done, and the rest of the tests are done in the same way.

Load first test's three camera data into \texttt{$cam_{11}, cam_{21}, cam_{31}$}

Inspect the length of those video, and cut them to the same length

Transform each frame to gray scale to save computing power later (also the data type is converted to uint8 since it's the datatype expected for later functions)

Algorithm 1 is used for all three cameras' data to find $x_a,y_a; x_b, y_b; x_c, y_c$ accordingly for each frame and store them into $XY_a, XY_b, XY_c$

    \FOR{$k=[a, b, c]$}
        \STATE{Detect important feature of the paint can and initialize three of them as the starting points to be tracked through each frame later}
        \FOR{$j = 2:numFrames$}
            \STATE{Get the updated locations of the points being tracked in frame j}
            \STATE{Add the new locations to be stored with the previous locations of that point}
        \ENDFOR
        \STATE{Manually double check which point's tracking is correct and store the best one into $XY_k$}
    \ENDFOR

Store $XY_a, XY_b, XY_c$ into $X$, i.e.$ X = [XY_a; XY_b; XY_c]$. Now X is a 6 by numFrames matrix

Obtain Covariance matrix $Cov_X$ of $X'$, and inspect which ones have large variance and covariance with each other, which are almost the same as their variances. This will indicate which variables are of interest, and which are redundant.

apply svd to matrix $X$ to obtain $U, S, V$, inspect $S$ to see which modes contain the most energy; inspect $U$ to see how variables evolve; inspect $X_{approx} = U_{j modes}S_{j_modes}V_{j modes}'$ to see how well modes j represent the whole data.

Computational Results

Test 1: Ideal case

First, as we can see in the covaraince matrix Table~\ref{tab:test1_cov}, off diagonal, the covariance between $Y_a$ and $X_c$ is very large and very close to the variance of $X_c$, thus we should be aware that $X_c$ might be redundant. $Y_a$, $Y_b$ and $X_c$ has very large variance, thus suggesting the dynamics of interests.

Next, in Figure~\ref{fig:test1_eigen} we can see the first mode is dominant; Figure~\ref{fig:test1_modes} shows us further that the first two modes already gives us pretty good results for $Y_a, Y_b, X_a$, thus confirming the dominant first two modes.

Test 2: Noisy case

As we can see in the covaraince matrix Table~\ref{tab:test2_cov}, compared to test 1, the variance in $X_a, X_b, Y_c$ has increased, while some of $Y_a, Y_b, X_c$ decreased. This is the result of the noise generated during recording. Similarly, off diagonal, the covariance between $Y_a$ and $X_c$ is also very large and very close to the variance of $X_c$. $Y_a$, $Y_b$ and $X_c$ has very large variance, thus also suggesting the dynamics of interests.

Next, in Figure~\ref{fig:test2_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test, because of the noise. But Figure~\ref{fig:test2_modes} shows us that the first two modes still gives us pretty good results for $Y_a, Y_b, X_a$, thus we can see how well PCA is doing in this case even with noise.

Test 3: Horizontal Displacement

As we can see in the covaraince matrix Table~\ref{tab:test3_cov}, compared to test 1, the variance in $X_a, X_b, Y_c$ has increased, while some of $Y_a, Y_b, X_c$ decreased. This is the result of the horizontal displacement.

Next, in Figure~\ref{fig:test3_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test. Figure~\ref{fig:test3_modes} shows us that the first two modes still gives us pretty good results for $Y_a, Y_b, X_a$. We can also notice that the horizontal movement is appearing very strong in the second video series.

Test 4: Horizontal Displacement and Rotation

As we can see in the covaraince matrix Table~\ref{tab:test4_cov}, compared to test 1 and test 3, the variance in $X_a, X_b, Y_c$ has largely increased, while some of $Y_a, Y_b, X_c$ decreased. This is the result of the horizontal and displacement and rotation.

Next, in Figure~\ref{fig:test4_eigen} we can clearly see the first three mode is still dominant, but not so as in the first test. Figure~\ref{fig:test4_modes} shows us that the first two modes still gives us pretty good results for $Y_a, Y_b, X_a$, but to get good results for $X_a, X_b, Y_c$, we might need mode 4 or 5. This is because the large variance is still on $Y_a, Y_b, X_a$, and the first three modes focus more on explaining them.

Summary and Conclusions

As we can see in the four tests, PCA does a very good job at cutting dimensions down, we can clearly see that first three or two modes can almost perfectly represent the whole data.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
main.m		main.m
test1_U.jpg		test1_U.jpg
test1_eigen.jpg		test1_eigen.jpg
test1_modes.jpg		test1_modes.jpg
test2_U.jpg		test2_U.jpg
test2_eigen.jpg		test2_eigen.jpg
test2_modes.jpg		test2_modes.jpg
test3_U.jpg		test3_U.jpg
test3_eigen.jpg		test3_eigen.jpg
test3_modes.jpg		test3_modes.jpg
test4_U.jpg		test4_U.jpg
test4_eigen.jpg		test4_eigen.jpg
test4_modes.jpg		test4_modes.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCA

About

Releases

Packages

Languages

EchoRLiu/PCA

Folders and files

Latest commit

History

Repository files navigation

PCA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages