Project of Getting and Cleaning Data of Johns Hopkins Univ. on Coursera
There's one run_analysis.R script in this repo. The script merge two datasets and create a new tidy one. It works as follows:
-
The script loads three train data files with
read.table()
first. And it loads features.txt as column names of the "x_train" dataset. -
The column names of "x_train" data sets are modified using loaded features.txt. The first two column was named "Activity_Label" and "Subject".
-
Use
cbind()
to combine the three data sets and we get a complete train data set. -
Do step 1-3 to test data files. A complete test data set is developed.
-
Use
rbind()
to combine train and test data sets. We get the one data set (data1
) meeting the first requirement of the project. -
Use
grep()
to retrievedata1
columns with names containing "mean()" or "std()". Measurements on the mean and standard deviation for each measurement are kept indata2
with subjects and activity labels. -
Load activity_labels.txt as
actnames
and merge it withdata2
by activity labels. In this step we translate activity labels to activity names. Then we get a new clean data setdata4
. -
Use
aggregate()
to calculate the average of each variable for each activity and each subject. The result data set is assigned toresult
. This is the tidy data set required by the project. -
The script writes the result data set to a text file named
DataSet.txt
, which is contained in this repo asResult.txt
.
- Data file from the project was unzipped. You should get a folder named UCI HAR Dataset. Put it in your R default working directory with run_analysis.R.
- This R script is developed in R v3.0.2 with Mac OS X 10.9.2. It should work regardless of your operation system.