PA1_template.Rmd

---
title: "Reproducible Research: Peer Assessment 1"
output: 
  html_document:
    keep_md: true
---


## Loading and preprocessing the data

```{r echo = TRUE}
Sys.setlocale(category = "LC_ALL", locale = "en_US.utf8")
library(dplyr)
library(lattice)
unzip("activity.zip")
activity <- read.csv("activity.csv")
activity_df <- tbl_df(activity)
```


## What is mean total number of steps taken per day?

```{r steps_perday, echo=TRUE}
activity_perday <- summarise(group_by(activity_df, date), steps = sum(steps))
barplot(activity_perday$steps, names.arg = activity_perday$date,
        ylab = "Number of Steps",
        main = "Total Number of Steps Taken per Day"
        )
```

Mean of above distribution is **`r mean(activity_perday$steps, na.rm = TRUE)`** while the meadian is **`r median(activity_perday$steps, na.rm = TRUE)`**.

## What is the average daily activity pattern?

```{r average_interval, echo = TRUE}
activity_interval <- summarise(group_by(activity_df, interval), steps = mean(steps, na.rm = TRUE))
plot(activity_interval$interval, activity_interval$steps,
        type = "l",
        ylab = "Number of Steps",
        xlab = "Time Intervals",
        main = "Average Daily Steps by Time(5 Mins per interval)"
        )
```

Across all days, the maximum of average number of steps happened on interval **`r filter(activity_interval, steps == max(steps))$interval`**, which is **`r filter(activity_interval, steps == max(steps))$steps`**.

## Imputing missing values

The total number of rows with NAs is **`r sum(is.na(activity_df))`**. 
Use average steps of a given interval to fill the missing value.

```{r echo = TRUE}
activity_filled <- activity_df

for (x in 1:nrow(activity_filled)){
  if(is.na(activity_filled[x,]$steps)){
    mark <- activity_filled[x,]$interval
    activity_filled[x,]$steps <- filter(activity_interval, interval == mark)$steps
  } 
}
rm(x, mark)
```

Below is what the histgram of the new filled dataset.

```{r filled_data,echo = TRUE}
activity_filled_perday <- summarise(group_by(activity_filled, date), steps = sum(steps))
barplot(activity_filled_perday$steps, names.arg = activity_filled_perday$date,
        ylab = "Number of Steps",
        main = "Total Number of Steps Taken per Day(Filled)"
        )
```

Mean of above distribution is **`r mean(activity_filled_perday$steps, na.rm = TRUE)`** while the meadian is **`r median(activity_filled_perday$steps, na.rm = TRUE)`**. There should be no big change from original dataset.

## Are there differences in activity patterns between weekdays and weekends?

```{r weekend,echo = TRUE}

activity_week <- cbind(activity_filled, week = NA)

for (x in 1:nrow(activity_week)){
    activity_week[x,]$week <- ifelse(test = (weekdays(strptime(activity_week[x,]$date, format = "%Y-%m-%d"), TRUE) %in% c("Sat", "Sun")), 
                                     yes = "weekend",
                                     no = "weekday") 
}
rm(x)

activity_week_interval <- summarise(group_by(activity_week, interval, week), steps = mean(steps))
xyplot(steps ~ interval | week, data = activity_week_interval,
       layout = c(1, 2),
       type = "l",
       ylab = "Number of Steps",
       xlab = "Time Intervals",
       main = "Average Daily Steps by Time(5 Mins per interval)"
       )
```

Above chart suggests there indeed some differences between weekdays and weekend.