-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-import.Rmd
109 lines (86 loc) · 3.41 KB
/
04-import.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# Importing Data
```{r, include=FALSE}
source("_common.R", local = knitr::knit_global())
```
## Prerequisites
In this chapter, we focus on the use of the `{readr}` package that forms a part
of the `{tidyverse}` package. `{readr}` provides a fast and easy way to parse a
file because it uses a sophisticated parser that guesses the data type for each
column along with the flexibility to specify what to parse. You will also need
the `{here}` package (to easily provide relative file paths) and the
`{lubridate}` package (to work with dates and times).
```{r, message=FALSE}
library(tidyverse)
library(lubridate)
library(here)
```
In this chapter we will also be using the following datasets. If the following
code returns `FALSE`, you should refer to the section [Finding your file] in the
[Preface] on where you can download the datasets.
```{r, eval=FALSE}
file.exists(here("data", "iris.csv"))
file.exists(here("data", "building.csv"))
```
## Finding your file
Before importing data, you need to first let R know where to find the file by
providing the file path. You can provide file paths relative to the top-level
directory of the current R project with the `here()` function.
```{r}
here()
```
To reference a file named `iris.csv` located in the `data` folder that is
located in the top-level directory of your project:
```{r}
here("data", "iris.csv")
```
## Parsing a csv file
We will focus on the `read_csv()` function because building data are typically
stored in `csv` format. You can find out more about other file formats in
[readr's documentation](https://readr.tidyverse.org).
The easiest way to parse a file is by providing the file path. In most cases, it
will just work as expected where `{readr}` correctly guesses the column
specification (that gets printed to the console) and you get a tibble as
specified.
[`readr_example()`](https://readr.tidyverse.org/reference/readr_example.html) is
a function that makes it easy to access example files to demonstrate `{readr}'`s
capabilities.
```{r}
read_csv(here("data", "iris.csv"))
```
## Building data
Data from buildings are often messy without a standardized format. Therefore, it
is not uncommon if `{readr}` does not guess correctly. Parsing date/times is
particularly important when working with time-series building data.
For example, with the `building.csv` dataset, you can see that there are
problems parsing the `timestamp` column. Specifically, column `timestamp` was
parsed as a character vector when it actually contains a date/time. The `skip`
argument is particularly useful since building data often contains meta-data
that you want to exclude. A character vector can also be supplied to specify the
column names.
```{r}
read_csv(
here("data", "building.csv"),
col_names = c(
"datetime",
"power"
),
skip = 2
)
```
You can correct this by taking advantage of the `{lubridate}` package that was
covered in Chapter \@ref(datetime). To recap, you can transform a character
vector into a datetime object by specifying the order of the year `y`, month
`m`, day `d`, hour `h`, minute `m` and second `s` with the date (`y`, `m`, and
`d`) and the time (`h`, `m`, and `s`) separated by an underscore `_`.
```{r}
bldg <- read_csv(
here("data", "building.csv"),
col_names = c(
"datetime",
"power"
),
skip = 2
)
# transform the datetime character vector into a datetime object using {lubridate}
bldg$datetime <- dmy_hm(bldg$datetime)
```