-
Notifications
You must be signed in to change notification settings - Fork 25
/
11-database-connections.Rmd
275 lines (204 loc) · 7.61 KB
/
11-database-connections.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
title: "Database connections in R"
subtitle: "Session 11"
institute: "NHS-R Community"
output:
xaringan::moon_reader:
css:
- default
- css/nhsr.css
- css/nhsr-fonts.css
lib_dir: libs
seal: false
self_contained: true
nature:
highlightStyle: googlecode
highlightLines: true
highlightLanguage: ["r"]
countIncrementalSlides: false
ratio: "16:9"
includes:
after_body: [css/insert-logo.html]
---
```{r libs, include=FALSE}
library(knitr)
library(tidyverse)
library(xaringan)
library(magick)
library(dplyr)
library(dbplyr)
library(icons)
library(xaringanExtra)
library(RSQLite)
xaringanExtra::use_share_again() # need to get the slide button on html view
opts_chunk$set(
echo = FALSE,
eval = FALSE,
message = FALSE,
warning = FALSE,
fig.width = 7.252,
fig.height = 4,
dpi = 300,
dev.args = list(type = "cairo")
)
#############################################################################
# Set up exercise database. This will be held in memory, using RSQLite
# You would likely be connecting to SQL Server, Oracle or similar in reality
#############################################################################
# Load beds_data.csv
beds <- read_csv("https://raw.githubusercontent.com/nhs-r-community/intro_r_data/26a2e39bd49d4aa95a8efc5b45c398386e6e4ed4/beds_data.csv",
col_types = cols(date = col_date(format = "%d/%m/%Y")),
skip = 3)
beds$year <- as.double(substring(beds$date,1,4))
beds$month <- as.double(substring(beds$date,6,7))
beds$day <- as.double(substring(beds$date,9,10))
# Set up the database connection
DBI_connection <- dbConnect(RSQLite::SQLite(), ":memory:")
# Write beds table to the database
dbWriteTable(DBI_connection, "MyTable", beds)
#Remove the imported data frame so we have to work on the database
rm(beds)
############################################################################
```
class: title-slide, left, bottom
# `r rmarkdown::metadata$title`
----
## **`r rmarkdown::metadata$subtitle`**
### `r rmarkdown::metadata$author`
### `r rmarkdown::metadata$date`
---
# Two common methods
There are two common methods of connection, both of which use Open Database Connectivity (ODBC) drivers:
1. The `RODBC` package (older method, not shown here)
2. The `DBI` system, using `dplyr,` `dbplyr` and `odbc`.
<br>
+ Both of these create a connection, using a 'connection string' with server details.
+ This can be used to create a connection object that you use in `R`.
+ We can use the object to pull data into `R` or manipulate it in the database.
--
<br>
For these examples, we will assume:
+ SQL Server connection, call 'MyServer'
+ A database called 'MyDB'
+ A table called 'MyTable'
---
# DBI connection
Requires a couple of packages, and knowing your connection details:
+ `DBI` - a common Database Interface engine for use in `S` and `R` (see <a href="https://cran.r-project.org/web/packages/DBI/vignettes/DBI-1.html">here</a>)
+ `odbc`- provides the odbc drivers, but you could use the functions below with other drivers instead.
```{r DBI1, message=FALSE, warning=FALSE, echo=TRUE}
library(DBI)
library(odbc)
DBI_connection <- dbConnect(odbc()
, Driver="SQL Server"
, Server="MyServer"
, database = "MyDB"
, trusted_connection="true"
)
```
--
+ `trusted_connection=true` passes your windows credentials to the server.
+ Sometimes this is: `Trusted_Connection` and `yes` or `no`, instead of `true` or `false`.
+ You can, instead, specify a username (`uid`) and a password (`pwd`)
---
# Using SQL & returning data to `R` (1)
+ Can write an SQL query directly using the `dbSendQuery` function.
+ Executes the query on the server-side only.
+ If you want the results back in `R`, you need to use `dbFetch` as well.
```{r DBIQuery, warning=FALSE, message=FALSE, echo=TRUE}
SomeRecords <-
dbFetch(
dbSendQuery(DBI_connection
, "Select TOP 100 * from MyTable"
)
)
#or
SomeRecords <-
dbSendQuery(DBI_connection
, "Select TOP 100 * from MyTable") %>%
dbFetch()
```
---
# Using tables in the database
Now we can define a table as if it was part of our `R` workspace, using the connection object and the names of the table in the database.
+ `dplyr` - to make the `tbl` and use it, we'll work with `dplyr` syntax.
+ `dbplyr` - this add-on package allows translation from `dplyr` to SQL.
--
```{r dplyr1, message=FALSE, warning=FALSE, collapse=TRUE, eval=TRUE, echo=TRUE}
MyTable <- tbl(DBI_connection, "MyTable")
```
```{r dplyr2, message=FALSE, warning=FALSE, collapse=TRUE, echo=TRUE}
# Or if your table was in a schema called 'ref', rather than default like'dbo'
MyTable_inschema <- tbl(DBI_connection, in_schema("ref", "MyTable"))
```
--
`glimpse` is a useful function that shows you a summary:
```{r dbplyr5, eval=TRUE, eval=TRUE, echo=TRUE}
glimpse(MyTable)
```
---
# Constructing `dplyr` query
+ We can then perform select or aggregate queries without translation.
+ Even though it returns results, the data are still in the database
```{r dplyr5, message=FALSE, warning=FALSE, collapse=FALSE, eval=TRUE, echo=TRUE}
library(dplyr)
library(dbplyr)
MyTable %>%
filter(year == 2015) %>%
group_by(org_name) %>%
summarise(AvgEvents = mean(beds_av)) %>%
arrange(month)
```
---
# Returning data to `R` with `dbplyr`
+ May need to pull the data from the server into memory in `R` sometimes.
+ Can do this with `collect()`
```{r dplyrCollect, warning=FALSE, message=FALSE, eval=TRUE, echo=TRUE}
MyTable_local<- MyTable %>%
filter(year == 2015) %>%
group_by(org_name) %>%
summarise(AvgEvents = mean(beds_av)) %>%
arrange(org_name) %>%
collect()
MyTable_local
```
---
# Writing to databases
You can also write back to a database using the `dbWriteTable` function.
+ For example:
+ Writing a new table current connection, call the table: 'NewDatabaseTable'
+ We are writing back the `R` `data.frame` called "MyTable_local" (that we created in the last section)
+ First time, `dbWriteTable` will create the table.
+ Subsequent uses, you can specify whether to `append` or `overwrite` options using TRUE/FALSE.
```{r dplyrWrite, warning=FALSE, message=FALSE, eval=FALSE, echo=TRUE}
dbWriteTable(DBI_connection
,"NewDatabaseTable"
, MyTable_local
, overwrite=TRUE)
```
---
class: middle, center
# Exercise:
### Follow the instructions in the `11_database_excerise.R` file
---
# Useful DBI commands
|Command | Summary |
| --------------|:------------------------------------------------------:|
|dbConnect() | Create a DBI connection object |
|dbListTables() | List the tables on the connection |
|dbListFields() | List the fields for a given table on a given connection|
|dbSendQuery() | Send a query to execute on the server/connection |
|dbFetch() | Fetch the results from the server/connection |
|dbWriteTable() | Write a table to the connection |
| | |
|tbl() | Set a table on the connection as a 'tibble' for `dplyr`|
|glimpse() | See a summary of the rows, data types and top rows |
|in_schema() | Use a named schema, other than 'dbo' (`dbplyr` package) |
---
#### This work is licensed as
</br> Creative Commons
</br> Attribution
</br> ShareAlike 4.0
</br> International
</br> To view a copy of this license, visit
</br> https://creativecommons.org/licenses/by/4.0/