-
Notifications
You must be signed in to change notification settings - Fork 11
/
README.Rmd
209 lines (146 loc) · 6.34 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Statistics Denmark StatBank API connection"
output: github_document
---
# dkstat <a href='https://ropengov.github.io/dkstat/'><img src='man/figures/logo.png' align="right" height="139" /></a>
<!-- badges: start -->
[![rOG-badge](https://ropengov.github.io/rogtemplate/reference/figures/ropengov-badge.svg)](http://ropengov.org/)
[![Codecov test coverage](https://codecov.io/gh/ropengov/dkstat/graph/badge.svg)](https://app.codecov.io/gh/ropengov/dkstat)
[![R-CMD-check](https://github.com/rOpenGov/dkstat/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/rOpenGov/dkstat/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
This package connects to the [StatBank](http://www.statistikbanken.dk/statbank5a/) API from [Statistics Denmark](http://www.dst.dk).
This package is in early *BETA* and new changes will most likely not have backward compatibility.
### A short message in Danish
Denne R Statistics pakke indeholder funktioner til at give dig adgang til data gennem API'en fra Danmarks Statistik.
Funktionerne henter data fra Statistikbanken og retunerer data.frame objekter med værdierne du spørger efter i dit funktionskald.
## Installation
```{r check_cran, include=FALSE}
if (!require(available)) install.packages("available")
is_on_cran <- !available::available_on_cran("dkstat")
```
```{r cran_instruct, echo=FALSE, results='asis', eval=is_on_cran}
cat("You can install `{dkstat}` from CRAN with:")
```
```{r cran_code, eval=FALSE, include=is_on_cran}
install.packages("dkstat")
```
```{r r_universe_instruct, echo=FALSE, results='asis'}
if (is_on_cran) {
cat("Or you can install the latest pre-release version of `{dkstat}` from r-universe with:")
} else if (!is_on_cran) {
cat("You can install `{dkstat}` from r-universe with:")
}
```
```{r r_universe, eval=FALSE}
install.packages(
"dkstat",
repos = c(
ropengov = "https://ropengov.r-universe.dev",
getOption("repos")
)
)
```
You can install the latest development version of `{dkstat}` from [GitHub](https://github.com/rOpenGov/dkstat) with:
``` r
# install.packages("devtools")
devtools::install_github("rOpenGov/dkstat")
```
## Examples
The default language is danish, but have got a lang parameter that you can change
from "da" to "en" if you wan't the data returned in English.
### Four basic function
There are four basic functions to learn:
- dst_search() This function makes it possible to search through the different tables for a word or a phrase.
- dst_tables() This function downloads all the possible tables available.
- dst_meta() This function lets you download the meta data for a specific table, so you can see the description,
unit, variables and values you can download data for.
- dst_get_data() lets you download the actual data you wan't.
Here are a few simple examples that will go through the basics of requesting data from the StatBank
and the structure of the output.
First, we'll load the package:
```{r, message=FALSE}
library(dkstat)
```
## The search function
The search function let's you.. OK, you might know this already.
Here I search for gdp in the text field of the tables.
```{r, message=FALSE}
dst_search(string = "bnp", field = "text")
```
## Download the tables
The dst_get_tables function downloads all the available tables that the search function use
when searching for a word or a phrase.
```{r, message=FALSE}
head(dst_get_tables(lang = "da"))
```
## Meta data
The dst_meta function retrieves meta data from the table you wan't to take a closer look at. It can be used to create the final request, but if you can figure out the structure of the query you can define it yourself.
We'll get some meta data from the [AULAAR table](http://www.statistikbanken.dk/AULAAR). The AULAAR table has net unemployment numbers.
```{r}
aulaar_meta <- dst_meta(table = "AULAAR", lang = "da")
```
The 'dst_meta' function returns a list with 4 objects:
- basics
- variables
- values
- basic_query
### Basics
Let's see what the basics contains:
```{r}
aulaar_meta$basics
```
There's a table id, a short description, a unit description and when the table was updated.
### Variables
The variables in the list has a short description of each variable as well as the id. You might want to make sure that
you have supplied all the ID's where the elimination columns is equal to FALSE. The IDs where eliminnation is equal
FALSE are mandatory.
```{r}
aulaar_meta$variables
```
### Values
The values is a list object of all the values in each variable. You use the text column to construct your final query:
```{r}
str(aulaar_meta$values)
```
## Get data
You need to build your query based on the text column that each variable contains in the meta_data$values list.
```{r}
aulaar <- dst_get_data(
table = "AULAAR", KØN = "Total", PERPCT = "Per cent of the labour force", Tid = 2013,
lang = "en"
)
str(aulaar)
```
In the request above I don't supply the meta_data to the dst_get_data function, but this is possible
as I will show below. It's a good idea to supply the meta data to the dst_get_data function if you
query the table more than once. If you don't supply the meta data the dst_get_data function will
request the meta data for the table and this will be very ineffecient.
Let's query the statbank using more than one value for each variable.
```{r}
folk1a_meta <- dst_meta("folk1a", lang = "da")
str(dst_get_data(
table = "folk1a",
Tid = "*",
CIVILSTAND = "*",
ALDER = "*",
OMRÅDE = c("Hele landet", "København", "Dragør", "Albertslund"),
lang = "da",
meta_data = folk1a_meta
))
```
I can also build a query beforehand and then use the query in the query parameter. This might
be a good way to split your script up into smaller pieces and make it more structured.
You might have noticed that I use the * as a value in the TID variable. You can use the star as a alternative
to writing all the text values for the variable.
```{r}
my_query <- list(
OMRÅDE = c("Hele landet", "København", "Frederiksberg", "Odense"),
CIVILSTAND = "Ugift",
TID = "*"
)
str(dst_get_data(table = "folk1a", query = my_query, lang = "da"))
str(dst_get_data(table = "AUP01", OMRÅDE = c("Hele landet"), TID = "*", lang = "da"))
```
If you run into problems, then try to set the parse_dst_tid parameter to FALSE as there
are few datasets with non-standard date formats.
Don't hesitate to submit an issue or question on github and I'll try to help as much as I can.