Skip to content

Small R utility to quickly download Redshift data into CSV

Notifications You must be signed in to change notification settings

abrenaut/redshift-r-quick-read

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Downloading large amount of data from Redshift using RPostgreSQL::dbGetQuery can be pretty slow.

This utility allows to quickly download the result of a Redshift SELECT query and create a data frame from it.

Instead of fetching n rows at a time from the database, it:

  • Unloads the result of the SELECT query to compressed csv files on S3
  • Downloads the csv files locally using the aws cli
  • Reads the csv files and creates a data frame from it

Requirements

  • aws cli installed and configured
  • The following R packages: install.packages('uuid', 'RPostgreSQL')
  • A S3 bucket
  • A AWS role associated to the Redshift cluster with write permission on the bucket
  • The AWS access key and secret of a user with read permission on the bucket

Usage

library(RPostgreSQL)
source("db.R")

drv <- dbDriver("PostgreSQL")
con <- dbConnect(
  drv,
  dbname   = "db_name",
  host     = "host",
  port     = "port",
  user     = "user",
  password = "password"
)

sql.query         <- "SELECT * FROM my_table"
s3.bucket         <- "bucket_name"
redshift.iam.role <- "arn:aws:iam::123456:role/RoleName"
aws.access.key    <- "AWS_ACCESS_KEY_ID"
aws.secret.key    <- "AWS_SECRET_ACCESS_KEY"

df <- ReadSqlResults(con, sql.query, aws.access.key, aws.secret.key, s3.bucket, redshift.iam.role)
head(df)

About

Small R utility to quickly download Redshift data into CSV

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages