Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redshift_to_pandas write to S3 #24

Open
Gauravshah opened this issue Jun 20, 2018 · 3 comments
Open

redshift_to_pandas write to S3 #24

Gauravshah opened this issue Jun 20, 2018 · 3 comments

Comments

@Gauravshah
Copy link

Expected Behaviour

it would be more performance efficient if redshift_to_pandas first writes to S3 and then reads it from there.

Actual Behaviour

redshift_to_pandas reads over wire directly connecting to redshift

Implications

If there is large dataset ( >100 million rows) being downloaded, redshift's one thread is occupied in serving this user.

@yaojiach
Copy link
Collaborator

yaojiach commented Jul 8, 2018

It sounds like a separate concern. If you are using Python >= 3.6 you can try https://github.com/yaojiach/red-panda/blob/master/red_panda/red_panda.py#L664

@PabTorre
Copy link
Contributor

the syntax from the redshift side shouldn't be too hard to integrate.
https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html

@agawronski
Copy link
Owner

Will look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants