-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google cloud run volume mounting #2133
Comments
@loveeklund-osttra lack of atomic renames may be a problem in certain cases. in your case AFAIK you'll start with always clean storage so you should not have problems with half commited files from previous runs. you can also look at #2131 if you can read your data in the same order you'll be able to extract it chunk by chunk |
Perfect, then I'll continue using it! Thanks for response! I don't think it start with an empty folder if I just mount a volume, ( it uses same directory between runs and I can see previous runs content in storage after it is complete) but I call If you decide to add this into the documentation it can be worth mentioning that you probably want to up the I did it like this in terraform
Also described here Re #2131 |
Documentation description
I run DLT in google cloudrun and have noticed when I load big tables it can get OOM even if it writes to files, as cloudrun doesn't have any "real" storage. What I've been doing instead is mounting a storage bucket and using pipeline_dir to direct the pipeline to use that as the directory. This seems to work well for me in the cases I've tested. But I've also seen that there are limitations with mounting a storage bucket as a directory, listed here https://cloud.google.com/run/docs/configuring/jobs/cloud-storage-volume-mounts . It would be good to have someone who knows how DLT works under the hood take a look at this and see if these limitations might cause issues (For example if two or more processes/ threads would write to the same file etc). If the limitations wouldn't cause issues I think it would be nice to include a section about it here
https://dlthub.com/docs/walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-run
to help other in the future.
Are you a dlt user?
Yes, I run dlt in production.
The text was updated successfully, but these errors were encountered: