Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug ETL on managed infrastructure, i.e. [new] staging #226

Open
mattfullerton opened this issue Nov 7, 2016 · 8 comments
Open

Debug ETL on managed infrastructure, i.e. [new] staging #226

mattfullerton opened this issue Nov 7, 2016 · 8 comments
Assignees

Comments

@mattfullerton
Copy link
Contributor

Because ideally this should work here too

@mattfullerton mattfullerton self-assigned this Nov 7, 2016
@mattfullerton
Copy link
Contributor Author

@davidmihalyi Have you had any recent experience with this? I.e. can we do a full CH load and larger GS loads on staging.resourceprojects.org?

@davidmihalyi
Copy link

Still not able to get full CH import through on staging... :(

@mattfullerton
Copy link
Contributor Author

OK, thanks for trying!

@davidmihalyi
Copy link

also no error message either...

@mattfullerton
Copy link
Contributor Author

It will be a crash/app being killed due to too many operation. In this case reports are not saved because the process doesn't get to finish and return a report. I will look in the logs.

@mattfullerton
Copy link
Contributor Author

The CH import works on a version of staging with more memory (2G) and no LB that would kill it in the event of failing healthchecks (cc @iprunache that I am getting this right).

We need to make a final decision on whether this is a tenable solution for the future or whether we want to live with giving our instances more memory just because of infrequent imports.

Certainly I'm sure some optimization can be done, for example tuning what data is loaded from the DB (i.e. only certain fields, or tuning queries) when doing checks for existing items but we will probably not get away from the fact that importing is a) infrequent and b) more memory intensive than most site requests.

Once these decisions are made we can close this issue.

@iprunache
Copy link
Contributor

@mattfullerton, you described the issue accurately.

@davidmihalyi
Copy link

Understood. We will have to push solving this issue (one way or the other) next year.

For now I would like one round of clean import this week (with extra memory) to use for a last data refresh for the year.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants