Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import process from PDC Describe is crashing #711

Open
hectorcorrea opened this issue Nov 4, 2024 · 3 comments · May be fixed by #724
Open

Import process from PDC Describe is crashing #711

hectorcorrea opened this issue Nov 4, 2024 · 3 comments · May be fixed by #724
Assignees

Comments

@hectorcorrea
Copy link
Member

hectorcorrea commented Nov 4, 2024

It looks like the import process from PDC Describe is crashing on one of the new and rather large records. The record in question is https://pdc-describe-prod.princeton.edu/describe/works/470. If you load this record in PDC Describe you'll notice that it takes a rather long time to load and loading the file list (60,005 files) also takes a very long time.

This is what Honeybadger reports:

Error importing record from https://pdc-describe-prod.princeton.edu/describe/works/470.json. 
Exception: Net::ReadTimeout 

The record in Discovery still looks like a DataSpace record because the import from Describe is failing: https://datacommons.princeton.edu/discovery/catalog/165364

@hectorcorrea
Copy link
Member Author

Loading the file list on this record takes 30 seconds in production:

pdc-describe(prod)> Rails.logger.level = Logger::INFO
=> 1
pdc-describe(prod)> w = Work.find(470)
=> #<Work:0x00007f8782575420 id: 470>
pdc-describe(prod)> w.file_list.count
Loading S3 objects. Bucket: pdc-describe-prod-postcuration. Prefix: 10.34770/n42z-hb72/470/. Elapsed: 30.917009837 seconds
=> 60005
pdc-describe(prod)> 

@hectorcorrea
Copy link
Member Author

hectorcorrea commented Nov 15, 2024

Fetching the data via curl also takes 30+ seconds, notice that the payload is 13M

~/src $ curl https://datacommons.princeton.edu/describe/works/470?format=json > 470.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.2M    0 13.2M    0     0   407k      0 --:--:--  0:00:33 --:--:-- 3300k

@kelynch
Copy link
Contributor

kelynch commented Dec 10, 2024

We should continue to keep an eye on this and plan to work this ticket in January.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants