Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response to persist work actor failure #159

Open
lsat12357 opened this issue Nov 27, 2019 · 2 comments
Open

response to persist work actor failure #159

lsat12357 opened this issue Nov 27, 2019 · 2 comments

Comments

@lsat12357
Copy link
Contributor

lsat12357 commented Nov 27, 2019

I've been finding works that, according to the aasm_status, failed during the persist_work stage, but are actually in Hyrax. Presumably something goes wrong after the object is saved but before the stack completes.
I think the causes I've seen so far (that aren't just problems we need to fix) have to do with the attach files job. I think we could have a job/service that retries running the attach files and checks visibility and if that operation is successful, a callback that updates the migrator work status.
We could add more services to cover other failures as they become apparent.

@lsat12357 lsat12357 changed the title persist work actor does not always have correct status response to persist work actor failure Dec 26, 2019
@lsat12357 lsat12357 self-assigned this Dec 26, 2019
@lsat12357 lsat12357 reopened this Jan 28, 2020
@lsat12357
Copy link
Contributor Author

Add other services as needed.

@lsat12357 lsat12357 added the Epic label Jan 28, 2020
@lsat12357 lsat12357 removed their assignment Jan 29, 2020
@lsat12357
Copy link
Contributor Author

lsat12357 commented Apr 22, 2021

because of intermittent system errors, this time around :

  • ERROR: Undefined namespace prefix: /rdf:RDF/rdf:Description/dc:title/text()
  • Failed to open TCP connection to fcrepo.od2-test.svc.cluster.local:8080 (getaddrinfo: Temporary failure in name resolution)
  • end of file reached

a number of assets were ingested in an incomplete state.
Fixing them required some combination of:

  • attaching the fileset
  • setting visibility
  • setting collections
  • creating new sipity entity

May want to draw the line somewhere and just delete/reingest? we just had some changes to infra, likely will not have this level of error when we start migrating for real.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants