-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sets support #206
Merged
Merged
sets support #206
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
68f0d5d
feature: update oai set implementation
rekt-hard a452d80
update: query sets more efficiently
rekt-hard 13e66e2
tests: fix pydocstyle and isort
rekt-hard 98fe14e
fix: use recordindexer to retrieve index
rekt-hard fcf3b38
fix: remove unused parts
rekt-hard 96dd51a
refactor: pass testcases
rekt-hard c425c85
fix: resumption token
rerowep 6614ff7
fix: read metadataPrefix from resumptionToken
rekt-hard a60831d
test: multiple resumption tokens
rerowep 3a1dc34
fix: load resumptionToken via model
rekt-hard 447f7f1
refactor: remove unused code
rekt-hard 48e791e
logging: use application logger
cffd1f3
test: enable set tests for verbs
ec28a02
installation: use invenio_db global version
e15b447
global: remove python2 support
f1d0d9f
tests: implement percolator tests
bd51efd
models: remove unused code
897b3ab
percolators: create less percolator indices
slint 0c01b1d
release: v1.4.0
slint File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application crashes if the
search_pattern
is empty. Why the default behaviour is not to search in the_oai.set
fields?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, an empty
search_pattern
does not make sense, if there is no way to manually manage sets.The idea is to not store the information of the sets a record belongs to within the record itself. Instead, the record is matched against the queries of the sets (ElasticSearch percolate query) and this information is generated dynamically on request.
If the set argument is provided (which is the case if the code above is to be executed), then only records matching the specified set will be returned. Thus it is necessary to lookup the set and get its search pattern. After fetching the records belonging to this set, the above step is performed (i.e. the records are matched against all the sets queries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply.
Here is some suggestions:
search_pattern
is required, I suggest to add a no null constraint in the db modelListIdentifiers
, you can usesource
for the elasticsearch during theget_records
ListRecords
search_query
asinsitution:test
raise the following error:TypeError: expected string or bytes-like object
, replacing byinstitution:"test"
works (bug)Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I misunderstand something. If during the request we use the
search_pattern
, is the percolator still useful?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The percolate query is used to get all sets a record belongs to (needed for the header of the response).
The additional query above is used, when a setSpec is specified in the request (f.e. https://127.0.0.1:5000/oai2d?verb=ListRecords&metadataPrefix=oai_dc&set=my-set). It is used during record querying so only records belonging to this set are retrieved. Then we take these records and do a percolate query to get all the sets the records belong to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jma First of all thanks for reviewing!
As @rekt-hard explained, the big change here is that we want to get away from storing OAI sets information inside the record. It's highly very inefficient to do it this way (e.g. define a new set, or modify an existing set and you may need to reindex every single record in the entire repository. The only thing we're loosing is the ability to say that a record was deleted from a set. A harvest can obtain that information by running ListIdentifiers for a set and compare against a local set.
On a side note, can I ask you not to use the "Request changes" when reviewing a PR. Instead, simply prefix your comment with Comment/Question/Minor//Major according (see definitions here). Basically idea is that lack of a green approve, means it's not approved. Just wanted to explain why I'm gonna press the "dismiss review"