Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Solr for newspaper issue listing #158

Merged
merged 11 commits into from
Mar 20, 2018

Conversation

whikloj
Copy link
Member

@whikloj whikloj commented Dec 1, 2017

JIRA Ticket: https://jira.duraspace.org/browse/ISLANDORA-2126

What does this Pull Request do?

Add a option to generate the newspaper issue list from Solr instead of the triplestore.

What's new?

New variables, split the islandora_newspaper_get_issues() function to call either Solr or the resource index.

How should this be tested?

You'll need a newspaper with several issues.

Download this PR and install it.

Go to the admin page, choose Use Solr and edit the 3 new fields as necessary. Save the configuration.

Go to the newspaper and see exactly the same display as before.

Additional Notes:

Any additional information that you think would be helpful when reviewing this PR.

Example:

  • Does this change require documentation to be updated? Documentation included
  • Does this change add any new dependencies? no
  • Does this change require any other modifications to be made to the repository (ie. Regeneration activity, etc.)? no
  • Could this change impact execution of existing code? no

Interested parties

@bgilling @mjordan @rosiel

Copy link
Contributor

@DiegoPino DiegoPino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly feedback on minor stuff

'#title' => t('Use Solr for Newspaper display'),
'#disabled' => (!module_exists('islandora_solr')),
'#description' => t('Use Solr to generate lists of issues for a newspaper object.'),
'#default_value' => (module_exists('islandora_solr') ? variable_get('islandora_newspaper_use_solr', FALSE) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if the module exists? Makes sense of course but we have existing settings (isMemberOf_uri_ms) that don't do that/depend on the existence (like 42 to 51). What about fixing all and making module exist part of the Solr options subform validation/ or even better display?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't officially have a dependency on Solr, so I figured it's a good idea to check for it. I'm good either way.

'#title' => t('Issued date field'),
'#description' => t('Solr field that contains the date issued of the newspaper issue.'),
'#size' => 100,
'#default_value' => variable_get('islandora_newspaper_solr_date_field', 'RELS_EXT_dateIssued_literal_ms'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new logic sorting against a date? In that case, the default could be better _s or even better _dt to allow actually an order by date_issued in the query itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not do that. It would require more work (which I considered) to check if the field was a date and then sort in Solr. But this returns just returns the results which are grouped and sorted in PHP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj don't want to be insistent, but in the rare case that you actually have an _ms field with multiple values, will the display/after logic )(not yours, what is already in place) work out correctly? What happens if you get two dates? This is just plain curiosity now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so for multi-valued fields I am just grabbing the first. With a date issued it seems like you shouldn't have a bunch of them. But I'm open to suggestions.

'#title' => t('Sequence field'),
'#description' => t('Solr field that contains the sequence number of the issues in a newspaper.'),
'#size' => 100,
'#default_value' => variable_get('islandora_newspaper_solr_sequence_field', 'RELS_EXT_isSequenceNumber_literal_ms'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of the same as previous, maybe better to default to something that can be sorted on via Solr (single valued)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer around sorting, depends on non-standard Solr config. Could be a new feature to add in later.

* Implements hook_FORM_ID_validate().
*/
function islandora_newspaper_admin_settings_form_validate(array $form, array &$form_state) {
$error = array();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob this spot is good for checking if solr module exist? (comment only, not requesting a change in this one)

catch (Exception $e) {
// Use the current time as a place holder.
$issued = new DateTime();
$msg = 'Failed to get issued date from SPARQL query for @pid';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "From the Solr query?"

);

do {
$start += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the default dates/sequence, (_the _ms issue I mentioned comments before), could a sort of be added to the Solr query? Not needed?

drupal_set_message(check_plain(t('Error searching Solr index')) . ' ' . $error->getMessage(), 'error');
}

} while ($count > ($rows * $start + $rows) && !isset($error));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should $count be initialized to something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 222 above, basically in the case that you have more than 10,000 issues I wanted to start looping to grab them. So the count tells me the total number but only after we have done a query. I could initialize it before the loop if you think that is better. I added the !isset($error) to halt the loop if you get an error from Solr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the !isset($error) is a good one. I was just afraid that at runtime, PHP 7.x could throw some warnings because $count is undefined. I have no strong opinion really but want to be sure the logic does not allow infinite looping (i'm scared of "while" statements always). Seems like the logic won't allow that. good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github is doing something weird. I added a $error = 0; to line 193, which you can see on the branch but in the PR it appears like maybe it didn't fully update? Not sure
https://github.com/uml-digitalinitiatives/islandora_solution_pack_newspaper/blob/7.x-ISLANDORA-2126/includes/utilities.inc#L193

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj weird. I have seen some caching weirdness in GitHub lately. Can you force a new hash via an ammend or a rebase? 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj still don't see $error = 0; Should i wait for that?

@@ -80,6 +77,41 @@ function islandora_newspaper_get_newspaper($object) {
* - issued: A DateTime object repersenting the date the issue was released.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, was there from forever but could be good chance to fix "repersenting"

@@ -139,7 +141,7 @@ function islandora_newspaper_admin_settings_form_validate(array $form, array &$f
}
if (count($error) > 0) {
foreach ($error as $field => $message) {
form_set_error($field, $message);
form_set_error($field, check_plain($message));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@DiegoPino
Copy link
Contributor

@whikloj looks good. Will wait for travis.

@DiegoPino
Copy link
Contributor

/remind me to approve once @whikloj rehashes and then merge in 24 hours

@DiegoPino
Copy link
Contributor

DiegoPino commented Dec 1, 2017

@dannylamb should something had happened with /remind thing? Do we need to have the Bot actually running?

@whikloj
Copy link
Member Author

whikloj commented Dec 1, 2017

Best git commit message ever!

Sorry @DiegoPino I forgot to make the OR explicit

Copy link
Contributor

@DiegoPino DiegoPino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jared. Does removing base filters still allows for namespace and xacml based exclusions?

@whikloj
Copy link
Member Author

whikloj commented Dec 4, 2017

@DiegoPino So namespaces and XACML should be ok. The whole reason to use the IslandoraSolrQueryProcessor is those exclusions.

The issue I hit was I have a base filter which removes newspaper issues from search results, because most people what to see the actual page.

But that filter means that I can't generate a listing of all newspaper issues via Solr unless I remove it.

Here is the admin form for the base filters.
https://github.com/Islandora/islandora_solr_search/blob/7.x/includes/admin.inc#L613-L620

and that we only remove those specific filters from the query.
https://github.com/Islandora/islandora_solr_search/blob/7.x/includes/utilities.inc#L651-L657

I'm not sure what others use that filter for and if this would produce unwanted consequences for them.

@DiegoPino
Copy link
Contributor

@whikloj totally. We will actually need in the near future to separate internal query processor v/s final user display one. Thanks for clarifying this. Now that DCS are Ok, I will test this again during the day. Give me 24 hours before merging.

@whikloj
Copy link
Member Author

whikloj commented Dec 4, 2017

If this seems dangerous, then we can close it. Most of this code is already in my islandora_custom_solr module to allow me to do my queries in Solr.

I have a follow-up that uses Solr date fields to add a range filter to retrieve only one years worth of newspapers at a time. It then uses Ajax (prompted by @bgilling's work here) to load any other year dynamically.
https://github.com/uml-digitalinitiatives/islandora_solution_pack_newspaper/compare/7.x-ISLANDORA-2126...uml-digitalinitiatives:7.x-ISLANDORA-2116-2?expand=1

I can also build this into islandora_custom_solr instead of core if there are concerns are my use of islandora_solr_remove_base_filters().

@DiegoPino
Copy link
Contributor

@whikloj i actually don't have any concerns with islandora_solr_remove_base_filters(). But I guess folks could? Any way we could make some exclusions without touching all the filters? Maybe just too much work, i know. @Islandora/7-x-1-x-committers any concerns?

@whikloj
Copy link
Member Author

whikloj commented Dec 4, 2017

@DiegoPino I guess we could add a toggle, I'm not sure how to clearly explain it.

[x] Remove base filters
This option removes your configured Solr base filters from the queries. If you want your filters to be applied even though they could affect which newspaper issue objects are returned in the list, uncheck this option.

or something....better?

@DiegoPino
Copy link
Contributor

DiegoPino commented Dec 4, 2017 via email

@whikloj
Copy link
Member Author

whikloj commented Dec 11, 2017

Ok, toggle is added.

So to test the toggle, if you go to Admin -> Islandora -> Solr Index -> Solr Settings -> Query Defaults tab

In the Solr base filter field add -RELS_EXT_hasModel_uri_ms:"info:fedora/islandora:newspaperIssueCModel"

Then by default this PR works, but if you un-check the new Remove base Solr filters toggle, then your base filters are NOT removed and you should get 0 issues displayed for your test newspaper.

@DiegoPino
Copy link
Contributor

@whikloj hi, looks almost ready, great. Could you please just fix DCS thing at https://travis-ci.org/Islandora/islandora_solution_pack_newspaper/jobs/314886467#L917 will later approve and go the usual merge workflow if nobody opposes. Thanks

@DiegoPino
Copy link
Contributor

@whiklojI approved the changes here but DCS issue is still unsolved. Please let me know if you want me to pull against your branch to ease on that.

@DiegoPino DiegoPino merged commit bd5ba36 into Islandora:7.x Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants