Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ElasticSearch and Elastica to 6.x #1880

Open
wants to merge 22 commits into
base: qa/2.x
Choose a base branch
from

Conversation

anvit
Copy link
Contributor

@anvit anvit commented Oct 9, 2024

Update ElasticSearch and Elastica and address issues introduced by breaking changes in ElasticSearch 6.x.

The most notable changes are:

  • Dropped support for multiple mapping types: ES 6.x no longer allows using multiple indices, They suggest handling this by creating multiple indices, each with its own type.
  • Mapping changes: The include_in_all mapping parameter is disallowed and the _all field is disabled by default.

Changes in Elastica:

  • Elastica only has a addDocuments, deleteDocuments, and updateDocuments methods in 6.x and there are no longer addDocument, deleteDocument, updateDocument methods for handling a singular documents.

@anvit anvit self-assigned this Oct 10, 2024
@anvit anvit requested a review from a team October 10, 2024 16:22
@anvit anvit added this to the 2.9.0 milestone Oct 10, 2024
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great @anvit!

I have not been able to look at everything in detail yet, but I have added a few comments that I think are important and may require some work already. Let me know your thoughts ;)

anvit and others added 10 commits November 15, 2024 15:53
Update ElasticSearch to 6.8.23 in the Dockerfile and update Elastica to
6.x in composer.
- 'inline' scripts are deprecated, changed to 'source'
Update arElasticSearchPlugin to use multiple indices instead of multiple
types since ES 6.x removed being able to add multiple types. Also update
mapping to remove include_in_all and add that to the _all field using
copy_to instead since include_in_all was also removed in ES 6.x
Update autocompleteAction to use multiple indices
Add the appropriate index types for ES requests that did not need to
specify explicit types in ES 5.x
Add missing document type to partialUpdate in arElasticSearchPlugin
Update condition in arElasticSearchPluginUtil to only get string
fields included in _all if copy_to is explicitly set to _all in the
mappping.
Renamed arElasticSearchIndexDecorator as
arElasticSearchMultiIndexWrapper for clarity and added a few comments
about changes needed for ElasticSearch/Elastica 7.x.
Rename getType from arElasticSearchMultiIndexWrapper to getIndex for
clarity.
Add a dummy ElasticSearch index type since one is still required for
ES 6.x, instead of having a seperate type name for each of the
multiple indices.
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anvit! This looks a lot clearer to me now.

plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
Update any references to ElasticSearch index type to index name.
Also add getType as an alias for getIndex for backwards compatibility
with custom themes that still might be making references to
ElasticSearch's getType.
Removed copy_to _all from all fields that previously had include_in_all
set to false.
Remove unnecessary $type ElasticSearch variables in QubitLftSyncer,
arElasticSearchPlugin, and updatePublicationStatusTask
Remove the index prefix from arElasticSearchMultiIndexWrapper and use
AtoM class names internally instead of using prefix to refer to them.
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anvit! Getting there, just a few minor tweaks!

lib/QubitLftSyncer.class.php Outdated Show resolved Hide resolved
lib/QubitLftSyncer.class.php Show resolved Hide resolved
Comment on lines 681 to 688
$this->log('Types that will be indexed:');

foreach ($this->mappings as $typeName => $typeProperties) {
if (!in_array(strtolower($typeName), $excludeTypes)) {
$this->log(' - '.$typeName);
foreach ($this->mappings as $indexName => $indexProperties) {
if (!in_array(strtolower($indexName), $excludeTypes)) {
$this->log(' - '.$indexName);
++$typeCount;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to change this too, "Indices that will be updated." and use the prefixed index name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, maybe not here. These need to match the exclude-types argument from the search:populate task.

Now that I think about that, we should probably revisit how is that done with multiple indexes instead of types.

plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
plugins/arElasticSearchPlugin/config/mapping.yml Outdated Show resolved Hide resolved
Comment on lines +40 to +45
public function delete()
{
foreach ($this->indices as $index) {
$index->delete();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the feedback above, this would be the only function without a $name parameter in the wrapper (apart from the constructor). And, when we revisit the exclude-types option I think we'll add that parameter.

It makes a lot of sense to me that all methods from this class require a name parameter now that it's a multi index wrapper instead of an index decorator.

It would be even clearer if $index was called $indicesWrapper in arElasticSearchPlugin, but preventing issues in custom code is a stronger point than clarity.

Just a note.

Fix bug in search autocompleteAction which was showing duplicate results
for actors under repositories. Also address some of the CR feedback and
update/add methods for add, update, and deleteById in the
arElasticSearchMultiIndexWrapper.
Remove unnecessary foreach loop in arElasticSearchPlugin. Also move code
for adding filter for stripping md tags into a seperate method to make
the initialization code more readable.
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @anvit!

Update the index refresh method to accept an optional name param that
only refreshes that particular index in
arElasticSearchMultiIndexWrapper. Also address some small CR feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants