Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3490 apostrophes in titles #2360

Open
wants to merge 32 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
1c87041
added setting for mapping
RK206 Nov 29, 2023
5102c2b
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Jan 22, 2024
2e18716
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Jan 26, 2024
1eb8523
Updated Functional tests section for pull request template
RK206 Feb 15, 2024
f6a170d
Added seamless for article and ascii folded fix for article and journal
RK206 Mar 11, 2024
1c8546c
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Mar 11, 2024
478d97a
reverted changes
RK206 Mar 11, 2024
f1b8fff
added a testcase for searching with ascii folded characters
RK206 Mar 11, 2024
a8d5998
Added more unit tests and added ascii folding for abstract and author
RK206 Mar 13, 2024
b978483
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Mar 13, 2024
6ea21df
Revert "Updated Functional tests section for pull request template"
RK206 Apr 1, 2024
7f9f810
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Apr 1, 2024
23ab229
Done some minor fixes
RK206 Apr 1, 2024
9a3decc
Merge branch 'develop' into feature/3490_apostrophes_in_titles
Steven-Eardley Jul 4, 2024
5ce3957
Merge branch 'develop' into feature/3490_apostrophes_in_titles
Steven-Eardley Jul 4, 2024
e6e39ed
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Aug 12, 2024
2d4bb86
'additional_fields' handled in better way
RK206 Aug 14, 2024
40c4b2a
Merge remote-tracking branch 'origin/feature/3490_apostrophes_in_titl…
RK206 Aug 14, 2024
32bbb74
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Aug 21, 2024
d69fa08
Fixed merge conflicts
RK206 Aug 21, 2024
32eaa52
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Aug 26, 2024
b5ce460
Updated code for better implementation of ascii folding
RK206 Aug 26, 2024
7e29c3d
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Aug 26, 2024
d65640a
Roll back remaining 'additional_fields'
RK206 Aug 28, 2024
e04a3a4
Merge branch 'develop' into feature/3490_apostrophes_in_titles
Steven-Eardley Nov 5, 2024
f3b1106
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Nov 6, 2024
4a9162b
Added Application to ascii folding
RK206 Nov 7, 2024
a566381
Merge branch 'develop' into feature/3490_apostrophes_in_titles
Steven-Eardley Nov 14, 2024
5ddcd5b
Added alternative title for ascii folding
RK206 Dec 11, 2024
e858b71
Merge remote-tracking branch 'origin/feature/3490_apostrophes_in_titl…
RK206 Dec 11, 2024
15c50c3
Merge branch 'develop' of https://github.com/DOAJ/doaj into feature/3…
RK206 Dec 19, 2024
af0097e
Fixed a failing test
RK206 Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions doajtest/fixtures/article.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,22 @@ def find_dict_in_list(lst, key, value):
def make_article_apido_struct():
return deepcopy(ARTICLE_STRUCT)

@staticmethod
def make_article_with_data(title=None, publisher_name=None, abstract=None, country=None, author=None):
source = deepcopy(ARTICLE_SOURCE)
if title:
source["bibjson"]["title"] = title
if publisher_name:
source["bibjson"]["journal"]["publisher"] = publisher_name
if abstract:
source["bibjson"]["abstract"] = abstract
if country:
source["bibjson"]["journal"]["country"] = country
if author:
source["bibjson"]["author"][0]["name"] = author

return source


ARTICLE_SOURCE = {
"id": "abcdefghijk_article",
Expand Down
12 changes: 12 additions & 0 deletions doajtest/fixtures/v2/journals.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,18 @@ def make_journal_form():
def make_journal_form_info():
return deepcopy(JOURNAL_FORM_EXPANDED)

@staticmethod
def make_journal_with_data(title=None, publisher_name=None, country=None, in_doaj=True):
journal = deepcopy(JOURNAL_SOURCE)
journal['admin']['in_doaj'] = in_doaj
if title:
journal["bibjson"]["title"] = title
if publisher_name:
journal["bibjson"]["publisher"]["name"] = publisher_name
if country:
journal["bibjson"]["publisher"]["country"] = country
return journal

@staticmethod
def make_bulk_edit_data():
return deepcopy(JOURNAL_BULK_EDIT)
Expand Down
12 changes: 12 additions & 0 deletions doajtest/testbook/public_site/public_search.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,4 +186,16 @@ tests:
- step: click spacebar to check the filter
results:
- filter is applied
- title: 'Test Public Search Ascii Folding: Articles/Journals'
context:
role: anonymous
steps:
- step: Make sure there is a Journal or Article which has special ascii characters (example - I can’t really think in English ) in one of the following fields
- Title
- Publisher name
- Country name
- step: Go to the DOAJ search page at /search/articles for article search or /search/journals for journal search
- step: search with ascii characters instead of special characters (example - I can't really think in English)
results:
- Same search results will be displayed when searched with special characters (I can’t really think in English)

144 changes: 137 additions & 7 deletions doajtest/unit/test_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from portality import models

from doajtest.fixtures import AccountFixtureFactory, ArticleFixtureFactory, EditorGroupFixtureFactory, \
ApplicationFixtureFactory
ApplicationFixtureFactory, JournalFixtureFactory
from doajtest.helpers import DoajTestCase, deep_sort

from portality.bll.services.query import QueryService, Query
Expand All @@ -18,6 +18,13 @@
"query_filters" : ["only_in_doaj"],
"result_filters" : ["public_result_filter"],
"dao" : "portality.models.Article"
},
"journal": {
"auth": False,
"role": None,
"query_filters": ["only_in_doaj"],
"result_filters": ["public_result_filter"],
"dao": "portality.models.Journal"
}
},
"publisher_query" : {
Expand Down Expand Up @@ -147,6 +154,13 @@
"search_all_meta" : "portality.lib.query_filters.search_all_meta",
}

MATCH_ALL_RAW_QUERY = {"query": {"match_all": {}}}


def raw_query(query):
return {'query': {'query_string': {'query': query, 'default_operator': 'AND'}}, 'size': 0, 'track_total_hits': True}


def without_keys(d, keys):
return {x: d[x] for x in d if x not in keys}

Expand Down Expand Up @@ -191,8 +205,8 @@ def get_journal_with_notes(self):

def test_01_auth(self):
with self.app_test.test_client() as t_client:
response = t_client.get('/query/journal') # not in the settings above
assert response.status_code == 403, response.status_code
response = t_client.get('/query/journal')
assert response.status_code == 200, response.status_code
Steven-Eardley marked this conversation as resolved.
Show resolved Hide resolved

# theoretically should be a 404, but the code checks QUERY_ROUTE config first, so auth checks go first
response = t_client.get('/query/nonexistent')
Expand Down Expand Up @@ -242,12 +256,15 @@ def test_02_query_gen(self):

q = Query()
q.add_include("last_updated")
assert q.as_dict() == {'track_total_hits' : True, "query": {"match_all": {}},"_source": {"includes": ["last_updated"]}}, q.as_dict()
assert q.as_dict() == {'track_total_hits': True, "query": {"match_all": {}},
"_source": {"includes": ["last_updated"]}}, q.as_dict()

q = Query()
q.add_include(["last_updated", "id"])
assert sorted(q.as_dict()) == sorted({'track_total_hits' : True, "query": {"match_all": {}},"_source": {"includes": ["last_updated", "id"]}}) or sorted(q.as_dict()) == sorted({"query": {"match_all": {}},"_source": {"include": ["last_updated", "id"]}}), sorted(q.as_dict())

assert sorted(q.as_dict()) == sorted({'track_total_hits': True, "query": {"match_all": {}},
"_source": {"includes": ["last_updated", "id"]}}) or sorted(
q.as_dict()) == sorted(
{"query": {"match_all": {}}, "_source": {"include": ["last_updated", "id"]}}), sorted(q.as_dict())

def test_03_query_svc_get_config(self):
qsvc = QueryService()
Expand Down Expand Up @@ -595,6 +612,119 @@ def test_journal_article_query_notes(self):
'size': 0, 'track_total_hits': True}, account=None, additional_parameters={"ref":"fqw"})
assert res['hits']['total']["value"] == 0, res['hits']['total']["value"]

def test_article_query_ascci_folding(self):
self.article12 = models.Article(
**ArticleFixtureFactory.make_article_with_data({"bibjson": {"title": "I can’t really think in English"}}))
self.article12.save(blocking=True)
qsvc = QueryService()

res = qsvc.search('query', 'article', MATCH_ALL_RAW_QUERY, account=None,
additional_parameters={})
assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

res = qsvc.search('query', 'article', raw_query("I can't really think in English"),
account=None, additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

res = qsvc.search('query', 'article', raw_query("I can’t really think in English"),
account=None, additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

def test_journal_query_ascii_folding(self):
self.journal = models.Journal(**JournalFixtureFactory.make_journal_with_data("I can’t really think in English"))
self.journal.save(blocking=True)
qsvc = QueryService()

res = qsvc.search('query', 'journal', MATCH_ALL_RAW_QUERY, account=None,
additional_parameters={})
assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

res = qsvc.search('query', 'journal', raw_query("I can't really think in English"),
account=None, additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

res = qsvc.search('query', 'journal', raw_query("I can’t really think in English"),
account=None, additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

def test_article_query_ascci_folding_data(self):
self.article12 = models.Article(
**ArticleFixtureFactory.make_article_with_data(title="Kadınlarının sağlık",
publisher_name="Ankara Üniversitesi",
abstract="Araştırma grubunu", country="Türkiye",
author="Sultan GÜÇLÜ"))
self.article12.save(blocking=True)
qsvc = QueryService()

res = qsvc.search('query', 'article', MATCH_ALL_RAW_QUERY, account=None,
additional_parameters={})
assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for title
res = qsvc.search('query', 'article', raw_query("Kadinlarinin saglik"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# echeck for publisher
res = qsvc.search('query', 'article', raw_query("Ankara Universitesi"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for abstract
res = qsvc.search('query', 'article', raw_query("Arastırma grubunu"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for country
res = qsvc.search('query', 'article', raw_query("Turkiye"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for author
res = qsvc.search('query', 'article', raw_query("Sultan GUCLU"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

def test_journal_query_ascii_folding_data(self):
self.journal = models.Journal(**JournalFixtureFactory
.make_journal_with_data(title="Kadınlarının sağlık",
publisher_name="Ankara Üniversitesi",
country="Türkiye", ))
self.journal.save(blocking=True)
qsvc = QueryService()

# check if journal exist
res = qsvc.search('query', 'journal', MATCH_ALL_RAW_QUERY, account=None,
additional_parameters={})
assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for title search
res = qsvc.search('query', 'journal', raw_query("Kadinlarinin saglik"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for publisher name
res = qsvc.search('query', 'journal', raw_query("Ankara Universitesi"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

# check for country
res = qsvc.search('query', 'journal', raw_query("Turkiye"), account=None,
additional_parameters={"ref": "fqw"})

assert res['hits']['total']["value"] == 1, res['hits']['total']["value"]

def test_search__invalid_from(self):
acc = models.Account(**AccountFixtureFactory.make_managing_editor_source())
acc.save(blocking=True)
Expand All @@ -604,4 +734,4 @@ def test_search__invalid_from(self):
'sort': [{'_score': {'order': 'desc'}}],
'track_total_hits': 'true'}
with pytest.raises(RequestError):
QueryService().search('admin_query', 'journal', query, account=acc, additional_parameters={})
QueryService().search('admin_query', 'journal', query, account=acc, additional_parameters={})
2 changes: 1 addition & 1 deletion portality/lib/dataobj.py
Original file line number Diff line number Diff line change
Expand Up @@ -759,7 +759,7 @@ def _get_list(self, path, coerce=None, by_reference=True, allow_coerce_failure=T
return deepcopy(val)

def _set_single(self, path, val, coerce=None, allow_coerce_failure=False, allowed_values=None, allowed_range=None,
allow_none=True, ignore_none=False):
allow_none=True, ignore_none=False, additional_fields = None):
Steven-Eardley marked this conversation as resolved.
Show resolved Hide resolved

if val is None and ignore_none:
return
Expand Down
9 changes: 8 additions & 1 deletion portality/lib/es_data_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# ~~->Seamless:Library~~
# ~~->DataObj:Library~~

from copy import deepcopy

from portality.lib import plugin


Expand All @@ -22,6 +24,7 @@ def get_mappings(app):
for cname in mapping_daos:
klazz = plugin.load_class_raw(cname)
mappings[klazz.__type__] = {'mappings': klazz().mappings()}
mappings[klazz.__type__]['settings'] = app.config["DEFAULT_INDEX_SETTINGS"]

return mappings

Expand All @@ -31,7 +34,11 @@ def apply_mapping_opts(field_name, path, spec, mapping_opts):
if dot_path in mapping_opts.get('exceptions', {}):
return mapping_opts['exceptions'][dot_path]
elif spec['coerce'] in mapping_opts['coerces']:
return mapping_opts['coerces'][spec['coerce']]
field_mapping = deepcopy(mapping_opts['coerces'][spec['coerce']])
if 'additional_fields' in spec:
field_mapping = {**field_mapping, **mapping_opts[spec['additional_fields']]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new lines no longer needed now, so can be rolled back


return field_mapping
else:
# We have found a data type in the struct we don't have a map for to ES type.
raise Exception("Mapping error - no mapping found for {}".format(spec['coerce']))
Expand Down
2 changes: 1 addition & 1 deletion portality/lib/seamless.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ def get_single(self, path, coerce=None, default=None, allow_coerce_failure=True)
return val

def set_single(self, path, val, coerce=None, allow_coerce_failure=False, allowed_values=None, allowed_range=None,
allow_none=True, ignore_none=False, context=""):
allow_none=True, ignore_none=False, context="", additional_fields = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RK206 need to roll back this change on seamless


if val is None and ignore_none:
return
Expand Down
7 changes: 7 additions & 0 deletions portality/migrate/3490_ascii_folding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# 09 11 2023; Issue 3575 - Make notes searchable for admin

## Execution

Run the migration with

python portality/scripts/es_reindex.py portality/migrate/3490_ascii_folding/migrate.json
Empty file.
16 changes: 16 additions & 0 deletions portality/migrate/3490_ascii_folding/migrate.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"new_version": "-20240307_ascii_folding",
"old_version": "",
"types": [
{
"type" : "article",
"migrate": true,
"set_alias": false
philipkcl marked this conversation as resolved.
Show resolved Hide resolved
},
{
"type": "journal",
"migrate": true,
"set_alias": false
}
]
}
Loading
Loading