Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fewer results when using bif:contains text indexing #1118

Open
rschalkrce opened this issue Sep 26, 2023 · 2 comments
Open

Fewer results when using bif:contains text indexing #1118

rschalkrce opened this issue Sep 26, 2023 · 2 comments
Assignees
Labels

Comments

@rschalkrce
Copy link
Contributor

bif:contains returns fewer results than CONTAINS, and this may be undesirable for some datasets.

I think this is because bif:contains does not support leading wildcards when searching within strings. It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger": bif-vs-contains-cht.

Several datasets currently use bif because of quicker response times, but it may negatively affect results. Would be good to discuss this, now that we are also considering using text indexing on some Poolparty datasets. We can leave the decision up to dataproviders but perhaps inform them, or mention it in the requirements.

Related issue: #1064

@ddeboer
Copy link
Member

ddeboer commented Sep 26, 2023

Hey, thanks for the clear report!

It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger"

Yup, this is a common issue with full text indexes. The source can solve this with proper stemming configuration.

I’m convinced we still need to push proper text indexes (using bif:contains on Virtuoso and equivalents on other SPARQL endpoints), because:

  • it’s much faster, and speed is quite important for a useable Network of Terms
  • it gives the terminology source greater power in determining what the results should look like: fuzzy searches, relevance ranking etc.

However, there are two caveats:

  • proper search index configuration isn’t easy
  • even if you know the proper configuration, search indexes included with SPARQL endpoints, usually Lucene e.g. with Virtuoso and Fuseki, may not allow setting advanced configuration anyway.

@rschalkrce
Copy link
Contributor Author

I almost get weekly complaints/questions by NoT users that they cannot find the terms they are looking for. When asked this is almost always because of bif:contains. E.g. users search on 'methode' in Brinkman, but do not find 'leermethode'. How do we want to deal with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants