Fewer results when using bif:contains text indexing #1118

rschalkrce · 2023-09-26T08:09:02Z

bif:contains returns fewer results than CONTAINS, and this may be undesirable for some datasets.

I think this is because bif:contains does not support leading wildcards when searching within strings. It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger": bif-vs-contains-cht.

Several datasets currently use bif because of quicker response times, but it may negatively affect results. Would be good to discuss this, now that we are also considering using text indexing on some Poolparty datasets. We can leave the decision up to dataproviders but perhaps inform them, or mention it in the requirements.

Related issue: #1064

The text was updated successfully, but these errors were encountered:

ddeboer · 2023-09-26T08:24:39Z

Hey, thanks for the clear report!

It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger"

Yup, this is a common issue with full text indexes. The source can solve this with proper stemming configuration.

I’m convinced we still need to push proper text indexes (using bif:contains on Virtuoso and equivalents on other SPARQL endpoints), because:

it’s much faster, and speed is quite important for a useable Network of Terms
it gives the terminology source greater power in determining what the results should look like: fuzzy searches, relevance ranking etc.

However, there are two caveats:

proper search index configuration isn’t easy
even if you know the proper configuration, search indexes included with SPARQL endpoints, usually Lucene e.g. with Virtuoso and Fuseki, may not allow setting advanced configuration anyway.

rschalkrce · 2024-03-01T11:50:47Z

I almost get weekly complaints/questions by NoT users that they cannot find the terms they are looking for. When asked this is almost always because of bif:contains. E.g. users search on 'methode' in Brinkman, but do not find 'leermethode'. How do we want to deal with this?

rschalkrce added the discuss label Sep 26, 2023

rschalkrce assigned ddeboer, EnnoMeijers and rschalkrce Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fewer results when using bif:contains text indexing #1118

Fewer results when using bif:contains text indexing #1118

rschalkrce commented Sep 26, 2023

ddeboer commented Sep 26, 2023

rschalkrce commented Mar 1, 2024

Fewer results when using bif:contains text indexing #1118

Fewer results when using bif:contains text indexing #1118

Comments

rschalkrce commented Sep 26, 2023

ddeboer commented Sep 26, 2023

rschalkrce commented Mar 1, 2024