You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bif:contains returns fewer results than CONTAINS, and this may be undesirable for some datasets.
I think this is because bif:contains does not support leading wildcards when searching within strings. It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger": bif-vs-contains-cht.
Several datasets currently use bif because of quicker response times, but it may negatively affect results. Would be good to discuss this, now that we are also considering using text indexing on some Poolparty datasets. We can leave the decision up to dataproviders but perhaps inform them, or mention it in the requirements.
It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger"
Yup, this is a common issue with full text indexes. The source can solve this with proper stemming configuration.
I’m convinced we still need to push proper text indexes (using bif:contains on Virtuoso and equivalents on other SPARQL endpoints), because:
it’s much faster, and speed is quite important for a useable Network of Terms
it gives the terminology source greater power in determining what the results should look like: fuzzy searches, relevance ranking etc.
However, there are two caveats:
proper search index configuration isn’t easy
even if you know the proper configuration, search indexes included with SPARQL endpoints, usually Lucene e.g. with Virtuoso and Fuseki, may not allow setting advanced configuration anyway.
I almost get weekly complaints/questions by NoT users that they cannot find the terms they are looking for. When asked this is almost always because of bif:contains. E.g. users search on 'methode' in Brinkman, but do not find 'leermethode'. How do we want to deal with this?
bif:contains
returns fewer results thanCONTAINS
, and this may be undesirable for some datasets.I think this is because
bif:contains
does not support leading wildcards when searching within strings. It will return "uitschuifbare opleggers" and "opleggers (molenelement)" when searching for "oplegger", but not "goudoplegger": bif-vs-contains-cht.Several datasets currently use bif because of quicker response times, but it may negatively affect results. Would be good to discuss this, now that we are also considering using text indexing on some Poolparty datasets. We can leave the decision up to dataproviders but perhaps inform them, or mention it in the requirements.
Related issue: #1064
The text was updated successfully, but these errors were encountered: