fh salzburg
should not match
In Salzburg there is a University and in Vienna there is an FH
How?
-
fh salzburg
should not matchIn Salzburg there is a University and in Vienna there is an FH
- Search for names and concepts:
"fh salzburg"
,"mountain bike"
- Well accepted by users
- Needs more advanced index with positional information
Notes:
- How could we implement this?
- Can the current index handle this?
- #1: retrieving more information about information retrieval
- #2: searching and retrieving a book about the search for information
- #3: a book about information
Terms (excluding stop words) | Doc IDs |
---|---|
book | #2:[3], #3:[1] |
information | #1:[2, 3], #2:[5], #3:[2] |
retriev | #1:[1, 4], #2:[2] |
search | #2:[1, 4] |
Notes:
- Audience question
"information retrieval"
-
Fetch postings for each query term:
- information: #1:[2, 3], #2:[5], #3:[2]
- retriev: #1:[1, 4], #2:[2]
- Calculate term pair distances per document,
eg.
retrieval - information
:- #1: retrieving more information about information retrieval * [1, 4] - [2, 3] = -1 != 1
- #1: retrieving more information about information retrieval * [1, 4] - [2, 3] = -2 != 1
- #1: retrieving more information about information retrieval * [1, 4] - [2, 3] = 2 != 1
- #1: retrieving more information about information retrieval * [1, 4] - [2, 3] = 1 → match
Expensive calculation
Notes:
- Can this use proximity regardless of order, e.g., match "retrieval information" as well?
- Can this support phrase gaps, i.e.
information … retrieval
?
Supports phrase gaps: "dwayne johnson"~2
matches dwayne the rock johnson
Notes:
- The most common case is to search for two consecutive words. The intersection algorithm is a bit expensive. Can we speed this up?
- Speed up common phrase queries
- Auxiliary index
- Index term pairs
- Fast lookup of term pairs
#1: "Study at FH Salzburg"
↓
Term | Doc IDs |
---|---|
study at | #1 |
at fh | #1 |
fh salzburg | #1 |
fh salzburg
→ #1
Notes: