-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minor: poem titles as search results are coming up under previous poem title #54
Comments
@queryluke could this just be a matter of mislabeling, or is it deeper? |
It's deeper. When erdman xml is parsed it creates Page json objects. These objects have a "headings" attribute which is a nested array of headings ids. For page 33, the heading attribute is:
b1.6 is For Children because there is a little bit of this poem on page 33 So this is a bit of a conundrum. Right now the script always selects the first 2nd level header in the list (in this case b1.6), I can switch it to always accept the LAST 2nd level header (b1.7), but then a search for "mother sister" (the last line of Children) would show the result under Marriage. Fixing this on the javascript side is nearly impossible and ugly. So it's something you'll want to discuss with Nathan. I'm sure he'll have his own ideas on how to fix it. |
ok, gonna assign to nathan |
It isn't clear exactly what the desired behavior is here. As Luke mentioned, the page title is set to the first header. I can set it to the last header, or the second header (if there is more than one). |
The title in the results should be the poem title of the poem/work that contains the line |
The information isn't stored that way. Pages titles are mapped to headers in a one to one relation. If you want I can stuff all the headers in the mapping, and you can write javascript to pick the one that should actually be displayed. |
The number of titles per page is inconsistent, so choosing first or second would be arbitrary. Basically the result should correspond to the actual poem it's in |
Ok, I'll confer with joe |
had a look more closely. i'm not sure what to say. if someone searches "marriage heaven hell" and the title of the poem "The Marriage of Heaven and Hell" is a result, then that is what should show as the header of the result, not the previous poem's title. i understand the issue in the code, but we do need to fix it. i'm not sure what you mean by stuffing all the headers in the mapping--i haven't looked at the code closely--but if you did that, how would we select the right one in javascript? we'd have to do another mini search in the javascript? is there a way to detect a result coming from a poem/work title and then use that title as the result header? |
The problem here stems from the fact that your unit of data is a page, but your desired unit of search results is not. In my opinion the best option is not to use the page heading in the search results, but instead use the page number. That is technically correct and avoids confusion. Probably the most direct way to get the behavior you want is if you do a javascript search on the page for the relevant text, then work backwards in the dom from that text node to the previous heading, which you then use for the title. Any other option would require completely redoing how data is stored in solr, which basically would involve rewriting the entire application. |
we can't use the page number because then the results wouldn't amount to a proper concordance and the information conveyed would be a lot less useful. we'll have to go the javascript way. @queryluke, is this the solution in the javascript that you were thinking of? |
@nathan-rice i wanted to remind you of this issue. joe v. just pointed out another instance of it. search "sin". the second result under THE [ FIRST ]BOOK OF URIZEN is actually a line in THE BOOK of AHANIA, which comes after THE [ FIRST ]BOOK OF URIZEN |
Page 84 occurs under both the Urizen and Ahania headings due to the structure of the XML. Currently the javascript groups query result text by heading, using the first heading on the page. As a result, though it is in Ahania, the heading for the result is Urizen. Changing this behavior to fix this (for example, by taking the last heading) will just break other cases. The best solution is to move the |
search 'marriage heaven hell'
notice that the first result (which is for the title of the Marriage of Heaven and Hell) is under For Children the Gates of Paradise, which is the poem before it
The text was updated successfully, but these errors were encountered: