minor: poem titles as search results are coming up under previous poem title #54

ghost · 2016-12-08T19:25:16Z

search 'marriage heaven hell'

notice that the first result (which is for the title of the Marriage of Heaven and Hell) is under For Children the Gates of Paradise, which is the poem before it

ghost · 2016-12-10T13:25:56Z

@queryluke could this just be a matter of mislabeling, or is it deeper?

queryluke · 2016-12-10T17:52:02Z

It's deeper. When erdman xml is parsed it creates Page json objects. These objects have a "headings" attribute which is a nested array of headings ids. For page 33, the heading attribute is:

"headings":["[[\"b1\", [[\"b1.6\", []], [\"b1.7\", [[\"b1.7.1\", []]]]]]]"],

b1.6 is For Children because there is a little bit of this poem on page 33
b1.7 is Marriage

So this is a bit of a conundrum. Right now the script always selects the first 2nd level header in the list (in this case b1.6), I can switch it to always accept the LAST 2nd level header (b1.7), but then a search for "mother sister" (the last line of Children) would show the result under Marriage.

Fixing this on the javascript side is nearly impossible and ugly. So it's something you'll want to discuss with Nathan. I'm sure he'll have his own ideas on how to fix it.

ghost · 2016-12-10T18:10:28Z

ok, gonna assign to nathan

nathan-rice · 2017-04-21T23:22:38Z

It isn't clear exactly what the desired behavior is here. As Luke mentioned, the page title is set to the first header. I can set it to the last header, or the second header (if there is more than one).

ghost · 2017-04-21T23:45:43Z

The title in the results should be the poem title of the poem/work that contains the line

nathan-rice · 2017-04-21T23:48:41Z

The information isn't stored that way. Pages titles are mapped to headers in a one to one relation. If you want I can stuff all the headers in the mapping, and you can write javascript to pick the one that should actually be displayed.

ghost · 2017-04-21T23:48:46Z

The number of titles per page is inconsistent, so choosing first or second would be arbitrary.

Basically the result should correspond to the actual poem it's in

ghost · 2017-04-21T23:51:32Z

Ok, I'll confer with joe

ghost · 2017-04-22T00:56:23Z

had a look more closely. i'm not sure what to say. if someone searches "marriage heaven hell" and the title of the poem "The Marriage of Heaven and Hell" is a result, then that is what should show as the header of the result, not the previous poem's title. i understand the issue in the code, but we do need to fix it. i'm not sure what you mean by stuffing all the headers in the mapping--i haven't looked at the code closely--but if you did that, how would we select the right one in javascript? we'd have to do another mini search in the javascript? is there a way to detect a result coming from a poem/work title and then use that title as the result header?

nathan-rice · 2017-04-22T01:25:54Z

The problem here stems from the fact that your unit of data is a page, but your desired unit of search results is not.

In my opinion the best option is not to use the page heading in the search results, but instead use the page number. That is technically correct and avoids confusion.

Probably the most direct way to get the behavior you want is if you do a javascript search on the page for the relevant text, then work backwards in the dom from that text node to the previous heading, which you then use for the title. Any other option would require completely redoing how data is stored in solr, which basically would involve rewriting the entire application.

ghost · 2017-04-22T01:49:34Z

we can't use the page number because then the results wouldn't amount to a proper concordance and the information conveyed would be a lot less useful.

we'll have to go the javascript way. @queryluke, is this the solution in the javascript that you were thinking of?

ghost · 2017-06-20T16:39:13Z

@nathan-rice i wanted to remind you of this issue. joe v. just pointed out another instance of it. search "sin". the second result under THE [ FIRST ]BOOK OF URIZEN is actually a line in THE BOOK of AHANIA, which comes after THE [ FIRST ]BOOK OF URIZEN

nathan-rice · 2017-06-23T00:36:36Z

Page 84 occurs under both the Urizen and Ahania headings due to the structure of the XML. Currently the javascript groups query result text by heading, using the first heading on the page. As a result, though it is in Ahania, the heading for the result is Urizen.

Changing this behavior to fix this (for example, by taking the last heading) will just break other cases. The best solution is to move the <pb page="#"> outside the <div2> containing Urizen. There isn't really a good solution to this problem given the current data model, and I doubt the problem is a big enough deal to warrant overhauling that.

ghost assigned nathan-rice Dec 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minor: poem titles as search results are coming up under previous poem title #54

minor: poem titles as search results are coming up under previous poem title #54

ghost commented Dec 8, 2016

ghost commented Dec 10, 2016

queryluke commented Dec 10, 2016

ghost commented Dec 10, 2016

nathan-rice commented Apr 21, 2017

ghost commented Apr 21, 2017

nathan-rice commented Apr 21, 2017

ghost commented Apr 21, 2017

ghost commented Apr 21, 2017

ghost commented Apr 22, 2017

nathan-rice commented Apr 22, 2017

ghost commented Apr 22, 2017

ghost commented Jun 20, 2017

nathan-rice commented Jun 23, 2017

minor: poem titles as search results are coming up under previous poem title #54

minor: poem titles as search results are coming up under previous poem title #54

Comments

ghost commented Dec 8, 2016

ghost commented Dec 10, 2016

queryluke commented Dec 10, 2016

ghost commented Dec 10, 2016

nathan-rice commented Apr 21, 2017

ghost commented Apr 21, 2017

nathan-rice commented Apr 21, 2017

ghost commented Apr 21, 2017

ghost commented Apr 21, 2017

ghost commented Apr 22, 2017

nathan-rice commented Apr 22, 2017

ghost commented Apr 22, 2017

ghost commented Jun 20, 2017

nathan-rice commented Jun 23, 2017