-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for search in odoc
#972
Conversation
Looks nice!
I think it would be better to simply have a This script can modify the dom to add the search interface and define the interaction with the search engine the way it wishes. This abstracts away all the webworker stuff, basically pages are completely agnostic to how search works. This would allow themes to redefine search they way they see it fit.
It looks ok to me. Maybe I would rename the Regarding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a cursory look and left a few comments.
URL.revokeObjectURL(blobUrl); | ||
|
||
return worker; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to the ridiculous world of web programming. Been there. You should add a comment that you are making that dance to enable webworking over the file://
protocol.
src/search/json_display.ml
Outdated
else None | ||
| Constructor { args; res } -> | ||
Some (" : " ^ display_constructor_type args res) | ||
| Field { mutable_ = _; type_; parent_type = _ } -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take the habit of priming ('
) identifiers that conflict with keywords rather than suffixing them with _
these kind of patterns look much less confusing:
| Field { mutable' = _; type'; parent_type = _ } ->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current code, _
is always used to escape keywords. There's one occurrence of module'
, which is a internal function used by module_
.
src/model/semantics.ml
Outdated
incr current_label; | ||
Paths.Identifier.Mk.label | ||
( status.parent_of_sections, | ||
Names.LabelName.make_std ("search_label_" ^ string_of_int !current_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a very nice label. People will start to link to these things so they will show up in URLs. Can we maybe find a shorter scheme and perhaps give a better hint at the kind of element we are linking to ? Maybe something like p%d
, code%d
, math%d
or whatever better scheme you can come up with.
src/xref2/component.mli
Outdated
@@ -432,9 +432,21 @@ and Substitution : sig | |||
end | |||
|
|||
and CComment : sig | |||
type nestable_block_element = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to redefine this here? It looks identical to nestable_block_element
in model/comment.ml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it's because it's referencing Label.t
here.
src/xref2/link.ml
Outdated
(* Only the first element is considered. *) | ||
Comment.synopsis [ e ] | ||
| { value = `Paragraph (_, text); _ } :: _ -> Some text | ||
(* | ({ value = #Comment.nestable_block_element; _ } as e) :: _ -> *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, the Component
version of nestable block comments were identical to the Lang
version. Now, they differ, so we cannot anymore call the Lang
function.
The old code now commented slipped in my staged area, I will clean it.
I wonder if we could extract some of the markdown renderer to allow rendering of the doc comments as markdown (in Search.Render)? It's probably not going to make it much bigger, it can still be indexed and the search tool could then show a preview. |
Thanks for the very prompt review!
I think that it is already working similarly to this:
Before I started using webworkers, the integration between search engine and (I hope I did not miss your point!)
I was hesitant between two approaches:
In this not finished work, it is a bit inbetween, but leaning toward the first option...
I'm not sure how this could work, but that's clearly some bytes saved!
That's also a possibility, that removes the need to store two versions of the docstring (the text one and the html one). But I think I prefer one of the option above: either match exactly the fully rendered version, or be tailored to be showed on a search result. |
I'd suggest you don't even go as far as defining this. Basically you shouldn't have to touch the generator (this should not be needed). Just let
I think this integration should be: include the javascript file(s) of my theme in the generated page.
I would rather lean towards the second one. I would like The reason is that I want to be able to do query term highlighting in them (you don't necessarily show the whole
When you retrieve the result you can just lookup the files via
Your search index will not store two versions of the doc string. It will use the I think it's good to have Having |
Thanks for the explanations! About the "include the javascript files of my theme", I think this would be a useful feature, that I would be keen to implement (but, in another PR). It is much more general than "search". However, I'm not sure that the inclusion of the search bar in the HTML should be done by a script by default. Most of the time I don't think the theme should be in charge of adding HTML elements. When the general layout really has to be modified, there is a In any case (supposing the "include the javascript files of my theme" command exists) the workflow you suggest is still possible: The search bar is not included in the HTML when no About how to render search results: I agree with you that having consistency between how entries are displayed in the search results, and how they are displayed in the generated docs, is an important property to have. So, I will modify this PR so that the |
I'm not keen on this; support files doesn't seem like the right place. For instance, I think I might want to have dune index each package of my workspace separately. In which case, the index looks more like an asset than a support file. PS: having the same option name when one of them expect a path and the other one a "name" (basename?) seems like a mistake. |
The diff is huge in part because of this. What do you think of moving that to a different PR ? That could make the history easier to understand and the whole easier to review. |
But then I don't get the link to the index file no ? |
816839b
to
6f58654
Compare
I rewrote the history and opened a new PR for the addition of IDs to block of texts. If we want to split the conversation even further, I can open other PRs for separate features, such as the I haven't yet implemented the review comments to the main commits ("Adding search support in odoc"). In particular, the way to specify the javascript files to include, and the html in the json index will change. |
I think thats a good principle, but it does work in every case. For instance, we want constructors to be searchable, and the html of constructors is not structured in a good way for that. That means that we want to display the following :
instead of
And I do not think the odoc html generator is flexible enough to provide the little functions that would be necessary to get the desired output, from what I remember there is no "constructor printer", just a typedecl one, and it is not exactly easy to extract the required functions (here the function that prints "goo" is not the typeexpr printer, as "goo" may be an inline record). In my opinion we may try two things :
|
4cf7ff8
to
382e2ea
Compare
Here is a summary of the changes that happened in this PR:
I think that the followin can be reviewed:
The points above consist in all the "user facing part", I think, so having it right is the most important for merging this PR! |
7b1294d
to
5c158d1
Compare
Signed-off-by: Paul-Elliot <[email protected]>
The odoc_search library is meant to be used by external (OCaml) indexers, while the JSON index generation is internal to odoc. Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Adds a "loader" circle when search is performed, and a "no results" entry when there is no results. Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Also, in the driver, pass the correct list of files Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
- Remove now unused added utils function - Simplified some code - Removed formatting changes in otherwise unchanged cppo-ed files - promote mdx tests Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Signed-off-by: Paul-Elliot <[email protected]>
Great! thanks everyone for your work on this! |
CHANGES: ### Added - Add support for external search engines (@panglesd, @EmileTrotignon, ocaml/odoc#972) This includes the generation of an index and the display of the results in the UI (HTML only). - Display 'private' keyword for private type extensions (@gpetiot, ocaml/odoc#1019) - Allow to omit parent type in constructor reference (@panglesd, @EmileTrotignon, ocaml/odoc#933) ### Fixed - Warn and exit when table(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Hint when list(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Fix crash on functors returning an alias (@Julow, ocaml/odoc#1046) - Fix rendering of polymorphic variants (@wikku, @panglesd, ocaml/odoc#971) - Add references to extension declarations (@gpetiot, @panglesd, ocaml/odoc#949) ### Changed - Style: Adjusted line height in the TOC to improve readability (@sorawee, ocaml/odoc#1045) - Style: Remove font fallback to Helvetica, Arial (@Julow, ocaml/odoc#1028) - Style: Preformatted elements fallback to UA monospace (@toastal, ocaml/odoc#967) - Style: Sidebar is now stuck to the left of the content instead of the left of the viewport (@EmileTrotignon, ocaml/odoc#999)
CHANGES: ### Added - Add support for external search engines (@panglesd, @EmileTrotignon, ocaml/odoc#972) This includes the generation of an index and the display of the results in the UI (HTML only). - Display 'private' keyword for private type extensions (@gpetiot, ocaml/odoc#1019) - Allow to omit parent type in constructor reference (@panglesd, @EmileTrotignon, ocaml/odoc#933) ### Fixed - Warn and exit when table(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Hint when list(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Fix crash on functors returning an alias (@Julow, ocaml/odoc#1046) - Fix rendering of polymorphic variants (@wikku, @panglesd, ocaml/odoc#971) - Add references to extension declarations (@gpetiot, @panglesd, ocaml/odoc#949) ### Changed - Style: Adjusted line height in the TOC to improve readability (@sorawee, ocaml/odoc#1045) - Style: Remove font fallback to Helvetica, Arial (@Julow, ocaml/odoc#1028) - Style: Preformatted elements fallback to UA monospace (@toastal, ocaml/odoc#967) - Style: Sidebar is now stuck to the left of the content instead of the left of the viewport (@EmileTrotignon, ocaml/odoc#999)
CHANGES: ### Added - Add support for external search engines (@panglesd, @EmileTrotignon, ocaml/odoc#972) This includes the generation of an index and the display of the results in the UI (HTML only). - Display 'private' keyword for private type extensions (@gpetiot, ocaml/odoc#1019) - Allow to omit parent type in constructor reference (@panglesd, @EmileTrotignon, ocaml/odoc#933) ### Fixed - Warn and exit when table(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Hint when list(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Fix crash on functors returning an alias (@Julow, ocaml/odoc#1046) - Fix rendering of polymorphic variants (@wikku, @panglesd, ocaml/odoc#971) - Add references to extension declarations (@gpetiot, @panglesd, ocaml/odoc#949) ### Changed - Style: Adjusted line height in the TOC to improve readability (@sorawee, ocaml/odoc#1045) - Style: Remove font fallback to Helvetica, Arial (@Julow, ocaml/odoc#1028) - Style: Preformatted elements fallback to UA monospace (@toastal, ocaml/odoc#967) - Style: Sidebar is now stuck to the left of the content instead of the left of the viewport (@EmileTrotignon, ocaml/odoc#999)
CHANGES: ### Added - Add support for external search engines (@panglesd, @EmileTrotignon, ocaml/odoc#972) This includes the generation of an index and the display of the results in the UI (HTML only). - Display 'private' keyword for private type extensions (@gpetiot, ocaml/odoc#1019) - Allow to omit parent type in constructor reference (@panglesd, @EmileTrotignon, ocaml/odoc#933) ### Fixed - Warn and exit when table(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Hint when list(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Fix crash on functors returning an alias (@Julow, ocaml/odoc#1046) - Fix rendering of polymorphic variants (@wikku, @panglesd, ocaml/odoc#971) - Add references to extension declarations (@gpetiot, @panglesd, ocaml/odoc#949) ### Changed - Style: Adjusted line height in the TOC to improve readability (@sorawee, ocaml/odoc#1045) - Style: Remove font fallback to Helvetica, Arial (@Julow, ocaml/odoc#1028) - Style: Preformatted elements fallback to UA monospace (@toastal, ocaml/odoc#967) - Style: Sidebar is now stuck to the left of the content instead of the left of the viewport (@EmileTrotignon, ocaml/odoc#999)
CHANGES: ### Added - Add support for external search engines (@panglesd, @EmileTrotignon, ocaml/odoc#972) This includes the generation of an index and the display of the results in the UI (HTML only). - Display 'private' keyword for private type extensions (@gpetiot, ocaml/odoc#1019) - Allow to omit parent type in constructor reference (@panglesd, @EmileTrotignon, ocaml/odoc#933) ### Fixed - Warn and exit when table(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Hint when list(s) is not closed (@lubegasimon, ocaml/odoc#1050) - Fix crash on functors returning an alias (@Julow, ocaml/odoc#1046) - Fix rendering of polymorphic variants (@wikku, @panglesd, ocaml/odoc#971) - Add references to extension declarations (@gpetiot, @panglesd, ocaml/odoc#949) ### Changed - Style: Adjusted line height in the TOC to improve readability (@sorawee, ocaml/odoc#1045) - Style: Remove font fallback to Helvetica, Arial (@Julow, ocaml/odoc#1028) - Style: Preformatted elements fallback to UA monospace (@toastal, ocaml/odoc#967) - Style: Sidebar is now stuck to the left of the content instead of the left of the viewport (@EmileTrotignon, ocaml/odoc#999)
This PR adds support for search in odoc.
It leaves the actual search to an external search engine, but exposes the information needed for such a search engine to work, and defines the communication between the search engine and odoc’s page.
It is not very polished yet, but close, I open the PR to start the review, and discuss some design choices.
More precisely, here is what the PR is about:
Odoc_model.Fold
module, folding on the values of type inLang
(can be useful to generate a glossary of values/modules/... too).Odoc_search.Entry.t
. A value of this type correspond to an entry in a search index. This type is supposed to stay stable as it can be consumed by search indexes written in OCaml (to still allow for breaking modification of theLang
module). Entries can be printed as a JSON object (sometimes simplified).json_search.ml
) and the json outputed by the search engine (injson_display.ml
)And, in terms of CLI:
compile-index
command, which outputs a JSON file, meant to be read for indexation by external search enginesThe addition of a--search-file <path>
option tosupport-files
. This option copies the given file in the relevant directory.The addition of areplaced by:--search-file <name>
option tohtml-generate
. This option activates the search elements on the generated pages (text input and result area), and includes the loading of thename
file (which has to be included with the corresponding option ofsupport-file
).--search-asset asset_reference
option tocompile
. This option tells to use the given asset to load as a search script.Finally, to be able to test:
You can view the result of running the driver directly here. (The minisearch library sometimes behaves strangely, but it is just an example, for reference).
TODOs/Improvements:
href
needs to be fully knowable after the link stage, for the index to have the right URL). However, headings still have their ID disambiguated at the document level. Need to change that to have the correct URLs for headings in the search index.Add labels to basic text block (such as paragraphs and code blocks) #974 was opened, but the decision was to start simple: just link to the page containing the paragraph, for now
Warning, resolved hidden path: Base__.Set_intf.Named.t
being output when indexing.The same warning appear on
html-generate
of the culprit documents. Not introduced in this PR, and not a big deal. Considered done.Entry.t
and generates html, and which does not depend onOdoc_html
.I consider this "done": The code has been improved, and the CSS is satisfying to me (see this comment)
Odoc_search.Entry.t
are the same.I think there is enough info in the JSON entry. Marking it as done.
I think the test is now sufficient.The
compile-index
is almost immediate onodoc
+ all of its dependencies. The efficiency of the search engine is not part of this PR.Separating the search engine from
odoc
As hinted above, the actual search engine is supposed to be external from
odoc
.It can generate its database from a JSON file (whose format is defined int the
src/search/json_search.ml
file), or by usingodoc
as a library.It should a javascript file that will be run as a webworker. This webworker should listen to"webworker messages", each message being a query (given as plain string) to search for.
The result of a queried search must be an array of json object of type defined in
/src/search/json_display.ml
. An odoc-generated value for each entry is included in the input of the search engine, which is welcome to modify it/generate its own, for instance to highlight the reason this result was chosen.As the search script is in a webworker, the result of the search has to be sent back to the main thread via
postMessage
, andodoc
will render the results.This allows to have different scenarios:
dune build @doc
could use that.index.js
file simply does a request to an API. Can search on much bigger dbs, using any search engine (elasticsearch, a custom OCaml-specific search engine, ...).odoc
could also depend on a blessed search engine (sherlodoc
), to allow for a more "out-of-the-box" search support.Sherlodoc
Sherlodoc has already been modified by @EmileTrotignon to use the functionality added by this PR to generate the sherlodoc search index.
It can be compiled to javascript using jsoo, to have client-side search on single libraries, or be used server-side for bigger indexes.
Compared to the current web version (doc.sherlocode.com), the modified version also search inside doc comments, and is not restricted to values (it can search module, module types, constructors, ...).
The corresponding branch is jsoo-compat.
Pinging also @art-w !
Data to build the search index
The data to build the search index can be read in the JSON file generated by the
compile-index
command. However, the code is made so that it is also easy to useodoc
as a library, and use values of typeOdoc_search.Entry.t
(that you get by combiningOdoc_model.Fold
withOdoc_search.Entry.of_{value/type/...}
).Each have its advantages and use-case:
Entry.t
format encodes slightly more information, and is more practical to use by OCaml programs.The
Entry.t
type is defined inOdoc_search.Entry.t
. The JSON format is defined inOdoc_search.Json_search
. An example of a JSON entry would be:It contains:
extra
field containing:src/search/json_output.ml
display
field, corresponding to the JSON value to return in case of inclusion of this entry in the search results:url
to the entry (relative to the base of the compilation unit, which is known in the page executing the search)html
field, containing how the html to use to display the entry in the search result.I am not sure the intermediate representation of a search entry,
Odoc_search.Entry.t
, is a good design. I would be curious about your opinion!Running as a webworker
Running as a webworker is nice to isolate the search engine, and prevent it to freeze the UI at any point.
The browsers interpretation of the CORS origin policy prevents to run webworkers from javascript files fetched from the
file://
protocol. I used a hack to go around this restriction.