Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use offline? #33

Open
cassidyjames opened this issue Jul 12, 2022 · 4 comments
Open

Use offline? #33

cassidyjames opened this issue Jul 12, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@cassidyjames
Copy link

Quick Lookup currently required an Internet connection to look words up on Wiktionary. However, it would greatly expand the utility of Quick Lookup if it were usable offline. For example, an offline dictionary app would be great on Endless OS for schools with limited or no Internet connectivity.

@johnfactotum
Copy link
Owner

As stated in the readme, I don't plan on adding support for other dictionaries (online or offline). The app is deliberately kept as simple as possible (currently a single script with ~600 LOC) for a very simple and narrow use case.

That said, I guess it would make sense to add some support for offline dictd and StarDict dictionaries in the same way Foliate does now. After all, Quick Lookup is basically a spin-off of Foliate's dictionary feature.

@johnfactotum johnfactotum added the enhancement New feature or request label Jul 13, 2022
@da2x
Copy link

da2x commented Aug 11, 2022

Wiktionary can be used offline. Let’s see what would be required.

  1. Select a language.
  2. Prompt user for their preference whether to make online lookups or offline. Warn that offline requires a large download (~ 1 GiB) and installation size (~7 GiB).
  3. Fetch, uncompress, and store https://dumps.wikimedia.org/${language_code}wiktionary/latest/${language_code}wiktionary-latest-pages-articles.xml.bz2
  4. The dictionary file is just one giant XML file. For performance reasons, it would need be preprocessed into something more useful. I believe that importing it into an SQLite database is probably the easiest option. That process could be time-consuming but it’s a one-time operation. At the end we’d have an indexed, searchable, and memory-efficient way to interact with the. The user should probably be prompted to update their dictionary once every 6 months. The user could still use the application in online-mode while offline mode is prepared asynchronously.
  5. Query the SQLite database by title instead of the online search API.

It’s quite a bit of work, but doable.

@johnfactotum
Copy link
Owner

Apart from the size, one major problem is that the data is in wikitext, so it's very far from using the definition API. It would require a wikitext parser and convertor. It seems there are also Enterprise HTML Dumps, but it would still require manually parsing the HTML, so it's not a direct replacement of the API. Whether it's wikitext or HTML, it would take a long time just to parse the dump.

I think the better approach would be to support StarDict and DICTD, like Foliate does. Then use something like https://github.com/BoboTiG/ebook-reader-dict to pre-generate selected mono- or bilingual dictionaries in those formats. Their en-en dictionary, for example, is only ~30 MB. That would be far better than downloading the whole dump.

@BoboTiG
Copy link

BoboTiG commented Feb 3, 2023

Note about https://github.com/BoboTiG/ebook-reader-dict, we publish StarDict, and DictFile, for a few months now. They are generated every day alongside the Kobo DictHTML one. Have a look at the English dictionaries for example: https://github.com/BoboTiG/ebook-reader-dict/blob/master/docs/en/README.md.

Wee also propose etymology-free versions, which are smaller in size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants