Skip to content

Commit

Permalink
feat: dictionary as language module (#1185)
Browse files Browse the repository at this point in the history
* feat: SpellChecker Dictionary as a language module

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: morfologik speller: use dictionary from language module without copy to a file system

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: update ISpellCheckerDictionary to support legacy hunspell checker

- Add API installHunspellDictionary(Path dictionaryDir)
- Support it on AR, DA and FR
- Provide bundled dictionary on FR

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: hunspell checker use language-module's dictionary

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: add abstract impl classes for spell dictionary

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: implement spell checker dictionaries

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: test case expectation for language-module-km spell dict

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: dutch spell dictionary class

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: typo of dutch spell dictionary class path

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: typo of russian spell dictionary language

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: Abstract*Dictionary to detect supported language

- Accept given langauge code e.g.,"sk" is contains in supported e.g.,"sk_SK"

Signed-off-by: Hiroshi Miura <[email protected]>

* Fix: update supported short language code

Signed-off-by: Hiroshi Miura <[email protected]>

* Fix: slovenian LT dependency and language code

Signed-off-by: Hiroshi Miura <[email protected]>

* Fix: Swedish module LT language dependency

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: sv: test with full sv_SE language specifier

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: Tagalog and Tamil spell dictionary

Signed-off-by: Hiroshi Miura <[email protected]>

* style: copyright header of tagalog module

Signed-off-by: Hiroshi Miura <[email protected]>

* fix: improve test for hunspell and de modules

Signed-off-by: Hiroshi Miura <[email protected]>

* docs: introduce developer manual to create spell-check dictionary plugin

Signed-off-by: Hiroshi Miura <[email protected]>

* docs: update user manual

- explain the folder for user CUSTOM spelling dictionary
- explain the language module will install the spelling dictionary when necessary.
- update explaination of the spelling preference

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: update preferences view of SpellChecker

- Remove URL box and install/uninstall buttons
- Remove DictionaryInstallerDialog
- clean bundles

Signed-off-by: Hiroshi Miura <[email protected]>

* feat: list spelling dictionary from language modules

- Extend SpellCheckerManager to return supported languages
- Update DictionaryManager#getLocalDictionaryCodeList to return languages from language modules

Signed-off-by: Hiroshi Miura <[email protected]>

* Update OmegaT_Preferences.xml

@Kazephil could you check my modifications please?
@miurahr I don’t think we should refer to the developer manual here. We have not done so for other parts of OmegaT. I don’t oppose that, of course, but we need to think about how to do that.

* Rewording of the paragraph on spelling dictionaries

- I tried to reword the paragraph to flow more smoothly. Let me know if anything seems off.

- I agree with Jean-Christophe about the developer manual reference, so I simply deleted it here. We will have to give some thought about how and where we can best make that information available.

* docs: update developer manual

- fix section levels
- update overview section

Signed-off-by: Hiroshi Miura <[email protected]>

---------

Signed-off-by: Hiroshi Miura <[email protected]>
Co-authored-by: Jean-Christophe Helary <[email protected]>
Co-authored-by: kazephil <[email protected]>
  • Loading branch information
3 people authored Dec 5, 2024
1 parent 6523a16 commit 4175085
Show file tree
Hide file tree
Showing 164 changed files with 1,261,739 additions and 1,064 deletions.
3 changes: 2 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -373,7 +373,6 @@ dependencies {
testImplementation(libs.assertj)
testImplementation(libs.bundles.xmlunit)

testImplementation(project(":language-modules"))
testImplementation(libs.languagetool.server) {
exclude module: "logback-classic"
}
Expand Down Expand Up @@ -1592,6 +1591,8 @@ test {
systemProperty 'java.awt.headless', 'true'
}
systemProperty 'java.util.logging.config.file', "${rootDir}/config/test/logger.properties"
// some test case depends on modules from subproject
dependsOn subprojects.collect { it.tasks.named('jar') }
}

tasks.register('testIntegration', JavaExec) {
Expand Down
2 changes: 1 addition & 1 deletion doc_src/en/App_ConfigurationFolder.xml
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@
<varlistentry id="configuration.folder.extra.contents.spelling">
<term id="configuration.folder.extra.contents.spelling.title">spelling/</term>
<listitem>
<para>This folder contains your spelling dictionaries. See the <link
<para>This folder contains your custom spelling dictionaries. See the <link
linkend="dialog.preferences.spellchecker"
endterm="dialog.preferences.spellchecker.title"/> preferences for
details.</para>
Expand Down
5 changes: 3 additions & 2 deletions doc_src/en/Menus_Tools.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@

<listitem>
<para><guilabel>Spelling Issues</guilabel> (optional): detects
spelling mistakes. Only works if a spelling dictionary is
installed. See the <link linkend="dialog.preferences.spellchecker"
spelling mistakes. It works when there is a supported language module,
or a user local custom dictionary for the target language.
See the <link linkend="dialog.preferences.spellchecker"
endterm="dialog.preferences.spellchecker.title"/> preferences for
details.</para>
</listitem>
Expand Down
6 changes: 6 additions & 0 deletions doc_src/en/OmegaT_Preferences.xml
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,12 @@
dictionaries. It is generally in the <link
linkend="configuration.folder" endterm="configuration.folder.title"/>
folder.</para>
<para>If the OmegaT language module for your project target language provides
a Hunspell dictionary, it will automatically install it in this folder when you
enable spell checking. If the module provides a Morfologik dictionary,
spellchecking works without having to install a dictionary in the folder.</para>
<para>You can put a dictionary in this folder to a use custom spelling dictionary,
or if your project target language is not covered by an OmegaT language module.</para>
</listitem>
</varlistentry>

Expand Down
239 changes: 239 additions & 0 deletions docs_devel/docs/52.HowToSpellCheckDictionary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# How to publish a spell check dictionary as plugin

## Overview

OmegaT provides a feature for translators to check their translations using spell-check dictionaries.
Developers can enhance this functionality by creating custom spell-check dictionary plugins.
These plugins must implement the `ISpellCheckDictionary` interface, which defines the necessary methods
to integrate with OmegaT. OmegaT also provides abstract classes for plugins.
There are ``AbstractHunspellDictionary`` and ``AbstractMorfologikDictinonary`` abstract classes.

This document provides guidance on creating a plugin, explaining the purpose of the interface methods and abstract
methods,
and giving practical tips for implementation.

## `ISpellCheckDictionary` Interface

The `ISpellCheckDictionary` interface defines methods for integrating custom spell-check dictionaries.
Implementing this interface allows your plugin to support different dictionary types, such as Hunspell and Morfologik.

```java
public interface ISpellCheckerDictionary extends Closeable {
/**
* Get Hunspell dictionary.
*
* @return Dictionary object when the language module has. Otherwise, null.
*/
default org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) {
return null;
}

/**
* Get Morfologik dictionary.
*
* @return Dictionary object when the language module has. Otherwise, null.
*/
default morfologik.stemming.Dictionary getMorfologikDictionary(String language) {
return null;
}

default Path installHunspellDictionary(Path dictionaryDir, String language) {
return null;
}

/**
* Get a dictionary type.
*
* @return type of dictionary. If the module provides nothing, return null.
*/
SpellCheckDictionaryType getDictionaryType();
}
```

### Method Descriptions

1. **`getHunspellDictionary(String language)`**

- **Purpose:** Provides access to a Hunspell dictionary for the specified language.
- **Return Value:**
- A `Dictionary` object if the language module supports Hunspell.
- `null` if Hunspell is not supported.

**Example Usage:**
```java
@Override
public org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) {
// Load and return the Hunspell dictionary for the specified language.
}
```

2. **`getMorfologikDictionary(String language)`**

- **Purpose:** Provides access to a Morfologik dictionary for the specified language.
- **Return Value:**
- A `Dictionary` object if the language module supports Morfologik.
- `null` if Morfologik is not supported.

**Example Usage:**
```java
@Override
public morfologik.stemming.Dictionary getMorfologikDictionary(String language) {
// Load and return the Morfologik dictionary for the specified language.
}
```

3. **`installHunspellDictionary(Path dictionaryDir, String language)`**

- **Purpose:** Installs a Hunspell dictionary for the specified language in a given directory.
- **Parameters:**
- `dictionaryDir`: The directory where the dictionary will be installed.
- `language`: The language code for the dictionary.
- **Return Value:**
- The path to the installed dictionary.
- `null` if installation is not supported.

**Example Usage:**
```java
@Override
public Path installHunspellDictionary(Path dictionaryDir, String language) {
// Logic to download or copy the Hunspell dictionary into dictionaryDir.
}
```

4. **`getDictionaryType()`**

- **Purpose:** Specifies the type of dictionary supported by the plugin.
- **Return Value:**
- A `SpellCheckDictionaryType` enum value, e.g., `HUNSPELL`, `MORFOLOGIK`.
- `null` if no dictionary type is provided.

**Example Usage:**
```java
@Override
public SpellCheckDictionaryType getDictionaryType() {
return SpellCheckDictionaryType.HUNSPELL;
}
```

### Example Implementation

```java
public class MyHunspellDictionaryPlugin implements ISpellCheckDictionary {

@Override
public org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) {
// Load and return the Hunspell dictionary for the specified language.
return new org.apache.lucene.analysis.hunspell.Dictionary(...);
}

@Override
public SpellCheckDictionaryType getDictionaryType() {
return SpellCheckDictionaryType.HUNSPELL;
}

@Override
public void close() {
// Cleanup resources if needed.
}
}
```

## Creating a Hunspell Spell-Check Dictionary Plugin

OmegaT provides an abstract class, `AbstractHunspellDictionary`, to simplify the process of implementing
a Hunspell-based spell-check dictionary. Developers can use this class to create plugins that support specific
languages by implementing a minimal set of methods.

This document provides guidance on using `AbstractHunspellDictionary`, including method descriptions,
implementation steps, and a complete example for a Catalan Hunspell dictionary.

### Abstract Class: `AbstractHunspellDictionary`

The `AbstractHunspellDictionary` class implements the `ISpellCheckDictionary` interface and
includes additional utilities for managing Hunspell dictionaries. Developers need to subclass this abstract class
and implement its key methods to provide language-specific dictionary support.

### Key Features of `AbstractHunspellDictionary`

1. **Dictionary Management**
- Locates and loads Hunspell `.aff` and `.dic` files.
- Provides access to the Hunspell dictionary for a given language.

2. **Helper Methods**
- **`protected abstract String[] getDictionaries()`**
- Returns the list of supported language codes for the dictionary.
- **`protected String getDictionary(String language)`**
- Finds the appropriate dictionary for a given language.
- **`protected abstract InputStream getResourceAsStream(String resource)`**
- Retrieves the dictionary resource stream.

3. **Predefined Implementation of `ISpellCheckDictionary` Methods**
- **`getHunspellDictionary(String language)`**
- Loads the Hunspell dictionary for the specified language.
- **`installHunspellDictionary(Path dictionaryDir, String language)`**
- Installs the Hunspell dictionary files in a specified directory.
- **`getDictionaryType()`**
- Returns `SpellCheckDictionaryType.HUNSPELL`.
- **`close()`**
- Closes any open streams to release resources.


### Implementation Steps

1. **Subclass `AbstractHunspellDictionary`**
- Create a new class extending `AbstractHunspellDictionary`.

2. **Implement Required Methods**
- Define the supported language codes in `getDictionaries()`.
- Provide logic to retrieve resource streams for dictionary files in `getResourceAsStream(String resource)`.

3. **Package the Implementation**
- Include your dictionary files (`.aff` and `.dic`) in the project resources directory.
- Package the implementation class as a plugin (e.g., a JAR file).


## Example: Catalan Hunspell Dictionary

Below is a complete implementation of a Catalan Hunspell dictionary plugin using `AbstractHunspellDictionary`.

### Dictionary Files

Ensure the following files are placed in the `resources` directory:
- `ca.aff`
- `ca.dic`

### Implementation

```java
public class CatalanHunspellDictionary extends AbstractHunspellDictionary {

// Supported language codes
private static final String[] HUNSPELL = { "ca" };

/**
* Provides the list of supported languages.
* @return an array of language codes.
*/
@Override
protected String[] getDictionaries() {
return HUNSPELL;
}

/**
* Retrieves the resource stream for a given dictionary file.
* @param resource the resource file name.
* @return an InputStream for the resource.
*/
@Override
protected InputStream getResourceAsStream(final String resource) {
return getClass().getResourceAsStream(resource);
}
}
```

## Conclusion

The `AbstractHunspellDictionary` class reduces the complexity of implementing Hunspell dictionaries.
By following the steps and using the provided example, developers can quickly create plugins for specific languages.
By implementing the `ISpellCheckDictionary` interface, developers can extend OmegaT’s functionality,
enabling support for additional spell-checking languages or dictionary types.
7 changes: 6 additions & 1 deletion docs_devel/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@

* [How to write an OmegaT script](51.HowToWriteScript.md)

## Spellchecker dictionary

* [How to publish spell checker dictionary as plugin](52.HowToSpellCheckDictionary.md)


## Details of features

* [Editor pane Key assigns](80.EditorKeys.md)
Expand All @@ -67,4 +72,4 @@
* [Release procedure](90.ReleaseProcedure.md)
* [Code Signing How-to](92.CodeSigning.md)
* [Building installer](93.BuildingInstallerPackage.md)
* [Appendix](91.appendix.md)
* [Appendix](91.appendix.md)
10 changes: 9 additions & 1 deletion language-modules/ar/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,19 @@ dependencies {
implementation(libs.languagetool.ar) {
exclude module: 'languagetool-core'
}
compileOnly(libs.lucene.analyzers.common)
}
testImplementation(libs.junit4)
testImplementation(libs.assertj)
testImplementation(testFixtures(project.rootProject))
testImplementation(libs.commons.io)
testImplementation(libs.languagetool.core)
testImplementation(project(":spellchecker:hunspell"))
testRuntimeOnly(libs.commons.io)
}

test {
dependsOn jar
dependsOn project(":spellchecker:hunspell").tasks.jar
}

jar {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* OmegaT - Computer Assisted Translation (CAT) tool
* with fuzzy matching, translation memory, keyword search,
* glossaries, and translation leveraging into updated projects.
*
* Copyright (C) 2023-2024 Hiroshi Miura
* Home page: https://www.omegat.org/
* Support center: https://omegat.org/support
*
* This file is part of OmegaT.
*
* OmegaT is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* OmegaT is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>.
*/
package org.omegat.languages.ar;

import java.io.InputStream;

import org.languagetool.JLanguageTool;

import org.omegat.core.spellchecker.AbstractHunspellDictionary;

public class ArabicHunspellDictionary extends AbstractHunspellDictionary {

private static final String DICTIONARY_BASE = "/org/languagetool/resource/ar/hunspell/";
private static final String[] LANG = {"ar"};

@Override
protected String[] getDictionaries() {
return LANG;
}

@Override
protected InputStream getResourceAsStream(final String resource) {
return JLanguageTool.getDataBroker().getAsStream(DICTIONARY_BASE + resource);
}
}
Loading

0 comments on commit 4175085

Please sign in to comment.