AutoText is a small autocomplete and autocorrect program written in Java 8. I wanted to create this project to use and understand new data structures.
A Trie is an efficient data structure for prefix searching, therefore, it is widely used in autocomplete programs.
A BK-tree data structure is a metric tree designed for efficient string matching, making it a logical structure for autocorrect. In my implementation, I used Damerau–Levenshtein distance to calculate edit distance between strings.
Additionally, I created a simple GUI using Java Swing to go along with the program. I used a list from @first20hours (20k.txt) to create the lexicons for the tree structures (I have not uploaded that file into this repo).
Please look at the javadoc comments for a more detailed usage description.
Here is an example of basic usage.
import main.java.kashiish.autotext.AutoText;
import java.util.ArrayList;
public static void main(String[] args) {
//Create a new AutoText instance with a file of words to build a Trie, BKTree,
//and lexicon for word validation
AutoText autotext = new AutoText(lexiconFileName);
/*
* Or you can use different dictionaries for each set up.
* AutoText autotext = new AutoText(lexiconFileName, trieFileName, bktreeFileName);
*/
ArrayList<String> corrections = autotext.autocorrect("lovly");
//set max autocomplete suggestions
autotext.setMaxSuggestions(3);
ArrayList<String> suggestions = autotext.autocomplete("ques")
System.out.println(corrections);
System.out.println(suggestions);
}
["lovely"]
["quest", "question", "questions"]
Please feel free to report or fix any bugs you may find in the program. It's greatly appreciated!
Current issues:
- the NPath complexity of the method that calculates the distance between strings is very high.
MIT