THE PROGRAMMING LANGUAGE

Scala has multiple assets that make it a good language for this small project: immutability and surety, functional paradigm, lambdas, parallel programming features, object oriented programming, portability, etc.

COMPILING AND RUNNING

Prerequisites

To compile and run the project, the best way is to use sbt, the Scala build tool. The project is written in Scala 2.13.1 and was tested it with sbt 1.2.8 but any of the greater should do the work.

You can download sbt from Scala website

Installing sbt is the best way to install Scala as pointed out by the Scala download page

If you have not installed sbt before, the installation will probably take some time in order to deploy all the dependencies on your system.

Compiling, packaging, running

To check if the project compile from the command line, go to the root of project with the command line and enter:

$sbt compile

To run the project direclty from sbt in the root folder, enter:

$sbt run

To produce a jar from sbt in the root folder and run the tests, enter:

$sbt assembly

Then, navigate to the project folder and execute the jar with Scala:

$ cd target/scala-X-XX
$ scala search-assembly-X-X.jar path/of/your/folder/of/txt/files

If everything has gone well you should see the following prompt:

Working dir: /path/to/the/project/target/scala-2.13
Please enter a command -->>
      | load: ## Load all resources from the resources folder, no parameters.
      | index: name path ## New document from file.
      | search: List, of, strings ## Exact match. None alphanumeric chars are separator.
      | fuzzy: List, of, strings ## Fuzzy search. Based on Levenshtein distance.
      | quit: ## Quit the app.
      |

If you choose this method to run the program, then you will have to set your own folder. The default folder of the app is for test purpose and work only when executed from sbt at the root folder of the project (or from an IDE).

To set and load the default resources folder then execute from the target/scala.X.XX folder

$ cd target/scala.X.XX
$ scala search-assembly-X-X.jar ../../resources
$
$ Log...

USAGE

The CLI only understand and execute 5 commands:

load : Will load the files from the folder passed as first parameter to the application. This is normally done by default at startup time, but it is possible to re-index
index : index a new document placed at a specific path.
search: Search a list of words with an exact match.
fuzzy : Proceed a fuzzy search. This may take a long time if one of the word in the given list is complex. For small words, it should be relatively usable.
quit : Quit the program.

A help prompt is given when a command fails.

GLOBAL DESIGN

The global design is quite simple. The entry point is in the class com.search.Search, then CLI is executed and can be found at com.search.api.CLI

The CLI interact with the engine described at com.search.engine.Engine which interact with model.

The model can be found at com.search.model and contains three classes:

SearchTree: a generic binary search tree
Token: generic wrapper that hold the data manipulated by the tree
StringItem: a specialization of the Token class dedicated to String. It has been designed to use Java String.

A very simple sequence diagram of a search would look like this:

CLI --> Engine --> SearchTree --> Token

Normally, only the engine is able to interact with the SearchTree and Token objects. The CLI would be the view in a MVC, the engine would be the controller, and the classes in the model package, the model.

CODE READING

The code is documented and contains some notes for further improvements. The override annotated methods are generally not commented.

Tests are given in order to provide clues on how the API of each class is intended to be used.

PERSONAL NOTES:

Implementing a good search engine is not a trivial thing, and I will not pretend that this program is well-designed. It will surely not be suitable for production purposes and is largely perfectible.

Some clues are (for an in memory storage):

Reimplement the search tree to a red / black tree,
Change the approach of proceeding a search (scalability, security, etc.),
Add reference object to a document for each indexed token,
Refactor the engine to proceed search on a global index,
Implement a command pattern for the API package,
Implement a sharding algorithm for distributed search,
And much more...

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

THE PROGRAMMING LANGUAGE

COMPILING AND RUNNING

Prerequisites

Compiling, packaging, running

USAGE

GLOBAL DESIGN

CODE READING

PERSONAL NOTES:

About

Releases

Packages

Languages

alexandre-cremieux/smallsearch

Folders and files

Latest commit

History

Repository files navigation

THE PROGRAMMING LANGUAGE

COMPILING AND RUNNING

Prerequisites

Compiling, packaging, running

USAGE

GLOBAL DESIGN

CODE READING

PERSONAL NOTES:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages