Nihotip is a web application designed to help users explore the intricacies of the Japanese language through a dynamic and interactive interface. With a React frontend and a Python backend, Nihotip provides a convenient way to tokenize Japanese text and delve into detailed information about words, symbols, and their respective properties via tooltips. Nihotip offers a robust solution for analyzing Japanese text at multiple levels of granularity.
URL of the published version: https://nihotip.netlify.app
-
Japanese Text Tokenization: Input Japanese text and have it automatically tokenized into words and symbols.
-
Detailed Word and Symbol Insights: Hover over words or symbols to access detailed tooltips that explain the structure, readings, and associated properties of each token.
-
Level-based Token Breakdown: Nihotip organizes tokenized text into multiple hierarchical levels for easy navigation (features of different levels of tokens are listed inside brackets):
- text
- not a japanese word
- punctuation
- space
- line break
- string of not japanese characters
- japanese word (part of speech)
- part by reading
- one or multiple kanji (kana reading -> part by reading)
- digraph
- big kana without tenten
- big kana with tenten
- small kana (respective big kana)
- kana without tenten (romaji, association)
- kana with tenten (respective kana without tenten)
- part by reading
- not a japanese word
- text
-
part by reading:
Parts are gotten by cutting the reading of the word. They allow to determine the kana reading for each kanji. A part consists of multiple characters if the reading of a kanji along with the characters surrounding it can't be cut. For example, the part "大人" of the word "大人買い" uses a special reading "おとな" that can't be cut. That's why the "おとな" reading applies to the whole part.
-
syllable:
- single kana
- digraph
- kana with "っ", "ッ" or "ー"
- single kanji
-
Tooltip insights: Show how readings map to individual characters and provide additional details like romaji and kana associations.
To run the application locally, follow these steps:
To run the application locally, follow these steps:
-
Clone the repository and navigate into the project directory.
-
Set up Environmental Variables:
Create
.env
files in the respective directories with the following content:-
client/.env
Create a file named
client/.env
and add:REACT_APP_BACKEND_URL=http://localhost:3001
-
server/.env
Create a file named
server/.env
and add:PORT=3001 HOST=localhost FRONTEND_URL=http://localhost:3000
-
-
Open two terminal windows and run the following commands in separate terminals:
# Start the frontend (React) cd client npm install npm start
# Start the backend (Python) cd server pip install -r requirements.txt python main.py
-
Open your browser and visit
http://localhost:3000
to start interacting with Nihotip.
-
Multilingual Tooltips: Add the option to choose the language for tooltips to enhance accessibility for non-Japanese speakers.
-
Word Normalization: Implement word normalization for more accurate tokenization results.
-
Notes for Ambiguous Words: Provide detailed notes for words that belong to multiple parts of speech or have different interpretations based on context.
We welcome contributions! If you'd like to contribute to Nihotip, feel free to submit issues or pull requests on the GitHub repository.