Skip to content

How does it work?

Animenosekai edited this page Jan 2, 2021 · 3 revisions

How does it work?

This page explains what is behind the scenes of Erina (if you don't want to dig too much into the code which can be really messy sometimes)

Erina is split up into different components (called APIs) which all do very specific things.
For example, the ErinaTwitter, ErinaDiscord and ErinaLine APIs are the APIs which manage the different available clients while ErinaSearch is here to search things in the database and on the internet.

It uses different caching methods (which are managed by the ErinaCaches API) to provide informations as fast as possible.
All of the caches are stored as .erina files which can be parsed by the new ErinaParser API.

Image Hashing

According to Wikipedia EN

A hash function takes an input as a key, which is associated with a datum or record and used to identify it to the data storage and retrieval application.

Hashing Algorithms are algorithms which transforms an input and outputs something which is formed of the characteristics of the input.
For example, hashing is used in cryptography: It would be dangerous to store passwords in a database without any modifications to them. As soon as someone get hands on the database, all of them would be compromised.
Instead, developers store a representation of the password, which is given by a function, an algorithm and represents this password inside the database.
For example, a very simple but not secure at all hashing function would be to shift all of the letters from one letter in the alphabetical order: password would be then stored as qbttxpse. The malicious hacker can't use this representation of the password to login to other sites for example.

I said that this algorithm isn't secure at all as it can be reversed: Cryptographic Hashing functions cannot be reversed which makes them 100000000x more secure and the practically only solution a hacker have would be to brute-force, test millions of words against the algorithm to see if it matches with what's in the database.

Another use case of those hashing functions is, like in this case with Erina or with Shazam is media recognition.

The principle is the same, you take an image as an input, pass it in an algorithm which does some work and outputs a text representation of it.

Image Hash Representation Found Image Similarity
ErinaHash Example Image. The image comes from Hitoribocchi no Marumaruseikatsu. Description: A girl is holding her school bag and talks with another person to his left, saying "I made a new friend" f8f8e0e0f0f0f0e1 Comparison Result: Found Image (basically the same as the first one without any subtitles) 94.03188121493133%

Erina uses Image Hashing algorithm to identify and compare images without actually reading the whole image file each time they need to be accessed.
Notice the similarity value? This is the result of the computation of the hamming distance between the obtain hashes.
The hamming distance is the difference between two hashes

Image Hashing is used to recognise images with ErinaDatabase but also to cache results from the trace.moe API, SauceNAO API and IQDB.

ErinaSearch

Due to the infrastructure and technical difficulties of such databases, Erina is using ErinaDatabase as a proof-of-concept but is in reality using (98% of the time) other services like trace.moe and SauceNAO which has a gigantic amount of anime stored as more complex hashes. IQDB is also used along with SauceNAO to recognise fan-arts.

All of the hashes are made with the ErinaHash API (all of the hash computations are handled by the ImageHash library)

The .erina file format

.erina is a custom file format (which is a text file format so you can open it with any text editor) which aims to reduce the amount of bugs and parsing errors while reducing the file sizes.

Specifications:

  • A line-break (\n) is used for each new information
  • A colon : followed by a space which are placed at the start of the line separates the key/name of the info (on the left) from the info (on the right)
  • The key of a boolean value can be with a question mark ? instead of a colon :
  • The elements of a list are separated by three colons :::
  • Each information of a key which has multiple information in it are separated by three vertical lines |||
  • Each key of a list of information is between two brackets and there is no colon
  • A docType can be specified at the beginning of the file

Example:

   --- ANILIST CACHE ---   

AniList ID: 7088
Romaji Title: Ichiban Ushiro no Daimaou
English Title: Demon King Daimao
Native Title: いちばんうしろの大魔王
Type: ANIME
Format: TV
Season: SPRING
Year: 2010
Country: JP
Licensed? True
Hentai? False
Genres: Action:::Comedy:::Ecchi:::Fantasy
Alternative Title(s): Ichiban Ushiro no Dai Mao
[STUDIO] True|||8|||Artland|||True
[STUDIO] False|||82|||Marvelous Entertainment|||False
[CHARACTER] MAIN|||31967|||Fujiko Etou|||江藤 不二子
[CHARACTER] MAIN|||31723|||Korone|||ころね
[CHARACTER] MAIN|||31757|||Junko Hattori|||服部 絢子
[STAFF] 2nd Key Animation (eps 1, 7)|||121210|||Mariko Iguchi|||井口真理子
[STAFF] ADR Director|||106631|||Steven Foster|||
[STAFF] Animation Director (ep 6)|||122343|||Tetsuya Wakano|||若野哲也
[streaming link] Episode 1 - Your future occupation is: Demon Lord: http://www.crunchyroll.com/demon-king-daimao/episode-1-your-future-occupation-is-demon-lord-539996
[streaming link] Episode 12 - All Done?: http://www.crunchyroll.com/demon-king-daimao/episode-12-all-done-540018
[external link] Crunchyroll: http://www.crunchyroll.com/demon-king-daimao


Cache Timestamp: 1593970643.762534

.erina files are parsed by the ErinaParser API.

Clients

Twitter

Erina uses the Twitter Stream API along with the Twitter Mentions API to retrieve tweets asking for an anime source in real-time.

Discord

Erina uses Cosine Similarity to understands commands even with a typo in them.

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space.

LINE

Erina uses the LINE Messaging API (a webhook API)

Useful links

Clone this wiki locally