Skip to content

Commit

Permalink
docs: add schema.md
Browse files Browse the repository at this point in the history
  • Loading branch information
johnhooks committed Dec 15, 2023
1 parent 56cd74e commit 43940e2
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 0 deletions.
14 changes: 14 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Architecture

- Tokenization / Lexer
- HTML tokenization
- Stemming
- Index
- Repository
- Cron
- Ranking
- Query
- Parser / Lexer
- Repository
- Stop Words
- Config
78 changes: 78 additions & 0 deletions docs/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Schema

The database base structure used to index WordPress documents for search.

## Document

Has many words through the `document_word` table.

Has many meta words through the `document_meta` table.

Has many hits through the `query_hit` table.

The biggest documents to index will be posts.

There will need to be a way to indicated document type.

|id|index_id|
|--|--------|

## Index

Used to group indexed documents and track queries.

Has many documents.

|id|name|description|type|
|--|----|-----------|----|

## DocumentWord

Ranking the importance of a word in the document based on meta words could be done during the index process, if we know how to rank the words at the time of indexing.

|id|doc_id|word_id|position|
|--|------|-------|--------|

## DocumentMeta

Meta word table. `meta_id` is an enum of meta word types.

|id|doc_id|meta_id|position|
|--|------|-------|--------|

## Word

Has many documents through `document_word` table.

|id|value|
|--|-----|

## Gram

Unique segments of words.

Have many words through the `gram_word` table.

|id|value|length|
|--|-----|------|

## GramWord

Map partial word to full word

|id|word_id|gram_id|hit_count|
|--|-------|-------|---------|

## Query

User input queries.

|id|value|
|--|-----|

## Query Hit

Match successful queries to specific words contained in a document, independent of location with in the document. The concept is to have permanent link between query -> word -> document, used for ranking based on previous results.

|id|query_id|word_id|document_id|count|
|--|--------|-------|-----------|-----|

0 comments on commit 43940e2

Please sign in to comment.