Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: mvp index #3

Closed
wants to merge 12 commits into from
Prev Previous commit
Next Next commit
docs: add schema.md
johnhooks committed Dec 15, 2023
commit 43940e23c0323a42d17873920025b64f9e9f2723
14 changes: 14 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Architecture

- Tokenization / Lexer
- HTML tokenization
- Stemming
- Index
- Repository
- Cron
- Ranking
- Query
- Parser / Lexer
- Repository
- Stop Words
- Config
78 changes: 78 additions & 0 deletions docs/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Schema

The database base structure used to index WordPress documents for search.

## Document

Has many words through the `document_word` table.

Has many meta words through the `document_meta` table.

Has many hits through the `query_hit` table.

The biggest documents to index will be posts.

There will need to be a way to indicated document type.

|id|index_id|
|--|--------|

## Index

Used to group indexed documents and track queries.

Has many documents.

|id|name|description|type|
|--|----|-----------|----|

## DocumentWord

Ranking the importance of a word in the document based on meta words could be done during the index process, if we know how to rank the words at the time of indexing.

|id|doc_id|word_id|position|
|--|------|-------|--------|

## DocumentMeta

Meta word table. `meta_id` is an enum of meta word types.

|id|doc_id|meta_id|position|
|--|------|-------|--------|

## Word

Has many documents through `document_word` table.

|id|value|
|--|-----|

## Gram

Unique segments of words.

Have many words through the `gram_word` table.

|id|value|length|
|--|-----|------|

## GramWord

Map partial word to full word

|id|word_id|gram_id|hit_count|
|--|-------|-------|---------|

## Query

User input queries.

|id|value|
|--|-----|

## Query Hit

Match successful queries to specific words contained in a document, independent of location with in the document. The concept is to have permanent link between query -> word -> document, used for ranking based on previous results.

|id|query_id|word_id|document_id|count|
|--|--------|-------|-----------|-----|

Unchanged files with check annotations Beta

interface EngineContract
{
public function loadConfig(array $config);

Check failure on line 7 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::loadConfig() has no return type specified.

Check failure on line 7 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::loadConfig() has parameter $config with no value type specified in iterable type array.
public function createIndex(string $indexName);

Check failure on line 9 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::createIndex() has no return type specified.
public function updateInfoTable(string $key, string $value);

Check failure on line 11 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::updateInfoTable() has no return type specified.
public function getValueFromInfoTable(string $value);

Check failure on line 13 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::getValueFromInfoTable() has no return type specified.
public function run();

Check failure on line 15 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::run() has no return type specified.
public function processDocument($row);

Check failure on line 17 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::processDocument() has no return type specified.

Check failure on line 17 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::processDocument() has parameter $row with no type specified.
public function saveToIndex($stems, $docId);

Check failure on line 19 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::saveToIndex() has no return type specified.

Check failure on line 19 in src/Contracts/EngineContract.php

GitHub Actions / Check Coding Standards

Method WpBlocks\Search\Contracts\EngineContract::saveToIndex() has parameter $docId with no type specified.
public function selectIndex($indexName);