-
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3a09ee7
commit 05f8987
Showing
3 changed files
with
99 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
<span style="float:right;"><a href="https://github.com/RubixML/RubixML/blob/master/src/Transformers/TfIdfTransformer.php">[source]</a></span> | ||
|
||
# BM25 Transformer | ||
BM25 is a term frequency weighting scheme that takes term frequency (TF) saturation and document length into account. | ||
|
||
> **Note:** This transformer assumes that its input is made up of word frequency vectors such as those produced by [Word Count Vectorizer](word-count-vectorizer.md). | ||
**Interfaces:** [Transformer](api.md#transformer), [Stateful](api.md#stateful), [Elastic](api.md#elastic) | ||
|
||
**Data Type Compatibility:** Continuous only | ||
|
||
## Parameters | ||
| # | Param | Default | Type | Description | | ||
|---|---|---|---|---| | ||
| 1 | alpha | 1.2 | float | The term frequency (TF) normalization factor. | | ||
| 2 | beta | 0.75 | float | The importance of document length in normalizing term frequency. | | ||
|
||
## Example | ||
```php | ||
use Rubix\ML\Transformers\BM25Transformer; | ||
|
||
$transformer = new BM25Transformer(1.2, 0.75); | ||
``` | ||
|
||
## Additional Methods | ||
Return the document frequencies calculated during fitting: | ||
```php | ||
public dfs() : ?array | ||
``` | ||
|
||
Return the average number of tokens per document: | ||
```php | ||
public averageDocumentLength() : ?float | ||
``` | ||
|
||
### References | ||
>- S. Robertson et al. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters