Skip to content

Commit

Permalink
Merge branch 'main' into add-depth-estimation
Browse files Browse the repository at this point in the history
  • Loading branch information
xenova committed Nov 19, 2023
2 parents 0c5e279 + 19daf2d commit 8903ede
Show file tree
Hide file tree
Showing 8 changed files with 90 additions and 33 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,13 @@
<img alt="NPM" src="https://img.shields.io/npm/v/@xenova/transformers">
</a>
<a href="https://www.npmjs.com/package/@xenova/transformers">
<img alt="Downloads" src="https://img.shields.io/npm/dw/@xenova/transformers">
<img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@xenova/transformers">
</a>
<a href="https://www.jsdelivr.com/package/npm/@xenova/transformers">
<img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@xenova/transformers">
</a>
<a href="https://github.com/xenova/transformers.js/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/xenova/transformers.js">
<img alt="License" src="https://img.shields.io/github/license/xenova/transformers.js?color=blue">
</a>
<a href="https://huggingface.co/docs/transformers.js/index">
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
Expand Down
7 changes: 5 additions & 2 deletions docs/scripts/build_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,13 @@
<img alt="NPM" src="https://img.shields.io/npm/v/@xenova/transformers">
</a>
<a href="https://www.npmjs.com/package/@xenova/transformers">
<img alt="Downloads" src="https://img.shields.io/npm/dw/@xenova/transformers">
<img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@xenova/transformers">
</a>
<a href="https://www.jsdelivr.com/package/npm/@xenova/transformers">
<img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@xenova/transformers">
</a>
<a href="https://github.com/xenova/transformers.js/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/xenova/transformers.js">
<img alt="License" src="https://img.shields.io/github/license/xenova/transformers.js?color=blue">
</a>
<a href="https://huggingface.co/docs/transformers.js/index">
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
Expand Down
6 changes: 3 additions & 3 deletions examples/next-client/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions examples/next-server/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions examples/semantic-image-search-client/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions examples/semantic-image-search/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

70 changes: 57 additions & 13 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 11 additions & 4 deletions src/tokenizers.js
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ class WordPieceTokenizer extends TokenizerModel {
* @param {Object} config.vocab A mapping of tokens to ids.
* @param {string} config.unk_token The unknown token string.
* @param {string} config.continuing_subword_prefix The prefix to use for continuing subwords.
* @param {number} [config.max_input_chars_per_word=100] The maximum number of characters per word.
*/
constructor(config) {
super(config);
Expand All @@ -291,6 +292,12 @@ class WordPieceTokenizer extends TokenizerModel {
*/
this.unk_token = config.unk_token;

/**
* The maximum number of characters allowed per word.
* @type {number}
*/
this.max_input_chars_per_word = config.max_input_chars_per_word ?? 100;

/**
* An array of tokens.
* @type {string[]}
Expand All @@ -310,10 +317,10 @@ class WordPieceTokenizer extends TokenizerModel {
let outputTokens = [];
for (let token of tokens) {
let chars = [...token];
// TODO add
// if len(chars) > self.max_input_chars_per_word:
// output_tokens.append(self.unk_token)
// continue
if (chars.length > this.max_input_chars_per_word) {
outputTokens.push(this.unk_token);
continue;
}

let isUnknown = false;
let start = 0;
Expand Down

0 comments on commit 8903ede

Please sign in to comment.