Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory usage? #21

Open
Vimiso opened this issue Oct 20, 2024 · 1 comment
Open

Large memory usage? #21

Vimiso opened this issue Oct 20, 2024 · 1 comment

Comments

@Vimiso
Copy link

Vimiso commented Oct 20, 2024

Take the given test:

$usage = memory()[1];

$provider = new \Yethee\Tiktoken\EncoderProvider;
$provider->setVocabCache(storage_path('app'));
$encoder = $provider->getForModel('gpt-4o-mini');

dd(memory()[1]-$usage); // 26mb! 

26mb seems a bit much no? Especially considering the cached vocab is only 3.6mb.

@yethee
Copy link
Owner

yethee commented Nov 10, 2024

The token dictionary takes up most of the allocated memory. We need to keep the entire dictionary in memory so that encoding text into tokens and vice versa is efficient. Currently, the built-in array type is used for this. I have no idea how to reduce the amount of memory consumed in this place.

Profile
<?php

use Yethee\Tiktoken\EncoderProvider;

require_once 'vendor/autoload.php';

$provider = new EncoderProvider();
$encoder = $provider->get('<encoding>');

Top of memory usage: Vocab::fromStream()

Encoding: cl100k_base

*** SPX Report ***

Global stats:

  Called functions    :       81
  Distinct functions  :       50

  Wall time           :  161.9ms
  ZE memory usage     :   11.8MB

Flat profile:

 Wall time           | ZE memory usage     |
 Inc.     | *Exc.    | Inc.     | Exc.     | Called   | Function
----------+----------+----------+----------+----------+----------
   70.2ms |   59.0ms |  432.2KB |  418.5KB |       12 | {closure}
   42.1ms |   38.2ms |   10.8MB |    8.8MB |        1 | Yethee\Tiktoken\Vocab\Vocab::fromStream
   78.5ms |    5.9ms |  839.8KB |  363.7KB |        1 | ComposerAutoloaderInitac9bfb1d4166aeecccdb5d5dfb6f6537::getLoader
    5.0ms |    5.0ms |     120B |     120B |        1 | Yethee\Tiktoken\Vocab\Loader\DefaultVocabLoader::checkHash
    4.0ms |    4.0ms |    2.0MB |    2.0MB |        1 | Yethee\Tiktoken\Vocab\Vocab::__construct
    2.4ms |    2.4ms |   43.0KB |   43.0KB |        1 | ComposerAutoloaderInitac9bfb1d4166aeecccdb5d5dfb6f6537::loadClassLoader
   29.9us |   29.9us |       0B |       0B |        1 | /var/src/tiktoken/vendor/phpunit/phpunit/src/Framework/Assert/Functions.php
   42.1ms |   19.4us |   10.8MB |   -8.0KB |        1 | Yethee\Tiktoken\Vocab\Vocab::fromFile
   15.4us |   15.4us |     424B |     424B |        1 | Composer\Autoload\ClassLoader::initializeIncludeClosure
    5.7ms |   11.7us |     592B |       0B |        6 | Composer\Autoload\ClassLoader::findFile

Encoding: o200k_base

*** SPX Report ***

Global stats:

  Called functions    :       81
  Distinct functions  :       50

  Wall time           :  202.1ms
  ZE memory usage     :   22.7MB

Flat profile:

 Wall time           | ZE memory usage     |
 Inc.     | *Exc.    | Inc.     | Exc.     | Called   | Function
----------+----------+----------+----------+----------+----------
   84.6ms |   76.1ms |   21.8MB |   17.8MB |        1 | Yethee\Tiktoken\Vocab\Vocab::fromStream
   16.4ms |   14.6ms |   64.9KB |   65.1KB |        6 | 1@Composer\Autoload\{closure}
   10.8ms |   10.8ms |     120B |     120B |        1 | Yethee\Tiktoken\Vocab\Loader\DefaultVocabLoader::checkHash
    8.5ms |    8.5ms |    4.0MB |    4.0MB |        1 | Yethee\Tiktoken\Vocab\Vocab::__construct
    2.0ms |    2.0ms |   43.0KB |   43.0KB |        1 | ComposerAutoloaderInitac9bfb1d4166aeecccdb5d5dfb6f6537::loadClassLoader
   31.9us |   31.9us |       0B |       0B |        1 | /var/src/tiktoken/vendor/phpunit/phpunit/src/Framework/Assert/Functions.php
   84.7ms |   23.8us |   21.8MB |   -8.0KB |        1 | Yethee\Tiktoken\Vocab\Vocab::fromFile
    5.5ms |   10.6us |     592B |       0B |        6 | Composer\Autoload\ClassLoader::findFile
    6.8us |    6.8us |      48B |      48B |        1 | Yethee\Tiktoken\EncoderProvider::__construct
  106.4ms |    6.1us |   21.8MB |     432B |        1 | Yethee\Tiktoken\EncoderProvider::getVocab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants