Skip to content

Commit

Permalink
Merge pull request #19 from A-Baji/dev
Browse files Browse the repository at this point in the history
thought min max and bold prompts
  • Loading branch information
A-Baji authored Feb 18, 2023
2 parents b02c9c4 + 0e1e196 commit 7035f0f
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 17 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,21 @@ Pick a channel and user whose chat logs you want to use for creating your custom

You can follow [this guide](https://turbofuture.com/internet/Discord-Channel-ID) to learn how to find a channel's ID. Make sure that you include the full username with the #id, and wrap it in quotes if it contains spaces. The `--dirty` flag prevents the outputted dataset files from being deleted. Downloaded chat logs get saved and reused, but you can set the `--redownload` flag if you want to update the logs.

You may have noticed the lack of a model customization process occurring after running that command. This is because no base model was selected, but before you specify a base model, you should analyze the generated dataset located in the directory mentioned in the logs. Chat messages are parsed into a dataset by grouping individual messages sent within a certain timeframe into "thoughts", where each thought is a completion in the dataset. The default for this timeframe is 10 seconds. If your dataset looks a bit off, try different timeframe settings using the `-t` option:
You may have noticed the lack of a model customization process occurring after running that command. This is because no base model was selected, but before you specify a base model, you should analyze the generated dataset located in the directory mentioned in the logs. Chat messages are parsed into a dataset by grouping individual messages sent within a certain timeframe into "thoughts", where each thought is a completion in the dataset. The default for this timeframe is 10 seconds. The length of each thought must also be within the minimum and max thought length. The defaults for these are 4 words and `None`, or optional. If your dataset looks a bit off, try different settings using the `--ttime`, `--tmin`, and `--ttmax` options:

`discordai model create -c <channel_id> -u "<username#id>" -t <timeframe> --dirty`
`discordai model create -c <channel_id> -u "<username#id>" --ttime <timeframe> --tmax <thought_max> --tmin <thought_min> --dirty`

After you've found a good timeframe setting, you will want to manage your dataset's size. The larger your dataset, the more openAI credits it will cost to create a custom model. By default, the max dataset size is set to 1000. If your dataset exceeds this limit, it will be reduced using either a "first", "last", "middle", or "even" reduction method. The "first" method will select the first n messages, "last" will select the last n, "middle" will select the middle n, and "even" will select an even distribution of n messages. The default reduction method is even. You can set the max dataset size and reduction mode using the `-m` and `-r` options:
After you've found good thought settings, you will want to manage your dataset's size. The larger your dataset, the more openAI credits it will cost to create a custom model. By default, the max dataset size is set to 1000. If your dataset exceeds this limit, it will be reduced using either a "first", "last", "middle", or "even" reduction method. The "first" method will select the first n messages, "last" will select the last n, "middle" will select the middle n, and "even" will select an even distribution of n messages. The default reduction method is even. You can set the max dataset size and reduction mode using the `-m` and `-r` options:

`discordai model create -c <channel_id> -u "<username#id>" -t <timeframe> -m <max_size> -r <reduction_mode> --dirty`
`discordai model create -c <channel_id> -u "<username#id>" --ttime <timeframe> --tmax <thought_max> --tmin <thought_min> -m <max_size> -r <reduction_mode> --dirty`

If you are planning on creating multiple models, you may want to get your hands on multiple openAI API keys in order to maximize the free credit usage. You can assign specific api keys to custom models using the `-o` option. Otherwise, the key provided in your config will be used.

Now that you have fine tuned your dataset, you can finally begin the customization process by specifying a base model. OpenAI has four base [models](https://beta.openai.com/docs/models/gpt-3): davinci, curie, babbage, and ada, in order of most advanced to least advanced. Generally you will want to use davinci, but it is also the most expensive model as well as the longest to customize. Select your base model with the `-b` option.

Your final command should look something like this:

`discordai model create -c <channel_id> -u "<username#id>" -t <timeframe> -m <max_size> -r <reduction_mode> -b <base_model>`
`discordai model create -c <channel_id> -u "<username#id>" --ttime <timeframe> --tmax <thought_max> --tmin <thought_min> -m <max_size> -r <reduction_mode> -b <base_model>`

If you find the training step to cost too many credits with your current options, you can cancel it with `discordai job cancel -j <job_id>`, and then either lower your max dataset size, or choose a different discord channel and/or user. You can get a list of all your jobs with `discordai job list --simple`.
### Test the new model
Expand Down
43 changes: 37 additions & 6 deletions discordai/command_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,13 @@ def discordai():
dest='stop_default',
help="Set the stop option to use for completions to True",
)
new_cmd_optional_named.add_argument(
"--bolden",
action='store_true',
required=False,
dest='bolden',
help="Boldens the original prompt in the completion output",
)

delete_cmd = bot_cmds_commands_subcommand.add_parser(
"delete", description="Delete a slash command from your bot"
Expand Down Expand Up @@ -179,13 +186,29 @@ def discordai():
help="The base model to use for customization. If none, then skips training step: DEFAULT=none",
)
model_create_optional_named.add_argument(
"-t", "--thought-time",
"--ttime", "--thought-time",
type=int,
default=10,
required=False,
dest='thought_time',
help="The max amount of time in seconds to consider two individual messages to be part of the same \"thought\": DEFAULT=10",
)
model_create_optional_named.add_argument(
"--tmax", "--thought-max",
type=int,
default=None,
required=False,
dest='thought_max',
help="The max in words length of each thought: DEFAULT=None",
)
model_create_optional_named.add_argument(
"--tmin", "--thought-min",
type=int,
default=4,
required=False,
dest='thought_min',
help="The minimum in words length of each thought: DEFAULT=4",
)
model_create_optional_named.add_argument(
"-m", "--max-entries",
type=int,
Expand Down Expand Up @@ -301,6 +324,13 @@ def discordai():
dest='openai_key',
help="The openAI API key associated with the job to see the status for: DEFAULT=config.openai_key",
)
job_status_optional_named.add_argument(
"--events",
action='store_true',
required=False,
dest='events',
help="Simplify the output to just the event list",
)

job_cancel = job_subcommand.add_parser(
"cancel", description="Cancel an openAI customization job"
Expand Down Expand Up @@ -351,17 +381,18 @@ def discordai():
if args.subcommand == "commands":
if args.subsubcommand == "new":
template.gen_new_command(args.model_id, args.command_name, args.temp_default, args.pres_default,
args.freq_default, args.max_tokens_default, args.stop_default, args.openai_key)
args.freq_default, args.max_tokens_default, args.stop_default, args.openai_key,
args.bolden)
elif args.subsubcommand == "delete":
template.delete_command(args.command_name)
elif args.command == "model":
if args.subcommand == "list":
openai_wrapper.list_models(args.openai_key, args.simple)
if args.subcommand == "create":
customize.create_model(config["token"], args.openai_key, args.channel, args.user,
thought_time=args.thought_time, max_entry_count=args.max_entries,
reduce_mode=args.reduce_mode, base_model=args.base_model, clean=args.dirty,
redownload=args.redownload)
thought_time=args.thought_time, thought_max=args.thought_max, thought_min=args.thought_min,
max_entry_count=args.max_entries, reduce_mode=args.reduce_mode, base_model=args.base_model,
clean=args.dirty, redownload=args.redownload)
if args.subcommand == "delete":
openai_wrapper.delete_model(args.openai_key, args.model_id)
elif args.command == "job":
Expand All @@ -370,7 +401,7 @@ def discordai():
if args.subcommand == "follow":
openai_wrapper.follow_job(args.openai_key, args.job_id)
if args.subcommand == "status":
openai_wrapper.get_status(args.openai_key, args.job_id)
openai_wrapper.get_status(args.openai_key, args.job_id, args.events)
if args.subcommand == "cancel":
openai_wrapper.cancel_job(args.openai_key, args.job_id)
elif args.command == "config":
Expand Down
8 changes: 4 additions & 4 deletions discordai/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ async def customai(self, context: Context, prompt: str = "", temp: float = {temp
frequency_penalty=presPen,
presence_penalty=freqPen,
max_tokens=max_tokens,
echo=True if prompt else False,
echo=False,
stop='.' if stop else None,
)
await context.send(response[\'choices\'][0][\'text\'][:2000])
await context.send(f"{{'**' if {bold} else ''}}{{prompt}}{{'**' if {bold} else ''}}{{response[\'choices\'][0][\'text\'][:2000]}}")
except Exception as error:
print({error})
await context.send(
Expand All @@ -63,7 +63,7 @@ async def setup(bot):


def gen_new_command(model_id: str, command_name: str, temp_default: float, pres_default: float, freq_default: float,
max_tokens_default: int, stop_default: bool, openai_key: str):
max_tokens_default: int, stop_default: bool, openai_key: str, bold_prompt: bool):
if getattr(sys, 'frozen', False):
# The code is being run as a frozen executable
data_dir = pathlib.Path(appdirs.user_data_dir(appname="discordai"))
Expand All @@ -84,7 +84,7 @@ def gen_new_command(model_id: str, command_name: str, temp_default: float, pres_
command_name=command_name, temp_default=float(temp_default),
pres_default=float(pres_default),
freq_default=float(freq_default),
max_tokens_default=max_tokens_default, stop_default=stop_default, openai_key=openai_key,
max_tokens_default=max_tokens_default, stop_default=stop_default, openai_key=openai_key, bold = bold_prompt,
error="f\"Failed to generate valid response for prompt: {prompt}\\nError: {error}\""))
print(f"Successfully created new slash command: /{command_name} using model {model_id}")

Expand Down
2 changes: 1 addition & 1 deletion discordai/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.2.1"
__version__ = "1.3.0"
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ discord.py
openai
pandas
appdirs
discordai_modelizer @ git+https://github.com/A-Baji/discordAI-modelizer.git@1.1.0
discordai_modelizer @ git+https://github.com/A-Baji/discordAI-modelizer.git@1.2.0

0 comments on commit 7035f0f

Please sign in to comment.