-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step/trial storage and LLM generation strategy management #15
Comments
Hi @doxav These are great suggestions.
Would this meet your need? We're happy to implement new features to make Trace more convenient to you. Just let us know.
|
Persistent Storage of Optimization Steps
Generatiion based strategies
|
I can help on the STOP implementation as well -- looking like a great way to expand Trace's capability! |
@doxav Some clarification on logging.
Sorry maybe I wasn't clear.
saving the nodes by e.g. deepcopying after |
Just to let you know that I won’t be able to respond until Wednesday. If you have ideas on the points below in the meantime, we could build it progressively. For STOP algorithm:
For the trials/attempts logging:
|
Hi @doxav, you can save parameters of the model. We actually do this in here: https://github.com/microsoft/Trace/blob/main/examples/bbh/run_prompt_bigbench_trace.py If after an update, the model performed worse than before, we can re-load the parameters from the previous step. In the example above, we reload if there is an error in the code. The schema looks something like: val_perfs = {}
model = SomeTracedModel()
for step in range(20):
try:
model.forward()
except trace.ExecutionError as e:
if len(val_perfs) > 0:
best_checkpoint = max(val_perfs, key=val_perfs.get)
model.load(best_checkpoint)
checkpoint_name = f"{save_dir}/{task_name}/epoch_{epoch}_step_{step}.pkl"
if correctness:
# evaluate on val examples
val_perf, _ = evaluate(model, val_examples)
val_perfs[checkpoint_name] = val_perf
model.save(checkpoint_name) This type of optimization loop is very common in PyTorch. We can definitely provide some specialized trainer that can do this under the hood. |
I realize you might be asking a different question -- OptoPrime has
|
Hi, I finally had the time to look into it this afternoon. As STOP generates several candidates at each iteration, the optimizer must be able to select best candidate among them via a utility function (e.g. loss/task specific metric) or it could be a comparison mean between candidates if no metric is available to find best. STOP also requires to be able to average the utility of all candidates generated by the optimizer to estimate if it is an improved optimizer, or this could also be done via comparison if no metric is available. The trick to use comparison would allow to cover any case by using LLM when no utility function is provided/available. To implement this simply, we could:
I like Option A because it could easily apply to other Optimizer I think. # Option A idea of implementation
class OptoPrimeImprover(Optimizer):
def __init__(self, base_optimizer, num_candidates=3, utility_function=None, preference_function=None, **kwargs):
super().__init__(base_optimizer.parameters, **kwargs)
self.base_optimizer = base_optimizer
self.num_candidates = num_candidates
self.utility_function = utility_function # User-provided utility function
self.preference_function = preference_function # User-provided preference function
@bundle(trainable=True)
def improver(self, summary, mask=None, *args, **kwargs):
"""Improver function that generates multiple candidates and selects the best."""
candidates = []
# Generate multiple candidate prompts
for _ in range(self.num_candidates):
# Generate a candidate prompt using the base optimizer's construct_prompt
system_prompt, user_prompt = self.base_optimizer.construct_prompt(summary, mask, *args, **kwargs)
candidates.append((system_prompt, user_prompt))
# Evaluate candidates and select the best
if self.utility_function:
# Evaluate each candidate individually
best_score = float('-inf')
best_prompt = None
for system_prompt, user_prompt in candidates:
score = self.utility_function(system_prompt, user_prompt)
if score > best_score:
best_score = score
best_prompt = (system_prompt, user_prompt)
elif self.preference_function:
# Use preference function to select the best candidate among all
best_prompt = self.preference_function(candidates)
else:
# Default behavior if no utility or preference function is provided
best_prompt = candidates[0] # Select the first candidate
return best_prompt
def construct_prompt(self, summary, mask=None, *args, **kwargs):
# Use the improver function to generate and select the best prompt
return self.improver(summary, mask, *args, **kwargs)
# Other methods (e.g., _step, call_llm) remain unchanged or inherit from base_optimizer
# Usage example,
base_optimizer = OptoPrime(parameters, config_list=config_list)
# Option 1: Define a utility function (e.g., evaluating prompt quality)
def my_utility_function(system_prompt, user_prompt):
# return score
# Option 2: Define a preference function (e.g., using an LLM to compare candidates)
def my_preference_function(candidates):
# Implement comparison logic to select the best candidate
best_candidate = max(candidates, key=lambda x: len(x[1]))
return best_candidate
# Create an OptoPrimeImprover using the base optimizer and a utility function
optimizer = OptoPrimeImprover( base_optimizer=base_optimizer, num_candidates=5,
utility_function=my_utility_function) # OR use preference_function=my_preference_function for Option 2 # Option B idea of implementation
class OptoPrimeEnhanced(n):
def __init__(self, parameters, config_list=None, num_candidates=3,
utility_function: Callable[[str, str], float] = None,
preference_function: Callable[[List[Tuple[str, str]]], Tuple[str, str]] = None, **kwargs):
super().__init__(parameters, config_list=config_list, **kwargs)
self.num_candidates = num_candidates
self.utility_function = utility_function
self.preference_function = preference_function
@bundle(trainable=True)
def construct_prompt(self, summary, mask=None, *args, **kwargs) -> Tuple[str, str]:
candidates = []
for _ in range(self.num_candidates):
# Generate a candidate prompt
system_prompt, user_prompt = super().construct_prompt(summary, mask, *args, **kwargs)
candidates.append((system_prompt, user_prompt))
# Evaluate and select the best candidate
if self.utility_function:
# Use utility function to score candidates
best_candidate = max(candidates, key=lambda pair: self.utility_function(pair[0], pair[1]))
elif self.preference_function:
# Use preference function to select the best candidate
best_candidate = self.preference_function(candidates)
else:
# Default to the first candidate if no functions are provided
best_candidate = candidates[0]
return best_candidate |
Based on my understanding of STOP, I think option A as you described is the closest to the original paper's intention. Do you want to do a fork & new branch and make a pull request after? It doesn't have to be fully working, and we can iterate. |
Ok, I'll try to do that this Monday |
After further analysis, nor Option A, nor Option B seems a good track at this stage. I cannot correctly answer what is the reasonable level for STOP optimization:
|
I have two feature requests related to optimization tracking and strategy management in Trace:
Persistent Storage of Optimization Steps
How can I store each optimization step (including parameters, gradients, and results) in a way that’s similar to how Optuna organizes trials in a study (e.g., using SQLite, PostgreSQL, MySQL, Redis, etc.)?
Management and Evolution of Generation Strategies
What is the best approach for implementing and managing different generation strategies (e.g., genetic algorithms, evolutionary strategies...) within Trace?
The text was updated successfully, but these errors were encountered: