Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step/trial storage and LLM generation strategy management #15

Open
doxav opened this issue Nov 12, 2024 · 11 comments
Open

Step/trial storage and LLM generation strategy management #15

doxav opened this issue Nov 12, 2024 · 11 comments
Assignees

Comments

@doxav
Copy link
Contributor

doxav commented Nov 12, 2024

I have two feature requests related to optimization tracking and strategy management in Trace:

  1. Persistent Storage of Optimization Steps
    How can I store each optimization step (including parameters, gradients, and results) in a way that’s similar to how Optuna organizes trials in a study (e.g., using SQLite, PostgreSQL, MySQL, Redis, etc.)?

    • This would enable analysis of parameter-result correlations, support for experiment reproducibility, distributed optimization, and fault tolerance.
    • Is there a recommended way to achieve this currently, or would this require new functionality?
  2. Management and Evolution of Generation Strategies
    What is the best approach for implementing and managing different generation strategies (e.g., genetic algorithms, evolutionary strategies...) within Trace?

    • Additionally, is there support or guidance on implementing a self-improving strategy, such as the Self-Taught Optimizer (STOP) approach? (Referencing STOP paper.)
    • Such a feature would enable more sophisticated optimization workflows by adjusting and evolving strategies automatically.
@chinganc
Copy link
Collaborator

Hi @doxav

These are great suggestions.

  1. Currently we suggest saving parameters nodes directly. All nodes in Trace can supports serialization (i.e. pickle and deepcopy) if the base object it wraps supports so. And after backward is called, a used parameter node object (i.e. nodes that are set to trainable) also contains the propagated feedback in its feedback attribute. So saving the nodes would keep all information. For trainable functions, its parameter can be obtained from its parameter attribute, which can be saved too.

Would this meet your need? We're happy to implement new features to make Trace more convenient to you. Just let us know.

  1. This can be implemented as an Optimizer class or outside. The former would support some book keeping such as memory or additional validation, but it may have some limitation due to the PyTorch-like design of optimizer usage. For best flexibility, I would suggest writing the search algorithm outside, just Trace optimizer as subroutines.

  2. Thanks for suggesting STOP. That should be easily implementable with Trace. If you're interested in contributing, we're happy to work with you. Alternatively, we will write an example of STOP implementation later on.

@doxav
Copy link
Contributor Author

doxav commented Nov 13, 2024

Persistent Storage of Optimization Steps

  • to my experience typical step-by-step optimization stores parameters and new performance measured to identify top configurations, guide exploration/exploitatoin and compare parameters influence/sensitivity.
    Parameters in our case can be new value of trainable nodes/functions parameters. Storing feedback, as well diff between previous and new parameters, seems also required for later analysis.
    New performance should be measured (utility), should it require to run a new loss measure (feedback) ?
  • why storing each trial after backward rather than step() ?

Generatiion based strategies

  • I still need to better understand how the utility function should be measured in the Optimizer class to align properly with STOP and ensure correct implementation.
  • The self-reinforcement of STOP could be executed within a single step (which would be easier for self-refining a step), but this approach seems computationally heavy due to extensive inference. Alternatively, it could follow the current step-by-step flow, but I don't yet see a clear and stable approach. For example, in OptoPrime, this would involve evolving the _step method, though this approach seems to introduce considerable instability.

@allenanie
Copy link
Collaborator

I can help on the STOP implementation as well -- looking like a great way to expand Trace's capability!

@chinganc
Copy link
Collaborator

chinganc commented Nov 13, 2024

@doxav Some clarification on logging.

... New performance should be measured (utility), should it require to run a new loss measure (feedback) ?

  • why storing each trial after backward rather than step() ?

Sorry maybe I wasn't clear. feedback at a node is computed after backward and it is removed when zero_feedback is called. So in a regular usage of

optimizer.zero_feedback() # feedback is cleared
optimizer.backward(target, feedback) # feedback is computed
optimizer.step() # feedback persists

saving the nodes by e.g. deepcopying after step would save both the latest value and the feedback resulting into it. By tracking these, you should be able to do analysis you described.

@doxav
Copy link
Contributor Author

doxav commented Nov 14, 2024

Just to let you know that I won’t be able to respond until Wednesday. If you have ideas on the points below in the meantime, we could build it progressively.

For STOP algorithm:

  • it can either self-refine just within a step or guide it across the different steps - I think a first simpler approach should be to run within a step (in any case we need to keep trace of candidates to avoid repeating)
  • it would improve the generated candidate solution at each step and the candidate generator (meta opitmizer)
  • we need to properly design the utility / meta-utility function, I think you are the best to think it.

For the trials/attempts logging:

  • it is separated from STOP design but STOP needs to keep track of passed attempts, so it might be thought to also serve this purpose.
  • I am still not clear how Trace/OptoPrime is able to manage long explorations of the space of parameters and associated results without it (I understand feedbacks and grads are taken in consideration as long as zero_grad is not called).

@allenanie
Copy link
Collaborator

Hi @doxav, you can save parameters of the model. We actually do this in here: https://github.com/microsoft/Trace/blob/main/examples/bbh/run_prompt_bigbench_trace.py

If after an update, the model performed worse than before, we can re-load the parameters from the previous step.

In the example above, we reload if there is an error in the code.

The schema looks something like:

val_perfs = {}
model = SomeTracedModel()
for step in range(20):
  try:
    model.forward()
  except trace.ExecutionError as e:
    if len(val_perfs) > 0:
       best_checkpoint = max(val_perfs, key=val_perfs.get)
       model.load(best_checkpoint)

   checkpoint_name = f"{save_dir}/{task_name}/epoch_{epoch}_step_{step}.pkl"
   if correctness:
      # evaluate on val examples
      val_perf, _ = evaluate(model, val_examples)
      val_perfs[checkpoint_name] = val_perf
      model.save(checkpoint_name)

This type of optimization loop is very common in PyTorch. We can definitely provide some specialized trainer that can do this under the hood.

@allenanie
Copy link
Collaborator

I am still not clear how Trace/OptoPrime is able to manage long explorations of the space of parameters and associated results without it (I understand feedbacks and grads are taken in consideration as long as zero_grad is not called).

I realize you might be asking a different question -- OptoPrime has self.memory -- it presents past variables and feedbacks in the prompt context:

Below are some variables and their feedbacks you received in the past:
...

@doxav
Copy link
Contributor Author

doxav commented Nov 18, 2024

Hi,

I finally had the time to look into it this afternoon.

As STOP generates several candidates at each iteration, the optimizer must be able to select best candidate among them via a utility function (e.g. loss/task specific metric) or it could be a comparison mean between candidates if no metric is available to find best.

STOP also requires to be able to average the utility of all candidates generated by the optimizer to estimate if it is an improved optimizer, or this could also be done via comparison if no metric is available.

The trick to use comparison would allow to cover any case by using LLM when no utility function is provided/available.

To implement this simply, we could:

  • Option A: wrap/decorate construct_prompt in an improver managing the generation of several candidates and selecting best, and this improver function could be iteratively evolved (or in parallel) to find better improver.
  • Option B: evolve construct_prompt to generate several candidates and selecting best, and new construct_prompt could be iteratively evolved (or in parallel) to find better improver.

I like Option A because it could easily apply to other Optimizer I think.

# Option A idea of implementation

    class OptoPrimeImprover(Optimizer):

        def __init__(self, base_optimizer, num_candidates=3, utility_function=None, preference_function=None, **kwargs):
            super().__init__(base_optimizer.parameters, **kwargs)
            self.base_optimizer = base_optimizer
            self.num_candidates = num_candidates
            self.utility_function = utility_function  # User-provided utility function
            self.preference_function = preference_function  # User-provided preference function

        @bundle(trainable=True)
        def improver(self, summary, mask=None, *args, **kwargs):
            """Improver function that generates multiple candidates and selects the best."""
            candidates = []
            # Generate multiple candidate prompts
            for _ in range(self.num_candidates):
                # Generate a candidate prompt using the base optimizer's construct_prompt
                system_prompt, user_prompt = self.base_optimizer.construct_prompt(summary, mask, *args, **kwargs)
                candidates.append((system_prompt, user_prompt))
            # Evaluate candidates and select the best
            if self.utility_function:
                # Evaluate each candidate individually
                best_score = float('-inf')
                best_prompt = None
                for system_prompt, user_prompt in candidates:
                    score = self.utility_function(system_prompt, user_prompt)
                    if score > best_score:
                        best_score = score
                        best_prompt = (system_prompt, user_prompt)
            elif self.preference_function:
                # Use preference function to select the best candidate among all
                best_prompt = self.preference_function(candidates)
            else:
                # Default behavior if no utility or preference function is provided
                best_prompt = candidates[0]  # Select the first candidate
            return best_prompt

        def construct_prompt(self, summary, mask=None, *args, **kwargs):
            # Use the improver function to generate and select the best prompt
            return self.improver(summary, mask, *args, **kwargs)
        # Other methods (e.g., _step, call_llm) remain unchanged or inherit from base_optimizer

    # Usage example,
    base_optimizer = OptoPrime(parameters, config_list=config_list)

    # Option 1: Define a utility function (e.g., evaluating prompt quality)
    def my_utility_function(system_prompt, user_prompt):
        # return score

    # Option 2: Define a preference function (e.g., using an LLM to compare candidates)
    def my_preference_function(candidates):
        # Implement comparison logic to select the best candidate
        best_candidate = max(candidates, key=lambda x: len(x[1]))
        return best_candidate
    
     # Create an OptoPrimeImprover using the base optimizer and a utility function
    optimizer = OptoPrimeImprover( base_optimizer=base_optimizer, num_candidates=5,
        utility_function=my_utility_function)  # OR use preference_function=my_preference_function for Option 2
# Option B idea of implementation

    class OptoPrimeEnhanced(n):

        def __init__(self, parameters, config_list=None, num_candidates=3,
                     utility_function: Callable[[str, str], float] = None,
                     preference_function: Callable[[List[Tuple[str, str]]], Tuple[str, str]] = None, **kwargs):
            super().__init__(parameters, config_list=config_list, **kwargs)
            self.num_candidates = num_candidates
            self.utility_function = utility_function
            self.preference_function = preference_function

        @bundle(trainable=True)
        def construct_prompt(self, summary, mask=None, *args, **kwargs) -> Tuple[str, str]:
            candidates = []
           
            for _ in range(self.num_candidates):
                # Generate a candidate prompt
                system_prompt, user_prompt = super().construct_prompt(summary, mask, *args, **kwargs)
                candidates.append((system_prompt, user_prompt))
           
            # Evaluate and select the best candidate
            if self.utility_function:
                # Use utility function to score candidates
                best_candidate = max(candidates, key=lambda pair: self.utility_function(pair[0], pair[1]))
            elif self.preference_function:
                # Use preference function to select the best candidate
                best_candidate = self.preference_function(candidates)
            else:
                # Default to the first candidate if no functions are provided
                best_candidate = candidates[0]
           
            return best_candidate

@allenanie
Copy link
Collaborator

Based on my understanding of STOP, I think option A as you described is the closest to the original paper's intention.

Do you want to do a fork & new branch and make a pull request after? It doesn't have to be fully working, and we can iterate.

@doxav
Copy link
Contributor Author

doxav commented Nov 21, 2024

Ok, I'll try to do that this Monday

@doxav
Copy link
Contributor Author

doxav commented Nov 23, 2024

After further analysis, nor Option A, nor Option B seems a good track at this stage. I cannot correctly answer what is the reasonable level for STOP optimization:

  1. Should we improve at the micro level of call_llm in OptoPrime/OPRO by generating different response candidates without being able to assess them by running a computed utility function or feedback but a LLM based comparison of them ? It will then be specific to OptoPrime/OPRO because it is not a generic method. The optimizer could try to vary prompt, chain, temperature, number of candidates... The comparison/preference will be highly dependent on the prompt and LLM. It can improve optimization but will be costly :-), and then no more 3x lighter than TextGrad in LLM calls.
  2. Should we improve at a larger level in step but I don't see where ?
  3. Should we optimize the Optimizer so that suggestions are propagated on the graph and we can then use either a utility function or maybe the feedback ? We are at a totally different scale.
  4. Maybe it is not interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants