Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Methods Enhancements to Avoid Duplicate Evaluated Pipelines 🥈 #211

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

simonprovost
Copy link

@simonprovost simonprovost commented Dec 4, 2023

Hi @PGijsbers, @leightonvg,

After dedicating some time to examine the concern raised in thread #189, I would like to suggest an initial resolution within the thread, which I previously suggested as the narrowed approach.

The narrowed approach could be seen as follow:

Because a search algorithm may be highly specific to itself, I am convinced that re-evaluation of previously evaluated pipelines should be performed by the algorithms themselves. While I am unable to provide a concrete example, a re-evaluated pipeline may be useful for theories built on this aspect of newly designed algorithms; who knows? Although this is not a compelling reason, approaching the problem narrowly allows the algorithm to process it in the manner it prefers. In other words, it may differ from algorithm to algorithm and be implemented in such a way that, for example, a particular candidate is given more prominence than another, as indicated by the duplication number of this candidate, and thus a potential course of action could dynamically shift the algorithm's focus, for example. If I am not mistaken, performing such a check within each algorithm opens up additional avenues for what to do in any way.

Lastly, to avoid confusion for new search algorithm designer, it is possible that we implement a log information warning within the evaluation pipeline module of the GAMA system regarding the re-evaluation of duplicate pipelines and the necessity to refactor the provided search algorithm. Exploring the duplication within each search method would facilitate the processus with what to do basically.

In the interim, this PR enhances the random search uniqueness of the evaluated pipeline. Others, such as @leightonvg, or myself, if I have the time, could investigate alternative algorithms, such as EA, etc., after this current PR's acceptance.

@PGijsbers how do you feel about all this?

🔔 EDIT (Update on Progress – 11 / 12 / 2023):

Following @PGijsbers's comment, available here, here is an update on progress.

I couldn't resist diving into the EvaluationLibrary. I've, therefore, added an out-of-the-box function that immediately tells us if a candidate is already known (evaluated) or not. This should offer flexibility for search methods; they can use this information as they see fit. Right now, we have a simple approach: if a candidate is known, we try another until we reach a max attempt count. However, if designers so desire, this could serve as a foundation for more complex strategies in custom search methods. Or, they may opt to utilise the straightforward approach that we have presently devised (candidate is known, we try another until a max attempt count is reached).

I've implemented this for Random Search, Async EA, and ASHA. I'm confident about Random Search and Async EA. However, I'm slightly less certain about ASHA, so I'd appreciate it if you could give it a once-over to ensure nothing's amiss. As agreed, this PR will stay under the #waiting tag and will follow after #210's merger.

To make things easier, here's a link focusing only on the recent commits for this update: Link to PR changes

Thanks, and looking forward to your feedback!

What contributions have been made

  • ✅ Improving GAMA's Search Methods Out of The Box functions to support no pipeline duplication, iff not wanted per the designer's search algorithm
  • Random Search && ASHA && Async EA have been refactored to prevent pipeline duplication by using a maximum attempt retrial to produce a new individual if duplicates are identified
  • ✅ Testing passes smoothly
  • ✅ Informative commits

❌ DISCLAIMER ❌

Kindly refrain from merging this PR prior to #210. I will need to perform a rebase from #210 before this PR can be merged!

Cheers,

@simonprovost simonprovost changed the title Refactor/improve random search 💡 Improve random search uniqueness 💅 Dec 4, 2023
@PGijsbers
Copy link
Member

Seems fine, though I would probably write it slightly differently. I do think it makes sense to have a global way to easily check what individuals have already been evaluated, so that search algorithms may make use of it out of the box (search algorithms may decide not to use it). That said, having an explicit check for this in random search while we do not yet have a general plug-and-play mechanism is a step in the right direction I am okay with. (I do think that perhaps the EvaluationLibrary would also be usable for this? But it's been too long to remember how everything ties together :( )

I'll revisit this after it's rebased on a main where #210 is merged.

@PGijsbers PGijsbers added the waiting PR or Issue depends on some other event or input as explained in one of the comments. label Dec 5, 2023
+ Custom config space is now utilising ConfigSpace
+ Classification.py is now divided in distinct Classifiers and Preprocesors for better understand and managmeent
+ Thorough documentation for users to add/modify anything
@simonprovost simonprovost force-pushed the refactor/improve_random_search branch 3 times, most recently from a46ec4f to bdbd54a Compare December 9, 2023 01:38
@simonprovost simonprovost changed the title 💡 Improve random search uniqueness 💅 💡 Improve Search Methods uniqueness (No Candidates Duplicate) 💅 Dec 9, 2023
@simonprovost simonprovost changed the title 💡 Improve Search Methods uniqueness (No Candidates Duplicate) 💅 Improve Search Methods uniqueness (No Candidates Duplicate) 🥈 Dec 11, 2023
@simonprovost simonprovost changed the title Improve Search Methods uniqueness (No Candidates Duplicate) 🥈 Improve Search Methods uniqueness (No Evaluated Pipelines Duplicate) 🥈 Dec 11, 2023
@simonprovost
Copy link
Author

simonprovost commented Dec 11, 2023

Seems fine, though I would probably write it slightly differently. I do think it makes sense to have a global way to easily check what individuals have already been evaluated, so that search algorithms may make use of it out of the box (search algorithms may decide not to use it). That said, having an explicit check for this in random search while we do not yet have a general plug-and-play mechanism is a step in the right direction I am okay with. (I do think that perhaps the EvaluationLibrary would also be usable for this? But it's been too long to remember how everything ties together :( )

I'll revisit this after it's rebased on a main where #210 is merged.

Implemented ! 🎉 Available as in the 🔔 EDIT (Update on Progress – 11 / 12 / 2023) of the initial PR's description.

Cheers.

@simonprovost simonprovost changed the title Improve Search Methods uniqueness (No Evaluated Pipelines Duplicate) 🥈 Search Methods Enhancements to Avoid Duplicate Evaluated Pipelines 🥈 Dec 11, 2023
@simonprovost
Copy link
Author

BRIEF UDPATE:

I intend to start working back on all this from the end of this month onwards. I have an important deadline for the 20th and had no time to look into those but will definitely afterward.

Cheers for your patience!

Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting PR or Issue depends on some other event or input as explained in one of the comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants