-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Methods Enhancements to Avoid Duplicate Evaluated Pipelines 🥈 #211
base: master
Are you sure you want to change the base?
Search Methods Enhancements to Avoid Duplicate Evaluated Pipelines 🥈 #211
Conversation
Seems fine, though I would probably write it slightly differently. I do think it makes sense to have a global way to easily check what individuals have already been evaluated, so that search algorithms may make use of it out of the box (search algorithms may decide not to use it). That said, having an explicit check for this in random search while we do not yet have a general plug-and-play mechanism is a step in the right direction I am okay with. (I do think that perhaps the I'll revisit this after it's rebased on a main where #210 is merged. |
+ Custom config space is now utilising ConfigSpace + Classification.py is now divided in distinct Classifiers and Preprocesors for better understand and managmeent + Thorough documentation for users to add/modify anything
a46ec4f
to
bdbd54a
Compare
bdbd54a
to
d7310cd
Compare
Implemented ! 🎉 Available as in the 🔔 EDIT (Update on Progress – 11 / 12 / 2023) of the initial PR's description. Cheers. |
BRIEF UDPATE: I intend to start working back on all this from the end of this month onwards. I have an important deadline for the 20th and had no time to look into those but will definitely afterward. Cheers for your patience! Simon |
Hi @PGijsbers, @leightonvg,
After dedicating some time to examine the concern raised in thread #189, I would like to suggest an initial resolution within the thread, which I previously suggested as the narrowed approach.
The narrowed approach could be seen as follow:
Because a search algorithm may be highly specific to itself, I am convinced that re-evaluation of previously evaluated pipelines should be performed by the algorithms themselves. While I am unable to provide a concrete example, a re-evaluated pipeline may be useful for theories built on this aspect of newly designed algorithms; who knows? Although this is not a compelling reason, approaching the problem narrowly allows the algorithm to process it in the manner it prefers. In other words, it may differ from algorithm to algorithm and be implemented in such a way that, for example, a particular candidate is given more prominence than another, as indicated by the duplication number of this candidate, and thus a potential course of action could dynamically shift the algorithm's focus, for example. If I am not mistaken, performing such a check within each algorithm opens up additional avenues for what to do in any way.
Lastly, to avoid confusion for new search algorithm designer, it is possible that we implement a log information warning within the evaluation pipeline module of the GAMA system regarding the re-evaluation of duplicate pipelines and the necessity to refactor the provided search algorithm. Exploring the duplication within each search method would facilitate the processus with what to do basically.
In the interim, this PR enhances the random search uniqueness of the evaluated pipeline. Others, such as @leightonvg, or myself, if I have the time, could investigate alternative algorithms, such as EA, etc., after this current PR's acceptance.
@PGijsbers how do you feel about all this?
🔔 EDIT (Update on Progress – 11 / 12 / 2023):
Following @PGijsbers's comment, available here, here is an update on progress.
I couldn't resist diving into the
EvaluationLibrary
. I've, therefore, added an out-of-the-box function that immediately tells us if a candidate is already known (evaluated) or not. This should offer flexibility for search methods; they can use this information as they see fit. Right now, we have a simple approach: if a candidate is known, we try another until we reach a max attempt count. However, if designers so desire, this could serve as a foundation for more complex strategies in custom search methods. Or, they may opt to utilise the straightforward approach that we have presently devised (candidate is known, we try another until a max attempt count is reached).I've implemented this for
Random Search
,Async EA
, andASHA
. I'm confident aboutRandom Search
andAsync EA
. However, I'm slightly less certain aboutASHA
, so I'd appreciate it if you could give it a once-over to ensure nothing's amiss. As agreed, this PR will stay under the#waiting
tag and will follow after #210's merger.To make things easier, here's a link focusing only on the recent commits for this update: Link to PR changes
Thanks, and looking forward to your feedback!
What contributions have been made
Random Search
&&ASHA
&&Async EA
have been refactored to prevent pipeline duplication by using a maximum attempt retrial to produce a new individual if duplicates are identified❌ DISCLAIMER ❌
Kindly refrain from merging this PR prior to #210. I will need to perform a rebase from #210 before this PR can be merged!
Cheers,