-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
124 integrate refactoring project to herb #126
base: master
Are you sure you want to change the base?
124 integrate refactoring project to herb #126
Conversation
…ate parsing operations into an in-memory array of strings.
…refactoring works.
…optimiser function to HerbSearch.
…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merged in changes of Pallabi.
…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merging global changes into local.
…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #126 +/- ##
==========================================
+ Coverage 67.35% 76.01% +8.66%
==========================================
Files 21 28 +7
Lines 729 1009 +280
==========================================
+ Hits 491 767 +276
- Misses 238 242 +4 ☔ View full report in Codecov by Sentry. |
…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb
|
new_grammar_rule = rulenode2expr(tree, grammar) | ||
add_rule!(grammar, :($type = $new_grammar_rule)) | ||
|
||
return grammar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be extend_grammar!
as it alters the grammar anyways.
This is very useful function, we should move it to HerbGrammar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: Herb-AI/HerbGrammar.jl#94
If that PR is accepted, extend_grammar and its tests can be removed from here, and its usage should be updated to account for it now being a mutating function
Project.toml
Outdated
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568" | ||
MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078" | ||
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" | ||
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" | ||
|
||
[compat] | ||
CSV = "0.10.15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to load CSV and Dataframes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV was used for an old implementation, it can be deleted.
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568" | ||
MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078" | ||
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" | ||
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" | ||
|
||
[compat] | ||
CSV = "0.10.15" | ||
Clingo_jll = "5.7.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to load Clingo
all the time you want to use HerbSearch
. Is there any way around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instantiation time of Clingo_jll seems minimal, as shown in the figure (instantiation time of all packages). If you want the refactoring to be a part of the main functionality (as it is in the PR now) it seems best to me already load the package in, as the pre-compilation probably speeds up the code.
Otherwise, lazy package loading using requires.jl could be a solution (Github).
# Result | ||
- `c_info::Dict{Int64, NamedTuple{(:size, :occurences), <:Tuple{Int64,Int64}}}`: an dict(key: compression_id, value: Tuple(size, # occurences))) | ||
""" | ||
function generate_stats(d, compressed_AST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add types to your header. You have it in the docstring already.
- please rename to sth like
generate_compression_stats
or so. - could you also clean up the code a bit? i.e. give variables proper names so we still understand what they mean in 6 months from now. Also remove all code that is commented out.
# Result | ||
- `Bool`: true if the RuleNodes are equal, false otherwise | ||
""" | ||
function compare(rn₁, rn₂)::Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already exists in HerbCore
: https://github.com/Herb-AI/HerbCore.jl/blob/2d5dcc60adf2d7067cefa5a34ac36895c1715acc/src/rulenode.jl#L112
end | ||
|
||
|
||
function select_compressions(case, c, f_best, verbosity=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add types + docstring
# Result | ||
- `subtrees::(Vector{RuleNode},Vector{RuleNode})`: a tuple of a list of all subtrees of the tree and a list of all other subtrees | ||
""" | ||
function enumerate_subtrees_rec(tree::RuleNode, g::AbstractGrammar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this to _enumerate_subtrees
.
# Result | ||
- `subtrees::Vector{RuleNode}`: a list of all subtrees of the tree | ||
""" | ||
function enumerate_subtrees(tree::RuleNode, g::AbstractGrammar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could make this an iterator instead? So you don't have to hold every possible sub-tree in memory all the time.
Also you can then collect(iterator)
to get the same list.
src/grammar_optimiser/parse_input.jl
Outdated
- `number::String`: the parsed number | ||
- `i::Int64`: the index of the last character parsed | ||
""" | ||
function parse_number(start_index, input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add types.
src/grammar_optimiser/parse_input.jl
Outdated
- `index::Int64`: the index of the last node parsed | ||
- `output::String`: the parsed tree | ||
""" | ||
function parse_tree(input, global_dict=nothing, start_index=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add types.
src/grammar_optimiser/parse_input.jl
Outdated
# Result | ||
- `(output, global_dict)::(String: the parsed string, Dict`: the global dictionary) | ||
""" | ||
function parse_json(json_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add types.
- could you either distinguish names a little more from
parse_json
in/parse_output.jl
or combine both into one function?
src/grammar_optimiser/parse_input.jl
Outdated
""" | ||
parse_tree(input::String, global_dict::Dict, start_index::Int64) | ||
|
||
Parses a tree from a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear for me what this function does. What kind of tree is parsed here? Shouldn't the output type then be RuleNode
?
Could you also add the return types?
Nice work! Left some comments mainly on types + docstrings. Some general comments:
|
function grammar_optimiser(trees::Vector{RuleNode}, grammar::AbstractGrammar, subtree_selection_strategy::Int, f_best::Float64, verbosity=0:Int) | ||
# 1. Enumerate subtrees | ||
start_time = time() | ||
verbosity > 0 && print("Stage 1: Select subtrees\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of logging should be done using @debug
. We can remove the verbosity argument in that case.
# Arguments | ||
- `trees::Vector{RuleNode}`: the trees to optimise the grammar for | ||
- `grammar::AbstractGrammar`: the grammar to optimise | ||
- `subtree_selection_strategy::Int`: the strategy to select subtrees, strategy 1 is based on occurrences and strategy 2 is based on size * occurrences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be best to use an @enum
here
new_grammar = grammar | ||
|
||
for b in best_compressions | ||
add_rule!(new_grammar, b) | ||
end | ||
verbosity > 1 && print("Time for stage 5 : " * string(time() - start_time) * "\n"); start_time = time() | ||
return new_grammar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-check that this doesn't modify the old grammar. I'm pretty sure both are modified.
for (i, include) in pairs(perm) | ||
for candidate in subtree_candidates | ||
if include | ||
for (j, child_subtree) in pairs(child_subtrees[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (i, include) in pairs(perm) | |
for candidate in subtree_candidates | |
if include | |
for (j, child_subtree) in pairs(child_subtrees[i]) | |
for (i, include) in enumerate(perm) | |
for candidate in subtree_candidates | |
if include | |
for (j, child_subtree) in enumerate(child_subtrees[i]) |
This is the more common approach, if enumerating is what you intended.
""" | ||
function selection_criteria(tree::RuleNode, subtree::AbstractRuleNode) | ||
size = length(subtree) | ||
return size > 1 && size < length(tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 < length(subtree) < length(tree)
also works here.
src/grammar_optimiser/parsing_IO.jl
Outdated
# Result | ||
- `json_string::String`: the JSON string | ||
""" | ||
function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode) | |
function print_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode) |
src/grammar_optimiser/parsing_IO.jl
Outdated
# Result | ||
- `json_parsed::Dict`: the parsed JSON content | ||
""" | ||
function read_json(json_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function read_json(json_content) | |
function read_last_witness_from_json(json_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming looks good 👍
Integrating refactoring project into herb
Testing
Some testing still needs to be finished