Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

124 integrate refactoring project to herb #126

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

janvandermeulen
Copy link
Member

@janvandermeulen janvandermeulen commented Nov 15, 2024

Integrating refactoring project into herb

  • Added pipeline into grammar optimizer .
  • Refactored pipeline to not use intermediate files.
  • Imported clingo binaries into Julia such that it no longer needs a local installation.
  • Added documentation.
  • Cleaned up imports and general code quality improvements.

Testing

Some testing still needs to be finished

  • test_enumerate_subtrees.jl
  • test_analyse_compressions.jl
  • test_extend_grammar.jl
  • test_parse_input.jl
  • test_grammar_optimiser.jl
  • test_parse_output.jl
  • test_parse_subtrees_to_json.jl
  • Merge all tests inuto the main pipeline

pujiii and others added 23 commits November 14, 2024 11:04
…ate parsing operations into an in-memory array of strings.
…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

Merged in changes of Pallabi.
…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

Merging global changes into local.
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 98.57143% with 4 lines in your changes missing coverage. Please review.

Project coverage is 76.01%. Comparing base (b70e247) to head (23f717d).

Files with missing lines Patch % Lines
src/grammar_optimiser/extend_grammar.jl 90.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage   67.35%   76.01%   +8.66%     
==========================================
  Files          21       28       +7     
  Lines         729     1009     +280     
==========================================
+ Hits          491      767     +276     
- Misses        238      242       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@janvandermeulen janvandermeulen marked this pull request as ready for review November 15, 2024 14:14
@THinnerichs
Copy link
Member

getting_started.jl should be refactored to a proper tutorial.

new_grammar_rule = rulenode2expr(tree, grammar)
add_rule!(grammar, :($type = $new_grammar_rule))

return grammar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be extend_grammar! as it alters the grammar anyways.
This is very useful function, we should move it to HerbGrammar.

Copy link
Member

@ViciousDoormat ViciousDoormat Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: Herb-AI/HerbGrammar.jl#94
If that PR is accepted, extend_grammar and its tests can be removed from here, and its usage should be updated to account for it now being a mutating function

Project.toml Outdated
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
CSV = "0.10.15"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to load CSV and Dataframes?

Copy link
Member Author

@janvandermeulen janvandermeulen Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV was used for an old implementation, it can be deleted.

Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
CSV = "0.10.15"
Clingo_jll = "5.7.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to load Clingo all the time you want to use HerbSearch. Is there any way around this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instantiation time of Clingo_jll seems minimal, as shown in the figure (instantiation time of all packages). If you want the refactoring to be a part of the main functionality (as it is in the PR now) it seems best to me already load the package in, as the pre-compilation probably speeds up the code.
image
Otherwise, lazy package loading using requires.jl could be a solution (Github).

# Result
- `c_info::Dict{Int64, NamedTuple{(:size, :occurences), <:Tuple{Int64,Int64}}}`: an dict(key: compression_id, value: Tuple(size, # occurences)))
"""
function generate_stats(d, compressed_AST)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add types to your header. You have it in the docstring already.
  • please rename to sth like generate_compression_stats or so.
  • could you also clean up the code a bit? i.e. give variables proper names so we still understand what they mean in 6 months from now. Also remove all code that is commented out.

# Result
- `Bool`: true if the RuleNodes are equal, false otherwise
"""
function compare(rn₁, rn₂)::Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end


function select_compressions(case, c, f_best, verbosity=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add types + docstring

# Result
- `subtrees::(Vector{RuleNode},Vector{RuleNode})`: a tuple of a list of all subtrees of the tree and a list of all other subtrees
"""
function enumerate_subtrees_rec(tree::RuleNode, g::AbstractGrammar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to _enumerate_subtrees.

# Result
- `subtrees::Vector{RuleNode}`: a list of all subtrees of the tree
"""
function enumerate_subtrees(tree::RuleNode, g::AbstractGrammar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could make this an iterator instead? So you don't have to hold every possible sub-tree in memory all the time.
Also you can then collect(iterator) to get the same list.

src/grammar_optimiser/extend_grammar.jl Outdated Show resolved Hide resolved
- `number::String`: the parsed number
- `i::Int64`: the index of the last character parsed
"""
function parse_number(start_index, input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add types.

- `index::Int64`: the index of the last node parsed
- `output::String`: the parsed tree
"""
function parse_tree(input, global_dict=nothing, start_index=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add types.

# Result
- `(output, global_dict)::(String: the parsed string, Dict`: the global dictionary)
"""
function parse_json(json_content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add types.
  • could you either distinguish names a little more from parse_json in /parse_output.jl or combine both into one function?

"""
parse_tree(input::String, global_dict::Dict, start_index::Int64)

Parses a tree from a string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear for me what this function does. What kind of tree is parsed here? Shouldn't the output type then be RuleNode?

Could you also add the return types?

@THinnerichs
Copy link
Member

Nice work! Left some comments mainly on types + docstrings. Some general comments:

  • could you rename optimiser -> optimizer? We converged to American English (file names, functions, ... ) Sorry for the hassle.
  • Please add parameter + (if possible) return types to your function headers. Dispatch is the guiding principle of Julia and its simply amazing. But we need types for that. :)
  • to define a return type, you can something like:
    function _rulenode_compare(rn₁::AbstractRuleNode, rn₂::AbstractRuleNode)::Int
    Which returns an Int.
  • When writing types, Int is sufficient, no need to make it an explicit Int64.

function grammar_optimiser(trees::Vector{RuleNode}, grammar::AbstractGrammar, subtree_selection_strategy::Int, f_best::Float64, verbosity=0:Int)
# 1. Enumerate subtrees
start_time = time()
verbosity > 0 && print("Stage 1: Select subtrees\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of logging should be done using @debug. We can remove the verbosity argument in that case.

# Arguments
- `trees::Vector{RuleNode}`: the trees to optimise the grammar for
- `grammar::AbstractGrammar`: the grammar to optimise
- `subtree_selection_strategy::Int`: the strategy to select subtrees, strategy 1 is based on occurrences and strategy 2 is based on size * occurrences
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be best to use an @enum here

Comment on lines 85 to 91
new_grammar = grammar

for b in best_compressions
add_rule!(new_grammar, b)
end
verbosity > 1 && print("Time for stage 5 : " * string(time() - start_time) * "\n"); start_time = time()
return new_grammar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-check that this doesn't modify the old grammar. I'm pretty sure both are modified.

Comment on lines 46 to 49
for (i, include) in pairs(perm)
for candidate in subtree_candidates
if include
for (j, child_subtree) in pairs(child_subtrees[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (i, include) in pairs(perm)
for candidate in subtree_candidates
if include
for (j, child_subtree) in pairs(child_subtrees[i])
for (i, include) in enumerate(perm)
for candidate in subtree_candidates
if include
for (j, child_subtree) in enumerate(child_subtrees[i])

This is the more common approach, if enumerating is what you intended.

"""
function selection_criteria(tree::RuleNode, subtree::AbstractRuleNode)
size = length(subtree)
return size > 1 && size < length(tree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 < length(subtree) < length(tree) also works here.

# Result
- `json_string::String`: the JSON string
"""
function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)
function print_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)

# Result
- `json_parsed::Dict`: the parsed JSON content
"""
function read_json(json_content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function read_json(json_content)
function read_last_witness_from_json(json_content)

Copy link
Member

@ReubenJ ReubenJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate refactoring project to Herb
5 participants