124 integrate refactoring project to herb #126

janvandermeulen · 2024-11-15T13:24:43Z

Integrating refactoring project into herb

Added pipeline into grammar optimizer .
Refactored pipeline to not use intermediate files.
Imported clingo binaries into Julia such that it no longer needs a local installation.
Added documentation.
Cleaned up imports and general code quality improvements.

Testing

Some testing still needs to be finished

…ate parsing operations into an in-memory array of strings.

…refactoring works.

…optimiser function to HerbSearch.

…imports etc...

…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merged in changes of Pallabi.

…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merging global changes into local.

…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

codecov · 2024-11-15T13:27:02Z

Codecov Report

Attention: Patch coverage is 98.57143% with 4 lines in your changes missing coverage. Please review.

Project coverage is 76.01%. Comparing base (b70e247) to head (23f717d).

Files with missing lines	Patch %	Lines
src/grammar_optimiser/extend_grammar.jl	90.47%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage   67.35%   76.01%   +8.66%     
==========================================
  Files          21       28       +7     
  Lines         729     1009     +280     
==========================================
+ Hits          491      767     +276     
- Misses        238      242       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

THinnerichs · 2024-11-15T14:18:33Z

getting_started.jl should be refactored to a proper tutorial.

THinnerichs · 2024-11-15T14:23:55Z

src/grammar_optimiser/extend_grammar.jl

+    new_grammar_rule = rulenode2expr(tree, grammar)
+    add_rule!(grammar, :($type = $new_grammar_rule))
+
+    return grammar


this should be extend_grammar! as it alters the grammar anyways.
This is very useful function, we should move it to HerbGrammar.

See: Herb-AI/HerbGrammar.jl#94
If that PR is accepted, extend_grammar and its tests can be removed from here, and its usage should be updated to account for it now being a mutating function

THinnerichs · 2024-11-15T14:24:30Z

Project.toml

 Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
 MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

 [compat]
+CSV = "0.10.15"


Why do you need to load CSV and Dataframes?

CSV was used for an old implementation, it can be deleted.

THinnerichs · 2024-11-15T14:25:24Z

Project.toml

 Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
 MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

 [compat]
+CSV = "0.10.15"
+Clingo_jll = "5.7.1"


We have to load Clingo all the time you want to use HerbSearch. Is there any way around this?

The instantiation time of Clingo_jll seems minimal, as shown in the figure (instantiation time of all packages). If you want the refactoring to be a part of the main functionality (as it is in the PR now) it seems best to me already load the package in, as the pre-compilation probably speeds up the code.

Otherwise, lazy package loading using requires.jl could be a solution (Github).

THinnerichs · 2024-11-20T14:13:22Z

src/grammar_optimiser/analyze_compressions.jl

+# Result 
+- `c_info::Dict{Int64, NamedTuple{(:size, :occurences), <:Tuple{Int64,Int64}}}`: an dict(key: compression_id, value: Tuple(size, # occurences))) 
+"""
+function generate_stats(d, compressed_AST)


Please add types to your header. You have it in the docstring already.

please rename to sth like generate_compression_stats or so.

could you also clean up the code a bit? i.e. give variables proper names so we still understand what they mean in 6 months from now. Also remove all code that is commented out.

THinnerichs · 2024-11-20T14:16:56Z

src/grammar_optimiser/analyze_compressions.jl

+# Result
+- `Bool`: true if the RuleNodes are equal, false otherwise
+"""
+function compare(rn₁, rn₂)::Bool  


This already exists in HerbCore: https://github.com/Herb-AI/HerbCore.jl/blob/2d5dcc60adf2d7067cefa5a34ac36895c1715acc/src/rulenode.jl#L112

THinnerichs · 2024-11-20T14:18:54Z

src/grammar_optimiser/analyze_compressions.jl

+end
+
+
+function select_compressions(case, c, f_best, verbosity=0)


Please add types + docstring

THinnerichs · 2024-11-20T14:27:37Z

src/grammar_optimiser/enumerate_subtrees.jl

+# Result
+- `subtrees::(Vector{RuleNode},Vector{RuleNode})`: a tuple of a list of all subtrees of the tree and a list of all other subtrees
+"""
+function enumerate_subtrees_rec(tree::RuleNode, g::AbstractGrammar)


Maybe rename this to _enumerate_subtrees.

THinnerichs · 2024-11-20T14:28:39Z

src/grammar_optimiser/enumerate_subtrees.jl

+# Result
+- `subtrees::Vector{RuleNode}`: a list of all subtrees of the tree
+"""
+function enumerate_subtrees(tree::RuleNode, g::AbstractGrammar)


Maybe you could make this an iterator instead? So you don't have to hold every possible sub-tree in memory all the time.
Also you can then collect(iterator) to get the same list.

src/grammar_optimiser/extend_grammar.jl

THinnerichs · 2024-11-20T14:40:39Z

src/grammar_optimiser/parse_input.jl

+- `number::String`: the parsed number
+- `i::Int64`: the index of the last character parsed
+"""
+function parse_number(start_index, input)


Please add types.

THinnerichs · 2024-11-20T14:40:49Z

src/grammar_optimiser/parse_input.jl

+- `index::Int64`: the index of the last node parsed
+- `output::String`: the parsed tree
+""" 
+function parse_tree(input, global_dict=nothing, start_index=0)


Please add types.

THinnerichs · 2024-11-20T14:44:49Z

src/grammar_optimiser/parse_input.jl

+# Result
+- `(output, global_dict)::(String: the parsed string, Dict`: the global dictionary)
+"""
+function parse_json(json_content)


Please add types.

could you either distinguish names a little more from parse_json in /parse_output.jl or combine both into one function?

THinnerichs · 2024-11-20T14:44:58Z

src/grammar_optimiser/parse_input.jl

+"""
+    parse_tree(input::String, global_dict::Dict, start_index::Int64)
+
+Parses a tree from a string.


It is not clear for me what this function does. What kind of tree is parsed here? Shouldn't the output type then be RuleNode?

Could you also add the return types?

THinnerichs · 2024-11-20T14:48:16Z

Nice work! Left some comments mainly on types + docstrings. Some general comments:

could you rename optimiser -> optimizer? We converged to American English (file names, functions, ... ) Sorry for the hassle.
Please add parameter + (if possible) return types to your function headers. Dispatch is the guiding principle of Julia and its simply amazing. But we need types for that. :)
to define a return type, you can something like:
function _rulenode_compare(rn₁::AbstractRuleNode, rn₂::AbstractRuleNode)::Int
Which returns an Int.
When writing types, Int is sufficient, no need to make it an explicit Int64.

…Grammar

ReubenJ · 2024-12-11T16:36:46Z

src/grammar_optimiser/grammar_optimiser.jl

+function grammar_optimiser(trees::Vector{RuleNode}, grammar::AbstractGrammar, subtree_selection_strategy::Int, f_best::Float64, verbosity=0:Int)
+    # 1. Enumerate subtrees 
+    start_time = time()
+    verbosity > 0 && print("Stage 1: Select subtrees\n")     


This type of logging should be done using @debug. We can remove the verbosity argument in that case.

ReubenJ · 2024-12-11T16:38:38Z

src/grammar_optimiser/grammar_optimiser.jl

+# Arguments
+- `trees::Vector{RuleNode}`: the trees to optimise the grammar for
+- `grammar::AbstractGrammar`: the grammar to optimise
+- `subtree_selection_strategy::Int`: the strategy to select subtrees, strategy 1 is based on occurrences and strategy 2 is based on size * occurrences


I think it'd be best to use an @enum here

ReubenJ · 2024-12-11T16:40:11Z

src/grammar_optimiser/grammar_optimiser.jl

+    new_grammar = grammar
+
+    for b in best_compressions
+        add_rule!(new_grammar, b)
+    end
+    verbosity > 1 && print("Time for stage 5 : " * string(time() - start_time) * "\n"); start_time = time()
+    return new_grammar


Double-check that this doesn't modify the old grammar. I'm pretty sure both are modified.

ReubenJ · 2024-12-12T10:33:51Z

src/grammar_optimiser/enumerate_subtrees.jl

+        for (i, include) in pairs(perm)
+            for candidate in subtree_candidates
+                if include
+                    for (j, child_subtree) in pairs(child_subtrees[i])


Suggested change

for (i, include) in pairs(perm)

for candidate in subtree_candidates

if include

for (j, child_subtree) in pairs(child_subtrees[i])

for (i, include) in enumerate(perm)

for candidate in subtree_candidates

if include

for (j, child_subtree) in enumerate(child_subtrees[i])

This is the more common approach, if enumerating is what you intended.

ReubenJ · 2024-12-12T10:40:59Z

src/grammar_optimiser/enumerate_subtrees.jl

+"""
+function selection_criteria(tree::RuleNode, subtree::AbstractRuleNode)
+    size = length(subtree)
+    return size > 1 && size < length(tree)


1 < length(subtree) < length(tree) also works here.

ReubenJ · 2024-12-12T11:12:48Z

src/grammar_optimiser/parsing_IO.jl

+# Result
+- `json_string::String`: the JSON string
+"""
+function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)


Suggested change

function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)

function print_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)

ReubenJ · 2024-12-12T11:13:10Z

src/grammar_optimiser/parsing_IO.jl

+# Result
+- `json_parsed::Dict`: the parsed JSON content
+"""
+function read_json(json_content)


Suggested change

function read_json(json_content)

function read_last_witness_from_json(json_content)

ReubenJ

Renaming looks good 👍

pujiii and others added 23 commits November 14, 2024 11:04

added grammar_optimiser folder

9679b7d

uses binary instead of local clingo installation

e3d90ca

added to project; removed unnecessary "using HerbSearch"s

97d47c6

Refactored pipeline to not use temporary files but safe all intermedi…

375a4b8

…ate parsing operations into an in-memory array of strings.

Added a file which can be used for a future tutorial to show how the …

fb4da5a

…refactoring works.

Small refactoring to add verbosity settings and exposing the grammar_…

854ce60

…optimiser function to HerbSearch.

Update documentation 1.

6ce4108

Improve documentation and code quality improvements.

d48ccc2

Quality of life changes, like documentation and removing unnecessary …

ced87e0

…imports etc...

Added initial testing.

b42137e

test parse subtrees to json

9f37490

Wrote tests for enumerate_subtrees.jl class.

4b5e561

Merge branch '124-integrate-refactoring-project-to-herb' of github.co…

aeeab0a

…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merged in changes of Pallabi.

integration tests grammar optimiser

31e1cec

cleaned up test code

cad5096

Wrote tests for extend_grammar.jl.

0b1318f

REF: remove unnecessary assign

7af81eb

DOC: Remove unnecessary documentation

6978b97

Wrote tests for parse_input.jl.

88e1daa

Merge branch '124-integrate-refactoring-project-to-herb' of github.co…

0af758a

…m:Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb Merging global changes into local.

tests for compare

a18a00a

Merge branch '124-integrate-refactoring-project-to-herb' of https://g…

7cc7ee6

…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

test analyse compressions

efaeb1e

janvandermeulen requested a review from THinnerichs November 15, 2024 13:24

janvandermeulen linked an issue Nov 15, 2024 that may be closed by this pull request

Integrate refactoring project to Herb #124

Open

THinnerichs mentioned this pull request Nov 15, 2024

Integrate refactoring project to Herb #124

Open

pujiii and others added 3 commits November 15, 2024 14:30

all tests analyze compressions EXCEPT generate_stats

d642691

test generate stats

4dab962

Merge branch '124-integrate-refactoring-project-to-herb' of https://g…

b9f959f

…ithub.com/Herb-AI/HerbSearch.jl into 124-integrate-refactoring-project-to-herb

janvandermeulen and others added 2 commits November 15, 2024 14:55

Added all tests to main testing pipeline.

23f717d

TEST: Added tests to generate tree from compression

b9afeff

janvandermeulen marked this pull request as ready for review November 15, 2024 14:14

THinnerichs reviewed Nov 15, 2024

View reviewed changes

ViciousDoormat mentioned this pull request Nov 15, 2024

FEAT: Add extend grammar Herb-AI/HerbGrammar.jl#94

Merged

THinnerichs reviewed Nov 20, 2024

View reviewed changes

ViciousDoormat and others added 2 commits November 25, 2024 15:41

REFACTOR: Replace new_grammar with the new add_rule! function in Herb…

6c7c611

…Grammar

Refactor project and tests

df50101

THinnerichs assigned ViciousDoormat Dec 11, 2024

ReubenJ self-requested a review December 11, 2024 15:16

ReubenJ requested changes Dec 12, 2024

View reviewed changes

THinnerichs added 2 commits December 12, 2024 13:14

Refactor names, remove getting started towards tutorial

28f24f5

Rewrite combinations

c8b4ee9

ReubenJ reviewed Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

124 integrate refactoring project to herb #126

124 integrate refactoring project to herb #126

janvandermeulen commented Nov 15, 2024 •

edited by pujiii

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

THinnerichs commented Nov 15, 2024

THinnerichs Nov 15, 2024

ViciousDoormat Nov 15, 2024 •

edited

Loading

THinnerichs Nov 15, 2024

janvandermeulen Nov 17, 2024 •

edited

Loading

THinnerichs Nov 15, 2024

janvandermeulen Nov 17, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs Nov 20, 2024

THinnerichs commented Nov 20, 2024

ReubenJ Dec 11, 2024

ReubenJ Dec 11, 2024

ReubenJ Dec 11, 2024

ReubenJ Dec 12, 2024

ReubenJ Dec 12, 2024

ReubenJ Dec 12, 2024

ReubenJ Dec 12, 2024

ReubenJ left a comment

		end


		function select_compressions(case, c, f_best, verbosity=0)

	function parse_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)
	function print_subtrees_to_json(subtrees::Vector{Any}, tree::RuleNode)

	function read_json(json_content)
	function read_last_witness_from_json(json_content)

124 integrate refactoring project to herb #126

Are you sure you want to change the base?

124 integrate refactoring project to herb #126

Conversation

janvandermeulen commented Nov 15, 2024 • edited by pujiii Loading

Integrating refactoring project into herb

Testing

codecov bot commented Nov 15, 2024 • edited Loading

Codecov Report

THinnerichs commented Nov 15, 2024

Choose a reason for hiding this comment

ViciousDoormat Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janvandermeulen Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

THinnerichs commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ReubenJ left a comment

Choose a reason for hiding this comment

janvandermeulen commented Nov 15, 2024 •

edited by pujiii

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

ViciousDoormat Nov 15, 2024 •

edited

Loading

janvandermeulen Nov 17, 2024 •

edited

Loading