Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn off identify_connected_components default #760

Merged
merged 9 commits into from
Sep 7, 2023

Conversation

daico007
Copy link
Member

Per discussion stemmed from #757, this PR turned off the default identify_connected_components option in the parameterize.apply.

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (0011e15) 92.06% compared to head (75b0a11) 92.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #760      +/-   ##
==========================================
+ Coverage   92.06%   92.07%   +0.01%     
==========================================
  Files          67       67              
  Lines        6892     6890       -2     
==========================================
- Hits         6345     6344       -1     
+ Misses        547      546       -1     
Files Changed Coverage Δ
gmso/utils/specific_ff_to_residue.py 89.66% <ø> (ø)
gmso/parameterization/parameterize.py 100.00% <100.00%> (ø)
gmso/parameterization/topology_parameterizer.py 97.67% <100.00%> (-0.04%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@CalCraven CalCraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this discussion, a few other changes may as well be made here:

    identify_connected_components: bool, optional, default=False
        A flag which will speed up atomtyping for systems with many repeated molecules in
        the system. Specifically useful if `top.unique_site_labels()` isn't properly identifying 
        repeated. However, if the molecules are identified, then it is recommended to use
        use_molecule_info.
        molecules.

Also, I would set use_molecule_info to True as default, since that shouldn't add any overhead, since it's just checking the molecule tags. The use_molecule_info text should also be changed. Maybe something like:
A flag to determine whether or not to look at site.molcule_name to look for repeated molecules, and copy the parameters out to the remaining molecules.

I was also thinking we could rename these parameters. There are three potential speedup methods:
identify_connected_components=True
use_molecule_info=True
fast_copy=True

What if we named them something something more cohesive so it was clear if things are running slow for you, you should try to use of of these flags?
Something:
speedup_idmolecules
speedup_molecule_tag
speedup_copyexpressions

While we're here, what do we think about reformatting assert_x_parameters as well. What if we condensed it to assert_parameter_connections, and you pass a dict instead of 4 different bools? Might reduce how lengthy this function docstring is.

@chrisjonesBSU
Copy link
Contributor

I like the idea of using speedup in the parameter names to help convey the idea of what these parameters are doing.

@daico007
Copy link
Member Author

Good idea! I think we can go with speedup_by_moltag and speedup_by_molgraph? (short for molecule tag and molecule graph, i.e., connected components)?

@daico007
Copy link
Member Author

Regarding the assert_x_parameters, I agree that we should to put them in one place/parameter. But I am thinking we could reverse it a little bit, instead of the user assert a set of parameters exist, we could turn it into ignore_pararameters, which could take a list/tuple of ('bond', 'angle', 'dihedral', 'improper'). So in the default case, the ignore_parameters will just be ('improper').

@daico007
Copy link
Member Author

I included the proposed changes in the latest commit, lmk what y'all think.

Copy link
Contributor

@chrisjonesBSU chrisjonesBSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple comments for some small changes.

gmso/parameterization/parameterize.py Show resolved Hide resolved
gmso/parameterization/parameterize.py Outdated Show resolved Hide resolved
Copy link
Contributor

@CalCraven CalCraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once those few changes are in, this looks good to me.

gmso/parameterization/parameterize.py Outdated Show resolved Hide resolved
gmso/parameterization/parameterize.py Show resolved Hide resolved
gmso/parameterization/parameterize.py Show resolved Hide resolved
gmso/parameterization/parameterize.py Show resolved Hide resolved
@daico007
Copy link
Member Author

daico007 commented Sep 5, 2023

I should have addressed all the comments by @chrisjonesBSU and @CalCraven in the last commit.

Copy link
Contributor

@chrisjonesBSU chrisjonesBSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I made one commit to change True to False in the doc string for speedup_by_molgraph

@chrisjonesBSU chrisjonesBSU linked an issue Sep 5, 2023 that may be closed by this pull request
@CalCraven
Copy link
Contributor

Few more minor doc string changes and this is good to merge.

@daico007 daico007 merged commit 6230bd9 into mosdef-hub:main Sep 7, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance significantly affected by "flatness" of the compounds
3 participants