Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change args to user_info and fix docstring #253

Closed
wants to merge 4 commits into from

Conversation

yucongalicechen
Copy link
Contributor

@yucongalicechen yucongalicechen commented Dec 19, 2024

closes #245, closes #244
@sbillinge ready for review

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (641a107) to head (4cfd13e).
Report is 58 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #253   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            8         8           
  Lines          380       379    -1     
=========================================
- Hits           380       379    -1     
Files with missing lines Coverage Δ
tests/test_tools.py 100.00% <100.00%> (ø)

@sbillinge
Copy link
Contributor

I am still struggling a bit with this functionality. I think maybe I got to the botto of why this is the case (though maybe not!). I think the function is trying to do two things which is making me confused. The two things are (a) getting the config information from config files and (b) a workflow to create/maintain the config file. I wonder if this all gets a bit easier if we separate these behaviorws, for example, with a conditional, but this taking place outside the get_info function? Maybe we could sketch out this flow in a diagram?

@yucongalicechen
Copy link
Contributor Author

I am still struggling a bit with this functionality. I think maybe I got to the botto of why this is the case (though maybe not!). I think the function is trying to do two things which is making me confused. The two things are (a) getting the config information from config files and (b) a workflow to create/maintain the config file. I wonder if this all gets a bit easier if we separate these behaviorws, for example, with a conditional, but this taking place outside the get_info function? Maybe we could sketch out this flow in a diagram?

Yes, I agree we can make it a bit more readable! I drew a diagram on this. I am thinking that we can have one private function for each action in the diagram?

workflow

@yucongalicechen
Copy link
Contributor Author

@sbillinge Since we're redoing the workflow anyways, I'm adding the skip_config_creation for #244 here too. Here's a new diagram I'm thinking about:

workflow2

@yucongalicechen
Copy link
Contributor Author

^push an intermediate commit in case anyone would like to comment, I will likely reformat some functions and make the comments clearer

@yucongalicechen
Copy link
Contributor Author

I edited the codes again and it's ready for some feedback now @sbillinge


def clean_dict(obj):

def _clean_dict(obj):
Copy link
Contributor Author

@yucongalicechen yucongalicechen Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this function private

@@ -71,67 +80,64 @@ def load_config(file_path):
def _sorted_merge(*dicts):
merged = {}
for d in dicts:
d = _clean_dict(d)
Copy link
Contributor Author

@yucongalicechen yucongalicechen Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a step here so we don't need to pass in _clean_dict(d) for every d

f"[{user_info.get('username', '')}]: "
).strip() or user_info.get("username", "")
email = input(f"Please enter the your email " f"[{user_info.get('email', '')}]: ").strip() or user_info.get(
"email", ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name change from args to user_info

)
return return_bool
config = {"username": _stringify(username), "email": _stringify(email)}
if username and email:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function originally returns a bool that passes back to get_user_info to see if we need to create a global config file. If user only input none or one of username / email, an empty global config file will be created. I simplified the logic here so that we only create a global config file if both username and email are present. Does this make sense though?

expected_username, expected_email = expected
config = get_user_info(args)
def _run_tests(cli_inputs, expected):
user_info = {"username": cli_inputs["cli_username"], "email": cli_inputs["cli_email"]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cli_inputs" refer to function arguments passed to get_user_info. Not sure if this is the best name here but I want to distinguish them from user inputs via prompts ("inputs").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to still have a helper function called _run_tests within each test func?

The name run_tests to me is quite ambiguous - what test is it running? comparing username and email?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops missed this comment. Yeah I think we can keep this test but maybe have a better name

confile = Path().home() / "diffpyconfig.json"
assert confile.is_file()
assert confile.is_file() == expected["config_file_exists"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global config file should only be created when both username and email are given

inp_iter = iter(inputsb)
monkeypatch.setattr("builtins.input", lambda _: next(inp_iter))
_run_tests(inputsa, expected)
_run_tests(cli_inputs, expected)
confile = Path().home() / "diffpyconfig.json"
assert confile.exists() is False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config file is never created if user only provides function arguments

user_info = {"username": cli_inputs["cli_username"], "email": cli_inputs["cli_email"]}
config = get_user_info(user_info=user_info, skip_config_creation=cli_inputs["skip_config_creation"])
expected_username = expected["expected_username"]
expected_email = expected["expected_email"]
assert config.get("username") == expected_username
assert config.get("email") == expected_email


params_user_info_with_home_conf_file = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params_user_info_with_home_conf_file may not be needed - please see our discussion: https://github.com/diffpy/diffpy.utils/pull/236/files#r1885496247 @yucongalicechen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably a different test case than that issue..?

{"expected_username": "cli_username", "expected_email": "[email protected]"},
),
(
{"cli_username": None, "cli_email": "[email protected]", "skip_config_creation": False},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 5 variables used - cli_username, cli_email, skip_config, expected_username, expected_email, while we have a lots of repeated key values, making it a bi hard to manage.

can we pass variables to @pytest.mark.parametrize("input_username, input_email, ... "). Also, does it have to be called cli...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Bob, thanks for the feedback!! I think these repeated key values correspond to different test cases. Basically this function tries to get username/email from four places: (a) prompt input, (b) function arguments (which we called cli input because it was initially used by labpdfproc, where the function arguments are from cli), (c) local config file, (d) global config file, with priority order.

Currently we have four test functions that correspond to:
(1) we only have global config file, function prioritizes (b) to (d) (that is, replace (d) if (b) is non-empty).
(2) we have local config file with/without global config file, function prioritizes (b) to (c).
(3) we dont have config files, and user does not choose to skip config creation, then function prioritizes (a) to (b), and create a global config file if both username/email arguments are present.
(4) we dont have config files, and user chooses to skip config creation (this means we don't have (a)), then the function reads (b), and replace username/email with an empty string if no value given.

So I think all arguments and test values are necessary here. I will add a bit more comment to the tests so that it's clearer. We might be able to find a better way to organize the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yucongalicechen for the comments. Yeah I had a quick look at the src code.

Username/email from 4 places:

  1. prompt
  2. function args
  3. local config
  4. global config

and the test functions are basically testing the input/out and reading/creating config files as needed..

Noted.

@pytest.mark.parametrize("inputsa, inputsb, expected", params_user_info_no_conf_file_no_inputs)
def test_get_user_info_no_conf_file_no_inputs(monkeypatch, inputsa, inputsb, expected, user_filesystem):
@pytest.mark.parametrize("cli_inputs, expected", params_user_info_no_conf_file_no_inputs)
def test_get_user_info_no_conf_file_no_inputs(monkeypatch, cli_inputs, expected, user_filesystem):
_setup_dirs(monkeypatch, user_filesystem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could help me understanding - what's the role of moneypatch and why do we need to setup a dir using _setup_dirs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to set up the global and local file paths for the test cases because the function will try to read files from these filepaths

@sbillinge
Copy link
Contributor

sbillinge commented Dec 20, 2024

I thought about this more and, as I mentioned before, I think it makes more sense to disentangle the two workflows (1) get_user_info and (2) update_config. I sketched this in two images that I paste below. I don't necessarily expect you to be able to read them, but they show the structure.

The starting point of the UC is labpdfproc (LPP), for example, first running an update_global_config workflow, followed by a separate build_metadata workflow, i.e.,
PXL_20241220_105541819

where the build_metadata workflow looks like
PXL_20241220_105549384 MP

The build_metadata workflow is as we have now, runtime > local config > global config for uname, email, orcid.

The update_global_config checks for global_config. If the file is missing or if unam/email/orcid are missing in the file, it runs something like if not skip_config it prompts users for input and creates/updates the config file.

I think separating these makes them more reusable. I want to start using this (and the get_package_info) in all our codes so we start the arduous process of collecting better metadata.

return {
"username": _stringify(user_info.get("username", "")),
"email": _stringify(user_info.get("email", "")),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified the workflow here

@yucongalicechen
Copy link
Contributor Author

@sbillinge I've simplified the workflow using the idea from the diagram, please check.
In the mean time I'll think about how to make the tests more readable as also suggested by @bobleesj (since we'll add orcid this will make the tests extremely long to account for different cases).

@yucongalicechen
Copy link
Contributor Author

replaced by #267

@yucongalicechen yucongalicechen deleted the user-info branch December 23, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I think the get_user_info docs don't match the code implement a skip_config_creation option in get_user_info
3 participants